Working with Selenium

Warning

This extension is currently ALPHA.

Things will change, break, not work as expected, and the documentation is lacking some serious work.

This section is here to give a brief overview but is neither complete nor definitive.

You’ve been warned.

Writing web crawlers with Bonobo and Selenium is easy.

First, install bonobo-selenium:

$ pip install bonobo-selenium

The idea is to have one callable crawl one thing and delegate drill downs to callables further away in the chain.

An example chain could be:

digraph { rankdir = LR; login -> paginate -> list -> details -> "ExcelWriter(...)"; }

Where each step would do the following:

  • login() is in charge to open an authenticated session in the browser.

  • paginate() open each page of a fictive list and pass it to next.

  • list() take every list item and yield it.

  • details() extract the data you’re interested in.

  • … and the writer saves it somewhere.

Installation

Overview

Details