For More Information or to Register, Click Here.
Back To Schedule
Thursday, October 1 • 15:45 - 16:35
Frontera: Open Source, Large Scale Web Crawling Framework - Alexander Sibiryakov, Scrapinghub Ltd.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

In this talk he is going to introduce new open source framework Frontera https://github.com/scrapinghub/frontera. Frontera allows to build real-time, large scale, distributed web crawlers and website focused ones. Offering:
  • customizable storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction,
  • fetcher abstraction.
Along with framework description he'll demonstrate how to build a distributed crawler using Scrapy, Apache Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler.


Alexander Sibiryakov

Core developer of web crawling framework Frontera at Scrapinghub Ltd. A performance geek, data scientist and ex-Yandex engineer (search quality department). Presenting at Berlin Buzzwords and Yandex local events.

Thursday October 1, 2015 15:45 - 16:35 CEST
Jozsef / Kolcsey

Attendees (0)