Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
For More Information or to Register, Click Here.
View analytic
Thursday, October 1 • 15:45 - 16:35
Frontera: Open Source, Large Scale Web Crawling Framework - Alexander Sibiryakov, Scrapinghub Ltd.

Sign up or log in to save this to your schedule and see who's attending!

In this talk he is going to introduce new open source framework Frontera https://github.com/scrapinghub/frontera. Frontera allows to build real-time, large scale, distributed web crawlers and website focused ones. Offering:
  • customizable storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction,
  • fetcher abstraction.
Along with framework description he'll demonstrate how to build a distributed crawler using Scrapy, Apache Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler.

Speakers
AS

Alexander Sibiryakov

Core developer of web crawling framework Frontera at Scrapinghub Ltd. A performance geek, data scientist and ex-Yandex engineer (search quality department). Presenting at Berlin Buzzwords and Yandex local events.


Thursday October 1, 2015 15:45 - 16:35
Jozsef / Kolcsey

Attendees (7)