Scrapy – An Introduction

Let’s learn about Scrapy. But before that… What it is Scrapy? What it does? Why I even want to learn about it?

What it is Scrapy? What it can do?

Scrapy is a Python-based framework that crawls sites and extracts the data.

So, with Scrapy, we can create a code that allows you to collect every ounce of data in a website, download it following a structure you have created, and store it in a .json, .csv or add it to a database. Pretty cool, right?

Why I even want to learn about it?

Web scraping has a wide range of usages if you think about it. The ability to extract the information of a website and save it (and maybe, put it to an use after that) automatizes the work you should be doing and frees your busy schedule so you can work more (or rest more).

Let’s see some use cases:

Your company told you to get all the prices of a rival shoe shop to, according to their prices, set the prices of your shoes. They have hundreds of shoes and doing that takes you 2 or 3 days, and you have to do this every 2 weeks.

But you, wisely, did my tutorial and the work of 3 days is done by your program in two minutes, and stores all the information in a excel file. This would please your employers (and maybe if you feel cheeky you can ask for a rise)

Or maybe you like sports and you want to keep track of your national football and basketball leagues, with all their divisions.

Why no create a Scrapy program that will do the heavy lifting for you, storing all the data (every match result, current team goals, cards, etc) in a mongoDB database?

Seems cool, right? All you need to start is just click here