Recap of scraping resources

First and foremost, hold on to the Mitchell book. It is a great resource for any scraping project you want to do in the future. Use the index before you go to Google.

We read chapters 1–6 before spring break. The rest of the chapters are more specialized. It doesn’t mean they’re harder — they just refer to things that go beyond basic everyday scraping (like how to scrape a site that requires you to log in; see page 142).

My Google slide decks based on Mitchell:

Other resources I provided for scraping are linked on the Course Schedule page under weeks 6, 7 and 8.

The one additional resource from me (not linked under those three weeks) is My first homemade Web scraper, which, as you know, is linked to an extensive GitHub repo. The CSV code is covered on pages 9 and 10. Remember that the code shown on page 10 is partial, although it includes all of the CSV parts.

API examples

My Wikipedia viewer code is here (an example of an API project). The live version is here.

Here is a different personal API project I did: Random Quote Machine.

My best tip for finding APIs to use is just search with Google and include “api” in your search. That’s how I found the APIs for The New York Times and for Google.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s