First and foremost, hold on to the Mitchell book. It is a great resource for any scraping project you want to do in the future. Use the index before you go to Google.
We read chapters 1–6 before spring break. The rest of the chapters are more specialized. It doesn’t mean they’re harder — they just refer to things that go beyond basic everyday scraping (like how to scrape a site that requires you to log in; see page 142).
My Google slide decks based on Mitchell:
- Web scraping tools (setup)
- Web scraping (Mitchell, chapter 2)
- Intro to regex — the Kevin Bacon files (Mitchell, chapter 3)
- APIs and storing files while scraping (Mitchell, chapters 4 and 5)
- No slides for chapter 6, “Reading Documents” (about scraping files other than web pages, such as PDFs)
Other resources I provided for scraping are linked on the Course Schedule page under weeks 6, 7 and 8.
The one additional resource from me (not linked under those three weeks) is My first homemade Web scraper, which, as you know, is linked to an extensive GitHub repo. The CSV code is covered on pages 9 and 10. Remember that the code shown on page 10 is partial, although it includes all of the CSV parts.
Here is a different personal API project I did: Random Quote Machine.