We reviewed the ideas about web scraping covered in the two articles by Canadian journalist Nael Shiab, including:
- Why do we scrape?
- What kinds of data do we seek?
- Examples of what Shiab has scraped, as a journalist
- Ethics questions about scraping
- What kinds of sites should we not scrape?
We then installed the BeautifulSoup library in a new Python3 virtualenv and tested it, using commands from Mitchell’s chapters 1 and 2. Don’t forget to use Mitchell’s updated code from her repo instead of the code in her book.
We reviewed the basics of writing and running Python3 functions, covered in Sweigart’s chapter 3. This was quick and short, so please refer to the week02 section of my python-beginners repo. In particular, you should examine the chapter outline there and the slide deck, which is linked below the outline. You will be writing your own functions when you write your own web scraper (Assignment 9).