Recap from class (week 6)

We reviewed the ideas about web scraping covered in the two articles by Canadian journalist Nael Shiab, including:

  • Why do we scrape?
  • What kinds of data do we seek?
  • Examples of what Shiab has scraped, as a journalist
  • Ethics questions about scraping
  • What kinds of sites should we not scrape?

We then installed the BeautifulSoup library in a new Python3 virtualenv and tested it, using commands from Mitchell’s chapters 1 and 2. Don’t forget to use Mitchell’s updated code from her repo instead of the code in her book.

We used the web-scraping section of my python-beginners repo.

We reviewed the basics of writing and running Python3 functions, covered in Sweigart’s chapter 3. This was quick and short, so please refer to the week02 section of my python-beginners repo. In particular, you should examine the chapter outline there and the slide deck, which is linked below the outline. You will be writing your own functions when you write your own web scraper (Assignment 9).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.