Scraping a random link

What is Mitchell’s program on p. 34 doing? (It is 1-getWikiLinks.py in her Chapter 3 files.)

  • Seeds a random number (using datetime)
  • Gets all the local links from the Kevin Bacon Wikipedia page
  • Stores them in the variable named links
  • Then this loop:
    while len(links) > 0:
       newArticle = links[random.randint(0, len(links)-1)].attrs["href"]
       print(newArticle)
       links = getLinks(newArticle)
    
  • That while-loop randomly selects one href attribute in the array named links
  • It prints only the new link
  • It gets all the local links from the NEW Wikipedia page and ADDS THEM to the array links
  • It loops back and does this again until there are no links left — which might take a VERY LONG TIME

So if the links were written to a file, one per line, the file would keep getting longer and longer.

Note that even with a seed, these sequences are pseudo random, not the same as truly random, because if you start with the same seed, the exact sequence will be repeated.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s