I wanted to find a real-life list I could use to reinforce the last thing I told you about regex. Here is a screen capture from the Pythex regex editor (click for full-size image):
The full data set would be a list of all the basketball players in the NBA that I scraped from somewhere. In the “test string,” I only pasted in nine lines to serve as my test data.
Also, I clicked MULTILINE — very important when you want the regex string to bring back every line that matches your criteria.
Like in class, I want to get only the point guards (indicated by PG). I want to get the complete line for each point guard, so I must make sure the green highlights the entire line.
My regex string:
^ and ending with
$ ensures that I’ll get the complete line.
(.)* means any characters, and any number of characters, except a newline. It is in my string twice — at the beginning, and at the end.
(, PG) means I want those exact four characters, together, in order, to be in the line. Yes, a space is a character. If any line has more than one space between the comma and PG, I won’t get that line.
The green highlighting tells me my regex is good: It has all the point guards and no one else.
Links to Python regex resources are on the Course Schedule under Week 7.