One last regex example

I wanted to find a real-life list I could use to reinforce the last thing I told you about regex. Here is a screen capture from the Pythex regex editor (click for full-size image):


The full data set would be a list of all the basketball players in the NBA that I scraped from somewhere. In the “test string,” I only pasted in nine lines to serve as my test data.

Also, I clicked MULTILINE — very important when you want the regex string to bring back every line that matches your criteria.

Like in class, I want to get only the point guards (indicated by PG). I want to get the complete line for each point guard, so I must make sure the green highlights the entire line.

My regex string: ^(.)*(, PG)(.)*$

Starting with ^ and ending with $ ensures that I’ll get the complete line.

(.)* means any characters, and any number of characters, except a newline. It is in my string twice — at the beginning, and at the end.

(, PG) means I want those exact four characters, together, in order, to be in the line. Yes, a space is a character. If any line has more than one space between the comma and PG, I won’t get that line.

The green highlighting tells me my regex is good: It has all the point guards and no one else.

Links to Python regex resources are on the Course Schedule under Week 7.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s