Updates for SQL + PHP in this course

This course is divided into three parts: databases with PHP, web scraping with Python, and using a web framework, Flask (also Python). I started teaching this course in January 2016. Of course, I had done a ton of research before starting, sitting down with and emailing numerous journalism practitioners of code.

After the second offering of this course concluded in April 2017, I resolved to take time during the summer to revisit, re-evaluate, and revise as needed.

I started with the intro-to-Python materials, which don’t come into the course until Week 5. I very much wanted to replace Learn Python the Hard Way as the base for learning Python. I tried using Head First Python in class, but it was a disaster. It’s just not a good book for beginners. Luckily, I discovered Automate the Boring Stuff with Python, and I couldn’t be happier! I’ll write more about that in a future post.

Much as I would have liked to just work on Python all summer, I had to go back to the first segment, databases with PHP, and get that organized.

Learning objectives

Inspired by a couple of articles by my friend Erika Biga Lee, I wanted to specify what I wanted the students to be able to do as a result of this four-week segment. The use of learning outcomes in teaching is intended to focus the instruction, assignments, and tests or quizzes on what you actually want the students to learn. This means not a big, vague idea like “I want them to be able to make web databases for journalism projects” (although that is the broad, final outcome I want) but rather the individual components that lead to that in the end.

Here are the atomic-level learning objectives for weeks 1–4. Links are included.

SQL

SQL (Structured Query Language) is a language for getting answers from your database. It’s an elegantly simple, rigid language with the power to extract very precise results across multiple tables. Since I’m not teaching data analysis in this course, I really just need the students to understand how to insert a new record and select records based on ID. However, I want them also to get a glimpse of the full power of SQL in case they do go on to need more sophisticated queries in the future. (Check out this 2-minute video.)

Databases

Most of the students have used a spreadsheet, but pretty much no one has seen database structure before. Database architecture is important to all kinds of journalism nowadays, so I want my students to have hands-on experience building a database from scratch. I want them to understand how field names are used as variables to search for and extract data from a table. I want them to be able to organize data in a table. I want them to have a glimpse of multiple tables interacting in one database, but I don’t need them to understand joins.

To make all this less abstract, I want them to have visual access to the database and its tables. I want them to see the grid as columns and rows and recognize how it differs from a spreadsheet.

I’m aware of the whole NoSQL movement, but since SQL queries are so useful in data journalism, I really want the students to get their feet wet with SQL.

XAMPP

My broad aims for SQL and databases led me to XAMPP, which has the advantages of being free and working on Windows as well as Mac OS. It very painlessly enables students to install a fully functional web server on their own laptops, and the server is already configured with a MySQL database server and PHP, a general-purpose scripting language that’s available by default on almost every web server in the world.

PHP

I have no great love for PHP, and in fact, I spent some time this summer trying to come up with a reasonable way to eliminate it from this course. However, in the end, I kept it. In spite of its age, it still offers the easiest way to whip up a script to process a web form (on most web servers) and return feedback to the user. I’ve Googled a lot and decided it’s still relevant. Plus, it’s already set up and ready to roll as soon as you’ve installed XAMPP — which is as clean an install as you could ever hope to find.

HTML and CSS

My students already spent a full semester in a course about front-end web development, but they are challenged in this SQL-PHP-database segment to review and revisit things they probably have not mastered, including properly constructed HTML forms, and HTML tables. At this point, though, they should not need me to hold their hand and make detailed instruction sheets about how to use form and table elements in their markup.

In that previous course, they already set up their own domain at a shared-hosting company, Reclaim Hosting (which is great for students and educators). Since Reclaim servers have the same technology stack as XAMPP, we don’t have much trouble transferring our database apps from our laptops to the live web.

Projects

Students complete two projects in this segment of the course. The first is a shoutbox, for which I give them all the code. They have to set up XAMPP and build one database table. They have to get it working on XAMPP, and then for a grade they have to move it to their own hosted live site.

A big part of what I needed to revise in this part of the course was how to teach about the jQuery Ajax code used in both projects. Last spring, I realized students were really unclear on the role played by this code in the shoutbox project, so I spent a lot of time Googling and thinking about how to fit that in. I ended up spending about two weeks on Progressive Web Apps as a result of that, but in the end I decided to stick with jQuery Ajax for now, and I created a new slide deck to help clarify it.

The second project is a build-from-scratch database app — although I’ve given them all the PHP code they need to build it. It was important to ensure that students understand very well the dangers posed by exposing any SQL database to attacks by malicious hackers. Even though with a journalism project we’re not exposing sensitive data (unlike banks, and any site that stores your credit card information), if you’re careless with your PHP commands for SQL, someone could delete your entire database when they submit a web form.

This led to another major revision of the materials I had developed for this course. As I reviewed them and slowly worked out my atomic-level learning objectives for weeks 1–4, I realized that my GitHub repo included too many files, and my explanations of the code were too long and too mixed up — a result of writing the detailed instructions as I was deciding what to teach and learning new things myself. So I had to thoroughly rewrite two handouts, separating out the explanation of using SQL prepared statements to prevent injection attacks from the explanation of how the files in the app interact with one another (including even more about jQuery Ajax).

You can find all my materials linked in that learning objectives document.

Feel free to copy and use anything here in your own journalism courses. Commercial and for-profit use of my materials is not granted or authorized.

If you know of ways I can improve anything here, please contact me through Twitter (@macloo) or Gmail.

Advertisements

Final projects 2017

The final apps students completed at the end of the semester (in alphabetical order by last name).

TruthRepo: Topical Bible verses and faith-based quotes. Search the MySQL database by topic; add your own quotes or verses to the database. Flask app. (Hailey)

Deported: Data from the top six states for deportations in the United States. View interactive Google Maps within a Flask app. (Gabrielle)

Florida Congressional Representatives: All 27 of the state’s members of the U.S. House of Representatives, with demographic info about their districts — and their PolitiFact Truth-O-Meter scores. Flask app with Python datasets. (Nicole)

Experience Fashion: A multimedia experience with images, sounds, and data about 469 fashion designers, with links to buy their creations. Flask app with Python dataset. (Caroline)

Florida’s Death Row: See and learn about every inmate. Flask app with Python dataset. (Caitlin)

University Explorer: An Alexa Skill enabling you to ask Alexa questions about more than 7,000 U.S. schools. Flask app with Python dataset. (Ryan)

Professionalizing your GitHub repo

Now that you know how most code professionals use GitHub, you probably realize that people are going to look at your GitHub repos when they are thinking about whether to hire you or offer you an internship.

Here’s how to make any GitHub repo more professional:

github

  • Write and add a proper README.md file. Use GitHub’s flavor of Markdown (slight differences from generic Markdown). This is super easy. The purpose of the README is to explain what’s in your repo, what it does, and how to use it. Make sure to use headings intelligently — it makes the document so much more readable.
  • Last but far from least, name your repo appropriately. Short and descriptive is the way to go. Notice that most repo names are all lowercase, with hyphens. The repo name is the same as the folder name on your computer. (If you rename an existing repo on GitHub, your GitHub Desktop app will certainly get confused!)

Database terminology (and Python)

What is an ORM? ORM stands for object-relational mapper, or object-relational mapping tool. It’s a translational layer of code between your script and the database, and it converts the database data into objects your code can use easily. Think of the database as a collection of items that are encoded into a complex structure, like a maze with secret doors and hidden passages. An ORM translates the structure of the maze into a parallel structure that is usable by the language of your script or program — in our case, Python.

SQLAlchemy is an ORM for Python, but it’s not the only one.

Peewee is another ORM for Python.

When using an ORM, you need to know what kind of DBMS (or RDBMS)  you are using. DBMS stands for database management system, and that means the software that runs your database. The database itself is the collection of data. The common SQL DBMSes are SQLite, MySQL and PostgreSQL. Although SQLite is a DBMS, it’s quite different from others in that it is self-contained and does not use a server.

These various DBMSes are not interchangeable, and therefore, you need to know how to make your chosen ORM communicate with the DBMS you are using.

Python ORMs are discussed in detail — but quite clearly — here. That page includes links to many resources.

Working with SQL databases in Flask

Activate your virtualenv in the folder where you’ve been doing Flask projects.

Install the Flask-SQLAlchemy extension like this (use uppercase as shown):

$ pip3 install Flask-SQLAlchemy

That installs SQLAlchemy as well as Flask-SQLAlchemy. No need for a separate install.

Using a database with Flask

You’ll have to connect your app to an existing SQL database, whether the app reads from the database, writes to the database, or both. Connecting will require your own database username and database password. (You are the owner of the database.)

You can create the SQL database using Python, but that is not required. You might already have a database. Then all you need to worry about is how to connect it.

If you do use Python to create a SQL database (and that’s an “if,” not a necessity), you will only do it once. You don’t create the same database again and again.

Your database may have one table, or more than one table. That depends on what you need, or the structure of the existing SQL database.

Your app might only read from your SQL database. You can write SQL queries to accomplish this — use Flask-SQLAlchemy commands to do so.

Your app might allow people to write to your SQL database. In that case, it is possible you will want people to log in securely.

You might write a Python script to populate your database from the contents of a CSV file. This would be fairly simple if you only need to run it once. If you need to add records repeatedly (say, once per month) to an existing database, you might need to check whether you are duplicating records that are already there. If you need to check for existing records and update them, that’s more challenging.

If people are writing into your database, you will want to give them a form, or forms, for doing so. Remember that Bootstrap and Flask-WTF are your very best friends for that!

Make a checklist for your database project

Figure out what your own app will need to do:

  1. Create a database (optional; only once)
  2. Connect to the database (required)
  3. Read from the database: Query
    • Display all records?
    • Display a subset of records?
    • Search for multiple records?
    • Search for one record at a time?
  4. Write to the database
  5. Allow users to write to the database
    • Insert a new record?
    • Update an existing record?
    • Delete a record?

For all Python and SQL commands, refer to the links listed under “User’s Guide” in the Flask-SQLAlchemy documentation.

Type hints in Python

In the Barry book (Head First Python, 2nd edition), he always includes something known as “type hints” in his code. This is akin to making a note that a particular variable is intended to be a string, an integer, a Boolean value, etc.

Every time Barry specifies a variable type like so:

def greeting(name: str) -> str:
  return 'Hello, ' + name

You can OMIT the boldfaced parts. Those are the markup for type hints, and hardly anyone uses them in Python.

This works perfectly well (and is so much nicer):

def greeting(name):
  return 'Hello, ' + name

The version with no type hints is what you have learned up until now, and there’s nothing wrong with using it.

Type hints are completely unnecessary:

“This PEP introduces a provisional module to provide these standard definitions and tools, along with some conventions for situations where annotations are not available.”

“While these annotations are available at runtime through the usual __annotations__ attribute, no type checking happens at runtime.”

See also this Reddit post for more about how type hints are not needed and also not standard practice in Python.

What journalist developers think about

This is a thoughtful and fascinating article by Tyler Fisher, journalist and web developer at NPR.org: Subverting the Story Model. Here’s an excerpt:

Screen Shot 2017-03-31 at 4.03.49 PM

I may have told you what a huge fan I am of NPR’s live annotation system for speeches and debates — it’s also discussed in the article. It runs off a shared Google doc. So sweet.

What I like most about Fisher’s article is that he’s thinking about how to serve the public better, and that’s something all journalists should do more often.

Summary: Using virtualenv

A few students experienced some confusion over using a virtualenv this week. Let’s review:

“virtualenv is a tool to create isolated Python environments. virtualenv creates a folder which contains all the necessary executables to use the packages that a Python project would need.” (source)

You can think of it as a quarantine space: It is completely sealed off from the rest of your computer. When a virtualenv is activated, and you install things in it, those things are NOT AVAILABLE outside. When you are not IN that virtualenv, you cannot access those things (such as BeautifulSoup).

When we created our two separate, different virtualenvs in class (on different days), we installed different things in the two. So you have ONE virtualenv that is for scraping, and it contains the BeautifulSoup library. You have ANOTHER virtualenv — in another folder — that contains the Flask framework and its many dependencies.

Here is how we set up our two virtualenvs in class:

  1. We created a new folder for that set of projects.
  2. We cd’d into the new folder, using Terminal.
  3. While inside the folder, we entered this long command on one line:

virtualenv --python=/Users/username/Software/Python-3.6.0/mybuild/bin/python3 env

(In place of “username,” you typed your own username for your own laptop.)

That command created a new virtualenv running Python 3 (not 2). The long path in the long command refers to the exact Mac OS location where we installed Python 3. If you’re on Windows, or if you never did a safe Python 3 install on a Mac, you don’t have that. There are instructions for Windows at the end of this document (pages 5–6).

Note that by activating a virtualenv that was set up in this way, you are automatically running Python 3 (not 2) for all the things.

If you had wanted to create a Python 2 (not 3) virtualenv on a Mac, you could instead just type this:

virtualenv env

Note that in both cases, the command creates a new folder named env inside the folder you are currently in. If you wanted it to be named foobar, you would type: virtualenv foobar

For that reason, you must go inside the project folder before you activate the virtualenv with this command (it is different for Windows):

source env/bin/activate

Don’t go inside the env folder.

(If your folder was foobar instead of env, the command would be source foobar/bin/activate)

While the virtualenv is activated, you’ll see (env) at the left side of your Terminal prompt. While you see that, you have access to everything you installed in that virtualenv.

To install new Python packages, libraries, frameworks, etc., for the project, make sure the virtualenv is activated. This is what makes the “quarantine space” work: You are installing all the new things within the enclosed, sequestered space, and they will not affect anything on the rest of your system. That is the beauty of the virtualenv.

When you’re finished working, you can close down the virtualenv with one command:

deactivate

My practice for handing projects, or a set of projects, is to create a new folder and name it something broad, like scraping (for all my web scraping projects). Then I go inside scraping, create a new virtualenv there, named env, and install what I need to do scraping projects — such as BeautifulSoup. For each scraping project, I create a new folder inside scraping and name the new one something that matches the project — such as soccer-players, for example.

When I wanted to start making some Flask projects, I created a new folder — outside scraping, because this is no longer about scraping work — and created a new virtualenv there, named env. I activated the virtualenv and then installed what I need to do Flask projects.

Other methods to create a virtualenv

Other people use their own systems to make virtualenvs. Some people always name theirs venv instead of env. Some people name their virtualenv after their project (for example, soccer-players), which confused me very, very horribly when I was starting to learn about virtualenvs.

I mention this because you will no doubt see tutorials where people use these other systems or methods. They all work — but YOU have to know what YOU’RE doing. YOU have to keep track of what you named your folder, your virtualenv, your project, etc.

In our class, I stick to my one way of creating a new virtualenv because it seems clear and simple to me. I didn’t get the hang of it right away, but now it seems quite natural to me. It’s consistent, and it keeps all my projects well organized.

Demonstration

Screen Shot 2017-03-22 at 10.19.59 AM

Above, I used a series of commands in Terminal to demonstrate that I am running Python 3.6.0 while I am in my virtualenv inside a directory named headfirst. When the virtualenv is deactivated, I am running Python 2.7.10.

Recap of scraping resources

First and foremost, hold on to the Mitchell book. It is a great resource for any scraping project you want to do in the future. Use the index before you go to Google.

We read chapters 1–6 before spring break. The rest of the chapters are more specialized. It doesn’t mean they’re harder — they just refer to things that go beyond basic everyday scraping (like how to scrape a site that requires you to log in; see page 142).

My Google slide decks based on Mitchell:

Other resources I provided for scraping are linked on the Course Schedule page under weeks 6, 7 and 8.

The one additional resource from me (not linked under those three weeks) is My first homemade Web scraper, which, as you know, is linked to an extensive GitHub repo. The CSV code is covered on pages 9 and 10. Remember that the code shown on page 10 is partial, although it includes all of the CSV parts.

API examples

My Wikipedia viewer code is here (an example of an API project). The live version is here.

Here is a different personal API project I did: Random Quote Machine.

My best tip for finding APIs to use is just search with Google and include “api” in your search. That’s how I found the APIs for The New York Times and for Google.