Recap from class (week 6)

We reviewed the ideas about web scraping covered in the two articles by Canadian journalist Nael Shiab, including:

  • Why do we scrape?
  • What kinds of data do we seek?
  • Examples of what Shiab has scraped, as a journalist
  • Ethics questions about scraping
  • What kinds of sites should we not scrape?

We then installed the BeautifulSoup library in a new Python3 virtualenv and tested it, using commands from Mitchell’s chapters 1 and 2. Don’t forget to use Mitchell’s updated code from her repo instead of the code in her book.

We used the web-scraping section of my python-beginners repo.

We reviewed the basics of writing and running Python3 functions, covered in Sweigart’s chapter 3. This was quick and short, so please refer to the week02 section of my python-beginners repo. In particular, you should examine the chapter outline there and the slide deck, which is linked below the outline. You will be writing your own functions when you write your own web scraper (Assignment 9).

Advertisements

Recap from class (week 5)

Yesterday in class, we did the following:

  • Reviewed the bash CLI commands, especially those used for navigating into and out of directories (folders).
  • Went over these Python programs and review materials, based on Sweigart’s chapters 1 and 2.
  • Listened to Mary-Lou’s and Danny’s presentations about cool web apps. They will post their resources and links for us in the designated Discussion in Canvas.
  • Installed virtualenv for Python.
  • Created a new virtualenv and activated it.
  • Installed Jupyter Notebook in that virtualenv.
  • Launched Jupyter Notebook (at the bash command).
  • Previewed Assignment 3 in the Jupyter Notebook browser app.
  • Saved and quit  Jupyter Notebook.
  • Deactivated the virtualenv.

We used this Google doc to do the installations. Commands can be copy/pasted from there. You will not need that document in the future, because now everything is installed.

You may want to refer to this Jupyter Notebook cheat sheet in the future as you complete Python assignments based on the Sweigart book.

Python: Do not use IDLE

Look — a new Python 3 resource for you.

In the middle of chapter 1 in Automate the Boring Stuff with Python, Sweigart invites you to leave the interactive Python shell (where you have the >>> prompt) and create a little program in a file.

Do not use IDLE. We are never using IDLE.

Instead, you can use Atom, our trusted code editor. Code is code. You can write any code in Atom. Instead of saving the file with a .html or .js extension, we save it with a .py extension when it is a Python file.

Then, how do you run it? Never try to run a file at the >>> prompt. You need to be at the bash prompt ($) in Terminal, or the PowerShell prompt if you’re using Windows.

It’s easiest if you are in the same directory where the .py file was saved. Use your cd command (Command Line Tips) to get there. (I made a folder named automate for my files from the book.)

At the bash prompt, type this (using your actual filename, of course)

python3 myfilename.py

And it runs!

Here’s where that comes in the chapter:

Screen Shot 2018-02-02 at 6.38.49 PM

Here’s what the program looks like in Atom:

Screen Shot 2018-02-02 at 6.26.11 PM

Here’s how I ran it in Terminal:

Screen Shot 2018-02-02 at 6.44.01 PM

Recap from class (week 4)

This week, you’re writing your first database app from scratch.

It’s not totally from scratch, though — you have the Sock Market GitHub repo for all the JavaScript and PHP code. You’ve successfully transferred your Shoutbox project from your own laptop to your hosted website, so you know how to do that. Refer to the docs from that assignment.

One new thing is how to export and import your MySQL database. You’ll do this using phpMyAdmin. Instructions are here: Import or Export a Database Table. It’s surprisingly easy.

In Assignment 1 (bottom of page 1), database privileges (permissions) were mentioned. When you’re setting up a new database on your hosted server, it’s good to think carefully about the minimum privileges that the users of this database will need. Refer to that section of Assignment 1 when you set up your new database for Assignment 2. Privileges are also covered in the “Make your SQL safer” section of the Khan Academy SQL course.

The document that explains all the PHP mysqli_ commands: Sockmarket: MySQL and PHP. These are the details about prepared statements, which prevent SQL injection attacks. This is also the document that explains the while loop (see page 8) that you’ll use to get the data from your MySQL table into an HTML table (or other HTML) on a normal web page.

The document that explains the interconnection of all the files in the Sockmarket GitHub repo: Using the Sock Market files. This is like a flowchart showing how the files talk to one another, and in which order.

The last slide deck about SQL (based on Khan Academy) covers UPDATE and DELETE, as well as database transactions. You will not be using transactions in your new database.

Recap from class (week 3)

Yesterday in class we covered these three things:

  • Review of Khan Academy’s “More advanced SQL queries” (slides)
  • Step by step, how jQuery Ajax operates in the Shoutbox project (slides)
  • How to use PHP’s mysqli_ commands and prepared statements to prevent SQL injection attacks (Google doc — we covered pages 1–7)

We also got our first look at the Sock Market database app, which serves as an example for your Assignment 2.

The slides from the previous week cover the first part of Khan Academy’s SQL course, which introduces the essential INSERT INTO and SELECT FROM commands.

Plan to attend NICAR 2018

What is NICAR? It’s the world’s biggest and best gathering of news nerds!

Who’s a news nerd? All the folks who write code and/or analyze data for journalism and storytelling.

NICAR is a four-day conference, and this year it will be in Chicago, March 8–11. The UF College of Journalism and Communications and the Knight Chair will sponsor up to eight students to attend all four days. We have already reserved rooms at the conference hotel. There are tons of hands-on workshops led by professionals. All kinds of skills are covered: basic and advanced Excel, Python and JavaScript programming, D3, R, data visualization, mapping, and more.

Students can join IRE (the parent organization of NICAR) for only $25/year, which gives you access to all kinds of resources and a very helpful Listserv where professional journalists post questions and answers daily about journalism code.

If you would like to apply to attend this amazing conference, please fill and submit this form. Current UF students only!

Updates for SQL + PHP in this course

This course is divided into three parts: databases with PHP, web scraping with Python, and using a web framework, Flask (also Python). I started teaching this course in January 2016. Of course, I had done a ton of research before starting, sitting down with and emailing numerous journalism practitioners of code.

After the second offering of this course concluded in April 2017, I resolved to take time during the summer to revisit, re-evaluate, and revise as needed.

I started with the intro-to-Python materials, which don’t come into the course until Week 5. I very much wanted to replace Learn Python the Hard Way as the base for learning Python. I tried using Head First Python in class, but it was a disaster. It’s just not a good book for beginners. Luckily, I discovered Automate the Boring Stuff with Python, and I couldn’t be happier! I’ll write more about that in a future post.

Much as I would have liked to just work on Python all summer, I had to go back to the first segment, databases with PHP, and get that organized.

Learning objectives

Inspired by a couple of articles by my friend Erika Biga Lee, I wanted to specify what I wanted the students to be able to do as a result of this four-week segment. The use of learning outcomes in teaching is intended to focus the instruction, assignments, and tests or quizzes on what you actually want the students to learn. This means not a big, vague idea like “I want them to be able to make web databases for journalism projects” (although that is the broad, final outcome I want) but rather the individual components that lead to that in the end.

Here are the atomic-level learning objectives for weeks 1–4. Links are included.

SQL

SQL (Structured Query Language) is a language for getting answers from your database. It’s an elegantly simple, rigid language with the power to extract very precise results across multiple tables. Since I’m not teaching data analysis in this course, I really just need the students to understand how to insert a new record and select records based on ID. However, I want them also to get a glimpse of the full power of SQL in case they do go on to need more sophisticated queries in the future. (Check out this 2-minute video.)

Databases

Most of the students have used a spreadsheet, but pretty much no one has seen database structure before. Database architecture is important to all kinds of journalism nowadays, so I want my students to have hands-on experience building a database from scratch. I want them to understand how field names are used as variables to search for and extract data from a table. I want them to be able to organize data in a table. I want them to have a glimpse of multiple tables interacting in one database, but I don’t need them to understand joins.

To make all this less abstract, I want them to have visual access to the database and its tables. I want them to see the grid as columns and rows and recognize how it differs from a spreadsheet.

I’m aware of the whole NoSQL movement, but since SQL queries are so useful in data journalism, I really want the students to get their feet wet with SQL.

XAMPP

My broad aims for SQL and databases led me to XAMPP, which has the advantages of being free and working on Windows as well as Mac OS. It very painlessly enables students to install a fully functional web server on their own laptops, and the server is already configured with a MySQL database server and PHP, a general-purpose scripting language that’s available by default on almost every web server in the world.

PHP

I have no great love for PHP, and in fact, I spent some time this summer trying to come up with a reasonable way to eliminate it from this course. However, in the end, I kept it. In spite of its age, it still offers the easiest way to whip up a script to process a web form (on most web servers) and return feedback to the user. I’ve Googled a lot and decided it’s still relevant. Plus, it’s already set up and ready to roll as soon as you’ve installed XAMPP — which is as clean an install as you could ever hope to find.

HTML and CSS

My students already spent a full semester in a course about front-end web development, but they are challenged in this SQL-PHP-database segment to review and revisit things they probably have not mastered, including properly constructed HTML forms, and HTML tables. At this point, though, they should not need me to hold their hand and make detailed instruction sheets about how to use form and table elements in their markup.

In that previous course, they already set up their own domain at a shared-hosting company, Reclaim Hosting (which is great for students and educators). Since Reclaim servers have the same technology stack as XAMPP, we don’t have much trouble transferring our database apps from our laptops to the live web.

Projects

Students complete two projects in this segment of the course. The first is a shoutbox, for which I give them all the code. They have to set up XAMPP and build one database table. They have to get it working on XAMPP, and then for a grade they have to move it to their own hosted live site.

A big part of what I needed to revise in this part of the course was how to teach about the jQuery Ajax code used in both projects. Last spring, I realized students were really unclear on the role played by this code in the shoutbox project, so I spent a lot of time Googling and thinking about how to fit that in. I ended up spending about two weeks on Progressive Web Apps as a result of that, but in the end I decided to stick with jQuery Ajax for now, and I created a new slide deck to help clarify it.

The second project is a build-from-scratch database app — although I’ve given them all the PHP code they need to build it. It was important to ensure that students understand very well the dangers posed by exposing any SQL database to attacks by malicious hackers. Even though with a journalism project we’re not exposing sensitive data (unlike banks, and any site that stores your credit card information), if you’re careless with your PHP commands for SQL, someone could delete your entire database when they submit a web form.

This led to another major revision of the materials I had developed for this course. As I reviewed them and slowly worked out my atomic-level learning objectives for weeks 1–4, I realized that my GitHub repo included too many files, and my explanations of the code were too long and too mixed up — a result of writing the detailed instructions as I was deciding what to teach and learning new things myself. So I had to thoroughly rewrite two handouts, separating out the explanation of using SQL prepared statements to prevent injection attacks from the explanation of how the files in the app interact with one another (including even more about jQuery Ajax).

You can find all my materials linked in that learning objectives document.

Feel free to copy and use anything here in your own journalism courses. Commercial and for-profit use of my materials is not granted or authorized.

If you know of ways I can improve anything here, please contact me through Twitter (@macloo) or Gmail.

Final projects 2017

The final apps students completed at the end of the semester (in alphabetical order by last name).

TruthRepo: Topical Bible verses and faith-based quotes. Search the MySQL database by topic; add your own quotes or verses to the database. Flask app. (Hailey)

Deported: Data from the top six states for deportations in the United States. View interactive Google Maps within a Flask app. (Gabrielle)

Florida Congressional Representatives: All 27 of the state’s members of the U.S. House of Representatives, with demographic info about their districts — and their PolitiFact Truth-O-Meter scores. Flask app with Python datasets. (Nicole)

Experience Fashion: A multimedia experience with images, sounds, and data about 469 fashion designers, with links to buy their creations. Flask app with Python dataset. (Caroline)

Florida’s Death Row: See and learn about every inmate. Flask app with Python dataset. (Caitlin) Link updated 1/2/2018.

University Explorer: An Alexa Skill enabling you to ask Alexa questions about more than 7,000 U.S. schools. Flask app with Python dataset. (Ryan)

Professionalizing your GitHub repo

Now that you know how most code professionals use GitHub, you probably realize that people are going to look at your GitHub repos when they are thinking about whether to hire you or offer you an internship.

Here’s how to make any GitHub repo more professional:

github

  • Write and add a proper README.md file. Use GitHub’s flavor of Markdown (slight differences from generic Markdown). This is super easy. The purpose of the README is to explain what’s in your repo, what it does, and how to use it. Make sure to use headings intelligently — it makes the document so much more readable.
  • Last but far from least, name your repo appropriately. Short and descriptive is the way to go. Notice that most repo names are all lowercase, with hyphens. The repo name is the same as the folder name on your computer. (If you rename an existing repo on GitHub, your GitHub Desktop app will certainly get confused!)

Database terminology (and Python)

What is an ORM? ORM stands for object-relational mapper, or object-relational mapping tool. It’s a translational layer of code between your script and the database, and it converts the database data into objects your code can use easily. Think of the database as a collection of items that are encoded into a complex structure, like a maze with secret doors and hidden passages. An ORM translates the structure of the maze into a parallel structure that is usable by the language of your script or program — in our case, Python.

SQLAlchemy is an ORM for Python, but it’s not the only one.

Peewee is another ORM for Python.

When using an ORM, you need to know what kind of DBMS (or RDBMS)  you are using. DBMS stands for database management system, and that means the software that runs your database. The database itself is the collection of data. The common SQL DBMSes are SQLite, MySQL and PostgreSQL. Although SQLite is a DBMS, it’s quite different from others in that it is self-contained and does not use a server.

These various DBMSes are not interchangeable, and therefore, you need to know how to make your chosen ORM communicate with the DBMS you are using.

Python ORMs are discussed in detail — but quite clearly — here. That page includes links to many resources.