CS21: Lab 8

Run update21, to create the cs21/labs/08 directory (and some additional files). Then cd into your cs21/labs/08 directory and create the python programs for lab 8 in this directory

Overview

For this lab, you will write a program to explore a collection of Twitter feeds. That program, which we will put in a file called twitex.py, will read a prepared database of tweets from a text file. It will then give the user the opportunity to perform sorting and filtering operations on those tweets and then display them.

Here's an example of how the program might run. The orange, italic text below is an annotation and is not part of the program output.

$ python twitex.py
Welcome to TwitEx!

Valid commands in this program are: 
  [s]ort
  [f]ilter
  [d]isplay
  [q]uit
You can use a command by typing its first letter.

What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? f
Filter by what field ([n]ame,[s]creenname,[c]ontent)? s
Filter using what string? @swarthmore
Filtering complete; there are 137 tweet(s) that match that filter.
What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? s
Sort by what field ([s]creenname,[n]ame,[t]ime,[r]etweets)? t
Sorted tweets by time.
What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? d
How many? 3
Swarthmore College (@swarthmore) at 2015-11-06 14:01:04 EST (0 retweets):
    Will Indonesia continue as a highly multilingual society or move toward monolingualism? Linguist Abby Cohn lectures at 4:15pm in Trotter 203

Swarthmore College (@swarthmore) at 2015-11-06 13:32:45 EST (0 retweets):
    Garnet Athletics Set to Battle for Conference Titles - https://t.co/9UEjgNWacQ https://t.co/1EkXDxPlrQ

Swarthmore College (@swarthmore) at 2015-11-06 11:30:30 EST (0 retweets):
    Way-Ting Chen '94 of Blue Garnet hosts a workshop on developing your social impact formula today at 1 pm in the Lang Center.

(Notice that the above tweets are sorted in order of time and are all from @swarthmore.)
What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? f
Filter by what field ([n]ame,[s]creenname,[c]ontent)? c
Filter using what string? Halloween
Filtering complete; there are 2 tweet(s) that match that filter.
What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? d
How many? 2
Swarthmore College (@swarthmore) at 2015-10-30 14:16:51 EDT (1 retweets):
    Happening now: @swatlibrary is hosting Halloween arts and crafts in McCabe Atrium #SwatStudentLife #Swatoberfest https://t.co/HHbNwxDlRe

Swarthmore College (@swarthmore) at 2015-10-29 15:30:31 EDT (0 retweets):
    Follow the Pumpkin trail from Parrish Hall to celebrate Halloween with the dean! 4-7 pm #swatstudentlife

(These tweets are also from @swarthmore; filtering again further reduces the list of tweets, so these are all @swarthmore tweets containing "Halloween".)
What now ([s]ort,[f]ilter,[d]isplay,[q]uit)? q
Goodbye!

Here are some more examples of execution.

Data File

You have been provided a data file containing some pre-fetched tweets; that file is named /usr/local/doc/twitter-data.txt. Here are a select few lines of that file:

science|||@science|||2015-09-29 23:13:05 EDT|||21|||Hubble revisits the beautiful Veil Nebula http://t.co/4HyactBrPA
Wikipedia|||@Wikipedia|||2015-10-14 11:44:08 EDT|||0|||@Wikisteff Thanks, Steffen!
@RioGrandeGames|||@RioGrandeGames|||2014-01-17 21:58:37 EST|||1|||I am absolutely in love with Dominion.
NAACP|||@NAACP|||2015-11-02 09:21:52 EST|||23|||.@POTUS bans the box https://t.co/Q5TrSfVXbq via @MSNBC
NPR|||@NPR|||2015-11-04 17:04:59 EST|||91|||What happens when physicists say #YOLO. https://t.co/YInJz5PJ3U

The file contains almost four thousand tweets.

More precisely, the data file is organized in the following manner:

Each line represents exactly one tweet.
Each tweet is stored as five pieces of data, separated by the string "|||".
The data for a tweet are, in order:
1. Twitter username
2. Twitter screenname
3. Time of post
4. Number of retweets
5. The content itself

Requirements

Your program must perform the following tasks:

Read all of the tweets from a data file.
Accept commands from the user:
- Filtering commands: You must be able to allow the user to work with just tweets with a particular username, screenname, or content. If the user runs more than one filter, all of the filters should apply. (That is, filtering to screenname "@NPR" and content "festival" should give only posts from @NPR that include the word "festival".) Filtering should be partial; filtering screenname to "@NPR", for instance, would keep tweets from "@NPRFan" and filtering content to "#Downton" would keep all of the tweets using that hashtag.
- Sorting commands: You must be able to sort the list of tweets by username, screenname, time of post, or retweet count. Sorting by username or screenname should be alphabetic. Sorting by time should give the most recent post first. Sorting by retweet count should give the highest number first.
- Printing: Once a user has filtered and sorted the tweets, you should be able to show the tweets that meet those criteria (up to a specified limit).
- Usability: The user should be able to quit any time he or she could enter e.g. a printing command.
Handle bad input. Your program should *never* crash with an exception, no matter what text the user types.
Do its own sorting. You may not use the list.sort() method.
Do the sorting, filtering, and printing in functions other than main().

Getting Started

Here are some tips to get your started.

Do some design. Although you are not required to submit a full top-down design of your program, you should still use what you have learned from the previous lab to help you organize your thoughts. Identify functions that you think you might need in order to perform these tasks. Write down their purposes, the parameters they need, and the return values they will produce. Also document whether they will modify their list parameters (like e.g. the shuffle function from the random library does).

Be clever if it makes sense to you. Using what you learned from class, it's possible to complete this assignment using one sorting function and one filtering function. You can also just write a different sorting function for each sorting task you need to perform; it's quite a bit more work, but it's less complex.

To sort in reverse, use your sorting function followed by list.reverse(). You're required to sort e.g. time in descending order, but you can do that by first sorting time the regular way and then calling list.reverse() to flip the list around.

Use str.split(...) to read each line. For example, consider:

>>> s = 'Big Ben|||@big_ben_clock|||2015-10-29 08:00:02 EDT|||4|||BONG'
>>> lst = s.split('|||')
>>> lst
['Big Ben', '@big_ben_clock', '2015-10-29 08:00:02 EDT','4','BONG']

Keep your data in a convenient format within your program. When you read the data file at the start of the program, it will be helpful to store the tweets as a list of lists. For instance, your list might look like this: [ ["NAACP", "@NAACP", "2015-10-19 14:30:57 EDT", 4, ...], ["Team Fortress 2", "@TeamFortress", "2011-12-21 20:20:29 EST", 6, ...], ...] Notice that we store the dates as strings but we store the retweet counts as integers. This is helpful because, when we sort,

We want to sort the retweet count as numbers, not strings.
We can sort the dates as strings. (We have chosen a date format in which an alphabetically greater string is also a chronologically greater string; that is, if time1 < time2, then time1 is before time2.

It's okay to handle the commands in main(). You may wish to handle input in its own function or you may wish to deal with it in main(). It's okay if your main() method handles the input; just make sure you write functions for your sorting, filtering, and so on.

Don't worry much about speed. You can use O(n²) sorting algorithms like bubble sort or selection sort on this data.

Write tests! You will write fairly complex functions which sort or filter lists of lists in different ways. It's hard to be sure you know what's happening in all of them; it's easier to be confident in your code if you've written some tests. For instance, the following might be a good test for a sorting function:

def test_my_name_sort():
  lst = [["B","@B","TIME",0,"Post"],["A","@A","TIME",0,"Post"],["C","@C","TIME",0,"Post"]]
  sort_by_name(lst)
  expected = [["A","@A","TIME",0,"Post"],["B","@B","TIME",0,"Post"],["C","@C","TIME",0,"Post"]]
  # TODO: now verify that lst == expected.  (Your approach may vary depending on instructor.)

A Bit of Fun

This section is some optional entertainment and is not necessary for the lab.

We have written a program to help you customize your Lab 08 experience by including data from Twitter accounts that you might be interested in. You may start by copying the Twitter data from its system-wide location to your own directory like so:

cp /usr/local/doc/twitter-data.txt ./twitter-data.txt

Once you have done this, you may add to that file by running our data file updating program like so:

python /usr/local/lib/twitter/twitter_getter.py ProBirdRights

In the above example, we are adding tweets from the user with the screenname @ProBirdRights. (Note that we left off the @ above.) This program will add tweets to the file in your current folder; make sure you're in your lab directory before you run it.

After you are done, you will (of course) need to make sure your program reads tweets from ./twitter-data.txt rather than /usr/local/doc/twitter-data.txt in order to see your new tweets.

Please note: do not overuse this program. The Twitter API will only support a few hundred calls per hour from an account and this program uses the credentials for a single Twitter account to do its work. These operations are shared among all CS21 students. For your own purposes, please only run the script a couple times to add some Twitter feeds you know about; do not try to create a massive file of several hundred Twitter accounts. If you receive errors when running this script, don't worry; you don't need to use it at all to complete this assignment.

Submit

Once you are satisfied with your program, hand it in by typing handin21 in a terminal window.

CS21 Lab 8: Twitter Explorer