Make sure all programs are saved to your cs21/labs/09
directory. Files outside this directory will not be graded.
$ update21
$ cd ~/cs21/labs/09
This is a two part assignment:
Sort Visualization - In visualize_sort.py
, you will write a sorting algorithm with the goal of visualizing how the algorithm sorts a list of numbers.
Analyzing YouTube Data - In youtube.py
, you will write a program that analyzes data about YouTube channels and prints out channels that have the most subscribers, uploads, and total video views.
Note: Feel free to use whatever sorting algorithm you want for each part of the assignment; however, you must use both SelectionSort
and BubbleSort
at least once each.
Part 1 of this assignment is more open-ended than our typical lab programs. Your task is to write a program, visual_sort.py
, that displays a visualization of how one particular sorting algorithm works. This visualization can work by printing strings to the terminal, or — for the extra challenge — by creating an animation using Zelle graphics. Your program should allow its users to see how a particular sorting algorithm changes a list from unsorted to sorted, one swap at a time.
main()
.Here are three examples of visualizing selection sort:
We would like to see what you can come up with. Note that the graphics example is not the expectation - this is more of a challenge/extension if you are interested. The goal is to help your users understand the inner workings of your chosen algorithm, however you see fit. Along the way, hopefully you will also deepen your own understanding.
TIP: it is a good idea to pause your program after each major step. In the graphics example above, you can use getMouse()
or getKey()
to make the program wait until the user is ready to proceed. For the command-line versions, you can use input()
e.g.,
#output some result
input("Hit Enter to Continue...")
#resume execution of program
It doesn't matter what the user inputs, so we did not bother to save the value in a variable.
In the second part of this lab, you will write a program to analyze YouTube Data. YouTube is a video-sharing website where people can upload videos they created and/or view others' videos. Over 400 hours of content are uploaded to YouTube each minute, and each day one billion hours of content are watched. Video producers make make money by having YouTube run advertisements with the videos they produce on their channel. The more subscribers a channel has and the more times consumers view a channel's videos, the more money the videos will earn.
Your task is to write a program to read in a file of data about YouTube channels and print out the top channels in terms of video uploads, total subscribers, and total views. Here is a snippet of a data file containing information about YouTube Channels:
Channel name,Video Uploads,Subscribers,Video views
10-Minute Crafts,180,346691,114871583
10-Minutes Amazing Life,193,170335,63251474
1000virtudes,59,770249,170131284
18th Asian Games 2018,306,598154,104006040
1Kilo Oficial,94,4795224,887547430
1MILLION Dance Studio,1202,11058659,2616585255
1theK (원더케이),12942,12918410,10657097331
20sarasa(にーさら),475,1525288,1225995975
20th Century Fox,1847,3113635,1773496880
...
Note: the first line is a "header line" and not actual information about a YouTube channel.
Each line of the YouTube channel file contains information about a different YouTube channel. For each channel, there are four pieces of information as follows:
Channel name
: the name of the YouTube channel.Video Uploads
: the total number of YouTube video uploads from this channel.Subscribers
: how many subscribers the YouTube channel has.Video views
: The total number of times a video from this channel has been viewed.The ytchannel
library contains a class for managing the YouTube channel data:
from ytchannel import *
This library contains a single Channel
class which encapsulates the information we know about a YouTube channel. You can create a new Channel
object by providing a channel name, number of video uploads, number of subscribers, and number of video views. It is expected that the channel name is a string
and the number of uploads, number of subscribers, and number of views are all integers. The following Channel
methods allow you to access the information of a single Channel
object:
getName()
return name of YouTube channelgetSubscribers()
return number of subscribers for this YouTube channel.getUploads()
return number of video uploads for this YouTube channel.getViews()
return number of video views for this YouTube channel.As an example, here is a snippet of code that creates a Channel object and calls its methods:
>>> ch = Channel("Gritty Fans", 4000, 89000, 5235262346436)
>>> print(ch.getName())
Gritty Fans
>>> print(ch.getSubscribers())
89000
>>> print(ch.getUploads())
4000
>>> print(ch.getViews())
5235262346436
>>>
Channel
object, and storing these objects in a list./usr/local/doc/youtube/ytDataLarge.csv
. However, you might want to test your code on smaller files: /usr/local/doc/youtube/ytDataSmall.csv
or /usr/local/doc/youtube/ytDataMed.csv
.Some sample output lies below.
$ python3 youtube.py
Welcome to my YouTube Channel program!
Enter the name of the YouTube data file: /usr/local/doc/youtube/ytDataMed.csv
How many top channels do you want to see? 6
Top channels by number of uploads:
channel uploads
AP Archive 422326
Various Artists - Topic 207072
Various Artists - Topic 203934
AlHayah TV Network 129941
Ennahar tv 121387
ABP NEWS 109223
Top channels by number of subscribers:
channel subscribers
Canal KondZilla 39409726
EminemMusic 30470865
JuegaGerman 28889480
EminemVEVO 26650488
VEGETTA777 23775389
VanossGaming 23590547
Top channels by number of views:
channel views
Canal KondZilla 19291034467
ABS-CBN Entertainment 17202609850
EminemVEVO 11317532576
Maroon5VEVO 10355362290
Markiplier 10053970560
VanossGaming 9880562011
For each of your sorting algorithms (sort by number of uploads,views, subscribers), you must sort the channel objects in descending order. Think about what comparisons will you need to make to sort Channel objects by e.g., number of subscribers in descending order. What about sorting by number of video views in descending order?
Now matter how you need to compare two items, the general structure of the sorting algorithm should be the same.
Unlike some of the recent lab assignments, the data file includes a single "header" line that is not an actual Channel
object. You'll need to create a Channel
object for every line of text that is not the first line. There are a couple of ways to do this; perhaps the most straightforward is to use a Boolean variable to keep track of whether you've processed the first line of text, and to create a Channel
object only after processing the first line.
Once you are satisfied with your programs, fill out the questionnaire in QUESTIONS-09.txt
. Then run handin21
a final time to make sure we have access to the most recent versions of your file.
The YouTube dataset comes from a publicly available data set Top 5000 YouTube Channels Data
hosted on the data science website Kaggle.