Today we are going to continue working on Lab 5: Movie Database. Be sure to examine MovieDB Introduction for tips on speeding up your program and on helpful tips.
It may be helpful to refer to past information about SQLite:
as well as these resources:
1. Saving Query Output as a Relational Table
For each of the required queries, there exists (at least one) single query solution. However, as computer scientists, we learn that it is good to break a hard problem into multiple parts. For some of the more difficult queries, it may be useful to break the problem into multiple parts and solve the smaller subproblems. One useful strategy is to write multiple queries and use the output of one query as an input to the next (i.e., compose your queries). Below, we show an example of this can be done.
Let’s say we want to satisfy the following query: "List all actors who have been cast in at more than 4 Quentin Tarantino films." Take a minute and think about how you might solve this problem before moving on.
Okay, here is one potential solution where we break the query into two parts. First, we want to identify all of the movies that Quentin Tarantino has directed:
queryPart1 = """
SELECT DM.movieID
FROM DirectsMovie DM, Director D
WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
"""
We join the DirectsMovie
and Director
tables and only keep films where the
director is named "Quentin Tarantino". Now we’d like to use this as the input
to the second part of our solution: "find the actors that were in those movies".
First, we need to save the results from part 1. We can do this using the
CREATE TABLE X AS
command which defines a table based on a query result:
queryPart1 = """
CREATE TABLE TarantinoFilms AS
SELECT DM.movieID
FROM DirectsMovie DM, Director D
WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
"""
db.execute(queryPart1) #at this point, you should check the table in sqlite3
2. Using the Saved Query Output
You should load up sqlite3
and verify that the table exists and the results
look reasonable (here is his IMDB page. Note that our dataset is a bit outdated so newer movies won’t show up
in your query result).
Now, we can use TarantinoFilms
just like any other relation in our database.
Here is how we can find all actors who have been in one of the
Tarantino films from Part 1:
queryPart2 = """
SELECT A.id, A.fname, A.lname, COUNT(*)
FROM TarantinoFilms T, Casts C, Actor A
WHERE T.movieID = C.movieID and A.id = C.actorID
GROUP BY A.id
HAVING COUNT(*)>3
"""
We join Casts
with Actors
and TarantinoFilms
which gives us information
about all Actors who have starred in a Tarantino movie. Then, we add a
GROUP BY
and HAVING
command to group all of the instances for each actor
and only keep actors with more than 3 instances. We can now execute, fetch
results, and print results for this query.
id fname lname COUNT(*) -------------------------------------------------------------------------------- 51688 Michael Bacall 4 82772 Lawrence Bender 5 496647 Samuel L. Jackson 4 647382 Michael Madsen 4 806512 Michael Parks 4 840200 Stevo Polyi 4 1038507 Quentin Tarantino 8 1141844 Bruce Willis 4 1343554 Julie Dreyfus 4 1794091 Uma Thurman 4 1810514 Venessia Valentino 4
3. Cleaning Up Temporary Tables
At this point, you’ve done the hard work. But you will need to drop the temporary table so it doesn’t linger in the database (the DB should be in the same state as it was before you ran the query).
After your query is done, you can run a DROP TABLE
command
db.execute("DROP TABLE TarantinoFilm")
Also, since your query may crash during development, you will want to
preemptively drop any tables so that you don’t have to recreate the whole
database. Add IF EXISTS
to
drop any tables that need to be cleaned up before starting your query:
db.execute("DROP TABLE IF EXISTS TarantinoFilm")
4. Result
Bringing it all together, this is the complete solution:
db.execute("DROP TABLE IF EXISTS TarantinoFilm") # in case your program crashed before cleaning this up last time
queryPart1 = """CREATE TABLE TarantinoFilms AS
SELECT DM.movieID
FROM DirectsMovie DM, Director D
WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
"""
queryPart2 = """
SELECT A.id, A.fname, A.lname, COUNT(*)
FROM TarantinoFilms T, Casts C, Actor A
WHERE T.movieID = C.movieID and A.id = C.actorID
GROUP BY A.id
HAVING COUNT(*)>3
"""
db.execute(queryPart1) #create the table
db.execute(queryPart2) #process the query
# todo, fetch and print results. This will only output results from the most
# recent call to db.execute()
db.execute("DROP TABLE TarantinoFilm") # delete the temporary table