One problem related to performance is related to where to store a large database. You cannot store it in your home directory due to its size being too big (and accesses to it would be too slow here as well).
There are a couple other options for storing files on our system, not all are equal:
Although /scratch sounds great, DO NOT use /scratch for this lab. Here is the problem: scratch is network space, and trying to create a database requires frequent fetching/retrieving of a large amount of data over the network. This is slow.
Here are some suggestions (I suggest doing #4 and either #1 or #2 depending on which part of the lab you are currently working):
$ python createDB.py /local/me_and_pal/movie.dbFirst you should create a subdirectory and set acls for you and your partner to access it:
mkdir /local/me_and_pal easyfacl # and follow prompts to enter uer names and directory namesome information about acls and permisions
PROS: this reduces run time about 20-fold (33 minutes down to 90
seconds for my python program)
CON #1: local is the hard drive for a particular machine. If you log in to
a different machine, you can't get the data. The work around is to move
the file after creation which takes a few seconds. While annoying, you only need
to do this once after you get createDB.py working:
$ mv /local/me_and_pal/movie.db /scratch/me/movie.dbYou can also use scp to copy from one machine's /local to another:
# from cumin, cp movie.db in paprika's /local into cumin's /local [cumin] $ scp newhall@paprika:/local/me_and_pal/movie.db/local/me_and_pal/ .
CON #2: if you are debugging you may accidentally leave some big files all over machines in the CS department. Be sure to clean up the /local disk if your file creation doesn't finish completely
connection = sqlite3.connect(":memory:")