Lab 4: Tips for Performance
This page may be updated with more tips as we go, so refer back to this page.
One problem related to performance is related to where to store a large
database. You cannot store it in your home directory due to its size
being too big (and accesses to it would be too slow here as well). You may have
used /scratch/
in other courses, but reading large files over the network is slow.
Here are some suggestions (I recommend doing #4 and #2. Once you have created your db, switch to strategy #1):
-
Save to local space (some information about using /local and /scratch):
$ python createDB.py /local/me_and_pal/movie.db
First you should create a subdirectory and set acls for you and your partner to access it:
mkdir /local/me_and_pal easyfacl # and follow prompts to enter user names and directory name
some information about acls and permisions
PRO: this reduces run time about 20-fold (33 minutes down to 90 seconds for my python program)
CON #1: local is the hard drive for a particular machine. If you log in to a different machine, you can’t get to the data. The work around is to remember your machine and ssh to it every time.
Or, move the file after creation which takes a few seconds. While annoying, you only need to do this once after you getcreateDB.py
working:$ mv /local/me_and_pal/movie.db /scratch/me/movie.db
You can also use scp to copy from one machine’s
/local
to another:# from cumin, cp movie.db in paprika's /local into cumin's /local [cumin] $ scp soni@paprika:/local/me_and_pal/movie.db/ /local/me_and_pal/ /local/me_and_pal/
CON #2: if you are debugging you may accidentally leave some big files all over machines in the CS department. Be sure to clean up the
/local
disk if your file creation doesn’t finish completely -
Use an in-memory database just for debugging, then write to disk once you have
createDB.py
working. This gives the same speed-ups as #1 without leaving files on a bunch of disks. To use in memory, create a DB connection as follows:connection = sqlite3.connect(":memory:")
- If you really want to get fancy, look into using transactions
(changed isolation level to
"DEFERRED"
, wrap"BEGIN TRANSACTION"
and acommit()
statement around your inserts). Also, execute"PRAGMA synchronous=OFF"
to reduce the concurrency frequency. These gave me 2x speed-up. - insert values for a table using
executeMany
. There are examples of this on the links provided on the lab write-up.