CS44: Lab 6

This page may be updated with more tips as we go, so refer back to this page.

One problem related to performance is related to where to store a large database. You cannot store it in your home directory due to its size being too big (and accesses to it would be too slow here as well).

There are a couple other options for storing files on our system, not all are equal:

Store it in a subdirectory in /local/. /local is a disk partition that is local to a particular machine. You can only access it on the particular machine; each machine has its own /local file system and contents.
Store it in a subdirectory in /scratch/. /scratch is an NFS (Network File System) partition that his hosted on the CS network file server. /scratch can be accessed from any lab machine; there is a single /scratch/ partition that is shared by all machines on our network, thus its contents are the same across all machines.
Although /scratch sounds great, DO NOT use /scratch for this lab. Here is the problem: scratch is network space, and trying to create a database requires frequent fetching/retrieving of a large amount of data over the network. This is slow.

Here are some suggestions (I suggest doing #4 and either #1 or #2 depending on which part of the lab you are currently working):

Save to local space (some information about using /local and /scratch):
```
$ python createDB.py /local/me_and_pal/movie.db
```
First you should create a subdirectory and set acls for you and your partner to access it:
```
mkdir /local/me_and_pal
easyfacl   # and follow prompts to enter uer names and directory name
```
some information about acls and permisions
PROS: this reduces run time about 20-fold (33 minutes down to 90 seconds for my python program)
CON #1: local is the hard drive for a particular machine. If you log in to a different machine, you can't get the data. The work around is to move the file after creation which takes a few seconds. While annoying, you only need to do this once after you get createDB.py working:
```
$ mv /local/me_and_pal/movie.db  /scratch/me/movie.db
```
You can also use scp to copy from one machine's /local to another:
```
# from cumin, cp movie.db in paprika's /local into cumin's /local
[cumin] $ scp newhall@paprika:/local/me_and_pal/movie.db/local/me_and_pal/ .
```
CON #2: if you are debugging you may accidentally leave some big files all over machines in the CS department. Be sure to clean up the /local disk if your file creation doesn't finish completely

Use an in-memory database just for debugging, then write to disk once you have createDB.py working. This gives the same speed-ups as #1 without leaving files on a bunch of disks. To use in memory, create a DB connection as follows:
```
connection = sqlite3.connect(":memory:")
```

If you really want to get fancy, look into using transactions (changed isolation level to "DEFERRED", wrap "BEGIN TRANSACTION" and a commit() statement around your inserts). Also, execute "PRAGMA synchronous=OFF" to reduce the concurrency frequency. These gave me 2x speed-up.

insert values for a table using executeMany. There are examples of this on the links provided on the lab write-up.

CS44 Lab 6: Tips for Performance