This week's lab is broken into two segments: a set of problems you will answer, followed by a programming assignment to implement the local pairwise alignment algorithm from class.
The goal of this week's lab is to obtain an understanding of dynamic programming. You will devise solutions to general problems and analyze the efficiency gains dynamic programming provides over recursive solutions. The second goal is learn how to implement a pairwise sequence alignment algorithm using the dynamic programming algorithm discussed in class.
You may hand in your problem set manually or through handin68. Hard copies must be turned in the mail slot outside my office door. Solutions to the programming assignment should be handed in using handin68. All solutions, whether electronic or hard copy, are due Wednesday, February 13 by midnight. Your problem set is to be done individually. For the programming assignment, you make work in pairs.
This part of the lab should be completed individually
Answer the following questions, which should be handed in either electronically or on paper. If handing in a hard copy, ensure that it is in the mail slot outside my office door by the assignment deadline.
Coin Problem: The minimum coin change problem is as follows: given the denomination of coins for a currency and a particular amount of cents for change, produce the minimum number of coins necessary to make exact change. For example, if a customer pays a bill and expects 77 cents in change, the teller's task is to produce 77 cents using as few coins as possible.
For US currency, there is a known greedy solution that is correct. For an abitrary set of denominations, however, the greedy solution is not optimal. For this problem, assume a simple scenario where there are three coins in the currency: 1, 3, and 7 cents. Note that for this problem, you are trying to determine the number of coins in the optimal solution, not the actual coins needed.
Implement a program that uses dynamic programming to compute a pairwise local alignment using an affine gap penalty function. The program should work on protein sequences (you can generalize it to DNA sequences as well, but this is not required).
You may work with one individual on this assignment. Please ensure that both partner's names are at the top of each file submitted and that only one individual in the pair submit the assignment.
You are allowed to complete your assignment in either C++ or Python, whichever you are more comfortable with. In either case, however, the execution style should meet the requirements below to making testing easier. If handing in C++ files, ensure that you submit an executable named alignment alongside your code. For Python, your main program should be in alignment.py. You should make this file executable (see the instructions in Tips and Hints).
Your program should take in 4 command-line arguments in this specific order:
To report your results, output the maximum alignment score followed by the local alignment. For example, a sample run with the simple example and a affine penalty of h = -10, g = -1 would look like:
$ ./alignment.py wisc/example blosum-62 -10 -1 Alignment Score: 30 Alignment: MQR---GGDEQ MQRGGGGGDEQYou can/should have extra output byt those results should appear at the end of the file. For debugging purposes, I have provided the final value of all three matrices for this example (-8888 refers to negative infinity) in addition to the results for the more complex examples:
from sys import *You can now access a list argv containing all of the arguments. Recall that for both languages, argv[0] is the name of the program.
#! /usr/bin/env pythonNext, make your file executable in unix:
$ chmod +x alignment.pyThen, when you execute, you can simply call the program directly without invoking python:
$ ./alignment.py pair blosum -12 -4
Another technique is to use a two-dimensional dictionary. Each row in the file will translate into a dictionary. They key is the character at the beginning of the line. The value it maps to is itself a dictionary containing all of the (column,value) pairs for that row. If I do this in python for just the first two rows and columns and print the resulting dictionary of dictionaries, I get the following value:
{'A': {'A': 4, 'R': -1}, 'R': {'A': -1, 'R': 5}}The first pair in the dictionary has a key 'A' and value which itself is a dictionary {'A': 4, 'R': -1}. If I want the substitution value for replacing an A with an R, I simple call:
>>> print s['A']['R'] -1where s is the dictionary structure I stored the values in. It's a bit more complicated to think about, but easier to use since you will not need to map the amino acids to integers.
A third approach is to stick with the two-dimensional array, but place it in a class SubstitutionMatrix such that you can create an object and call a getScore method that does all of the mapping for you. Either technique is acceptable, but your code should be readable and well designed. If using C++, either of the two- dimensional array options are probably easier to implement.
Once you are satisfied with your program, hand it in by typing handin68 at the unix prompt.
You may run handin68 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.