Overview of Requirements
- A paragraph abstract emailed to me by Monday, April 29, at noon.
Your abstract should state what topic you would like to cover and what aspect
of the topic you will go into detail on (see below).
- An (approximately) 5 page paper discussing your topic due the last day of final exams (May 18). Please email me a PDF of your paper as soon as
it is complete.
This paper is to be done individually. But you are welcome and encouraged to discuss
ideas with anyone in the class, and team up to
and provide feedback on your reports. While the due date is late, you should treat
this more as a 1-2 week assignment. The 5 page suggestion is assuming a generous
double-spacing and margins. I provide that simply as a marker to ensure you do
not write too much. I do not want you to spend more time than necessary and I also
do not want to read 10 pages for 30 students. That being said, be sure
to read the requirements below to understand what components you should place
in your paper.
Introduction
Throughout the course, we we have seen several algorithms and structures for
organizing and analyzing biological data. With each, you have built the foundations
of your computational knowledge base that will aid you as you continue to
mature as scientists.
The constraints of a 14-week course limited the number of
bioinformatics topics covered as well as the number of algorithms proposed to solve each
problem. The central goal of this course, however, was not to
memorize every bioinformatics algorithm. Rather, it was to develop the
skills of algorithmic analysis and design that are a prerequisite to becoming a skilled
computer scientist.
For your last assignment, you will write a medium-length paper reviewing a topic of research
and study not discussed in class. There are dozens of potential areas of interest as well
as various types of reports you can prepare. Below, I outline a few options as well as the
expectations of your reports. You can pick from any topic related to the course (see below), including extensions to topics covered in class (e.g., probabilistic methods for tree inference). Your paper can have one of two styles:
a review of a field of study or a review of a specific approach to a problem.
I give examples of each and the expectations; in general, the former will
require to discuss a little about many approaches/problems while the latter
requires a more in-depth focus on one approach. Please be sure to email me a short preview of your topic by April 29.
Choosing a Topic
There are a wide-range of topics available to you. In general, it should be a topic
related to bioinformatics, medical informatics, or machine learning. You may decide
to cover something completely novel, or extend something covered in class. If delving
into something we covered, you will be expected to go more in depth on the methodology
as introducing the problem will be fairly basic.
Some example biology problems:
- Genome assembly (taking many, small overlapping fragments and reconstructing
the original genome)
- Single nucleotide polymorphisms (SNPs) and their uses (e.g., genome wide association studies (GWAS))
- Systems biology
- Network inference
- genomics and evolution
- Protein structure; protein function; secondary structure; protein disorder
- Mass-spec and proteomics
- RNA structure
- Protein-Protein interactions; protein-dna interactions
- Next generation sequencing data
- Medical informatics
- Translational bioinformatics
- Computational Immunology
- Image Analysis
- Databases and ontologies
- Biomedical text mining
- Disease models/ontologies
- Drug discovery
- Metabolic networks
If you are more interested in exploring general algorithms, you can use the following list to
explore various techniques:
- Probabilistic graphical models (e.g., Bayesian networks, HMM's, conditional random fields, Markov random fields)
- Supervised learning algorithms (e.g., neural networks, support vector machines, deep learning)
- Unsupervised learning algorithms
- Sequential algorithms
- Vision/image analysis
- Memory efficient models
- Tree-learning algorithms
- Semi-supervised learning
- Search
Paper Options
You have two options for you paper:
A) Survey Paper
In this type of paper, you will review a general field of study.
For example, you could motivate a problem such as
RNA structure prediction and then briefly cover some of the proposed solutions (~1 paragraph each for 3 to 4 methods).
Your paper will most likely rely on touching upon the findings of many papers
or a central tutorial/review paper in the literature.
B) Method Review Paper
By this, I mean you are choosing to go in-depth in reviewing one specific approach
to a particular problem. For example, I had you write a one page discussion on
the T-Coffee approach to multiple sequence alignment. This type of paper will
require you to concentrate on a detailed explanation of the technique, but you will
need to only read one paper in-depth and possible another paper or two that provide
further context.
Writeup Requirements
Your writeup should be well-structured and follow scientific writing principles. You
can structure as you see fit, but at a minimum, a scientific paper touches upon these
core topics:
- Introduction - Introduce the problem (computational and biological); preview
contents of paper; this should sell me on reading the rest of your paper.
- Motivation - Is it clear what the purpose of the algorithm is?
Why is the problem important to solve? Why is it difficult/not solved yet?
- Related work - How does the problem relate to a topic covered in class?
How does it connect to other bioinformatics problems? How does it relate to
non-bioinformatics topics (e.g., NLP, economics, robotics)?
- Methods - For paper A): what are the different types of approaches
proposed to solve the problem? There may be dozens, but you should be able to describe
the general theme of these methods (e.g., probabilistic; greedy; search-based). What
are the most popular/common methods? Spend a paragraph each describing how 3-4 methods
approach the problem.
For paper B) you should go in depth on explaining the algorithm. You probably will
want to write out pseudo-code and/or show an illustration of the algorithm overview.
You should also have a paragraph outlining competing methods.
- Results and Experimental Methodology - How is/are the technique(s) evaluated? Is
there a standardized data set that is publicly available? What are the measures used
and what do they show?
- Discussions/Analysis/Conclusions - What are the advantages and disadvantages
to the methods? Are there particularly types of data the approach works well on?
What are the deficiencies? These should suggest future directions of research -
what do you consider the most promising possible direction?
I do not expect you to report on every aspect of these topics, but touch on at least
some of each core topic. Depending
on your choice, it is okay to limit your research to one central paper or a
light-reading of many papers. While this is hard to quantify, my expectation is that
your paper will inform me about both your ability to frame the problem in a concise,
yet informative and accurate manner. I also want to obtain an understanding of what
you find interesting about the problem.
Your paper should be typed; no
hand-written writeups will be accepted. The exception is with illustrations, which
can be
neatly drawn and attached to your writeup. I would prefer
you scan and attach the images to your PDF document, if possible.
It may be useful to learn to use latex, a popular
typesetting system that is used widely for writing scientific and mathematical papers.
If you are interested, feel free to email me any questions about this. You can also
find a primer and some sample guides online, including
here or
here.