Provide anonymous course feedback here. Please be constructive in any comments
This syllabus is a living document; please be aware that many elements on this page will change throughout the semester, including the course schedule. It is the student's responsibility to review this page periodically for updates.
Course Title: CPSC 68 - Bioinformatics (cross-listed as BIO 68). 1 credit. Satisfies Group 3 major requirement.
Lecture: TR 1:15pm - 2:30pm Science Center 183
Lab: F 1:00-2:30pm Science Center 240
Required textbook: Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison.
Prerequisites: Data Structures and Algorithms (CS 35); interest in learning basic molecular biology and probability theory
Professor: Ameet Soni
Email:
Office: Science Center 253
Phone: 610-957-6288
Office hours: Mondays, 2pm - 4pm OR by appointment
Welcome to CS68. This course is an introduction to the fields of bioinformatics and computational biology, with a central focus on algorithms and their application to a diverse set of computational problems in molecular biology.Computational themes will include dynamic programming, greedy algorithms, supervised learning and classification, data clustering, trees, graphical models, data management, and structured data representation. Applications will include genetic sequence analysis, pairwise-sequence alignment, phylogenetic trees, motif finding, gene-expression analysis, and protein-structure prediction.
While significant time will be spent exploring the biological significance of problems, the central focus in this course will be on understanding how to develop algorithms for complex problems. In particular, the general question we will answer is "How does one reason about large amounts of complex data?" That is, how do we uncover underlying phenomena and draw conclusions in the face of large data sets with noisy, intricate relationships? While this question is presented in context of problems in molecular biology, it applies to open problems across all of the sciences. We will see that many of the algorithms we cover have applications and foundations in far-reaching domains including natural language, social network analysis, security, and search.
By the end of the course, you will understand:
- the wide diversity of data produced by biological experiments
- the inherent difficulties in analyzing/understanding "real-world" data sets that are large, complex, and noisy
- the computational problems that arise in an effort to store, process, and analyze these data sets
- the core set of algorithms utilized in computational biology to handle many of these problems
- the biological and societal impact of these algorithms in being able to uncover biological phenomena and/or generate novel hypotheses
- the basics of probabilistic modeling for handling uncertainty of information, and the idea of inference to draw conclusions under uncertainty
- several categories of algorithms seen across computer science including dynamic programming, search, greedy algorithms, and approximate algorithms
- the connection between the theory of algorithms covered and the real- world practical problems that arise with large data sets and limited computational resources
- the connection from theory to bench - the bioinformatic toolkits currently used by biologists
I have outlined the skills and objects this course promises to provide you. For this promises to be upheld, you will need to commit towards the policies outlined below.
Lecture attendance is required. While I am more than happy to help with any material in office hours, priority will be given to students who show up and participate regularly in class. Office hours are not to make up for a missed lecture.
Lab attendance is required. New lab assignments will be introduced in our Friday lab sessions, and lab sessions will sometimes contain new course material, required practice exercises, and written quizzes.
Merely showing up to class is not sufficient for success in this course. Students are expected to be active in the learning process. This includes asking and answering questions as well as working with classmates during small group break sessions. Students are expected to review the previous lecture's notes and the reading prior to showing up to class. Studies have shown that active involvement is the number one determinant of student success. Besides, your participation grade is based on your involvement in classroom discussion!
Lab assignments are meant to help you learn the material. It is very important that students approach assignments with this view. This means that labs should be started early and completed with academic integrity (see policy below). If you work with a lab partner on an assignment, you must design and complete most of the program together; failing to do so will only hurt you on exams. You are expected to contribute equally to the assignment, otherwise I will intervene and reassign lab partners.
5% Class participation 35% Lab assignments 35% Midterms 25% Final exam
If you believe that you need accommodations for a disability, please contact Leslie Hempling in the Office of Student Disability Services, located in Parrish 130, or e-mail lhempli1 to set up an appointment to discuss your needs and the process for requesting accommodations. Leslie Hempling is responsible for reviewing and approving disability-related accommodation requests and, as appropriate, she will issue students with documented disabilities an Accommodation Authorization Letter. Since accommodations may require early planning and are not retroactive, please contact her as soon as possible. For details about the Student Disabilities Service and the accomodations process, visit here.
You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through Leslie Hempling in the Office Of Student Disability Services.
To receive an accommodation for a course activity, you must have an Accomodation Authorization letter from Leslie Hempling and you need to meet with me to work out the details of your accommodation at least two weeks prior to any activity requiring accommodations.
Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on approved lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes sharing solutions after the due date of the assignment.
All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.
Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.
``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2008-2009, Section 7.1.2)
Please see me if there are any questions about what is permissible.
Note: this is a tentative schedule. As this is the first time teaching the course at Swarthmore, there will be adjustments made to adjust for the pacing of the course and the interest of students in later topics
WEEK | DAY | ANNOUNCEMENTS | TOPIC & READING | LAB |
1 | Jan 22 | Introduction to Bioinformatics; Molecular Biology
|
Lab 1 - Databases and Central Dogma | |
Jan 24 | ||||
2 | Jan 29 | Pairwise Sequence Alignment - Detecting homology
|
Lab 2 - Dynamic Programming and Pairwise Seq. Alignment | |
Jan 31 | Drop/Add ends (Feb 01) | |||
3 | Feb 05 | Heuristic Alignment Methods
|
||
Feb 07 | ||||
4 | Feb 12 | Multiple Sequence Alignment
|
Lab 3 - Multiple Sequence Alignment | |
Feb 14 | ||||
5 | Feb 19 | Phylogentic Trees - Inferring evolutionary relationships
|
Lab 4 - Inferring the Tree of Life | |
Feb 21 | ||||
6 | Feb 26 | Midterm 1 study guide | ||
Feb 28 | Midterm 1 (in lab) | |||
7 | Mar 05 | Introduction Probability and Statistics
|
||
Mar 07 | ||||
Mar 12 |
Spring break |
|||
Mar 14 |
||||
8 | Mar 19 | Probabilistic Sequence Models - Finding patterns using Markov Models
|
Lab 5 - Gene Finding using Markov Models | |
Mar 21 | ||||
9 | Mar 26 | Probabilistic Sequence Models - Hidden Markov Models and its applications
|
Lab 6 - Problem Set for HMMs | |
Mar 28 | CR/NC and Withdraw deadline (Mar 29) |
|||
10 | Apr 02 | Lab 7 - Profile HMMs for Multiple Sequence Alignment | ||
Apr 04 | ||||
11 | Apr 09 | Midterm 2 study guide | Introduction to Functional Genomics - Gene Expression Data
|
|
Apr 11 | Midterm 2 (in lab) | |||
12 | Apr 16 | Clustering Algorithms
|
||
Apr 18 | ||||
13 | Apr 23 | More clustering; Supervised Learning
|
Lab 8 - Classification and Clustering of Gene Expression Data | |
Apr 25 | ||||
14 | Apr 30 | Protein Structure Prediction
|
Review paper | |
May 02 | Final Exam study guide | |||
May 14 |
Final 2:00pm–5:00pm Sci 183 |