Search engines, such as Google, YouTube and Apple iTunes, have had a huge impact on how people find and use information. In this course, we will explore how these text and multimedia information retrieval (IR) system are designed and implemented.
The first half of the class will be devoted to developing traditional IR skills such as web-crawling, text & multimedia processing, boolean & vector-space modeling, classification, clustering, and recommendation.
The second half of the course will be devoted to creating SWAMOODIE: a Swarthmore Music Discovery Engine. This will be a collaborative class project in which groups of student will design and develop individual component of this large-scale music IR system. In the final weeks we will combine these components and (if all goes well) have a powerful new tool that helps people find music.
Professors: Douglas Turnbull / Richard Wicentowski
Office: Science Center 255 / 251
Phone: (610) 597-6071 / (610) 690-5643
Office hours: TBA or by appointment
Room: Science Center 240
Time: Tuesday & Thursday 9:55pm–11:10pm
Text: Manning, Raghavan, & Schutze. Introduction to Information Retrieval (2008).
Wiki: mugwort.cs.swarthmore.edu/67wiki (link no longer available)
WEEK | DAY | ANNOUNCEMENTS | TOPIC & READING | LAB |
1 | Jan 20 | Introduction to IR Chapter 1 (Both) |
Lab 1 (1/29) |
|
Jan 22 | Web-Crawling and Basic SQL Chapter 20 (Rich) |
|||
2 | Jan 27 | Advanced SQL and Database Design (Doug) |
Lab 2 (2/4) |
|
Jan 29 | Drop/Add Ends (Jan 30) |
Basic IR Models (boolean, vector-space, TF-IDF) Chapter 1,6 (Rich) |
||
3 | Feb 03 | Lab 3 (2/17) |
||
Feb 05 | Performance Evaluation Chapter 9 (Doug) |
|||
4 | Feb 10 | |||
Feb 12 | Document Classification Chapter 13,14,15 (Doug/Rich) |
Lab 4 (2/24) |
||
5 | Feb 17 | |||
Feb 19 | Lab 5 (3/19) |
|||
6 | Feb 24 | Document Clustering Chapter 16,17 (Doug/Rich) |
||
Feb 26 | ||||
7 | Mar 03 | |||
Mar 05 |
In-Class Exam (Rescheduled for Monday March 23, 7pm) |
|||
Mar 10 |
Spring Break |
|||
Mar 12 |
||||
8 | Mar 17 | Audio Signal Processing Tutorial (Doug T) |
||
Mar 19 | Music Classification Lab (Doug T) |
|||
9 | Mar 24 | SWAMOODIE Planning Day Five Approaches to Collecting Tags for Music Turnbull, Barrington, Lanckriet (2008) (Doug T) |
||
Mar 26 | Recommender Systems Music Recommendation and Discovery in the Long Tail Celma (2009) Chapter 2 ONLY (Joon) |
|||
10 | Mar 31 | SWAMOODIE Architecture Session | ||
Apr 02 | Search Engine Architecture The anatomy of a large-scale hypertextual Web search engine Brin, Page (1998) Focus on Section 4 (Doug W.) |
|||
11 | Apr 07 | Hubs and Authority Authoritative Sources in a Hyperlinked Environment Kleinberg (1999) (Nick) |
||
Apr 09 | Page Rank The pagerank citation ranking: Bringing order to the web Page, Brin, Motwani, Winograd (1998) (Malcolm) |
|||
12 | Apr 14 | Autotagging Autotagger: A model for predicting social tags from acoustic features on large music databases. Bertin-Mahieux, Eck, Maillet, Lamere (2009) (Derek) |
||
Apr 16 | HCI and Visualization MusicSun: A new approach to artist recommendation Pampalk, Goto (2007) Skim paper, also check out Pandora, Last.fm, Musicovery, Echotron, & other music discovery websites (Ashley) |
|||
13 | Apr 21 | Other Topics: Combining Data Sources (Brian) Text-based Multimedia IR (Meggie) Social Tags and IR (Jeff) |
||
Apr 23 | ||||
14 | Apr 28 | Swamoodie Final Presentations | ||
Apr 30 |
35% | Lab Assignments |
20% | Final Exam |
35% | Swamoodie Project |
10% | Class Participation |
Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.
All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were.
``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2007-2008, Section 7.1.2)
Please see me if there are any questions about what is permissible.
Course Wiki
Python links
Related Courses