CS 91.3 Lab 4: Research Datasets
Due Tuesday, February 15, by midnight (23:59, EST)
Goals
The goals for this lab assignment are:
-
Learn how to find a research dataset
-
Learn how to find the related code
-
Get familiar with code of LDA
-
Get familiar with code of SVM
1. Download Datasets (10 min)
-
Download 'Data sets 2a' from official website. I have already shown you the procedures in class.
-
Download the 'description' file from the same page.
-
Download the Pre-processed dataset
2. Run the time() function (10 min)
3. Reproductivity (50 min)
-
Paper 1: 'Fast and Accurate Multiclass …',
-
Follow the instructions on the GitHub page.
-
-
Paper 2: 'Exploring Embedding Methods …',
-
Follow the instructions on the GitHub page.
-
-
For both of the two papers from PWC above:
-
Download the existing code
-
Set up the coding environment
-
Run the existing code
-
Track the execution time and write them down in your notes.txt file. For example, 'Total running time of the script: ( 0 minutes 4.159 seconds)'.
-
Take the screenshots of your results after running the code.
-
4. LDA (10 min)
-
Example 1: 'Normal, Ledoit-Wolf …'
-
Download Python source code: plot_lda.py
-
-
Example 2: 'Comparison of LDA and PCA ..'
-
Download Python source code: plot_pca_vs_lda.py'
-
-
For both of the two examples above:
-
Download the existing code
-
Set up the coding environment
-
Run the existing code
-
Track the execution time and write them down in your notes.txt file.
-
Take the screenshots of your results after running the code.
-
-
Write in your own words what LDA is, in four to five sentences, in your notes.txt file.
5. SVM (10 min)
-
Example 1: 'SVM: Maximum margin …'
-
Download Python source code: plot_separating_hyperplane.py
-
-
Example 2: 'Plot different SVM …'
-
Download Python source code: plot_iris_svc.py
-
-
For both of the two examples above:
-
Download the existing code
-
Set up the coding environment
-
Run the existing code
-
Track the execution time and write them down in your notes.txt file.
-
Take the screenshots of your results after running the code.
-
-
Write in your own words what SVM is, in four to five sentences, in your notes.txt file.
6. Your research project (Four hours)
-
Example Paper 1: 'Fast and Accurate Multiclass …'
-
Example Paper 2: 'Exploring Embedding Methods …'
-
Write 'Dataset' sections to both your poster and paper, including
-
Dataset source
-
Dataset description
-
-
You may use the two above examples' dataset related sections as references.
7. Submission Guide
-
Each team only submits one file, lab_4_lastname1_lastname2.zip, including
-
lab_4_lastname1_lastname2.PDF for your poster and paper draft, including 'Introduction' and 'Dataset' sections.
-
notes_lab_4_lastname1_lastname2.txt for your notes.
-
A screenshot folder for all the screenshots files (PNG or JPEG), total size less than 10 M.
-
8. Notes
-
Each team only needs to submit one ZIP file, with both names on it.
-
Email 'xqu1@swarthmore.edu' your Lab 4 files as lab_4_lastname1_lastname2.zip.
-
The team members from the same team may get the same score.
-
Lab assignments will typically be released on Wednesday and will be due by midnight on the following Tuesday. This lab was released on 02/09 and will be due by midnight on 02/15.