Lab 3: Exploring Datasets

For this lab we will be working with real-world datasets from CORGIS (which is an acronym for: The Collection of Really Great, Interesting Situated Datasets).

You may find it helpful to refer to this short Pandas Cheat Sheet as you are doing this lab.

Biases in real-world data

Part A: Explore the Hospitals data set

This data set contains information about hospitals throughout the United States with the goal of helping consumers make informed choices about which are most cost effective and have the best ratings.

  1. Go to the CORGIS website and find the link to the data set about Hospitals (the data sets are listed in alphabetic order).
  2. Click on the link and read through the description of the Hospitals data set.
  3. Click on the link to download the data set, which is called "hospitals.csv". It will likely place this in your Downloads folder on your Mac. Use the finder app on your Mac to move this file into your S3P folder.
  4. Download the Jupyter notebook ExploreHospitalsData.ipynb and save it in the S3P folder.
  5. Open a terminal window and type: cd Desktop/S3P to move into your S3P directory.
  6. In the terminal type: python3 -m notebook to start up Jupyter notebook.
  7. Double click on file named ExploreHospitalsData.ipynb.
  8. Read through this notebook and complete the exercises that are given.

Part B: Find a data set of interest to you

Now it's time for you to become a data scientist! Go back to the CORGIS website and explore what data sets are available. Read through the descriptions and think about what is most interesting to you.

  1. Once you select a data set to focus on, go through the same steps above to download its CSV file on to your computer and move it into your S3P folder.
  2. Download the Jupyter notebook ExploreMyData.ipynb and save it in the S3P folder.
  3. Open a terminal window and type: cd Desktop/S3P to move into your S3P directory.
  4. In the terminal type: python3 -m notebook to start up Jupyter notebook.
  5. Double click on file named ExploreMyData.ipynb.
  6. This notebook provides a template for you to begin exploring the data set of your choice. Feel free to add/remove cells to this data set as needed.

You may try one data set and discover it isn't quite what you expected, or that it doesn't yield many interesting insights. Feel free to try another until you find one that works for you. The goal is that the data set that you choose will become the focus of your final poster.