Skip to main content

Lab 4: Content Moderation on Swatchan

Due Date

Monday, March 3, 11:59pm EST

Overview

Learning Goals

By the end of this lab you will be able to...

  • Web Development: Build a dynamically-served website with using Flask, HTML, CSS, and JavaScript. You'll also learn how to create a database, and use it to store and retrieve information.
  • Social Computing: Navigate the technial and ethical dimensions of content moderation and the challenges of moderating content on an anonymous platform.

Before you begin...

"Social media's greatest assets - anonymity, virality, interconnectedness - are also its main weaknesses." - Evgeny Morozov, author of To Save Everything, Click Here

This is a one-week lab that consists of two parts. The first part involves editing a Jupyter Notebook to create an initial set of database queries, insert, delete, and update statements for a SQLite database. The second part involves creating a Flask web application that uses those statements to create a better alternative to 4chan.

Background

4chan is an anonymous English-language imageboard website — a forum where people can post text and images. Launched in October 2003 by a 15 year old, the site hosts boards dedicated to a wide variety of topics, from video games and television to literature, cooking, weapons, music, history, technology, anime, physical fitness, politics, and sports, among others. As of 2022, 4chan receives more than 22 million unique monthly visitors, of whom approximately half are from the United States.

4chan was created as an unofficial English-language counterpart to the Japanese imageboard Futaba Channel, also known as 2chan, and its first boards were originally used for posting images and discussion related to anime. The site has been described as a hub of Internet subculture, its community being influential in the formation and popularization of prominent Internet memes, such as lolcats, Rickrolling, rage comics, wojaks, Pepe the Frog, as well as hacktivist and political movements, such as Anonymous and the alt-right.

Posting is ephemeral, as threads receiving recent replies are "bumped" to the top of their respective board and old threads are deleted as new ones are created. As making a post without filling in the "Name" field causes posts to be attributed to "Anonymous", general understanding on 4chan holds that Anonymous is not a single person but a collective (hive) of users.

"[A] significant and influential element of contemporary internet culture", 4chan is responsible for many early memes and the site has received positive attention for its association with memes. This included "So I herd u liek mudkipz" [sic], which involved a phrase based on Pokémon and which generated numerous YouTube tribute videos. Another example of memes popularized by 4chan is the lolcat, an image combining a photograph of a cat with solecistic text intended to contribute humour, widely popularized by 4chan in the form of a weekly post dedicated to them and a corresponding theme.

However, 4chan is perhaps better known for being one of the worst places on the internet. This includes hosting non-consensual sexual imagery and being a breeding ground for racism, hate speech, and the alt-right. The site has been linked to a number of high-profile incidents, including the leaking of celebrity photos in 2014 (Celebgate), an online harassment campaign against women gamers (Gamergate), the spread of the Pizzagate conspiracy theory, and fueling online conversation in the lead up to the 2017 Unite the Right rally in Charlottesville, Virginia.

How did things get so bad on 4chan? According to the Washington Post, a couple things make 4chan unusual as a forum. For one thing, unlike Reddit, users never need to make an account or pick a username — even a pseudonymous one. That means participants can say and do virtually anything they want with only the most remote threat of accountability. It also means you can’t message other users or establish any kind of social relationship with them, unless they reveal their identity in some way. For a social network, that’s pretty weird.

To further complicate things, 4chan threads expire after a certain amount of time — less time for R-rated boards, more time for G or PG ones — which lends a sense of impermanence to the whole operation and means that users rarely see the exact same thing. Few posts last more than a few days before they’re deleted from 4chan’s servers. Posts are organized reverse-chronologically — although “organized” maybe overstates it. 4chan’s interface is deliberately, anachronistically minimalist, which can make it difficult for non-regular users to access.

Bottom line: 4chan is a forum — nothing crazy or mysterious there. It’s just a forum with no names, few rules and few consequences, which is (a) the philosophical antithesis to virtually every other mainstream social property and (b) means people can (and do!) say just about anything they want.

Requirements

Your mission this week is to make a better version of 4chan for Swarthmore. You can call it Swatchan, 4chan++, Swatboard, or whatever you like. Your version of the site should have the following features:

By the deadline, you should have:

  1. Come up with content moderation rules for your online forum and add them to the README.md file. These rules can be serious (e.g., disallowing hate speech, harassment, misogyny) or silly (e.g., disallowing humor, politics, or revealing personal information). You should also define what will happen to users who break the rules. Will they be banned? Will their posts be hidden or altered in some way?

  2. Created a Jupyter Notebook that contains the following:

    • A set of Models that defines the structure of your database tables. This includes Thread and Comment. The models should have relevant fields (columns). See the Content Rules below for examples of fields you might want to include and queries you might want to write.
    • Two SQLAlchemy queries that search for data in each of the two tables.
    • Two SQLAlchemy queries that insert data into each of the two tables.
    • Two SQLAlchemy queries that update data in each of the two tables.
    • Two SQLAlchemy queries that delete data from each of the two tables.
  3. Your project should have the following file structure:

    swatchan/
    └── templates/ # view
    │ ├── base.html
    │ ├── home.html
    │ └── thread.html
    └── static/
    │ ├── css/
    │ │ └── base.css
    │ │ └── thread.css
    │ ├── js/
    │ │ └── thread.js
    │ └── images/
    │ └── logo.png
    └── instance/database.db # populated with threads and comments
    ├── app.py # controller
    ├── models.py # model
    ├── utils.py # helper functions
    ├── database-notebook.ipynb
    ├── requirements.txt
    ├── .gitignore
    └── README.md
    • Your README.md file should include:
      • Instructions on how to install all of the requirements for your project and run your flask server. (Anyone who reads your README should be able to set up and run your project on their local machine assuming they have python, pip, and uv pre-installed.)
      • Information about how content moderation works on your platform.
  4. Set up your flask server with three pages: (1) a homepage ("/"), (2) a threads page to display each thread and comments ("/thread/<id>"), and (3) an error page ("/error") that tells the user that a particular author has posted too much. Each page should extend a "base.html" HTML template that includes a header, footer, and navigation bar.

  5. Homepage: Create and style a homepage for your forum. This includes:

    • Display your content moderation rules at the top of the page.
    • Organize threads based on the number of upvotes, in descending order.
    • Each thread should display the thread title, the number of comments, the number of upvotes, and the date it was created.
    • Clicking on a thread should redirect to that thread's page ("/thread/<id>").
    • On the homepage, users can also create a thread. To create a thread, a user needs to provide a title, author, content, and choose from one of several pre-defined categories.
  6. Thread: Create and style a thread page that is dynamically created for each thread. This includes:

    • Customize the default thread page (e.g., button color, style, background color, etc.)
    • Display the thread title, author, content, and number of upvotes.
    • Display all comments associated with the thread. Each comment should display the author, content, and date it was created.
    • Users can add a comment to the thread. To add a comment, a user needs to provide author and content.
    • Users can also upvote and downvote threads anonymously. Upvotes and downvotes only change the upvote field in the Thread table, and can be a negative number.
    • Threads and comments can't be edited or deleted by users.
  7. Content Rules: Create helper functions that enforce the following content moderation rules:

    • The rules you came up with in Requirement #1.
    • Threads that are older than one week should not be shown (but should still be stored in the database).
    • Comments that are older than 24 hours should not be shown (but should still be stored in the database).
    • Threads and comments should be automatically deleted after 30 days.
    • A specific "author" should not be able to post more than 3 threads or 5 comments per day. If they try to do so, you can redirect them to an "Error" page that says they've reached their limit. You can keep track of this with the "author" field in the Thread and Comment tables. (An alternative option — that you should consider for your final project and not this lab — is having a separate Author table.)
  8. Your final submission should include a database.db file that is populated with threads and comments. You can use the database-notebook.py file to populate the database.

Stretch Goals:

  1. Let users upload images when creating a thread or comment. You can use Pillow (Python Image Library) to resize images to a smaller size before storing them in the database.
  2. Use a ML model like nltk or an API like Google Jigsaw's Perspective API or OpenAI's omni moderation to automatate moderating comments and threads.

Setup

a. Clone Your Lab Repository

Get your Lab SSH URL from the CS77-S25 GitHub organization. The repository to clone is named Lab4.

$ git clone https://github.swarthmore.edu/cs77-s25/lab4.git
$ mv lab4 lab4-<username1>-<username2>
$ cd lab4-<username1>-<username2>
$ ls
Note
  • Here's a handy visual guide to help you better understand git.
  • If you need help getting your git ssh keys setup or cloning the repository, please refer to the using git guide.
  • If you are having trouble ssh-ing into a lab machine, make sure you've followed the instructions in the ssh setup guide.

You'll need to work with a partner for this lab. Teams are a common feature of the workplace, in virtually every industry; and HCI, UX, and Software Engineering are no exception. Many of you will have been, or currently are, involved in work teams, either formally or informally. In both face-to-face and remote contexts, team members have to practice good communication and collaboration. Job descriptions routinely ask for these skills, and I hope this class will help you to develop and reflect on these skills.

If you haven't already, read the syllabus for tips on effective teamwork. It's good to set expectations before you start a project so you have something to refer back to if (when) you encounter friction in your partnership.

Now, sit next to your partner and create a team compact. The team compact should include:

  • Team Name: Come up with a team name that represents your partnership.
  • Team Members: List the names of your team members.
  • Meeting Times: List the times you plan to meet outside of class.
  • Communication: How will you communicate with each other? (e.g., Slack, Discord, text, email)
  • Conflict Resolution: How will you resolve conflicts that arise during the lab? Feel free to use me as a scapegoat: “Prof. V will not find this convincing. Why don’t we try…?”
  • Division of Labor: How will you divide the work? Will you work on different parts of the lab or work together on everything? Remember, if one person does all the work, you won't be able to make much progress in a future lab. You also have to fill in a partnership survey at the end of each lab to indicate how much you and your partner contributed to the lab.
  • Timeline: What is your timeline for completing the lab? When will you have certain requirements completed by?

c. Set Up Your Virtual Environment

  1. In your lab repository, create a virtual environment.:

    uv venv 
  2. To activate your virtual environment, type source .venv/bin/activate. You should see your shell change to have a (folder name) prefix.

  3. To deactivate your virtual environment, type deactivate.

  4. Check that your virtual environment is using the correct version of Python:

    (.venv) $ which python
    <path_to_directory>/.venv/bin/python
  5. Great, now install Flask and Flask-SQLAlchemy and other required Python libraries:

    uv pip install flask flask-cors Flask-SQLAlchemy jupyter
    #OR
    uv pip install -r requirements.txt

d. Run Your Flask App

If your main flask application is set up in a file called app.py, you can start your flask web server using the following command:

flask run --debug --reload

The --debug and --reload flags will automatically restart your sever if you make changes to files.

Part 1: Database Setup and Queries

1.1 Database Models

In your Jupyter Notebook, create a set of models that define the structure of your database tables. This includes Thread and Comment. The models should have relevant fields (columns).

Your models.py file should contain the following code:

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy() # creates a Flask-SQLAlchemy object
class Thread(db.Model): # Models must be subclasses of the Flask-SQLAlchemy Model class
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(255), nullable=False)
content = db.Column(db.Text, nullable=False)
created_at = db.Column(db.DateTime, default=db.func.now())

# defines a one-to-many relationship between Thread and Comment, one thread and many comments
# cascade refers to what happens when a thread is deleted. In this case, the child object
# (comment) should follow along with its parent in all cases, and be deleted once it is no
# longer associated with that parent.
comments = db.relationship('Comment', backref='thread', cascade="all, delete-orphan", lazy=True)

class Comment(db.Model):
id = db.Column(db.Integer, primary_key=True)

# thread_id is a foreign key that references the primary key (id) in the Thread table
thread_id = db.Column(db.Integer, db.ForeignKey('thread.id'), nullable=False)

content = db.Column(db.Text, nullable=False)
created_at = db.Column(db.DateTime, default=db.func.now())

TODO: Add additional fields to the Thread and Comment models: upvotes and author. Note, only threads have upvotes; threads nd comments have an author.

1.2 Database Queries

Refer to the slides and the example from class on how to generate queries.

To open the Jupyter Notebook, make sure you are running in your virtual environment, source .venv/bin/activate. Then while in the lab4 directory, type jupyter notebook in your terminal. Navigate to the the database-notebook.ipynb file.

To view data in your database, download a database viewer like DBeaver or DB Browser for SQLite. Then navigate to the instance folder in your project directory and open the database.db file.

Part 2: Create Your Forum

Edit the starting point code to meet the requirements above. You'll need to create additional routes, templates, and static files to complete the lab, as well as helper functions in utils.py.