Lab 3: Word Embeddings

Introduction

Creating an appropriate word embeddings is the first step of Large Language Models like ChatGPT. Word embeddings are a numeric representation of words that reflect the rich context of how words are used within the training data. The representation of words that appear in close proximity in the training data will have similar words embeddings, and words that never or rarely appear in close proximity will have very different word embeddings. This lab allows you to explore word embeddings for a tiny grammar with a limited set of words.

We will again be using Google colab to experiment with neural networks in a Python notebook. Follow this link to start the lab. Before you begin working, save a copy of this notebook to your own drive.