linked lists

Linked Lists vs Python lists

How do python lists work? We have used them a lot this semester. Is there another way to implement lists? And if so, what are the pros and cons of each way?

What things do we normally do with python lists?

  L = []                 # construct
  L.append("cat")        # add to
  print L, len(L), L[1]  # print, length, indexing
  L.pop(0)               # remove from

Can we write/create lists from scratch (without using python lists)?

Computer Memory

There are two main options for representing lists internally:

Store the list items in consecutive locations in memory (python list)
Store "nodes" anywhere in memory. A Node consists of the list item, plus the address (or something like the address) of where the next node can be found. This is known as a linked list

Here is a klunky/simplified picture of how each option might look:

computer memory

For the python list, because everything is at known, easy-to-calculate memory addresses, that makes indexing fast and efficient:

L[0] is at 2000
L[1] is at 2000 + 1
L[2] is at 2000 + 2
... 
L[i] is at 2000 + i

So, to get any value from the list, all we have to do is one quick calculation (base memory address + i). This is a constant time operation (i.e., it doesn't depend on the size of the list).

It is also easy to append to the python list, by using the length of the list to locate the next location to store a new item.

Some disadvantages of using consecutive memory:

what happens when you want to insert a new item in the middle of a list, or at the beginning of the list? all following items need to be shifted down one spot
what happens when the computer runs out of large chunks of consecutive memory? think of a crowded movie theater, and how you can't always sit with all of your friends

Linked Lists:

A linked list is a collection of objects/nodes, where each node contains both the item stored in the list, as well as a "pointer" to the next node in the list.

Linked lists typically have three instance variables: a head pointer (self.head), a tail pointer (self.tail), and the number of nodes in the list (self.size)

In the above picture, the linked list has 5 nodes, and each node stores a simple string, as well as a pointer to the next node in the list. In class I often draw linked lists and nodes as follows:

              ------    ------   ------   ------   ------
self.head --> |rich|    |lisa|   |andy|   |zach|   |tia |<--self.tail
              ------    ------   ------   ------   ------
              |    |--> |    |-->|    |-->|    |-->|    |-->None
              ------    ------   ------   ------   ------

                             self.size = 5

Linked lists are often used to create other useful data structures, such as stacks and queues. They are also a good introduction to more advanced data structures, such as binary trees.

One big advantage of the linked list is not needing large chunks of consecutive memory. Each node in the list is just put in memory wherever there is space, and all nodes are "linked" together into one "list".

Also, adding an item into the middle of the list does not require shifting all other elements down one slot. Simply reset the next "pointers" to include the new node.

Implementing A Linked List

We will implement a linked list using two classes:

Node, which will store a data value (the list item) and a "pointer" to the next node in the list
LinkedLlist, which will store pointers to the head Node and the tail Node, as well as the current size of the list.

the `Node` class:

Let's write a Node class, where each node has two instance variables: one to hold the node data (call it "data"), and one to hold a pointer to another node (call it "next"). Later we will use these node objects to build a linked-list!

Add two mutators and two accessors for the data and next variables, then test it out.

class Node(object):

  def __init__(self, item):
    """
    construct new node, with given item.
    next initially set to None.
    """
    self.item = item
    self.next = None

  def __str__(self):
    """return a string representation of the node"""
    s = "("+str(self.item)+")"
    if self.next == None:
      s += "--|"
    else:
      s += "-->"
    return s

  def getItem(self): return self.item
  def getNext(self): return self.next

  def setItem(self, i):
    """set node item to i"""
    self.item = i

  def setNext(self, n):
    """set node next to n"""
    self.next = n


if __name__ == "__main__":

  n1 = Node("jeff")
  n2 = Node("zach")
  n3 = Node("lauri")
  n4 = Node("rich")

  n1.setNext(n2)
  print(n1)
  n2.setNext(n3)
  print(n2)
  n3.setNext(n4)

the `LinkedList` class

Last time we wrote a Node class to be used to create our linked lists. A Node is just a "container" to hold two things: the item or data we want stored in the list, plus a next "pointer" to point to the next node in the list. Here is an example of how we could manually create nodes and link them together:

>>> from node import *
>>> n1 = Node("jeff")
>>> n2 = Node("zach")
>>> n3 = Node("lauri")
>>> 
>>> print(n1)
(jeff)--|
>>> 
>>> n1.setNext(n2)
>>> n2.setNext(n3)
>>> 
>>> print(n1)
(jeff)-->
>>> print(n2)
(zach)-->
>>> print(n3)
(lauri)--|

Instead of doing this manually, we could create a LinkedList class to allow us to create linked lists just like python lists. Here is what I would like to be able to do:

>>> from linkedlist import *
>>> LL = LinkedList()
>>> LL.append("jeff")
>>> LL.append("zach")
>>> LL.append("lauri")
>>> print(LL)
head-->(jeff)(zach)(lauri)<--tail

Each linked list should have three instance variables: self.head, self.tail, and self.size. All methods we write should make sure all three are updated accordingly. For example, when appending to the list, the self.tail pointer should be adjusted, as well as the size of the list. And if it is the first node being added/appended to the list, then the self.head pointer will also need to be set (i.e., if there is just one node in the list, both the head and the tail should point to that one node).

the constructor

For the constructor, we want to build an empty linked list object, so self.head and self.tail don't point to anything, and the size is zero. We use python's None type when things don't point to anything:

def __init__(self):
  """construct an empty linked list"""

  self.head = None
  self.tail = None
  self.size = 0

the `append` method

To append a new node, we need to create the node with the given item stored in the node, and then add it to the end of the list. That means connecting the current tail of the list to the newly-created node, then resetting the tail to be the newly-created node. Something like this:

n = Node(item)
self.tail.setNext(n)    # connect to end
self.tail = n           # reset tail pointer
self.size += 1

However, this only works if there are already nodes in the list. If this is the first node in the list, we have to account for that (set the head to point to this node, and don't connect the current tail). Here is the final version of our append() method:

def append(self, item):
  """add new node, containing item, to end of the list"""

  n = Node(item)
  if self.size == 0:
    self.head = n
  else:
    self.tail.setNext(n)
  self.tail = n
  self.size += 1

the `prepend` method

Can you create a similar method called prepend, to add nodes to the beginning of the list?

the `str` method

It would be nice to be able to print the list, to make sure it is set up correctly. Something like the above example would be useful:

head-->(jeff)(zach)(lauri)<--tail

However, to get each item stored in the list, we need to access each node in the list. This is typically done with a loop to visit each node in the list (i.e., list traversal). Here is how you could manually set a pointer to the first node in the list, get the item stored there, and then move the pointer over to the next node in the list:

s = "head-->"
curr = self.head
s += "(%s)" % str(curr.getItem())
curr = curr.getNext()

Note the curr = curr.getNext()...this moves the pointer from one node to the next. Putting this in a loop, we can visit each node in the list and create/accumulate a string-representation of the list:

def __str__(self):
  """return string representation of linked list"""
  s = "head-->"
  curr = self.head
  for i in range(self.size):
    s += "(%s)" % str(curr.getItem())
    curr = curr.getNext()
  s+="<--tail"
  return s

Note: if the list is empty, the for loop never executes, so all we get is "head--><--tail", which shows an empty list.

If you want you can change the str method to use square brackets and commas between items, so the linked list prints exactly like the python list.

the `len()` method

Adding a getter for self.size could be useful. Calling it __len__() allows you to use len(LL), just like we've done with python lists.

testing

Once you have the above methods written, add some test code to make sure everything works. Try to test all possible cases. Thoroughly testing now will help eliminate bugs that can cause lots of headaches later!!

Use the assert statement to add as many tests as you can think of.

# test init,str,append,len
LL = LinkedList()
assert len(LL) == 0
assert str(LL) == "head--><--tail"
LL.append("A")
LL.append("B")
LL.append("C")
assert len(LL) == 3
assert str(LL) == "head-->(A)(B)(C)<--tail"

the `deleteHead()` method

Last time we created methods to add nodes to the list. Today we want to write methods to delete nodes from the list. We will write two methods: deleteHead() and deleteTail(), the opposites of prepend and append. Later we will write methods to add and delete nodes from anywhere in the list.

Typically, when deleting a node or item from a list, we want to do something with that item. So for all delete methods, we want to return the item that is stored in the node we are deleting.

To delete the head node, we need to do these six steps:

save the item from the current head node
find what will eventually be the new head node
cut the link from the current head to the new head
reset the head to be the new head node
decrease the size of the list by one
return the item we saved in step 2

However, if there are 1 or fewer nodes in the list, we will have to take care of those special cases:

return None if there were zero nodes in the list (i.e., no nodes to delete)
set both head and tail to None if there was only one node in the list (which we just deleted)

See if you can write the deleteHead() method. Make sure you add lots of test code. Here is how we would like it to work:

LL = LinkedList()
LL.append("A")
LL.append("B")
LL.append("C")
item = LL.deleteHead()
assert item == "A"
assert len(LL) == 2
item = LL.deleteHead()
assert item == "B"
assert len(LL) == 1
item = LL.deleteHead()
assert item == "C"
assert len(LL) == 0
item = LL.deleteHead()
assert item == "None"
assert len(LL) == 0

If you are having trouble getting your method to work, try drawing the linked list on paper and working through your code by hand. Make sure you understand what is happening at each step.

the `deleteTail()` method

Deleting the tail node is very similar to deleting the head node. However, the second step above, finding what will be the new tail node, requires a list traversal. For example, if we have 5 nodes in a list, and we want to delete the tail node, the new tail will be the 4th node in the current list. And to get to the 4th node, we could do something like this:

newtail = self.head                # start at head
for i in range(self.size - 2):     # move over 3 times
  newtail = newtail.getNext()

Once you have what will be the new tail node, the rest is very similar to the deleteHead method. This includes the special cases when the size of the list is zero or one.

See if you can add the deleteTail method to your linked list class. Make sure you test it thoroughly!

LL = LinkedList()
LL.append("A")
LL.append("B")
LL.append("C")
item = LL.deleteTail()
assert item == "C"
assert len(LL) == 2
item = LL.deleteTail()
assert item == "B"
assert len(LL) == 1
item = LL.deleteTail()
assert item == "A"
assert len(LL) == 0
item = LL.deleteTail()
assert item == "None"
assert len(LL) == 0

analysis

Note, to delete the tail node, we need to find the node before the current tail node, which requires a list traversal. If the list is longer, the list traversal will take longer (more steps!). That makes deleteTail() an O(N) algorithm (where N is the number of nodes/items in the list). Is L.pop() also an O(N) algorithm?

Summary...Linked Lists vs Python lists

Now that we have a working LinkedList class, what are some of the pros and cons of linked lists vs python lists?

Remember, here is a picture of how each is stored/implemented in the computer's memory:

computer memory

append

To add an item to the end of a python list, we need to find the correct spot in memory (python does this for us). Since this is just a simple calculation (base address + len(L)), appending to a python list is an O(1) operation. That is, it doesn't matter how long the list is (assuming there is available free memory at the end of the list).

The same is true for a linked list: adding a node to the end of the list is O(1), since it doesn't depend on the size of the list. There are only a few steps required: create new node, connect it to the tail, reset the tail, increment the size.

So for appending to the list, both are the same.

prepend

To add an item to the beginning of a python list, if we want to preserve the current base address, requires moving all other items in the list over by one slot. This clearly depends on the size of the list, making it an O(N) operation.

You can see this by running some simple tests using append(item) and insert(0,item):

$ cat testpythonlists.py 
"""
test/time various python list operations
"""

from time import time

# ------------------------------------------------ #

def main():

  N = int(raw_input("N: "))

  L = []
  t1 = time()
  for i in range(N):
    L.append(i)
  t2 = time()
  total = t2 - t1
  print("append time:  %6.3f  (for L of size %d)" % (total, len(L)))

  L = []
  t1 = time()
  for i in range(N):
    L.insert(0, i)
  t2 = time()
  total = t2 - t1
  print("insert@0 time:  %6.3f  (for L of size %d)" % (total, len(L)))

# ------------------------------------------------ #

main()

Here's a simple run of the above program. Notice the difference between adding to the end and adding to the beginning of a python list:

$ python testpythonlists.py 
N: 100000
append time:   0.015  (for L of size 100000)
insert@0 time:   2.573  (for L of size 100000)

$ python testpythonlists.py 
N: 200000 
append time:   0.021  (for L of size 200000)
insert@0 time:  10.259  (for L of size 200000)

For the linked list, again, just a few steps needed to add a node to the beginning of a list: create node, connect it to the current head, reset the head, increase the size. This is clearly an O(1) operation, so the linked list does better here.

indexing

How about for indexing, or finding the ith item in a list? For the python list, that is an O(1) operation, since it is just a simple calculation to find the correct memory location (base address + i). For the linked list, to get to the ith node, we need to start at the head of the list and move over i times. This is an O(N) operation, so the python list does better at indexing (which we've used a lot this semester!).

summary

See if you can figure out whether each method we have written (e.g., deleteHead(), __str__(), deleteTail(), etc) is O(1), O(N), or something else. And compare that to the python list.

CS21 Topics