How do python lists work? We have used them a lot this semester. Is there another way to implement lists? And if so, what are the pros and cons of each way?
What things do we normally do with python lists?
L = [] # construct
L.append("cat") # add to
print L, len(L), L[1] # print, length, indexing
L.pop(0) # remove from
Can we write/create lists from scratch (without using python lists)?
There are two main options for representing lists internally:
Here is a klunky/simplified picture of how each option might look:
For the python list, because everything is at known, easy-to-calculate memory addresses, that makes indexing fast and efficient:
L[0] is at 2000
L[1] is at 2000 + 1
L[2] is at 2000 + 2
...
L[i] is at 2000 + i
So, to get any value from the list, all we have to do is one quick calculation (base memory address + i). This is a constant time operation (i.e., it doesn't depend on the size of the list).
It is also easy to append to the python list, by using the length of the list to locate the next location to store a new item.
Some disadvantages of using consecutive memory:
A linked list is a collection of objects/nodes, where each node contains both the item stored in the list, as well as a "pointer" to the next node in the list.
Linked lists typically have three instance variables: a head pointer (self.head
),
a tail pointer (self.tail
), and the number of nodes in the list (self.size
)
In the above picture, the linked list has 5 nodes, and each node stores a simple string, as well as a pointer to the next node in the list. In class I often draw linked lists and nodes as follows:
------ ------ ------ ------ ------
self.head --> |rich| |lisa| |andy| |zach| |tia |<--self.tail
------ ------ ------ ------ ------
| |--> | |-->| |-->| |-->| |-->None
------ ------ ------ ------ ------
self.size = 5
Linked lists are often used to create other useful data structures, such as stacks and queues. They are also a good introduction to more advanced data structures, such as binary trees.
One big advantage of the linked list is not needing large chunks of consecutive memory. Each node in the list is just put in memory wherever there is space, and all nodes are "linked" together into one "list".
Also, adding an item into the middle of the list does not require shifting all other elements down one slot. Simply reset the next "pointers" to include the new node.
We will implement a linked list using two classes:
Node
, which will store a data value (the list item) and a
"pointer" to the next node in the listLinkedLlist
, which will store pointers to the head Node and
the tail Node, as well as the current size of the list.Node
class:Let's write a Node
class, where each node has two instance
variables: one to hold the node data (call it "data"), and
one to hold a pointer to another node (call it "next").
Later we will use these node objects to build a linked-list!
Add two mutators and two accessors for the data and next variables, then test it out.
class Node(object):
def __init__(self, item):
"""
construct new node, with given item.
next initially set to None.
"""
self.item = item
self.next = None
def __str__(self):
"""return a string representation of the node"""
s = "("+str(self.item)+")"
if self.next == None:
s += "--|"
else:
s += "-->"
return s
def getItem(self): return self.item
def getNext(self): return self.next
def setItem(self, i):
"""set node item to i"""
self.item = i
def setNext(self, n):
"""set node next to n"""
self.next = n
if __name__ == "__main__":
n1 = Node("jeff")
n2 = Node("zach")
n3 = Node("lauri")
n4 = Node("rich")
n1.setNext(n2)
print(n1)
n2.setNext(n3)
print(n2)
n3.setNext(n4)
LinkedList
classLast time we wrote a Node
class to be used to create our linked lists.
A Node
is just a "container" to hold two things: the item or data we want
stored in the list, plus a next
"pointer" to point to the next node in the list.
Here is an example of how we could manually create nodes and link them together:
>>> from node import *
>>> n1 = Node("jeff")
>>> n2 = Node("zach")
>>> n3 = Node("lauri")
>>>
>>> print(n1)
(jeff)--|
>>>
>>> n1.setNext(n2)
>>> n2.setNext(n3)
>>>
>>> print(n1)
(jeff)-->
>>> print(n2)
(zach)-->
>>> print(n3)
(lauri)--|
Instead of doing this manually, we could create a LinkedList
class to allow
us to create linked lists just like python lists. Here is what I would like
to be able to do:
>>> from linkedlist import *
>>> LL = LinkedList()
>>> LL.append("jeff")
>>> LL.append("zach")
>>> LL.append("lauri")
>>> print(LL)
head-->(jeff)(zach)(lauri)<--tail
Each linked list should have three instance variables: self.head
, self.tail
, and self.size
.
All methods we write should make sure all three are updated accordingly. For example, when appending
to the list, the self.tail
pointer should be adjusted, as well as the size of the list. And if
it is the first node being added/appended to the list, then the self.head
pointer will also
need to be set (i.e., if there is just one node in the list, both the head and the tail should
point to that one node).
For the constructor, we want to build an empty linked list object, so self.head
and
self.tail
don't point to anything, and the size is zero. We use python's None
type
when things don't point to anything:
def __init__(self):
"""construct an empty linked list"""
self.head = None
self.tail = None
self.size = 0
append
methodTo append a new node, we need to create the node with the given item stored in the node, and then add it to the end of the list. That means connecting the current tail of the list to the newly-created node, then resetting the tail to be the newly-created node. Something like this:
n = Node(item)
self.tail.setNext(n) # connect to end
self.tail = n # reset tail pointer
self.size += 1
However, this only works if there are already nodes in the list. If this is the
first node in the list, we have to account for that (set the head to point to
this node, and don't connect the current tail). Here is the final version of
our append()
method:
def append(self, item):
"""add new node, containing item, to end of the list"""
n = Node(item)
if self.size == 0:
self.head = n
else:
self.tail.setNext(n)
self.tail = n
self.size += 1
prepend
methodCan you create a similar method called prepend
, to add nodes to the beginning
of the list?
str
methodIt would be nice to be able to print the list, to make sure it is set up correctly. Something like the above example would be useful:
head-->(jeff)(zach)(lauri)<--tail
However, to get each item stored in the list, we need to access each node in the list. This is typically done with a loop to visit each node in the list (i.e., list traversal). Here is how you could manually set a pointer to the first node in the list, get the item stored there, and then move the pointer over to the next node in the list:
s = "head-->"
curr = self.head
s += "(%s)" % str(curr.getItem())
curr = curr.getNext()
Note the curr = curr.getNext()
...this moves the pointer from one node to the next.
Putting this in a loop, we can visit each node in the list and create/accumulate a
string-representation of the list:
def __str__(self):
"""return string representation of linked list"""
s = "head-->"
curr = self.head
for i in range(self.size):
s += "(%s)" % str(curr.getItem())
curr = curr.getNext()
s+="<--tail"
return s
Note: if the list is empty, the for
loop never executes, so all we get is
"head--><--tail", which shows an empty list.
If you want you can change the str
method to use square brackets and commas
between items, so the linked list prints exactly like the python list.
len()
methodAdding a getter for self.size
could be useful. Calling it __len__()
allows you to use len(LL)
, just like we've done with python lists.
Once you have the above methods written, add some test code to make sure everything works. Try to test all possible cases. Thoroughly testing now will help eliminate bugs that can cause lots of headaches later!!
Use the assert
statement to add as many tests as you can think of.
# test init,str,append,len
LL = LinkedList()
assert len(LL) == 0
assert str(LL) == "head--><--tail"
LL.append("A")
LL.append("B")
LL.append("C")
assert len(LL) == 3
assert str(LL) == "head-->(A)(B)(C)<--tail"
deleteHead()
methodLast time we created methods to add nodes to the list. Today we want to write
methods to delete nodes from the list. We will write two methods: deleteHead()
and deleteTail()
, the opposites of prepend and append. Later we will write methods
to add and delete nodes from anywhere in the list.
Typically, when deleting a node or item from a list, we want to do something with that item. So for all delete methods, we want to return the item that is stored in the node we are deleting.
To delete the head node, we need to do these six steps:
However, if there are 1 or fewer nodes in the list, we will have to take care of those special cases:
None
if there were zero nodes in the list (i.e., no nodes to delete)None
if there was only one node in the list (which
we just deleted)See if you can write the deleteHead()
method. Make sure you add lots of
test code. Here is how we would like it to work:
LL = LinkedList()
LL.append("A")
LL.append("B")
LL.append("C")
item = LL.deleteHead()
assert item == "A"
assert len(LL) == 2
item = LL.deleteHead()
assert item == "B"
assert len(LL) == 1
item = LL.deleteHead()
assert item == "C"
assert len(LL) == 0
item = LL.deleteHead()
assert item == "None"
assert len(LL) == 0
If you are having trouble getting your method to work, try drawing the linked list on paper and working through your code by hand. Make sure you understand what is happening at each step.
deleteTail()
methodDeleting the tail node is very similar to deleting the head node. However, the second step above, finding what will be the new tail node, requires a list traversal. For example, if we have 5 nodes in a list, and we want to delete the tail node, the new tail will be the 4th node in the current list. And to get to the 4th node, we could do something like this:
newtail = self.head # start at head
for i in range(self.size - 2): # move over 3 times
newtail = newtail.getNext()
Once you have what will be the new tail node, the rest is very similar to the
deleteHead
method. This includes the special cases when the size of the list
is zero or one.
See if you can add the deleteTail
method to your linked list class. Make sure
you test it thoroughly!
LL = LinkedList()
LL.append("A")
LL.append("B")
LL.append("C")
item = LL.deleteTail()
assert item == "C"
assert len(LL) == 2
item = LL.deleteTail()
assert item == "B"
assert len(LL) == 1
item = LL.deleteTail()
assert item == "A"
assert len(LL) == 0
item = LL.deleteTail()
assert item == "None"
assert len(LL) == 0
Note, to delete the tail node, we need to find the node before the current tail node,
which requires a list traversal. If the list is longer, the list traversal will take
longer (more steps!). That makes deleteTail()
an O(N) algorithm (where N is the number
of nodes/items in the list). Is L.pop()
also an O(N) algorithm?
Now that we have a working LinkedList
class, what are some of the
pros and cons of linked lists vs python lists?
Remember, here is a picture of how each is stored/implemented in the computer's memory:
To add an item to the end of a python list, we need to find the correct spot in memory (python does this for us). Since this is just a simple calculation (base address + len(L)), appending to a python list is an O(1) operation. That is, it doesn't matter how long the list is (assuming there is available free memory at the end of the list).
The same is true for a linked list: adding a node to the end of the list is O(1), since it doesn't depend on the size of the list. There are only a few steps required: create new node, connect it to the tail, reset the tail, increment the size.
So for appending to the list, both are the same.
To add an item to the beginning of a python list, if we want to preserve the current base address, requires moving all other items in the list over by one slot. This clearly depends on the size of the list, making it an O(N) operation.
You can see this by running some simple tests using append(item)
and insert(0,item)
:
$ cat testpythonlists.py
"""
test/time various python list operations
"""
from time import time
# ------------------------------------------------ #
def main():
N = int(raw_input("N: "))
L = []
t1 = time()
for i in range(N):
L.append(i)
t2 = time()
total = t2 - t1
print("append time: %6.3f (for L of size %d)" % (total, len(L)))
L = []
t1 = time()
for i in range(N):
L.insert(0, i)
t2 = time()
total = t2 - t1
print("insert@0 time: %6.3f (for L of size %d)" % (total, len(L)))
# ------------------------------------------------ #
main()
Here's a simple run of the above program. Notice the difference between adding to the end and adding to the beginning of a python list:
$ python testpythonlists.py
N: 100000
append time: 0.015 (for L of size 100000)
insert@0 time: 2.573 (for L of size 100000)
$ python testpythonlists.py
N: 200000
append time: 0.021 (for L of size 200000)
insert@0 time: 10.259 (for L of size 200000)
For the linked list, again, just a few steps needed to add a node to the beginning of a list: create node, connect it to the current head, reset the head, increase the size. This is clearly an O(1) operation, so the linked list does better here.
How about for indexing, or finding the ith item in a list? For the python list, that is an O(1) operation, since it is just a simple calculation to find the correct memory location (base address + i). For the linked list, to get to the ith node, we need to start at the head of the list and move over i times. This is an O(N) operation, so the python list does better at indexing (which we've used a lot this semester!).
See if you can figure out whether each method we have written (e.g., deleteHead()
,
__str__()
, deleteTail()
, etc) is O(1), O(N), or something else. And compare
that to the python list.