The Memory Hierarchy
In our characterization of common memory devices, we see a pattern emerge: as devices grow in their storage capacities, their performance drops. Said another way, we have devices that are fast and devices that store a large amount of data, but no single device does both. This trade-off between performance and capacity is known as the memory hierarchy, and figure NNN depicts the hierarchy visually. Practical systems must offer a combination of devices to meet the performance and capacity requirements of programs, and a typical system today incorporates most if not all of the devices described in the figure.
The reality of the memory hierarchy is unfortunate for programmers, who would prefer not to worry about the performance implications of where their data resides. For example, when declaring an integer in most applications, you ideally wouldn't need to agonize over the differences between it being stored in a cache or main memory. Always having to micromanage which type of memory each variable occupies would be burdensome, although it may be necessary for certain performance-critical sections of code.
For the remainder of this chapter, we'll focus on data movement for primary storage devices only (i.e., CPU caches and main memory). We'll reintroduce secondary storage and see how it fits into the bigger picture of memory management later in a discussion of virtual memory.
Locality
Luckily, most programs exhibit common memory access patterns, known as locality, and system designers have built hardware that exploits locality to automatically move data into an appropriate storage location. This automatic data movement accounts for why you've been able to write programs without worrying about memory devices thus far, aside from perhaps very coarse-grained decisions like in-memory variables vs. files. Intuitively, we want data that is being actively used to be as close to the CPU's computation circuitry as possible (e.g., in a register or cache on the CPU), whereas data that is less likely to be used can safely reside farther way (e.g., in main memory) until we eventually need it.
To a system designer, building a system that exploits locality is an abstraction problem. The system designer's goal is to provide an abstract view of memory devices such that it feels to programmers as if they have all the capacity of memory with the performance characteristics of fast on-chip storage. Of course, providing this rosy illusion to users can't be achieved perfectly, but by exploiting program locality, modern systems achieve good performance for most programs.
There are two primary forms of locality that programs exhibit:
Temporal Locality: Programs tend to access the same data repeatedly over time. That is, if a program has used a variable recently, it's likely to use that variable again soon.
Spatial Locality: Programs tend to access data that is nearby other, previously-accessed data. "Nearby" here refers to the data's memory address. For example, if a program accesses data at addresses N and N+4, it's likely to access N+8 soon.
To help illustrate the concepts of temporal and spatial locality, we'll start by adopting an example with real-world objects that you're already familiar with: books. Suppose you have have a desk in your room where you do all your homework, and the desk has a small amount of space for storing only three books. Just outside your room you have a bookshelf, which has much more space than the desk. Finally, across campus you have a library with a huge variety of books. Your "book storage hierarchy" might look something like Figure NNN. Given this scenario, we'll explore how locality can help guide which storage location we should use for our books.
Temporal Locality
Temporal locality suggests that, if there's a book you use frequently, you should keep it as close to your desk as possible. If you occasionally need to move it to your shelf to clear up temporary work space, the cost isn't too high, but it would be silly to take a book back to the library if you're just going to need it again the next day. The inverse is also true: if there's a book taking up valuable space on your desk or shelf, and you haven't used it for quite a while, that book seems like a good candidate for returning to the library.
So, which books should you move to your precious desk space? In this example, real students would probably look at their upcoming assignments and select the books that will be most useful. In other words, to make the best storage decision, we would ideally need information about future usage.
Unfortunately, hardware designers haven't discovered how to build circuits that can predict the future. As an alternative to prediction, you might propose instead that we ask the programmer or user to tell the system in advance how data will be used so that its placement can be optimized. Such a strategy may work well in specialized applications (e.g., large databases), that exhibit very regular access patterns. However, in a general-purpose system like a personal computer, requiring advance notice from the user is typically considered too large of a burden. Many users would not want to (or would be unable to) provide enough detail to help the system make good decisions.
Thus, instead of relying on future access information, systems look to the past as a predictor of what will likely happen in the future. Applying this idea to our book example allows us to build a relatively simple (but still quite effective) strategy for governing our book storage spaces:
When you need to use a book, retrieve it from wherever it currently is and move it to your desk.
If the desk is already full, move the book that you've used least recently (that is, the book that has been sitting on your desk untouched for the longest amount of time) to your shelf.
If the shelf is full, return the shelf's least recently used book to the library to free up space.
While this scheme may not be perfect, the simplicity makes it attractive. All it requires is the ability to move books between storage locations and a small amount of meta-information regarding which book was used least recently. Furthermore, it captures our two initial temporal locality statements well:
If there's a book you use frequently, it's likely to remain on your desk or shelf, preventing unnecessary trips to the library.
If there's a book you haven't used for a while, it will eventually become the least recently used book, at which point returning it to the library makes sense.
Applying this strategy to primary storage devices looks remarkably similar to the book example: as data is loaded into CPU registers from main memory, make room for it in the CPU cache. If the cache is already full, make room in the cache by evicting the least recently used cache data to main memory. In the following section, we'll explore the details of how such mechanisms are built into modern caching systems.
Spatial Locality
Spatial locality suggests that, when making a trip to the library, we should retrieve more than one book to reduce the likelihood of future library trips. Specifically, we should retrieve additional books that are "nearby" the one we need, since those that are nearby seem like good candidates for books that might otherwise turn into additional library visits.
Suppose you're taking a literature course on the topic of Shakespeare's histories. If in the first week of the course you're assigned to read Henry VI, Part I, when you find yourself in the library to retrieve it, you're likely to also find parts II and III close by on the shelves. Even if you don't yet know whether the course will assign those other two parts, it's not unreasonable to think that you might need them. That is, the likelihood of needing them is much higher than a random book in the library, specifically because they are nearby the book you do need.
In this scenario, the likelihood increases due to the way we arrange books on library shelves, but data in memory is often similarly organized. For example, constructs like arrays or structs store collections of related data in contiguous regions of memory. When iterating over all the elements in an array, there is clearly a spatial pattern in the memory addresses we access! Applying these spatial locality lessons to primary storage devices implies that, when retrieving data from main memory, we should also retrieve the data immediately surrounding it.
Locality Examples in Code
Fortunately, common programming patterns exhibit both forms of locality quite frequently. Take the following function for example:
/* Sum up the elements in an integer array of length len. */
int sum_array(int *array, int len) {
int i;
int sum = 0;
for (i = 0; i < len; i++) {
sum += array[i];
}
return sum;
}
In the code above, the repetitive nature of the for loop introduces temporal locality for i, len, sum, and array (the base address of the array), as each of these variables is accessed within every iteration of the loop. Exploiting this temporal locality allows a system to load each variable from main memory only once into the CPU cache. Every subsequent access can be serviced out of the significantly faster cache.
Accesses to the array's contents benefit from spatial locality. Even though each element of the array is only accessed once, a modern system will load more than one int at a time from memory to the CPU cache. That is, accessing the first array index will load the cache with not only the first integer, but also the next few integers after it too. Exactly how many additional integers get moved into the cache depends on the cache's block size. For example, if the block size is 16 bytes, the system will load four integers at a time. Thus, accessing the first integer will incur the relatively high cost of accessing main memory, but accesses to the next three will be served out of cache, even if we have never accessed them previously.
Up next, we'll characterize cache parameters and describe mechanisms for the hardware to make identifying and exploiting locality happen automatically.