Using Valgrind
What is Valgrind?
Valgrind is a debugger and memory profiler for Linux executables. In Data Structures and Algorithms we will focus on using the debugging tool to diagnose memory leaks and errors in the programs that we write.
What are memory leaks and errors?
A memory leak occurs when a program allocates memory, e.g. using the C++
keyword new
, and that memory is never released by the program. Memory leaks
occur in C++ when you use new
to allocate memory but forget to delete
that
memory. This is bad because it wastes resources that other programs could use,
and can eventually lead to the computer running your program freezing up.
A memory error occurs when a program uses memory in an inappropriate way. The two most common types of memory errors in Data Structures and Algorithms are uninitialized memory errors and invalid read/write errors.
- Uninitialized memory is when memory is set aside by a program, but never given an explicit value. This means that its value is whatever is left over from the previous user of that memory. If you try to use uninitialized memory, your program will have unpredictable behavior!
- An invalid read/write occurs when a program attempts to use memory that
it hasn’t properly allocated. This most often happens when you try to
access memory that is outside the bounds of an array, e.g. index
n
in an array of sizen
. (The array would only have indices0
ton-1
.)
Between memory leaks and errors, errors are far more important. An error means that your program’s behavior is unpredictable, and so incorrect. A leak merely indicates that resources are being wasted, but the program may still be functioning properly otherwise. Prioritize first fixing errors, and then leaks.
Using Valgrind
To use Valgrind, first make sure that you compile using -g
flag. (The
Makefile
s provided in your lab assignments will do this for you.) This
flag, which is supported by most popular C++ compilers, adds extra debugging
information to the output executable that helps Valgrind display more
information about the program. For example, if you have a one-file C++
program named foo.cpp
, you could run:
clang++ -std=c++11 -g foo.cpp -o foo
Then, you can run Valgrind by putting valgrind
before a normal execution of
your program. For example, assuming you would normally run ./foo
, you instead
run
valgrind ./foo
Examples
Finding uninitialized memory
The following C++ program contains an uninitialized memory error.
#include <iostream>
using namespace std;
int main() {
// We are declaring n, but we don't initialize it.
int n;
cout << "Hello!" << endl;
for (int i=0;i<n;i++) {
cout << i << endl;
}
cout << "Goodbye!" << endl;
return 0;
}
The variable n
is not initialized, so it contains what we call “junk”: there’s
some int
value in the n
variable, but we can’t be sure what it is and it
doesn’t mean anything to us. A program that depends on junk values has
unpredictable behavior: sometimes it might work properly, but other times it
might behave incorrectly or even crash. Finding these bugs can be challenging
because a program might behave correctly even if it encounters one.
We will use valgrind
to identify cases in which this might happen.
valgrind
can identify memory errors even if they don’t cause the program to
crash. For instance, running valgrind
on the above program might produce the
following output:
$ valgrind ./foo
==136134== Memcheck, a memory error detector
==136134== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==136134== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==136134== Command: ./foo
==136134==
Hello!
==136134== Conditional jump or move depends on uninitialised value(s)
==136134== at 0x1091FE: main (foo.cpp:11)
==136134==
Goodbye!
==136134==
==136134== HEAP SUMMARY:
==136134== in use at exit: 0 bytes in 0 blocks
==136134== total heap usage: 2 allocs, 2 frees, 73,728 bytes allocated
==136134==
==136134== All heap blocks were freed -- no leaks are possible
==136134==
==136134== Use --track-origins=yes to see where uninitialised values come from
==136134== For lists of detected and suppressed errors, rerun with: -s
==136134== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
In this output, lines that start iwth ==136134==
are Valgrind’s output. The
output of the program itself is printed as usual. The number 136134
is the
process ID of the program. We can safely ignore it here.
Let’s break the output into sections based on the information provided. The first section prints Valgrind’s copyright information, information about its version, as well as how Valgrind was invoked. This part essentially says “Hello! I’m Valgrind.”
==136134== Memcheck, a memory error detector
==136134== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==136134== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==136134== Command: ./foo
==136134==
Next, the program starts running. It prints out the “Hello!
” string, but
Valgrind identifies an error and prints it.
==136134== Conditional jump or move depends on uninitialised value(s)
==136134== at 0x1091FE: main (foo.cpp:11)
We will discuss this error below. The program continues running (despite the
error) and prints the “Goodbye
” string and finishes running. Valgrind then
prints a summary of how much heap memory the program used (which, roughly
speaking, is the amount of memory we dynamically allocated).
==136134== HEAP SUMMARY:
==136134== in use at exit: 0 bytes in 0 blocks
==136134== total heap usage: 2 allocs, 2 frees, 73,728 bytes allocated
==136134==
==136134== All heap blocks were freed -- no leaks are possible
The C++ runtime allocates some memory for us to handle things like command-line arguments, but this summary says that we don’t have any memory leaks. Finally, Valgrind prints out a summary and suggestions for further debugging steps.
==136134== Use --track-origins=yes to see where uninitialised values come from
==136134== For lists of detected and suppressed errors, rerun with: -s
==136134== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
During this session, Valgrind reported an error: “Conditional jump or move
depends on unitialised value(s)”. It includes the text foo.cpp:11
, which
indicates the file and line number where the error occurred. This corresponds
to the following line of code:
for (int i=0;i<n;i++) {
Now that Valgrind has helped us out, we have to do the reasoning ourselves.
There are two variables used on this line: i
and n
. We can see that the
variable i
is initialzied in the for
loop, so it can’t be the source of
our problem. We conclude that n
must be uninitialized. Note that this is
not the line that we need to fix; it’s just the first line where Valgrind saw
an uninitialized variable being used. It’s up to us to figure out why n
was
never given a value.
Finding an invalid read/write
Suppose we have the following snippet in our program:
#include <iostream>
using namespace std;
int main() {
// Initialize size and arr
int size = 10;
int* arr = new int[size];
for (int i=0; i<size; i++) {
arr[i] = i*i;
}
// Print out arr
for (int j=0; j<=size; j++) {
cout << arr[j] << endl;
}
// Clean up
delete[] arr;
return 0;
}
If we look carefully, the second loop exceeds the bounds of the array arr
.
In a more complicated program, this kind of mistake could be easy to overlook.
Thankfully, Valgrind can help us find this problem. Here’s an example session
of Valgrind running the above program:
$ valgrind ./foo
==136387== Memcheck, a memory error detector
==136387== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==136387== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==136387== Command: ./foo
==136387==
0
1
4
9
16
25
36
49
64
81
==136387== Invalid read of size 4
==136387== at 0x10925A: main (foo.cpp:15)
==136387== Address 0x4d7bca8 is 0 bytes after a block of size 40 alloc'd
==136387== at 0x484220F: operator new[](unsigned long) (vg_replace_malloc.c:640)
==136387== by 0x109207: main (foo.cpp:8)
==136387==
0
==136387==
==136387== HEAP SUMMARY:
==136387== in use at exit: 0 bytes in 0 blocks
==136387== total heap usage: 3 allocs, 3 frees, 73,768 bytes allocated
==136387==
==136387== All heap blocks were freed -- no leaks are possible
==136387==
==136387== For lists of detected and suppressed errors, rerun with: -s
==136387== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
As with the previous program, this program prints output to the terminal. That
output is mixed with Valgrind’s output, but it’s easy to determine which is
which because Valgrind adds a prefix (here, “==136387==
”) to all of its
output.
Let’s focus on Valgrind’s error message:
Because this program outputs to the terminal, that output is mixed in with Valgrind’s output. Let’s look at the error message.
Invalid read of size 4
at 0x10925A: main (foo.cpp:15)
Address 0x4d7bca8 is 0 bytes after a block of size 40 alloc'd
at 0x484220F: operator new[](unsigned long) (vg_replace_malloc.c:640)
by 0x109207: main (foo.cpp:8)
The Invalid read of size 4
at foo.cpp:15
points to the line:
cout << arr[j] << endl;
An “invalid read” is when our program looks at memory that it does not own.
This line uses arr[j]
to compute the location of the j
-indexed slot of the
arr
array. The problem is that, during the very last execution of the loop,
j
is equal to size
, which is out of the bounds of the array we allocated.
Valgrind helpfully points out where arr
was allocated in the rest of the
error message: foo.cpp:8
, mentioned on the last line of the error, corresponds
to the following line of code:
int* arr = new int[size];
Again: it’s up to us to figure out why the program went wrong. With this
information, though, we can deduce that the length of the array in arr
is
size
. This encourages us to inspect the loop for which j
is the iterator
variable. We can see that this loop’s condition is j<=size
, which will cause
it to run once when j
is equal to size
. This is out of the bounds of the
array: our loop went too far! We can fix this by correcting the loop bound to
j<size
.
Finding a memory leak
Valgrind can also detect when our program leaks memory. If we run Valgrind on the program
#include <iostream>
using namespace std;
int main() {
// Initialize size and arr
int* arr = new int[8000];
return 0;
}
then we might see the following session output:
==136597== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==136597== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==136597== Command: ./foo
==136597==
==136597==
==136597== HEAP SUMMARY:
==136597== in use at exit: 32,000 bytes in 1 blocks
==136597== total heap usage: 2 allocs, 1 frees, 104,704 bytes allocated
==136597==
==136597== LEAK SUMMARY:
==136597== definitely lost: 32,000 bytes in 1 blocks
==136597== indirectly lost: 0 bytes in 0 blocks
==136597== possibly lost: 0 bytes in 0 blocks
==136597== still reachable: 0 bytes in 0 blocks
==136597== suppressed: 0 bytes in 0 blocks
==136597== Rerun with --leak-check=full to see details of leaked memory
==136597==
==136597== For lists of detected and suppressed errors, rerun with: -s
==136597== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Note that there are no memory errors: Valgrind’s output moves straight from the
introduction to the heap summary. However, the leak summary indicates that we
lost 32,000 bytes of memory. Here, “lost” means that the memory was allocated
by our program but never freed. In this small program, the problem is fairly
obvious: we never used delete[] arr
to free the array we allocated.
While it’s useful for Valgrind to tell us that it discovered a leak, we need to
tell Valgrind to be even more thorough if we want more information. Valgrind
gives us the hint: “Rerun with --leak-check=full
to see details of leaked
memory”. So we run Valgrind again, but instead of running valgrind ./foo
, we
run valgrind --leak-check=full ./foo
:
==136650== Memcheck, a memory error detector
==136650== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==136650== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==136650== Command: ./foo
==136650==
==136650==
==136650== HEAP SUMMARY:
==136650== in use at exit: 32,000 bytes in 1 blocks
==136650== total heap usage: 2 allocs, 1 frees, 104,704 bytes allocated
==136650==
==136650== 32,000 bytes in 1 blocks are definitely lost in loss record 1 of 1
==136650== at 0x484220F: operator new[](unsigned long) (vg_replace_malloc.c:640)
==136650== by 0x1091B8: main (foo.cpp:7)
==136650==
==136650== LEAK SUMMARY:
==136650== definitely lost: 32,000 bytes in 1 blocks
==136650== indirectly lost: 0 bytes in 0 blocks
==136650== possibly lost: 0 bytes in 0 blocks
==136650== still reachable: 0 bytes in 0 blocks
==136650== suppressed: 0 bytes in 0 blocks
==136650==
==136650== For lists of detected and suppressed errors, rerun with: -s
==136650== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
This time, Valgrind produced an error message after the heap summary. This
message shows us the line of our program where the leaked memory was originally
allocated: foo.cpp:7
, which corresponds to
int* arr = new int[8000];
As with the previous examples, there’s not necessarily anything wrong with this
line of code. That is: Valgrind didn’t point us to the location of a bug.
Valgrind only shared information about what went wrong: that this allocation was
never freed with delete[]
. It’s up to us to decide what to do about it.