Lab 3: Basic Encryption
Due on Sunday, February 14th at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, you will work with a partner to write a program which can apply a variety of simple encryption and decryption techniques to text files. This will help you gain the following:
- More experience with files and command-line arguments in C++
- Familiarity with object-oriented programming, especially polymorphism
- An understanding of abstraction via interfaces
- Practice writing tests for your own code
- Experience developing software with another programmer
This is your first lab working as a pair, so your repository URL will be a bit different this time. You will clone git@github.swarthmore.edu:cs35-s16/lab03-<team-name>
. Your team name will be e-mailed to you prior to the start of this lab. You can see a list of all of your repositories on the Swarthmore GitHub in the lower-right side of the main page; make sure to switch your view to the organization for this class to see them.
Encryption and Ciphers
Encryption is the process of encoding messages in such a way that only authorized parties can read the message. The idea of encrypting messages is thousands of years old and has been particularly useful in the military, where it is important to exchange messages (e.g. “Attack at dawn!” vs “Hold your ground”) without enemies being able to understand them in the event a message is intercepted.
A cipher is a kind of encryption scheme where each letter is replaced by another. Ciphers have been used since at least Roman times. Julius Caesar used an cipher in which each letter of the message was replaced with one three positions earlier in the alphabet. Using the English alphabet, for instance, ‘A’ is encrypted as ‘X’, ‘B’ is encrypted as ‘Y’, ‘C’ is encrypted as ‘Z’, ‘D’ is encrypted as ‘A’, and so on. An encrypted message is called the ciphertext. If an enemy intercepted the ciphertext, it would look like a jumble of letters. Caesar’s subordinates knew the encryption scheme, however, and could could decrypt the message by employing doing the opposite of the encryption scheme: ‘A’ is decrypted as ‘D’, ‘B’ is decrypted as ‘E’, etc. This cipher was used so effectively that it became known as the Caesar cipher.
In this lab, you will write a program equipped with multiple different ciphers:
- The Caesar cipher, as described above. The code for this cipher has been given as an example in your initial repository.
- A rotation cipher, which works similarly to the Caesar cipher but allows the user to specify the number of rotation steps. The Caesar cipher is a rotation cipher with a shift of -3 (that is, three to the left); we can also imagine rotation ciphers with a distance of 1, 4, -5, 12, etc.
- A substitution cipher, which replaces each letter with another letter according to a pattern given by the user. In this cipher, the user explicitly indicates which letter replaces ‘A’, which letter replaces ‘B’, and so on; they do not have to follow a rotation.
- A simplified version of the Enigma cipher. The Enigma Machine, a device created to implement the Enigma cipher, first appeared in the 1920s; later versions of this machine were used by the German military during World War II. The success of Polish, British, and French engineers in breaking the Enigma cipher provided military intelligence which is said to have shortened the war in Europe by two years; this effort also led to the first electrical digital computers. Our simplification of the Enigma cipher will be discussed below.
Our ciphers will concern the encryption only of English alphabetic characters ('a'
through 'z'
as well as A
through Z
). The ciphers will ignore all other characters; the digit character '1'
, for instance, will be encrypted to itself. Although this would be a poor decision in any real cryptographic setting, it helps to simplify our assignment.
Your Starting Point
In this lab, you will create one class to represent each cipher you must write. Each of these classes will be subclasses of the abstract Cipher
class, which represents the interface of all ciphers. Your subclasses will implement functions (such as encrypt
and decrypt
); the behavior of these functions will differ with each cipher. In addition, you will write the main program to allow these ciphers to be used to encrypt and decrypt text files.
Your starting repository consists of the following:
- A
Makefile
. In the fashion of the last lab,make
will build the program whilemake tests
will build your tests. cipher.h
, which defines the abstractCipher
class.- Files for the individual ciphers:
caesar.h
/caesar.cpp
: An implementation of the Caesar cipher which has been provided for you.rotation.h
/rotation.cpp
: Skeleton files for the rotation cipher.substitution.h
/substitution.cpp
: Skeleton files for the substitution cipher.enigma.h
/enigma.cpp
: Skeleton files for the simplified Enigma cipher.
crypto.cpp
: The file in which you will writecrypto_main
, themain
-like function for this program. As in the last lab, the realmain
function will callcrypto_main
.crypto.h
: The header forcrypto.cpp
; you won’t need to edit this file.program.cpp
: A file containing the simplemain
function that callscrypto_main
; you won’t need to edit this file.tests.cpp
: A file containing your tests.test_utils.h
/test_utils.cpp
: A file containing testing utilities (e.g.RUN_MAIN
). You won’t need to edit this file, although it contains an example of how to declare functions with a variable number of arguments in C++ if you’re curious.test_data
: A directory containing some sample data for testing.test_output
: A directory to use during testing for output files.
User Interface
As with the previous lab, we will control your program via command-line arguments. Invocations to your program will take the following form:
./crypto <input-file> <operation> <configuration-file> <output-file>
Your program will always have four arguments (other than the name of the program itself). If the user provides invalid arguments (e.g. files that don’t exist or an invalid mode of operation), you must produce an appropriate error message and terminate your program with a non-zero exit code.
The input-file
argument is the name of an existing text file which we will encrypt or decrypt. The output-file
argument is the name of the file to which we will write our results. The operation
argument is one of encrypt
or decrypt
, depending upon the user’s needs. Finally, the configuration-file
argument is the name of an existing text file which tells us which cipher to use and how to configure it. For instance, we might invoke our program in the following ways:
./crypto secret.txt encrypt caesar.cfg secret.txt.encrypted
./crypto message.txt.encrypted decrypt caesar.cfg message.txt
./crypto orders.txt.encrypted decrypt enigma.cfg orders.txt
Configuration Files
For this assignment, configuration files are simply text files that describe how to create our Cipher
objects. Each time your program is run, the user must name a configuration file; you will read that file to determine how to create your Cipher
object and then use that object to encrypt or decrypt the given data. If, when reading the configuration file, you encounter an I/O error or discover invalid contents, you must generate an appropriate error message and terminate your program with a non-zero exit code.
All configuration files will begin by naming, in a single word, the cipher being used. The contents of the rest of the file depend upon this cipher name:
caesar
: The Caesar cipher is to be used. A configuration file indicating the Caesar cipher will contain no other information.rotation
: A rotation cipher is to be used. The next and only content of this file will be an integer indicating the shift of the cipher.-
substitution
: A substitution cipher is to be used. The next and only content of this file will be a 26-character upper-case string containing the substitution. For instance, if the configuration file containssubstitution QPWOEIRUTYLAMZKSXCJDNFHVGB
then we must create a
SubstitutionCipher
such thatA
encrypts toQ
,B
encrypts toP
,C
encrypts toW
, and so on. Your program should check that the substitution string is exactly 26 letters long and contains no repeats; that is,AAAAAAAAAAAAAAAAAAAAAAAAAA
is not a valid substitution string. enigma
: The simplified Enigma cipher is to be used. The file contains three 26-character upper-case strings (similar to the substitution cipher above) and you must check their validity in the same fashion. These three substitution strings represent the rotors of our Enigma Machine, which we will discuss below.
Testing Your Program
The test_data
directory for this assignment contains three different kinds of files:
- Configuration files, named as
caesar.cfg
orenigma1.cfg
. You may create your own configuration files, but several have been provided for you. - Unencrypted text files, named as
message1.txt
. - Encrypted text files, named as
message1__caesar.txt.encrypted
ormessage1__enigma1.txt.encrypted
. The portion of the filename following the two underscores indicates the configuration file which was used to encrypt the text. Thus,message1.txt
encrypted usingenigma1.cfg
should produce the same text as appears inmessage1__enigma1.txt.encrypted
; likewise,message1__enigma1.txt.encrypted
decrypted usingenigma1.cfg
should produce the same text as appears inmessage1.txt
.
You are required to write tests for each of your ciphers as well as for your main program. This lab provides you with a test_utils.h
file which defines RUN_MAIN
and CHECK_FILES_EQUAL
in a fashion similar to the previous lab. Make sure you write tests as soon as you can! This means that you should be writing test code after completing each encrypt
and decrypt
method and for your main
as soon as it is able to process each type of settings
file.
To get you started, the initial tests.cpp
contains a few tests for the Caesar cipher.
Writing Your Program
This lab is most easily approached when broken down into the following steps.
- Understand what is happening in
cipher.h
andcaesar.h
/caesar.cpp
. It might help to look at the example tests intests.cpp
. - Write the rotation cipher. This involves writing both the
encrypt
anddecrypt
methods, which can be written separately. After you complete each of those methods, write appropriate tests for them! - Write the substitution cipher similarly to the rotation cipher above.
- Write the
crypto_main
function. This will involve writing code that will read the input file one character at a time, give each character to the cipher, and write the resulting character to the output file. You will also need to create the cipher based on the provided configuration file; you will need to updatecrypto_main
for each cipher you write. - Write the simplified Enigma cipher. You should not do this until your other ciphers are finished! The ninjas will be instructed not to help you with the Enigma cipher until your other ciphers are finished.
The above steps do not have to be performed strictly in that order; the diagram shows the dependencies between the different tasks. A solid arrow points from a task that must wait to the task it must wait for; for instance, you should understand the Cipher
class before you do anything else, but you can work on the substitution cipher without having completed the rotation cipher. The dotted lines indicate that the crypto_main
task will need more work once each cipher is finished. We present this diagram because it may be of particular interest to your team, depending upon how you work with your partner. You might divide up the ciphers or have one member work on crypto_main
while the other gets started on a cipher.
Most of the rest of this lab will give you hints on how to handle the tasks above.
1. Understanding Cipher
and CaesarCipher
The Cipher
class serves as the base class for all cipher definitions. It is an interface in that its sole purpose is to describe what a Cipher
can do (as opposed to how the Cipher
does those things). In particular, it defines two methods:
char encrypt(char c)
: Encrypts a single character.char decrypt(char c)
: Decrypts a single character.
The encrypt
and decrypt
methods are pure virtual methods, meaning that they do not have an implementation. Instead, each subclass must implement these functions. We can see that e.g. encrypt
is a pure virtual method because it is declared as follows:
virtual char encrypt(char c) = 0;
The virtual
keyword makes encrypt
a virtual method; for our purposes, this means that the definition of the method can be replaced by a subclass. The = 0
makes a virtual method become pure virtual, which means that subclasses must replace the definition of the method (because it has no definition otherwise). Classes which contain pure virtual methods – those methods without implementations – are called abstract classes.
Abstract classes cannot be instantiated (since they aren’t really finished or ready to use). By polymorphism, however, every instance of a subclass is an instance of its parent class, so we can get instances of Cipher
by creating instances of one of its subclasses (e.g. CaesarCipher
). For instance, consider:
Cipher* my_cipher = new CaesarCipher();
The variable my_cipher
is guaranteed to point to some instance of the Cipher
class. Since Cipher
is an abstract class, we know that the actual object must be one of the subclasses of Cipher
but, depending on the code, we might not know which one. Here, we can see that the pointer always refers to a CaesarCipher
. But the specific cipher isn’t necessarily known in the following:
Cipher* my_cipher;
if (x%2==0) { // or any other condition...
my_cipher = new CaesarCipher();
} else {
my_cipher = new RotationCipher(4);
}
char c = my_cipher->encrypt('a');
In the above, my_cipher
contains either a CaesarCipher
or a RotationCipher
. We don’t know which, but we don’t care: as long as the object in question is a Cipher
object, we know that it has an encrypt
method and that’s all we want from it.
For our purposes, we will assume that only upper-case letters are passed to the encrypt
method. This assumption will hold for all implementations of Cipher
. It will be the responsibility of crypto_main
to make sure that this is true.
2. Writing the Rotation Cipher
The provided rotation.h
/rotation.cpp
files give a skeleton for the RotationCipher
class. Like CaesarCipher
, it is a specialization of the Cipher
class and is defined accordingly. Although we have provided signatures for the encrypt
and decrypt
methods, we have not provided any private
data members. You’ll need to add your own private
data, constructor arguments, and implement the methods of the class appropriately.
In the encrypt
method, we are assuming that the input is an upper-case letter character. A RotationCipher
should move that letter some number of positions in the alphabet, but that movement will vary based on the RotationCipher
’s configuration. For that reason, you should modify the provided files so that RotationCipher
’s constructor takes an int
argument (named e.g. rotation
) and stores it in a field. Then, within the encrypt
method, you can use rotation
to determine the correct letter.
Remember that you might have more work to do than a simple addition, since character addition might move you outside of the range of capital letters. If rotation
is 1
, for instance, Z
should encrypt to A
; in C++, however, 'Z'+1
evaluates to '[
(since [
is the next ASCII character). There are many ways to solve this problem, but we’d recommend converting the letter to an index (0
for A
, 1
for B
, etc.) and using modulus to keep the index within range.
The decrypt
method will work very much like the above, but it must perform the opposite rotation. This means you will have to consider different boundary conditions.
3. Writing the Substitution Cipher
Similar to the rotation cipher above, substitution.h
and substitution.cpp
contain very basic skeletons of the SubstitutionCipher
class; you will need to provide your own constructor arguments and fields. Instead of accepting a rotation int
as an argument to store as a field, you should instead accept a substitution string
. In order to encrypt a capital letter, you must then perform the following steps:
- Calculate the index of the letter in the alphabet (
0
forA
,1
forB
, etc.). - Look up the letter in the substitution string at that index.
Decryption is a little trickier in this case. Instead of calculating a position by index, you will need to perform a linear search of the substitution string (look at each character in turn) to find the encrypted character you have. Then, you can calculate the decrypted character based upon the index of that character.
4. Writing crypto_main
The main
function of this program simply calls crypto_main
(similar to picfilter_main
from the last lab), so crypto_main
serves as the entry point to your program. The execution of this function can be broken down into the following steps:
- Read the configuration file to create a cipher.
- Open the input and output files.
- Read each character from the input file and write some corresponding character to the output file.
- Close the input and output files and clean everything up.
If anything goes wrong along the way (input file doesn’t exist, configuration file has a bad format, etc.), then crypto_main
should return a non-zero exit code to terminate the program.
Recall that we have been assuming in the Cipher
classes that the input is always an upper-case letter. This may not be true of every character in your input file, so crypto_main
has to bridge this gap. The logic for handling a given character from the input file is as follows:
- If it is an upper-case letter (between
'A'
and'Z'
, inclusive) then pass it to the appropriate function (encrypt
ordecrypt
). - If it is a lower-case letter (between
'a'
and'z'
, inclusive) then convert it to upper-case, pass it to the appropriate function, and then convert it back to lower-case. - For any other character, do nothing; simply write the character to the output file unchanged.
Here are some suggestions to make the task of writing crypto_main
manageable:
- Create a separate function
read_configuration
which takes the name of the configuration file and returns aCipher*
. If the configuration file is present and readable, theCipher*
will be a pointer to the appropriate kind ofCipher
; if anything goes wrong, the function returnsNULL
. Theread_configuration
function can also write a message tocout
to explain what went wrong. In yourcrypto_main
, you can simply call this function to get aCipher*
and then, if the result isNULL
, exit with a non-zero status. - Feel free to use
cout
to write updates throughout your program. If your program crashes, this will help you to narrow down where things went wrong. In larger, more sophisticated projects, there are special ways in which you would output debugging messages; for now,cout
is just fine. - You won’t be able to write the entire
read_configuration
until you’ve finished all of your ciphers. At the beginning, however, you can write code to create aCaesarCipher
for Caesar cipher configurations and returnNULL
for everything else. You can then updateread_configuration
with each new cipher as you complete it. - If you add
read_configuration
tocrypto.h
, you can write unit tests for it! This could help you determine whether yourread_configuration
function is working separately from the rest of yourcrypto_main
. - Remember that you’re required to validate the contents of the configuration files; if a substitution or Enigma configuration contains a string with duplicate letters, for instance, you must reject it and return a non-zero exit code. It’d be best to write a function (e.g.
is_good_substitution_string
) so you can separate the work and then use that function from inside ofread_configuration
.
5. The Simplified Enigma Cipher
Remember: you should complete your other ciphers before attempting this cipher. The ninjas will not help you with the Enigma cipher until your other ciphers are complete and tested. The other ciphers are built upon concepts that you will need to use here, so this restriction is intended for your benefit.
The original Enigma Machine operated by using a set of three alphabetic rotors, wheels on which different letters were printed. The sequence of letters on a rotor could vary and different rotors could be installed in different locations inside of the machine. In order to encrypt a character, the machine would perform the following steps:
- Let the input character be called \(c_1\). Find the position of \(c_1\) on the first rotor and call that position \(p_1\).
- Find the character on the third rotor at position \(p_1\). Let this character be called \(c_2\).
- Find the position of \(c_2\) on the second rotor. Let this position be called \(p_2\).
- Find the character on the third rotor at position \(p_2\). Let this character be called \(c_3\). \(c_3\) is the encrypted character.
The above process seems somewhat complex, but is actually not difficult to implement. However, the security of the Enigma Machine largely came from the fact that the rotors would move after each character. You will also implement that behavior. After each character you encrypt, you will adjust the rotors as follows:
- Rotate the first rotor by one position. If the first rotor were originally the identity alphabet (
"ABC...XYZ"
), it is now effectively shifted to the left ("BCD...YZA"
). - Once the first rotor has made a full rotation (after 26 characters have been encrypted), rotate the second rotor by one position. If the first two rotors were originally the identity alphabet, then the second rotor would effectively be
"BCD..YZA"
after 26 characters are encrypted.
To perform rotations, don’t modify the rotor variables; that approach is inefficient and we care about performance in this course. Instead, add two more fields of type int
. These fields should be initialized to 0
in your constructor; they represent how far each of the first two rotors have turned. You can use them when you are performing lookups on the rotors. For example, imagine that your first rotor was the identity alphabet ("ABC...XYZ"
). After the first substitution, you would not change the value of first_rotor
; you would simply increment first_rotor_offset
. Then, when you look up the position of a given character (e.g. 'D'
), you find the position in the original string (e.g. 3
) and use the first_rotor_offset
to correct it (2
) based on the number of rotations that have occurred (1
).
To get you started on the Enigma encryption implementation, here are some hints:
- Your
EnigmaCipher
class will need fields for each of the three different rotors. You could name them e.g.first_rotor
if you like; they should have typestring
. Your constructor will need to take values for these fields so that they can be provided by the configuration file. - You can add new methods to your
EnigmaCipher
class. It might help you to add a methodposition_to_char
which, given a rotor number (0
,1
, or2
) and a position on that rotor (0
through25
) will give you back the character at that rotor position. It could also be helpful to write another methodchar_to_position
which does the opposite: given a rotor number and a character, it tells you the position of that character on the rotor. Remember to use the rotation fields mentioned above in order to write these methods. - You should also add a method
advance_rotors
which you can use to change the rotor positions (by updating your rotation fields). That way, you can call that method from bothencrypt
anddecrypt
.
Decrypting the simplified Enigma cipher might seem tricky, but it’s similar to decrypting the substitution cipher: you perform the same series of steps but in the opposite order. You’d start, for instance, by finding the position \(p_2\) of the encrypted character \(c_3\) on the third rotor; then, you’d look up the character \(c_2\) at that position on the second rotor, and so on. After you decrypt a character, make sure that you rotate the rotors in the same way as you did during encryption!
Coding Style Requirements
You are required to observe some good coding practices:
-
You should pick meaningful variable names.
// Good int* pixels = new int[size]; // Bad int* p = new int[size];
-
You should use correct and consistent indentation. Lines of code within a block (that is, surrounded by
{
and}
) should be indented four spaces further than the lines surrounding them.// Good if (condition) { cout << "Test" << endl; } // Bad if (condition) { cout << "Test" << endl; }
-
You should use a block whenever possible, even if it’s not necessary. This helps you avoid subtle or messy bugs in the future.
// Good if (condition) { cout << "Something" << endl; } // Bad if (condition) cout << "Something" << endl;
Peer Review
As this is a team assignment, you are required to complete a peer review of your lab partner or partners. You must do this even if everything is fine. You should complete this peer review after you have turned in your assignment. The peer review comes in the form of a simple Google Form and can be accessed here; in most cases, it will likely take less than a minute to complete. If you have trouble accessing this peer review, please make sure that you are logged into Google using your Swarthmore credentials.
Your peer review will be kept private; only course staff will have access to this information. You can also update your peer review after you have submitted it. However, if you do not submit a peer review, you will lose points on your lab. Please don’t forget!
Summary of Requirements
When you are finished, you should have
- Completed implementations of the above-mentioned ciphers
- Tests for all of these ciphers
- A
crypto_main
function which allows the user to encrypt and decrypt files based upon command-line arguments and configuration files - Validation checks in
crypto_main
to detect bad arguments or bad configuration files - Tests for
crypto_main
- Code which conforms to the style requirements above
- A completed peer review
You should also have comments on any code that isn’t easy to read or understand. Good commenting will eventually become part of the grading criteria for these assignments, so you should develop good habits now!
Acknowledgements
The Enigma Machine cipher part of the lab is based on David Reed’s NIFTY 2009 submission entitled “Enigma Encryption”.