Lab 3: Basic Encryption
Due on Wednesday, September 26th at 11:59 PM. This is an individual lab. You are to complete this lab on your own, although you may discuss the lab concepts with your classmates. Remember the Academic Integrity Policy: do not show your code to anyone outside of the course staff and do not look at anyone else’s code for this lab. If you need help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, you will write a program to apply a variety of simple encryption and decryption techniques to text files. This will help you to gain
- More experience with files and command-line arguments in C++
- Familiarity with object-oriented programming, especially polymorphism
- An understanding of abstraction via interfaces
As in previous labs, your repository can be found at git@github.swarthmore.edu:cs35-f18/lab03-<your-username>
. Start by cloning that repository.
Encryption and Ciphers
Encryption is the process of encoding messages in such a way that only authorized parties can read the message. The idea of encrypting messages is thousands of years old and has been particularly useful in the military, where it is important to exchange messages (e.g. “Attack at dawn!” vs “Hold your ground”) without enemies being able to understand them in the event a message is intercepted.
A cipher is a kind of encryption scheme where each letter is replaced by another. Ciphers have been used since at least Roman times. Julius Caesar used an cipher in which each letter of the message was replaced with one three positions earlier in the alphabet. Using the English alphabet, for instance, ‘A’ is encrypted as ‘X’, ‘B’ is encrypted as ‘Y’, ‘C’ is encrypted as ‘Z’, ‘D’ is encrypted as ‘A’, and so on. An encrypted message is called the ciphertext. If an enemy intercepted the ciphertext, it would look like a meaningless jumble of letters. Caesar’s subordinates knew the encryption scheme, however, and could decrypt the message by employing doing the opposite of the encryption scheme: ‘A’ is decrypted as ‘D’, ‘B’ is decrypted as ‘E’, etc. This cipher was used so effectively that it became known as the Caesar cipher.
In this lab, you will write a program equipped with multiple different ciphers:
- The Caesar cipher, as described above. The code for this cipher has been given as an example in your initial repository.
- A rotation cipher, which works similarly to the Caesar cipher but allows the user to specify the number of rotation steps. The Caesar cipher is a rotation cipher with a shift of -3 (that is, three to the left); we can also imagine rotation ciphers with a shift of 1, 4, -5, 12, etc.
- A substitution cipher, which replaces each letter with another letter according to a pattern given by the user. In this cipher, the user explicitly indicates which letter replaces ‘A’, which letter replaces ‘B’, and so on; they do not have to follow a rotation.
- A simplified stream cipher which will change behavior as it reads the message. This means that the letter ‘A’ may be replaced by ‘D’ in one part of the message but ‘Q’ in another part. Nonetheless, we will be able to decrypt the message with the right information.
Our ciphers will concern the encryption only of upper-case English alphabetic characters. The ciphers will ignore all other characters; the characters '?'
, '1'
, 'q'
, and ' '
, for instance, will be encrypted to themselves. Although this would be a poor decision in any real cryptographic setting, it helps to simplify our assignment.
Your Starting Point
In this lab, you will create one class to represent each cipher you must write. Each of these classes will be subclasses of the abstract Cipher
class, which represents the interface of all ciphers. Your subclasses will implement functions (such as encrypt
and decrypt
); the behavior of these functions will differ with each cipher. In addition, you will write the main program to allow these ciphers to be used to encrypt and decrypt text files.
Your starting repository consists of the following:
- A
Makefile
. In the fashion of the last lab,make
will build the program whilemake tests
will build your tests. cipher.h
, which defines the abstractCipher
class.- Files for the individual ciphers:
caesar.h
/caesar.cpp
: An implementation of the Caesar cipher which has been provided for you.rotation.h
/rotation.cpp
: Skeleton files for the rotation cipher.substitution.h
/substitution.cpp
: Skeleton files for the substitution cipher.stream.h
/stream.cpp
: Skeleton files for the stream cipher.enigma.h
/enigma.cpp
: Skeleton files for the simplified Enigma cipher. (This is an optional part of the lab.)
crypto.cpp
: The file in which you will write yourmain
function.tests.cpp
: A program which will test your ciphers to determine if they are correct.test_utils.h
/test_utils.cpp
: Resources used by the test program.test_data
: A directory containing some sample data for testing.test_output
: A directory to use during testing for output files. This will keep your output from cluttering up your repository.
User Interface
As with the previous lab, we will control your program via command-line arguments. Invocations to your program will take the following form:
./crypto <input-file> <operation> <cipher> <configuration> <output-file>
Your program will always have five arguments (other than the name of the program itself). If the user provides invalid arguments (e.g. files that don’t exist or an invalid mode of operation), you must throw an appropriate exception to terminate the program. The arguments have the following meanings:
input-file
is the name of the existing text file which we would like to encrypt or decrypt.output-file
is the name of the file to which we will write the result.operation
is eitherencrypt
ordecrypt
to tell us what to do with the input file.cipher
is one ofcaesar
,rotation
,substitution
, orstream
. This indicates which cipher to use.configuration
is a string which will be used to create the cipher.
For example, we might run our program in the following ways:
./crypto secret.txt encrypt caesar nothing secret.txt.encrypted
./crypto orders.txt.encrypted decrypt caesar nothing orders.txt
./crypto message.txt encrypt rotation 5 message.txt.encrypted
./crypto diary.txt.encrypted decrypt stream mypassword diary.txt
Writing Your Program
This lab is most easily approached when broken down into the following steps.
- Understand what is happening in
cipher.h
andcaesar.h
/caesar.cpp
. It might help to look at the example tests intests.cpp
. - Write a version of
crypto.cpp
that can support the Caesar cipher. To do this, you will need to implement themain
function as well as theread_file
andwrite_file
functions. See below for help with that. - Write the rotation cipher. This involves writing both the
encrypt
anddecrypt
methods, which can be written separately. After you complete these methods, make sure to test them! Then, incorporate the rotation cipher intomain
. - Write the substitution cipher similarly to the rotation cipher above.
- Write the stream cipher. You should not do this until your other ciphers are finished; this cipher is a little more complex than the others, so it’s best if you complete them first.
The above steps do not have to be performed strictly in that order; the diagram shows the dependencies between the different tasks. A solid arrow points from a task that must wait to the task it must wait for; for instance, you should understand the Cipher
class before you do anything else, but you can work on the substitution cipher without having completed the rotation cipher. The dotted lines indicate that the main
task will need more work once each cipher is finished. We present this diagram because it may be of particular interest to your team, depending upon how you work with your partner.
Most of the rest of this lab will give you hints on how to handle the tasks above.
1. Understanding Cipher
and CaesarCipher
The Cipher
class serves as the base class for all cipher definitions. It is an interface in that its sole purpose is to describe what a Cipher
can do (as opposed to how the Cipher
does those things). In particular, it defines two methods:
char encrypt(char c)
: Encrypts a single character.char decrypt(char c)
: Decrypts a single character.
The encrypt
and decrypt
methods are pure virtual methods, meaning that they do not have an implementation. Instead, each subclass must implement these functions. We can see that e.g. encrypt
is a pure virtual method because it is declared as follows:
virtual char encrypt(char c) = 0;
The virtual
keyword makes encrypt
a virtual method; for our purposes, this means that the definition of the method can be replaced by a subclass. The = 0
makes a virtual method become pure virtual, which means that subclasses must replace the definition of the method (because it has no definition otherwise). Classes which contain pure virtual methods – those methods without implementations – are called abstract classes.
Abstract classes cannot be instantiated (since they aren’t really finished or ready to use). By polymorphism, however, every instance of a subclass is an instance of its parent class, so we can get instances of Cipher
by creating instances of one of its subclasses (e.g. CaesarCipher
). For instance, consider:
Cipher* my_cipher = new CaesarCipher();
The variable my_cipher
is guaranteed to point to some instance of the Cipher
class. Since Cipher
is an abstract class, we know that the actual object must be one of the subclasses of Cipher
but, depending on the code, we might not know which one. Here, we can see that the pointer always refers to a CaesarCipher
. But the specific cipher isn’t necessarily known in the following:
Cipher* my_cipher;
if (x%2==0) { // or any other condition...
my_cipher = new CaesarCipher();
} else {
my_cipher = new RotationCipher(4);
}
char c = my_cipher->encrypt('a');
In the above, my_cipher
contains either a CaesarCipher
or a RotationCipher
. We don’t know which, but we don’t care: as long as the object in question is a Cipher
object, we know that it has an encrypt
method and that’s all we want from it.
For our purposes, we will assume that only upper-case letters are passed to the encrypt
method. This assumption will hold for all implementations of Cipher
. It will be the responsibility of main
to make sure that this is true.
2. Starting the main
Function
The main
function in your program will need to perform the following steps:
- Get the five command-line arguments and store them in
string
variables. - Where possible, check the variables to determine if they make sense. Throw an exception if you discover a problem. (See below.)
- Read the input file into a
string
. - Create an appropriate
Cipher
object based on the arguments, storing a pointer to it in aCipher*
variable. You must use polymorphism here. - Use the
Cipher
to convert the file contents into a newstring
. - Write the resulting
string
to the output file.
There are a few things that will be helpful to know about these steps, which we briefly mention here.
-
Remember that you can convert
char*
tostring
. Thestring
class in C++ can be created from achar*
just by writingchar* c = ...; string s = c;
-
Let’s keep exceptions simple. You can
#include <stdexcept>
and thenthrow runtime_error("Message")
to throw an exception. For this lab, if something goes wrong, you may throw aruntime_error
which crashes your program. See below for more information about bad input you are required to detect. -
We can use
>>
to read whole files. You can useifstream
to read a file one character at a time. Thechar
data type holds just a single character, sochar c; myFile >> c;
will read the next
char
from the file. To read the entire file, you can write a loop which accumulates all of the characters into astring
. You can see whether we’ve reached the end of the file by calling theeof()
method of theifstream
after trying to read from the file. So one way to loop through all of the characters of a file might be:char c; myFile >> c; while (!myFile.eof()) { // do stuff... myFile >> c; }
This way, every time we reach the loop condition, we’ve just tried to read a character from the file and we can see whether or not we succeeded. This pattern is called a priming loop. We prime the character
c
before we enter the loop. We then immediately test for end of file. Inside the loop we process the character, and then as the final step again read the next character. -
ifstream
skips spaces. This is a problem if you’re trying to use the loop above to include spaces (and, when reading a whole file into a string, you shouldn’t skip spaces). To fix this, theifstream
library provides us the following strange-looking operation:myFile >> noskipws;
Here,
noskipws
is part of theifstream
library. Running the above line will changemyFile
so that it no longer skips spaces, allowing you to read the entire file without interference. -
We can test to see if a character is a capital letter. Characters are treated quite similarly to numbers in C++ (and, in fact, are numbers as far as the computer is concerned). So we can ask if a character variable
c
contains a capital by writingc >= 'A' && c <= 'Z'
-
To write to a file, use
ofstream
. In the same way thatifstream
lets you read from a file in a way that feels like usingcin
,ofstream
lets you write to a file in a way that feels like usingcout
.ofstream outFile; outFile.open("filename.txt"); outFile << "Hello!"; outFile.close();
Your task in writing main
is to put the above pieces together. We strongly recommend writing non-main
functions for at least steps 3, 5, and 6 above!
- Remember to encrypt or decrypt only capital letters. The
Cipher
objects we create will assume that theirencrypt
anddecrypt
methods receive only upper-case letters, so we must make good on that promise here!
Required Error Handling
In your main
function, you are required to handle the following cases of bad input:
- The user provides an incorrect number of command-line arguments
- The user provides an operation other than
encrypt
ordecrypt
- The user provides the name of a cipher which is not mentioned in this lab
You are permitted to assume that the user provides valid filenames.
3. Writing the Rotation Cipher
The provided rotation.h
/rotation.cpp
files give a skeleton for the RotationCipher
class. Like CaesarCipher
, it is a specialization of the Cipher
class and is defined accordingly. Although we have provided signatures for the encrypt
and decrypt
methods, we have not provided any private
data members. You’ll need to add your own private
data and implement the methods of the class appropriately.
In the encrypt
method, we are assuming that the input is an upper-case letter character. A RotationCipher
should move that letter some number of positions in the alphabet, but that movement will vary based on the RotationCipher
’s configuration. For that reason, you should modify the provided files so that RotationCipher
’ has an int
field (named e.g. rotation
). In the constructor, you can convert the string
to an int
by using the C++ stoi
function (which is available because we #include <string>
). Then, within the encrypt
method, you can use rotation
to determine the correct letter.
Rotation is more work than simple addition. You can convert an alphabetic letter into its 0
-based index in the alphabet by writing int(c-'A')
. Likewise, you can convert an int
into the corresponding letter by writing char(i+'A')
. Note, though, that char(26+'A')
produces '['
; the computer uses the ASCII table to represent letters as numbers and '['
is the next character after 'Z'
. You’ll need to use the modulus operator (%
) to ensure that you are rotating correctly.
The decrypt
method will work very much like the above, but it must perform the opposite rotation. This means you will have to consider different boundary conditions.
Once you’ve finished writing the cipher itself, update main
to create a RotationCipher
object when appropriate.
4. Writing the Substitution Cipher
Similar to the rotation cipher above, substitution.h
and substitution.cpp
contain very basic skeletons of the SubstitutionCipher
class; you will need to provide your own fields. Here, the string
accepted by your constructor is a substitution string (instead of a rotation).
In order to encrypt a capital letter, you must then perform the following steps:
- Calculate the index of the letter in the alphabet (
0
forA
,1
forB
, etc.). - Look up the letter in the substitution string at that index.
Decryption is a little trickier in this case. Instead of calculating a position by index, you will need to perform a linear search of the substitution string (look at each character in turn) to find the encrypted character you have. Then, you can calculate the decrypted character based upon the index of that character.
Required Error Handling
Your SubstitutionCipher
class must validate the provided substitution string. You should throw a runtime_error
if
- Any character in the substitution string is not a capital letter
- Any letter is missing from the substitution string
- Any letter appears in the substitution string more than once
Developing an algorithm to check these properties is part of the task of writing the SubstitutionCipher
constructor.
5. Writing the Stream Cipher
A stream cipher has the quality of encrypting each character in a file differently. Our stream cipher will be a rotation cipher, but the rotation of the cipher will change as we encrypt. The parameter to the StreamCipher
constructor will be a password that the user selected to encrypt the text. This password must contain only upper-case letters; you should raise an exception if it contains anything else.
Our stream cipher will adjust the amount by which it rotates each letter based upon the letters in the password. For each character, it will:
- Determine the index of the current letter in the password (
0
for'A'
,1
for'B'
, etc.). - Increase rotation by an amount equal to one more than that index.
- Rotate the character.
- Move to the next letter in the password (going back to the beginning if necessary).
For example, let’s consider encrypting the message “HI THERE
” with the password "ACB"
. The result is IM ZOODR
. This is because:
- We start trying to encrypt
H
. Since the first letter of the password isA
, we set our rotation to1
. This yieldsI
, sinceI
is the next character afterH
. - We next encrypt
I
. The second letter of the password isC
, which adds3
to our rotation. This brings our rotation to4
, so we yieldM
(four letters afterI
). - We next encrypt
T
(sincemain
skips the space for us). The third letter of the password isB
, so our rotation is now6
; thus, we produceZ
as an output. - We next encrypt
H
. Since we’re out of letters in the password, we start again withA
. This brings our rotation to7
, so we produce anO
. - We next encrypt
E
. TheC
in the password brings us to a rotation of10
, so we also produceO
here. - The process continues likewise for
R
andE
, producingD
andR
, respectively.
Think carefully about information the StreamCipher
class will need to remember between calls to encrypt
or decrypt
before you start writing code. Then, create those fields and initialize them in the StreamCipher
constructor.
Once you’ve completed this cipher and the tests succeed, you’ve completed the lab!
Testing Your Program
The test_data
directory in this assignment contains a number of test files for you to use. There are two types of files:
- Unencrypted text files, named things like
message1.txt
. - Encrypted text files, named things like
message1__caesar__nothing.txt
andmessage1__rotation__4.txt
.
We’ve also included a directory test_output
for you to use to store your output files. Using this directory will help keep your lab tidy.
The names of the encrypted files tell you the manner in which they were encrypted. The file message1__rotation__4.txt
, for instance, is the message1.txt
file encrypted using the rotation cipher with a configuration string of 4
.
The Makefile
also produces a program called ./tests
which performs simple tests on your cipher classes. While the contents of test_data
are far more rigorous, the ./tests
program is easy to run and can detect many simple mistakes. Make sure to run it when you’re finished to see if it helps you find any bugs!
Coding Style Requirements
You are required to observe some good coding practices:
-
You should pick meaningful variable names.
// Good int* pixels = new int[size]; // Bad int* p = new int[size];
-
You should use correct and consistent indentation. Lines of code within a block (that is, surrounded by
{
and}
) should be indented four spaces further than the lines surrounding them.// Good if (condition) { cout << "Test" << endl; } // Bad if (condition) { cout << "Test" << endl; }
-
You should use a block whenever possible, even if it’s not necessary. This helps you avoid subtle or messy bugs in the future.
// Good if (condition) { cout << "Something" << endl; } // Bad if (condition) cout << "Something" << endl;
Summary of Requirements
When you are finished, you should have
- Completed implementations of the above-mentioned ciphers
- A
main
function which allows the user to encrypt and decrypt files based upon command-line arguments - Validation checks in
main
and in the cipher constructors to detect bad arguments and raise exceptions - Code which conforms to the style requirements above
You should also have comments on any code that isn’t easy to read or understand. Good commenting will eventually become part of the grading criteria for these assignments, so you should develop good habits now!
Going Further: The Simplified Enigma Cipher
If you would like to try a somewhat more complex cipher, you can attempt to implement a simplified version of the Enigma cipher. The Enigma Machine, a device created to implement the Enigma cipher, first appeared in the 1920s; later versions of this machine were used by the German military during World War II. The success of Polish, British, and French engineers in breaking the Enigma cipher provided military intelligence which is said to have shortened the war in Europe by two years; this effort also led to the first electrical digital computers.
The original Enigma Machine operated by using a set of three alphabetic rotors, wheels on which different letters were printed. The sequence of letters on a rotor could vary and different rotors could be installed in different locations inside of the machine. In order to encrypt a character, the machine would perform the following steps:
- Let the input character be called \(c_1\). Find the position of \(c_1\) on the first rotor and call that position \(p_1\).
- Find the character on the third rotor at position \(p_1\). Let this character be called \(c_2\).
- Find the position of \(c_2\) on the second rotor. Let this position be called \(p_2\).
- Find the character on the third rotor at position \(p_2\). Let this character be called \(c_3\). \(c_3\) is the encrypted character.
The above process seems somewhat complex, but is actually not difficult to implement. However, the security of the Enigma Machine largely came from the fact that the rotors would move after each character. You will also implement that behavior. After each character you encrypt, you will adjust the rotors as follows:
- Rotate the first rotor by one position. If the first rotor were originally the identity alphabet (
"ABC...XYZ"
), it is now effectively shifted to the left ("BCD...YZA"
). - Once the first rotor has made a full rotation (after 26 characters have been encrypted), rotate the second rotor by one position. If the first two rotors were originally the identity alphabet, then the second rotor would effectively be
"BCD..YZA"
after 26 characters are encrypted.
To perform rotations, don’t modify the rotor variables; that approach is inefficient and error-prone. Instead, add two more fields of type int
. These fields should be initialized to 0
in your constructor; they represent how far each of the first two rotors have turned. You can use them when you are performing lookups on the rotors. For example, imagine that your first rotor was the identity alphabet ("ABC...XYZ"
). After the first substitution, you would not change the value of first_rotor
; you would simply increment first_rotor_offset
. Then, when you look up the position of a given character (e.g. 'D'
), you find the position in the original string (e.g. 3
) and use the first_rotor_offset
to correct it (2
) based on the number of rotations that have occurred (1
).
To get you started on the Enigma encryption implementation, here are some hints:
- Your
EnigmaCipher
class will need fields for each of the three different rotors. You could name them e.g.first_rotor
if you like; they should have typestring
. Since your constructor only takes a singlestring
, we will expect the user to describe the rotors separated by dashes. One example might beMQULBESHPFJITVGZCKWRODXAYN-BVUISDKMTNERCZHLQAWXJOFGYP-ATKPEFYSOMILHBXWVUCJDQNGRZ
to describe the three rotors. You can use thesubstring
method of thestring
class to separate the rotor strings. - You can add new methods to your
EnigmaCipher
class. It might help you to add a methodposition_to_char
which, given a rotor number (0
,1
, or2
) and a position on that rotor (0
through25
) will give you back the character at that rotor position. It could also be helpful to write another methodchar_to_position
which does the opposite: given a rotor number and a character, it tells you the position of that character on the rotor. Remember to use the rotation fields mentioned above in order to write these methods. - You should also add a method
advance_rotors
which you can use to change the rotor positions (by updating your rotation fields). That way, you can call that method from bothencrypt
anddecrypt
.
Decrypting the simplified Enigma cipher might seem tricky, but it’s similar to decrypting the substitution cipher: you perform the same series of steps but in the opposite order. You’d start, for instance, by finding the position \(p_2\) of the encrypted character \(c_3\) on the third rotor; then, you’d look up the character \(c_2\) at that position on the second rotor, and so on. After you decrypt a character, make sure that you rotate the rotors in the same way as you did during encryption!
Acknowledgements
The Enigma Machine cipher part of the lab is based on David Reed’s NIFTY 2009 submission entitled “Enigma Encryption”.