Advent of Code, Day 5: cracking vigenere
Update: There was an issue with the key length guessing algorithm description. You should be comparing the first segment to subsequent segments because you are looking for breaks in the in the key length matching up with repeating lyrics.
Jay likes to sing along to music, but he has trouble remembering the lyrics. He keeps the text, to the lyrics of even his most favorite song. However, he's very secretive of his favorite music and encrypts these lyrics. Not only that, Jay is quite paranoid so he's using Vigenere and the key might be up to 40 characters long. That's going to take too long to try every single key like when cracking a Caeser Cipher.
However, if you think about it, a Vigenere secret message is just a key_length number of Caeser messages interleaved.
Rather than trying more than 26^40+26^39...etc
key attempts, we could slice up the text and reassemble it into
multiple Caeser cipher texts. Divided up we only have to try 26 * 40 + 26 * 39...etc
brute force attempts
that's not too bad on a computer, but we can do even better. If we can figure out the length, we only have
to make 26 * key_length
brute force attempts.
The first useful function is this endeaver, would be a normalize function. Since Jay left all his spaces and punction in his encrypted lyrics, having something that makes it easy to strip out all the non-letters and make them all lowercase will make analysis easier.
The Second useful function to have would be a hamming_distance function.
Hamming Distance is a way to compare two equal length
strings. We can use the fact that not only is the Vigenere key repeating itself, but the music lyrics
repeat themselves. We can find the likely key_length
by dividing the encrypted text by all the possible
key lengths, taking the first 10 of these, and then use the hamming_distance to score those possible key lengths. More specifically you would do this by comparing the first segment to each of the following segments with this function and then summing them together
and dividing by the trial key_length
. Lowest score is likely the key length.
Once you have the likely key length, take every nth
character and put it into key_length
number of
Caeser cipher texts. Perform Caeser analysis on each one of these like in LC101 advent of code day 3,
revealing each nth
letter of the key.
Here are the encrypted song lyrics. So what is the title of the song?