Advent of Code, Day 5: cracking vigenere


Update: There was an issue with the key length guessing algorithm description. You should be comparing the first segment to subsequent segments because you are looking for breaks in the in the key length matching up with repeating lyrics.

Jay likes to sing along to music, but he has trouble remembering the lyrics. He keeps the text, to the lyrics of even his most favorite song. However, he's very secretive of his favorite music and encrypts these lyrics. Not only that, Jay is quite paranoid so he's using Vigenere and the key might be up to 40 characters long. That's going to take too long to try every single key like when cracking a Caeser Cipher.

However, if you think about it, a Vigenere secret message is just a key_length number of Caeser messages interleaved. Rather than trying more than 26^40+26^39...etc key attempts, we could slice up the text and reassemble it into multiple Caeser cipher texts. Divided up we only have to try 26 * 40 + 26 * 39...etc brute force attempts that's not too bad on a computer, but we can do even better. If we can figure out the length, we only have to make 26 * key_length brute force attempts.

The first useful function is this endeaver, would be a normalize function. Since Jay left all his spaces and punction in his encrypted lyrics, having something that makes it easy to strip out all the non-letters and make them all lowercase will make analysis easier.

The Second useful function to have would be a hamming_distance function. Hamming Distance is a way to compare two equal length strings. We can use the fact that not only is the Vigenere key repeating itself, but the music lyrics repeat themselves. We can find the likely key_length by dividing the encrypted text by all the possible key lengths, taking the first 10 of these, and then use the hamming_distance to score those possible key lengths. More specifically you would do this by comparing the first segment to each of the following segments with this function and then summing them together and dividing by the trial key_length. Lowest score is likely the key length.

Once you have the likely key length, take every nth character and put it into key_length number of Caeser cipher texts. Perform Caeser analysis on each one of these like in LC101 advent of code day 3, revealing each nth letter of the key.

Here are the encrypted song lyrics. So what is the title of the song?

Bn lohr zatk exgxc kwm sml'c km
Qaacr fyoqr cku C ofgp fwm urmj
I to cslcjo jakh qq fxd rgsrr uig
Jhsg ccn'g lon
Kgl mv owd ngtgl wvt vt, vhnmy cx
Fgp owfoc fbnry poze tiqx
JERQ ses ch sr kl ele
Loh gqtgt Vlevs uho osxf n lbt n sguhf vg on
Kanx pb ly cmhr srvead neg'l yjmf
Wolzqlsbl lboxia' av mr
Zruruag' apw umb
Jifh loh nkg jty xefubu' bzo ngt
Hrrr wvtj tuby befvlvuw umb
Pita-pitat rvznn jjgf Jctk-Gsoa
V jhsg bktr bz
Cx'k xhy bzo pmbk V dbn'g lkgum on
Mddphg lri yzal oa tue cnpxjuc gjysiq
Qmtr ig uc jvgiy ztqy ml xxlt dsoc soeeclny
Ao zr iuvvah pa axjgaige
Ua ua
Dit pofpaw lagdo'w yyl vn vt
 
Tevtvg pckyq ppb ad
Ry aa aa aa aa pa at turs
Ft ui fk ry aaaa
Tegtkn wbmac oam pb
Fk ry aa aa aa aa pa attu
Rs ft ui fk ryaa
Tegtvn litze qml am
Ui fk ry aa aa aa aa paat
Tu rs ft ui fkry
Tegtvn wiigl pon ml
Ft ui fk ry aa aa aa aapa
At tu rs ft uifk
 
Afnt loh wnnv tb ugfp oamo bzo ogq
Wntph lowr fmkj cgm fpozd jyyl
Grliag vo qh cbel A wpl
Ekqy-hnu mnmn-uph ztsu ggex jtgce'
Ga tue ziqdne by zbi udni eadl rue euo-a-qud
Nb eupi xgk apw rerrrf
Mnd pawsr B mix xdhvz koerf ag tue Yameel
Yyi ew hu bzo jgstl ynrq lknr ponl lzx Yianipf
Mrt Nlv hg tbej gi A'e mom ybiygeft
V gbt vhr ykpij xhy bzo jjnvbr bf n ctojw vfiskxy
LB zpyl aaoghrr
Hrbf zbi hjbukw yj ruif
Ybue hkgugkmw
Gfef jsn gfvcxs eiqe kn zr cbmhk
Lvclr xm ghr wrsg tq tux kuwl lh apw xspgh
Oohgut oy ubzm efv phbur 'ik to bfs a to qfs
Tn sik qxz g'svp wn dbn'g sgor
Ia mny aafmlz gb xfr (shmzeetkmr)
B suowk ba pgd
 
Kcgtvn witga wvm on
Rs ft ui fk ry aa aa aaaa
Pa at tu rs ftui
Yoxrvn witgl wkt vm
Tu rs ft ui fk ry aa aaaa
Aa pa at tu rsft
Nmldml witgl wvt kt
At tu rs ft ui fk ry aaaa
Aa aa pa at turs
Yxabax ngtgl wvt vt
Pa at tu rs ft ui fk ryaa
Aa aa aa pa attu
 
Iayaa-napxw V.S. vf loh ngeq t rcjl
Oav'a lri ivd vn ghr dtoc
Pni idkx Dqdv Wkvtu
Lvvvn' vhnm rcjw khtm uyrqvdrr n mltj
Rbvq zvge lvclr wreert go bng tjh lcjlz
Pvuwx yqrd go gensg mr
Zopi al mv uw xsu aipe nnq ecsl
Lohgw A fvdwn yn yixe Tebrie ngj Qlwwslg
Ubiyz tb tue zazizns C fw slrqf' 'oq
Ubuyd loh lkkr mu vsmfvl eadl rue orbtuet tutz'm tdsmpvmw
Rciee sre Jinl nmzugcag' 'lu
A beruee pyal bcly ponl Kztx ifn 'ik
Slntgea 'eo
Pfribi
Zamaqf' iss ghbuthg I vobd g mtade
Icl S hgqn'g
Teuft vhr egxc gx ff tapi que uigtvn'
Jig akl aala h ljyt rbp jigh ghg rvuhir
Ujbi ngb qw zoz oa tue quglqcvlk hm Xzspjl
Ybu grlipg gh lfip gg tm
Vyr'r oe fiyll
 
Ggtgbt dmyyr dql sx
Ln nn nn nn nc nn gg hefs
Gh vs xe ln nnnn
Grtvia coakq oba ql
Xe ln nn nn nn nc nn gghe
Fs gh vs xe lnnn
Grtgip jvzms aal ba
Vs xe ln nn nn nn nc nngg
He fs gh vs xeln
Grtgia jkgtr ccx al
Gh vs xe ln nn nn nn ncnn
Gg he fs gh vsxe

Credit to Jay Tuley for this problem