--- title: Keyed data encoding with Python description: XOR is pretty cool date: 2019-08-24 tags: - project - python aliases: - /blog/2019/08/24/shift2 - /blog/shift2 --- I have always been interested in text and data encoding, so last year, I made my first encoding tool. [Shift64](https://github.com/Ewpratten/shift64) was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the [5024 Programming Test](https://github.com/frc5024/Programming-Test/blob/master/test.md) for 2018/2019. Yes, the key was in fact `5024`. This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word *hello*. ## Shift2 One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. [Shift2](https://github.com/Ewpratten/shift) is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon). ### How it works Shift2 has two inputs. A `file`, and a `key`. These two strings are used to produce a single output, the `message`. When encoding a file, shift2 starts by encoding the raw data with [base85](https://en.wikipedia.org/wiki/Ascii85), to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps): ```python file = "Hello reader! I am some input that needs to be encoded" key = "ewpratten" message = "" for i, char in enumerate(file): message += chr( ord(char) ^ ord(key[i % len(key) - 1]) ) ``` The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the `message` once results in: ``` CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di ``` If using the shift2 commandline tool, you would see a different output: ``` B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YCe0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H ``` This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 **twice**. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type. ### Try it yourself with PIP I have published shift2 on [pypi.org](https://pypi.org/project/shift-tool/) for use with PIP. To install shift2, ensure both `python3` and `python3-pip` are installed on your computer, then run: ```sh # Install shift2 pip3 install shift-tool # View the help for shift2 shift2 -h ```
### Try it in the browser I have ported the core code from shift2 to [run in the browser](http://www.brython.info/index.html). This demo is entirely client-side, and may take a few seconds to load depending on your device.
### Future plans Due to the fact that shift2 can also be used as a library (as outlined in the [README](https://github.com/Ewpratten/shift/blob/master/README.md)), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly. If you are interested in helping out, or taking on this idea for yourself, send me an email.