1
ewpratten.com/_posts/2019-08-24-Shift2.md
2019-08-24 15:39:37 -04:00

3.6 KiB

layout title description date categories
post Keyed data encoding with Python XOR is pretty cool 2019-08-24 13:13:00 projects

I have always been interested in text and data encoding, so last year, I made my first encoding tool. Shift64 was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the 5024 Programming Test for 2018/2019. Yes, the key was in fact 5024.

This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word hello.

Shift2

One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. Shift2 is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).

How it works

Shift2 has two inputs. A file, and a key. These two strings are used to produce a single output, the message.

When encoding a file, shift2 starts by encoding the raw data with base85, to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):

file = "Hello reader! I am some input that needs to be encoded"
key = "ewpratten"

message = ""

for i, char in enumerate(file):
    message += chr(
        ord(char) ^ ord(key[i % len(key) - 1])
    )

The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the message once results in:

CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di

If using the shift2 commandline tool, you would see a different output:

B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YC<SaOLbpaL^T!^W_ucF8Er~?NnC$>e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H

This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 twice. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.

Try it yourself

I have published shift2 on pypi.org for use with PIP. To install shift2, ensure both python3 and python3-pip are installed on your computer, then run:

# Install shift2
pip3 install shift-tool

# View the help for shift2
shift2 -h

Future plans

Due to the fact that shift2 can also be used as a library (as outlined in the README), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.

If you are interested in helping out, or taking on this idea for yourself, send me an email.