3.6 KiB
layout | title | description | date | categories |
---|---|---|---|---|
post | Keyed data encoding with Python | XOR is pretty cool | 2019-08-24 13:13:00 | projects |
I have always been interested in text and data encoding, so last year, I made my first encoding tool. Shift64 was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the 5024 Programming Test for 2018/2019. Yes, the key was in fact 5024
.
This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word hello.
Shift2
One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. Shift2 is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).
How it works
Shift2 has two inputs. A file
, and a key
. These two strings are used to produce a single output, the message
.
When encoding a file, shift2 starts by encoding the raw data with base85, to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):
file = "Hello reader! I am some input that needs to be encoded"
key = "ewpratten"
message = ""
for i, char in enumerate(file):
message += chr(
ord(char) ^ ord(key[i % len(key) - 1])
)
The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the message
once results in:
CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di
If using the shift2 commandline tool, you would see a different output:
B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YC<SaOLbpaL^T!^W_ucF8Er~?NnC$>e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H
This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 twice. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.
Try it yourself
I have published shift2 on pypi.org for use with PIP. To install shift2, ensure both python3
and python3-pip
are installed on your computer, then run:
# Install shift2
pip3 install shift-tool
# View the help for shift2
shift2 -h
Future plans
Due to the fact that shift2 can also be used as a library (as outlined in the README), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.
If you are interested in helping out, or taking on this idea for yourself, send me an email.