1
ewpratten.com/content/blog/2019-08-24-Shift2.md
Evan Pratten 66528d6284 Revert "The great migration"
This reverts commit f184e610368cedc50f9dd557953c83f70b55f329.
2024-11-14 12:45:30 -05:00

83 lines
4.5 KiB
Markdown

---
title: Keyed data encoding with Python
description: XOR is pretty cool
date: 2019-08-24
tags:
- project
- python
aliases:
- /blog/2019/08/24/shift2
- /blog/shift2
---
I have always been interested in text and data encoding, so last year, I made my first encoding tool. [Shift64](https://github.com/Ewpratten/shift64) was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the [5024 Programming Test](https://github.com/frc5024/Programming-Test/blob/master/test.md) for 2018/2019. Yes, the key was in fact `5024`.
This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word *hello*.
## Shift2
One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. [Shift2](https://github.com/Ewpratten/shift) is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).
### How it works
Shift2 has two inputs. A `file`, and a `key`. These two strings are used to produce a single output, the `message`.
When encoding a file, shift2 starts by encoding the raw data with [base85](https://en.wikipedia.org/wiki/Ascii85), to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):
```python
file = "Hello reader! I am some input that needs to be encoded"
key = "ewpratten"
message = ""
for i, char in enumerate(file):
message += chr(
ord(char) ^ ord(key[i % len(key) - 1])
)
```
The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the `message` once results in:
```
CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di
```
If using the shift2 commandline tool, you would see a different output:
```
B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YC<SaOLbpaL^T!^W_ucF8Er~?NnC$>e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H
```
This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 **twice**. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.
### Try it yourself with PIP
I have published shift2 on [pypi.org](https://pypi.org/project/shift-tool/) for use with PIP. To install shift2, ensure both `python3` and `python3-pip` are installed on your computer, then run:
```sh
# Install shift2
pip3 install shift-tool
# View the help for shift2
shift2 -h
```
<div id="demo" markdown="1">
### Try it in the browser
I have ported the core code from shift2 to [run in the browser](http://www.brython.info/index.html). This demo is entirely client-side, and may take a few seconds to load depending on your device.
<input type="radio" id="encode" name="shift-action" value="encode" checked>
<label for="encode">Encode</label>
<input type="radio" id="decode" name="shift-action" value="decode">
<label for="decode">Decode</label>
<input type="text" id="key" name="key" placeholder="Encoding key" required><br>
<input type="text" id="msg" name="msg" placeholder="Message" required size="30">
<button type="button" class="btn btn-primary" id="shift-button" disabled>shift2 demo is loading... (this may take a few seconds)</button>
</div>
### Future plans
Due to the fact that shift2 can also be used as a library (as outlined in the [README](https://github.com/Ewpratten/shift/blob/master/README.md)), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.
If you are interested in helping out, or taking on this idea for yourself, send me an email.
<!-- Python code -->
<script type="text/python" src="/assets/python/shift2/shift2demo.py"></script>