1

234 lines
12 KiB
HTML

<head>
<title>Evan Pratten</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<!-- Begin Jekyll SEO tag v2.6.1 -->
<title>Keyed data encoding with Python | Evan Pratten</title>
<meta name="generator" content="Jekyll v4.0.0" />
<meta property="og:title" content="Keyed data encoding with Python" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="XOR is pretty cool" />
<meta property="og:description" content="XOR is pretty cool" />
<link rel="canonical" href="http://0.0.0.0:4000/blog/2019/08/24/shift2" />
<meta property="og:url" content="http://0.0.0.0:4000/blog/2019/08/24/shift2" />
<meta property="og:site_name" content="Evan Pratten" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2019-08-24T09:13:00-04:00" />
<script type="application/ld+json">
{"datePublished":"2019-08-24T09:13:00-04:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://0.0.0.0:4000/blog/2019/08/24/shift2"},"@type":"BlogPosting","url":"http://0.0.0.0:4000/blog/2019/08/24/shift2","headline":"Keyed data encoding with Python","description":"XOR is pretty cool","dateModified":"2019-08-24T09:13:00-04:00","@context":"https://schema.org"}</script>
<!-- End Jekyll SEO tag -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css"
integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<link rel="stylesheet" href="/assets/css/main.css">
<link rel="stylesheet" href="/assets/css/github-syntax.css">
<link href="https://fonts.googleapis.com/css?family=IBM+Plex+Mono:400,400i|IBM+Plex+Sans:100,100i,400,400i,700,700i" rel="stylesheet">
<link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous">
</head>
<body>
<div class="site-ctr">
<!-- Navbar -->
<nav class="navbar navbar-dark sticky-top bg-dark navbar-expand-lg">
<!-- Navbar content -->
<!-- <div class="container"> -->
<a class="navbar-brand" href="/">Evan Pratten</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavAltMarkup" aria-controls="navbarNavAltMarkup" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarNavAltMarkup">
<div class="navbar-nav ml-auto">
<a class="nav-item nav-link" href="/blog">Blog</a>
<a class="nav-item nav-link" href="/projects">Projects</a>
<!-- <a class="nav-item nav-link" href="/documentation">Documentation</a> -->
<a class="nav-item nav-link" href="/about">About</a>
</div>
<!-- </div> -->
</div>
</nav>
<!-- <div style="height:5vh"></div> -->
<!-- Header -->
<!-- <div class="header">
<div class="container">
<div class="content">
</div>
</div>
<div class="header-gap"></div>
</div> -->
<div class="reactive-bg">
<div class="post container">
<h1>Keyed data encoding with Python
</h1>
<h4>XOR is pretty cool
</h4>
<hr>
<p><em>2019-08-24 09:13:00 -0400
</em></p>
<br>
<p>I have always been interested in text and data encoding, so last year, I made my first encoding tool. <a href="https://github.com/Ewpratten/shift64">Shift64</a> was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the <a href="https://github.com/frc5024/Programming-Test/blob/master/test.md">5024 Programming Test</a> for 2018/2019. Yes, the key was in fact <code class="highlighter-rouge">5024</code>.</p>
<p>This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files &gt;1GB from just the word <em>hello</em>.</p>
<h2 id="shift2">Shift2</h2>
<p>One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. <a href="https://github.com/Ewpratten/shift">Shift2</a> is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).</p>
<h3 id="how-it-works">How it works</h3>
<p>Shift2 has two inputs. A <code class="highlighter-rouge">file</code>, and a <code class="highlighter-rouge">key</code>. These two strings are used to produce a single output, the <code class="highlighter-rouge">message</code>.</p>
<p>When encoding a file, shift2 starts by encoding the raw data with <a href="https://en.wikipedia.org/wiki/Ascii85">base85</a>, to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">file</span> <span class="o">=</span> <span class="s">"Hello reader! I am some input that needs to be encoded"</span>
<span class="n">key</span> <span class="o">=</span> <span class="s">"ewpratten"</span>
<span class="n">message</span> <span class="o">=</span> <span class="s">""</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">char</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">file</span><span class="p">):</span>
<span class="n">message</span> <span class="o">+=</span> <span class="nb">chr</span><span class="p">(</span>
<span class="nb">ord</span><span class="p">(</span><span class="n">char</span><span class="p">)</span> <span class="o">^</span> <span class="nb">ord</span><span class="p">(</span><span class="n">key</span><span class="p">[</span><span class="n">i</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">])</span>
<span class="p">)</span>
</code></pre></div></div>
<p>The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the <code class="highlighter-rouge">message</code> once results in:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CIA~89YF&gt;W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di
</code></pre></div></div>
<p>If using the shift2 commandline tool, you would see a different output:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>B2-is8Y&amp;4!ED2H~Ix&lt;~LOCfn@P;xLjM_E8(awt`1YC&lt;SaOLbpaL^T!^W_ucF8Er~?NnC$&gt;e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h-&gt;H
</code></pre></div></div>
<p>This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 <strong>twice</strong>. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.</p>
<h3 id="try-it-yourself-with-pip">Try it yourself with PIP</h3>
<p>I have published shift2 on <a href="https://pypi.org/project/shift-tool/">pypi.org</a> for use with PIP. To install shift2, ensure both <code class="highlighter-rouge">python3</code> and <code class="highlighter-rouge">python3-pip</code> are installed on your computer, then run:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install shift2</span>
pip3 <span class="nb">install </span>shift-tool
<span class="c"># View the help for shift2</span>
shift2 <span class="nt">-h</span>
</code></pre></div></div>
<div id="demo">
<h3 id="try-it-in-the-browser">Try it in the browser</h3>
<p>I have ported the core code from shift2 to <a href="http://www.brython.info/index.html">run in the browser</a>. This demo is entirely client-side, and may take a few seconds to load depending on your device.</p>
<p><input type="radio" id="encode" name="shift-action" value="encode" checked>
<label for="encode">Encode</label>
<input type="radio" id="decode" name="shift-action" value="decode">
<label for="decode">Decode</label></p>
<p><input type="text" id="key" name="key" placeholder="Encoding key" required=""><br>
<input type="text" id="msg" name="msg" placeholder="Message" required="" size="30"></p>
<p><button type="button" class="btn btn-primary" id="shift-button" disabled>shift2 demo is loading… (this may take a few seconds)</button></p>
</div>
<h3 id="future-plans">Future plans</h3>
<p>Due to the fact that shift2 can also be used as a library (as outlined in the <a href="https://github.com/Ewpratten/shift/blob/master/README.md">README</a>), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.</p>
<p>If you are interested in helping out, or taking on this idea for yourself, send me an email.</p>
<!-- Python code -->
<script type="text/python" src="/assets/python/shift2/shift2demo.py"></script>
</div>
</div>
</div>
<!-- <div id="particles-js"></div> -->
<div class="container foot" style="text-align:center;">
<br>
<span class="site-info">
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
This site was last updated at: 2019-11-30 11:37:59 -0500
</span>
</div>
<!-- Brython -->
<script src="/assets/js/brython.js"></script>
<script src="/assets/js/brython_stdlib.js"></script>
<script>
function startPY(){
brython();
console.log("Started Python")
}
window.onload = startPY;
</script>
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js" integrity="sha384-UO2eT0CpHqdSJQ6hJty5KVphtPhzWj9WO1clHTMGa3JDZwrnQq4sF86dIHNDz0W1" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>
<!-- Offsets for links -->
<script>
(function ($, window) {
var adjustAnchor = function () {
var $anchor = $(':target'),
fixedElementHeight = 100;
if ($anchor.length > 0) {
window.scrollTo(0, $anchor.offset().top - fixedElementHeight);
}
};
$(window).on('hashchange load', function () {
adjustAnchor();
});
})(jQuery, window);
</script>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-74118570-2"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() { dataLayer.push(arguments); }
gtag('js', new Date());
gtag('config', 'UA-74118570-2');
</script>
<!-- particles -->
<script>
var body = document.body
var particles = document.getElementById("particles-js")
particles.style.height = body.scrollHeight + "px"
console.log(body.scrollHeight)
</script>
<script src="/assets/js/particles.min.js"></script>
<script>
particlesJS.load('particles-js', '/assets/js/particles.json', function () {
console.log('callback - particles.js config loaded');
});
</script>
<!-- Twitter embeds -->
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</body>