<head> <title>Evan Pratten</title> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" /> <!-- Begin Jekyll SEO tag v2.6.1 --> <title>Keyed data encoding with Python | Evan Pratten</title> <meta name="generator" content="Jekyll v4.0.0" /> <meta property="og:title" content="Keyed data encoding with Python" /> <meta property="og:locale" content="en_US" /> <meta name="description" content="XOR is pretty cool" /> <meta property="og:description" content="XOR is pretty cool" /> <link rel="canonical" href="http://0.0.0.0:4000/blog/2019/08/24/shift2" /> <meta property="og:url" content="http://0.0.0.0:4000/blog/2019/08/24/shift2" /> <meta property="og:site_name" content="Evan Pratten" /> <meta property="og:type" content="article" /> <meta property="article:published_time" content="2019-08-24T09:13:00-04:00" /> <script type="application/ld+json"> {"datePublished":"2019-08-24T09:13:00-04:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://0.0.0.0:4000/blog/2019/08/24/shift2"},"@type":"BlogPosting","url":"http://0.0.0.0:4000/blog/2019/08/24/shift2","headline":"Keyed data encoding with Python","description":"XOR is pretty cool","dateModified":"2019-08-24T09:13:00-04:00","@context":"https://schema.org"}</script> <!-- End Jekyll SEO tag --> <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> <link rel="stylesheet" href="/assets/css/main.css"> <link rel="stylesheet" href="/assets/css/github-syntax.css"> <link href="https://fonts.googleapis.com/css?family=IBM+Plex+Mono:400,400i|IBM+Plex+Sans:100,100i,400,400i,700,700i" rel="stylesheet"> <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous"> </head> <body> <div class="site-ctr"> <!-- Navbar --> <nav class="navbar navbar-dark sticky-top bg-dark navbar-expand-lg"> <!-- Navbar content --> <!-- <div class="container"> --> <a class="navbar-brand" href="/">Evan Pratten</a> <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavAltMarkup" aria-controls="navbarNavAltMarkup" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse" id="navbarNavAltMarkup"> <div class="navbar-nav ml-auto"> <a class="nav-item nav-link" href="/blog">Blog</a> <a class="nav-item nav-link" href="/projects">Projects</a> <!-- <a class="nav-item nav-link" href="/documentation">Documentation</a> --> <a class="nav-item nav-link" href="/about">About</a> </div> <!-- </div> --> </div> </nav> <!-- <div style="height:5vh"></div> --> <!-- Header --> <!-- <div class="header"> <div class="container"> <div class="content"> </div> </div> <div class="header-gap"></div> </div> --> <div class="reactive-bg"> <div class="post container"> <h1>Keyed data encoding with Python </h1> <h4>XOR is pretty cool </h4> <hr> <p><em>2019-08-24 09:13:00 -0400 </em></p> <br> <p>I have always been interested in text and data encoding, so last year, I made my first encoding tool. <a href="https://github.com/Ewpratten/shift64">Shift64</a> was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the <a href="https://github.com/frc5024/Programming-Test/blob/master/test.md">5024 Programming Test</a> for 2018/2019. Yes, the key was in fact <code class="highlighter-rouge">5024</code>.</p> <p>This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word <em>hello</em>.</p> <h2 id="shift2">Shift2</h2> <p>One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. <a href="https://github.com/Ewpratten/shift">Shift2</a> is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).</p> <h3 id="how-it-works">How it works</h3> <p>Shift2 has two inputs. A <code class="highlighter-rouge">file</code>, and a <code class="highlighter-rouge">key</code>. These two strings are used to produce a single output, the <code class="highlighter-rouge">message</code>.</p> <p>When encoding a file, shift2 starts by encoding the raw data with <a href="https://en.wikipedia.org/wiki/Ascii85">base85</a>, to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">file</span> <span class="o">=</span> <span class="s">"Hello reader! I am some input that needs to be encoded"</span> <span class="n">key</span> <span class="o">=</span> <span class="s">"ewpratten"</span> <span class="n">message</span> <span class="o">=</span> <span class="s">""</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">char</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">file</span><span class="p">):</span> <span class="n">message</span> <span class="o">+=</span> <span class="nb">chr</span><span class="p">(</span> <span class="nb">ord</span><span class="p">(</span><span class="n">char</span><span class="p">)</span> <span class="o">^</span> <span class="nb">ord</span><span class="p">(</span><span class="n">key</span><span class="p">[</span><span class="n">i</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">])</span> <span class="p">)</span> </code></pre></div></div> <p>The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the <code class="highlighter-rouge">message</code> once results in:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di </code></pre></div></div> <p>If using the shift2 commandline tool, you would see a different output:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YC<SaOLbpaL^T!^W_ucF8Er~?NnC$>e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H </code></pre></div></div> <p>This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 <strong>twice</strong>. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.</p> <h3 id="try-it-yourself-with-pip">Try it yourself with PIP</h3> <p>I have published shift2 on <a href="https://pypi.org/project/shift-tool/">pypi.org</a> for use with PIP. To install shift2, ensure both <code class="highlighter-rouge">python3</code> and <code class="highlighter-rouge">python3-pip</code> are installed on your computer, then run:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install shift2</span> pip3 <span class="nb">install </span>shift-tool <span class="c"># View the help for shift2</span> shift2 <span class="nt">-h</span> </code></pre></div></div> <div id="demo"> <h3 id="try-it-in-the-browser">Try it in the browser</h3> <p>I have ported the core code from shift2 to <a href="http://www.brython.info/index.html">run in the browser</a>. This demo is entirely client-side, and may take a few seconds to load depending on your device.</p> <p><input type="radio" id="encode" name="shift-action" value="encode" checked> <label for="encode">Encode</label> <input type="radio" id="decode" name="shift-action" value="decode"> <label for="decode">Decode</label></p> <p><input type="text" id="key" name="key" placeholder="Encoding key" required=""><br> <input type="text" id="msg" name="msg" placeholder="Message" required="" size="30"></p> <p><button type="button" class="btn btn-primary" id="shift-button" disabled>shift2 demo is loading… (this may take a few seconds)</button></p> </div> <h3 id="future-plans">Future plans</h3> <p>Due to the fact that shift2 can also be used as a library (as outlined in the <a href="https://github.com/Ewpratten/shift/blob/master/README.md">README</a>), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.</p> <p>If you are interested in helping out, or taking on this idea for yourself, send me an email.</p> <!-- Python code --> <script type="text/python" src="/assets/python/shift2/shift2demo.py"></script> </div> </div> </div> <!-- <div id="particles-js"></div> --> <div class="container foot" style="text-align:center;"> <br> <span class="site-info"> Site design by: <a href="https://retrylife.ca">Evan Pratten</a> | This site was last updated at: 2019-11-30 11:37:59 -0500 </span> </div> <!-- Brython --> <script src="/assets/js/brython.js"></script> <script src="/assets/js/brython_stdlib.js"></script> <script> function startPY(){ brython(); console.log("Started Python") } window.onload = startPY; </script> <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js" integrity="sha384-UO2eT0CpHqdSJQ6hJty5KVphtPhzWj9WO1clHTMGa3JDZwrnQq4sF86dIHNDz0W1" crossorigin="anonymous"></script> <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script> <!-- Offsets for links --> <script> (function ($, window) { var adjustAnchor = function () { var $anchor = $(':target'), fixedElementHeight = 100; if ($anchor.length > 0) { window.scrollTo(0, $anchor.offset().top - fixedElementHeight); } }; $(window).on('hashchange load', function () { adjustAnchor(); }); })(jQuery, window); </script> <!-- Global site tag (gtag.js) - Google Analytics --> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-74118570-2"></script> <script> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag('js', new Date()); gtag('config', 'UA-74118570-2'); </script> <!-- particles --> <script> var body = document.body var particles = document.getElementById("particles-js") particles.style.height = body.scrollHeight + "px" console.log(body.scrollHeight) </script> <script src="/assets/js/particles.min.js"></script> <script> particlesJS.load('particles-js', '/assets/js/particles.json', function () { console.log('callback - particles.js config loaded'); }); </script> <!-- Twitter embeds --> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> </body>