A photo of Evan Pratten
Evan Pratten

Keyed data encoding with Python

XOR is pretty cool

I have always been interested in text and data encoding, so last year, I made my first encoding tool. Shift64 was designed to take plaintext data with a key, and convert it into a block of base64 that could, in theory, only be decoded with the original key. I had a lot of fun with this tool, and a very stripped down version of it actually ended up as a bonus question on the 5024 Programming Test for 2018/2019. Yes, the key was in fact 5024.

This tool had some issues. Firstly, the code was a mess and only accepted hard-coded values. This made it very impractical as an every-day tool, and a nightmare to continue developing. Secondly, the encoder made use of entropy bits, and self modifying keys that would end up producing encoded files >1GB from just the word hello.

Shift2

One of the oldest items on my TODO list has been to rewrite shift64, so I made a brand new tool out of it. Shift2 is both a command-line tool, and a Python3 library that can efficiently encode and decode text data with a single key (unlike shift64, which used two keys concatenated into a single string, and separated by a colon).

How it works

Shift2 has two inputs. A file, and a key. These two strings are used to produce a single output, the message.

When encoding a file, shift2 starts by encoding the raw data with base85, to ensure that all data being passed to the next stage can be represented as a UTF-8 string (even binary data). This base85 data is then XOR encrypted with a rotating key. This operation can be expressed with the following (this example ignores the base85 encoding steps):

file = "Hello reader! I am some input that needs to be encoded"
key = "ewpratten"

message = ""

for i, char in enumerate(file):
    message += chr(
        ord(char) ^ ord(key[i % len(key) - 1])
    )

The output of this contains non-displayable characters. A second base85 encoding is used to fix this. Running the example snippet above, then base85 encoding the message once results in:

CIA~89YF>W1PTBJQBo*W6$nli7#$Zu9U2uI5my8n002}A3jh-XQWYCi2Ma|K9uW=@5di

If using the shift2 commandline tool, you would see a different output:

B2-is8Y&4!ED2H~Ix<~LOCfn@P;xLjM_E8(awt`1YC<SaOLbpaL^T!^W_ucF8Er~?NnC$>e0@WAWn2bqc6M1yP+DqF4M_kSCp0uA5h->H

This is for a few reasons. Firstly, as mentioned above, shift2 uses base85 twice. Once before, and once after XOR encryption. Secondly, a file header is prepended to the output to help the decoder read the file. This header contains version info, the file length, and the encoding type.

Try it yourself with PIP

I have published shift2 on pypi.org for use with PIP. To install shift2, ensure both python3 and python3-pip are installed on your computer, then run:

# Install shift2
pip3 install shift-tool

# View the help for shift2
shift2 -h

Try it in the browser

I have ported the core code from shift2 to run in the browser. This demo is entirely client-side, and may take a few seconds to load depending on your device.


Future plans

Due to the fact that shift2 can also be used as a library (as outlined in the README), I would like to write a program that allows users to talk to eachother IRC style over a TCP port. This program would use either a pre-shared, or generated key to encode / decode messages on the fly.

If you are interested in helping out, or taking on this idea for yourself, send me an email.