Learn how binary encoding works (it’s more fun and useful than you think)

Human language encodes ideas

Encoding isn’t just for computers. Any mapping of data from one form to another is an encoding.

Computer encodings map data onto binary

When we’re dealing with computers and data, we don’t have the luxury of using redundancy to help the computer understand what we’re trying to do. There is no “close enough”, because the computer cannot think. It can only follow our instructions. For computers it’s all about providing context.

What’s binary?

Computers are electronic devices that store data using tiny physical things that can be in one of two states: “on” or “off”. For convenience, we map these onto 1 and 0 (an encoding!) so that we can treat these states like binary numbers.

  • A tally of how many unique data values you’re going to encode.
  • Enough bits to have at least that many unique binary sequences.
  • For each unique data value, a way to reliably find a unique binary sequence to map it to.

Example encodings

Let’s go through some example binary encodings.

Booleans

“Boolean” values can only be true or false. That makes for only two unique values, so we only need one bit to get a matching number of unique sequences in binary. We could assign 0 to false and 1 to true. We could just as fairly assign 1 to false and 0 to true. After all, the 0 and 1 bit values don't mean anything. They are just unique sequences we're mapping onto our data.

Integers

At root, counting is a mechanical process. In base ten, I cycle through the numbers in a specified order (0–9) and, when I hit the highest one, I increment the number to the left and set the current one back to 0 (so 09 becomes 10 and 19 becomes 20). This works the same way for any base, including base two (binary).

Bitfields

Earlier I talked about booleans, and how wasteful they are when encoded using full bytes. Fortunately, for typical applications this waste doesn’t matter very much. But when every extra byte matters (like with high-throughput networking traffic, or costly database storage) these wasted bits can add up.

  • Can read posts (Bin 001, Dec 1)
  • Can edit posts (Bin 010, Dec 2)
  • Can create posts (Bin 100, Dec 4)

Text

We’ve now talked about booleans and integers, but what about text? Text is just another kind of data, after all.

Displaying binary: Hexadecimal and Base64

Displaying binary data is a problem: we need to be able to map each possible byte onto a “printable” character, or sequence of printable characters. ASCII doesn’t do this for us, since many of its characters are unprintable.

“Types” inform encoding

In programming, we’re constantly dealing with “types”, meaning whether or not a given value is a string, an integer. The computer has to know what we’re trying to represent with our data so that it can perform the right tasks on that data. A “type” is just an encoding!
When we specify type, what we’re really doing is providing metadata to inform the programming language how our data is encoded.

Truthiness, falsiness, and fuzzy types

In many languages we have both boolean false values and the more general concept of falsey. That is, values that are not boolean false but that we interpret as such for convenience.

  • In PHP, an empty array [] is falsey.
  • In GameMaker Studio 2 (GMS2, a program for making video games) any number less than 0.5 is falsey.
  • In JavaScript, NaN==false yields false (NaN means "Not a Number"), but casting it to a boolean does yield false, and it's also treated as false in if statements. So NaN is both falsey and not falsey, depending on context.
  • In JavaScript, using a bitwise operator on a number (which is likely a 64-bit float) first converts it into a 32-bit integer.
  • In JavaScript, 10+"10" is "1010", while 10*"10" is 100.
  • In Python 2.X, 3/2=1 but 3.0/2=1.5, while in Python 3.X both are 1.5.

Metadata and inference

Two of the recurring themes above were:

  1. Data encodings are inventions, and you can encode things however you want to.
  2. You must know how something was encoded to be able to interpret it.

Bespoke encodings

You may someday find yourself needing to store a whole bunch of data in a compact way. If your use case is specific enough, that might mean coming up with your own custom encoding.

Encoding is worth thinking about

Thinking of data problems as encoding problems can make it easier to learn programming languages, do data analysis, and solve data storage problems.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Coster

Adam Coster

CTO and Fullstack Webdev at Butterscotch Shenanigans