03 October, 2011

10 kinds of people

The distinction between binary and not binary probably came from the same mindset which thinks that percentages are integers, and that percentages and fractions have nothing in common.

Let's face it - how is a value decided to be binary? The most common test is "it displays little squares when I open it in Notepad". Why not call it printables vs nonprintables?

Because, deep down under, it's all binary. All the computer memory in known history has flip-flop elements, which can have one or two states, whereas the hardware decides to interpret one state as a zero, the other as a 1. This element is called bit, binary digit.

There, I said it.

We use the software at various levels to aggregate these elements into larger units, specifically the immediate next level, an octet of bits, which we customarily call byte (but byte doesn't have to have 8 bits - there were systems with 9 bit bytes back in the day - the 9th being the parity check bit). Now a byte can contain any value between 0 and 255 (or -127 and 128 in some cases), which is then assigned a meaning by the software. It may be a (part of) a command in the program, it may be a part of the text, a number, a pixel, a sound, a physical measurement (number then again, eh?), or just a collection of bits (if used for bitwise logic, zipping and some encoding or encryption techniques). Which of these are binary? All of them. They are made out of bits, which are, c'mon, you just (re)learned it, binary digits.

So I'm ranting about people who confuse binary with nonprintable? Partly, yes. What's deemed binary nowadays is anything that contains text beyond the 7-bit standard, which is characters in the 32-127 range, plus carriage return, line feed, tab and sometimes form feed. There are still machines out there which will not accept anything which falls out of this range, or they will but with unpredictable results if you don't know how to specifically tell them what to do. This was the cause of all those clips in mp3 files in the early days of Napster, because they were sent around via uucp without bothering to specify a certain flag, and all the carriage returns in them (which is the Windows end-of-line marker) were replaced with line feeds (same for unix). Because they were treated as text.

Is any particular value binary? Maybe, or it depends. Any number you store in memory is most likely a binary, even if it is an integer. Binary formats are so much easier for the processor - they ere the native format and ready to use. Other formats require conversion - which will happen sooner or later, because you will want to see the result, or send it. Humans don't read binary - and even 10100001001001000 is not the actual binary, it's an alphabetic representation of it. It just doesn't make sense to do that conversion into readable every time a number is used in a calculation, considering that we may want to see only the result, and that may come after millions of basic operations. Converting back and forth every time we add one of a million numbers is a waste.

Your value may be stored on disk as text or binary - i.e. converted into a readable format, or taken verbatim from memory (or not verbatim, but still binary - that's industrial standards for you). Text you can see in Notepad; binary not.

At the bottom of it, it's still all binary. But this being English, nobody bothered to invent a word to distinguish a readable from unreadable formats, so they created the usual confusion by burdening the same donkey (the word binary) with two unrelated tasks.

Is this a rant? Maybe not. Just felt I'd clear the things up for the latecomers in the game.

0 back and forths: