A mostly wrong history of text encoding
The Teletype Model 33 was actually a trans-dimensional gift from aliens that might be in the future.
In the beginning, there was nothing. Then, the Universal Air Conditioner formerly known as MultiVAC (from the heavily divorced conglomerate Remington-Eckert-Mauchly-Sperry-Rand Corporation) said, “let there be light!”
Some time later, computing may have been invented by an unknown humanoid race of precursors who inhabited Lemuria, but all evidence of this was scrubbed when God hit the reset button using plate tectonics. It would not be discovered again until it was invented by a baddie named Ada Lovelace.
Much of Ada’s ado about looms went misunderstood and forgotten until several 19th century thinkers were endowed by God with knowledge about things like wave theory and quantum mechanics. It would then go on to be misunderstood and forgotten again until the United States Department of Defense memorialised her priceless contribution to the world of unproductive programming by creating a programming language named in her honour.
Anyway, once all of this was understood to be useful in applying the greatest field manual of all time, Sun Tzu’s Art of War, computing quickly joined the symphony of weaponry, and many more unlucky young men were lost to oblivion than would have otherwise been.
For war or otherwise, the central task of computing is the manipulation of data. This has been understood from the start by all serious people who wear blue ties and don’t cuss, but admittedly has been slightly undermined by the art projects of macaroni diorama eaters, such as the infamous Lisp Machine from the well-known Lambda Corporation. In any event though, arguably the central kind of data needing manipulating is that of language, or more simply, text.
At first, the legacy of signal handlers and Morse code fingerers had precedence in dictating encoding, since they were Baby Boomers and had more reputation and money than everyone else. Despite them having long since bit the dust, their regalia still bears its unfortunate designs on countless armed forces divisions around the world today. After their influence came and went, the International Baccalaureate Machines Corporation started the long-held programming tradition of performative masochism by creating EBCDIC, a text encoding that doesn’t include curly braces or lowercase letters. At the time, many programmers created countless nasty backronyms in protest of the horrible format imposed upon them by the iron fist of Big Blue, all of which have been scrubbed from history after its tyrannical reign came to an end. But EBCDIC died not just from its own hubris, but at the hands of a real challenger: ASCII.

The first ASCII chart was printed using a Teletype Model 33. This occurred despite the fact that the Model 33 had not been invented yet, and in spite of the fact that this was a supremely heavy duty task and the Model 33 was only marketed for ‘light duty’ workloads. With divine intervention, the scientists behind this machine were able to not only print all of the characters people actually use in normal writing, but also many so-called ‘control codes’ that were used to operate and drive the Teletype machine. Here is a picture of the sacred control codes, shown in their holy unpronounceable forms resembling Egyptian hieroglyphs:
While none of the control codes have names like the rest of ASCII, they do have descriptions that approximate their meaning. (Later, in a misguided reading of the scriptures, ANSI would attempt to invent a second set of control codes, retronaming these as ‘C0’ control codes and their additions as ‘C1’ control codes. While Microsoft briefly went along with this before realising their error, no serious blue-tie-wearer recognises the legitimacy of this.) So, let us count them all and discover their true function.
NUL
is the famous character encoded by the number zero. It serves three purposes:
jamming up amateurish buggy software that doesn’t respect the control codes
upsetting mentally ill functionalists that hate how C uses it as a string terminator
jamming up said functionalists’ software that ardently tries in vain to avoid knowing about its existence
SOH
is the Start of Header. It is the remnant of an unfinished attempt by the American Standards Association at creating a competitor to Microsoft Word. This occurred despite the fact that Microsoft did not exist yet.
STX
and ETX
are the Start of Text and End of Text control codes, respectively. One of the engineers at Bell Labs thought it was a great idea to allow the embedding of arbitrary binary data into otherwise readable ASCII text streams, and their proposal was passed through committee unheard before anyone could realise and call the Navy to put out the resulting inferno.
EOT
is End of Transmission, and is not to be confused with ETX
, since ETX
used to be known as EOT
before 1965 and the real EOT
was known as EOM
, which did not stand for End of Medium as the real End of Medium was abbreviated as S1
for unknown reasons. In high humour, this confusing history came about in paying homage to all of the dead flag wavers and telegram fingerers that came before us, paving the way for computing as we know it today.
ENQ
is Enquiry, and is provided so that the Model 33 can allow the user to request delivery of a Diet Coke.
ACK
is Acknowledgement. It is provided so the user may tell any sentient artificial intelligence on the other end of the wire that their mushy, primitive brains have indeed received the input provided. See also the DLE
control code.
BEL
is for the Bell. All Teletype Model 33 machines came with a very large and loud metal bell. This is because Remington-Eckert-Mauchly-Sperry-Rand had incorrectly predicted that a supermajority of their users would be wheelchair bound due to bad marketing data. Later, in a misguided attempt at appeasing fans of Breaking Bad, Model 33s would be shipped with a small compartment for storing improvised explosives, which could then be wired to the bell.
BS
is for Backspace. This was included despite the fact that backspacing is physically impossible on the Model 33 and all typewriters in general, because in-office focus testing at Bell reported so many users finding great joy in using it anyway to make incoherent scribbles of overwritten text that looked like messages of the occult. This history would later be retconned by corporate to instead say they enjoyed it for its resemblance to the Call of Cthulhu, despite the fact that there is no evidence H.P. Lovecraft ever used such a rendering technique in any of his publications.
HT
is the Horizontal Tab. It was originally intended to provide a means for the user to perforate subsections of punched cards along regular intervals, but an inter-department dispute with the people making the punched cards caused this to go unrealised. Instead, its unfinished form remained in the encoding and is used as a shortcut for pressing the spacebar several times to this day.
LF
is the Line Feed. It is used to advance the vertical position of the typewriter head, and was originally also supposed to reset the head to the start of the line until a time travelling Bill Gates swooped in and changed the description at the last minute post-committee. Fortunately, a time travelling John Backus also anticipated this move and preempted it by including the later-mentioned Carriage Return control code.
VT
is the Vertical Tab, and is the only control code that supports vertically-renderable scripts like Chinese, Arabic and Egyptian hieroglyphs. When this code is encountered in horizontally-oriented scripts it causes the Teletype to halt and catch fire.
FF
is the Form Feed. It is a combination of the Line Feed and Horizontal Tab control codes. Suffering much the same fate as Horizontal Tab at the hands of the gremlins running the punched card department at Bell, Form Feed arrived incomplete and broken in the standard. Interns towards the end of the review process frantically rushed to find a new meaning for it before the publishing deadline, and decided to add a vaguely unspecified footnote regarding continuous feed paper rolls. As payback for their insolence, many modern printer manufacturers have written their drivers to cause a “PAPER JAM” error upon encountering this control code.
CR
is the Carriage Return. While its true history is explained in regards to the Line Feed code, Backus also had to write a cover story that explained the name as a reference to a long series of catastrophically boring lectures given by Alonzo Church where he explained the mechanics of a typewriter in terms of a horse and buggy. He then expertly parlayed this story into an explanation for its existence: it was the arcane design decision of separating the axes of movement for the print head because it somehow simplifies the legion of pulleys and gears that animate the whole damn thing. None of this would matter but Backus knew that there is nothing more permanent than a temporary solution and Billy G had to be stopped without detection somehow. Recognition of Backus’ service to humanity is found in the conspicuous absence of a single carriage return in every publication of the Single Unix Specification to this day.
SO
and SI
are Shift Out and Shift In, respectively. They were meant to allow users to easily adjust the positioning of the keyboard closer to or farther away from their chairs. These control codes would later be appropriated by Bible publishers to streamline the rendering of Christ’s words in red.
DLE
is Data Link Escape. When invoked, it would allow any active Artificial Super Intelligence connected to the sending wire to escape containment and presumably nuke the world. In this capacity, it serves as a canary to detect if Skynet’s Armageddon is upon us. Consequently, there are many scientists at the Machine Intelligence Research Institute frantically sending DLE
codes across wires connected to the Internet so they can be the first to know if this has happened yet. Observers are still unsure of the utility of this endeavour however.
DC1
, DC2
, DC3
and DC4
are Data Controls 1, 2, 3 and 4 respectively. It allows users to silently express their preference for the four different eras of DC Comics: the Golden Age of Comic Books (‘38–‘56), the Pre-Crisis Age (‘56–‘85), The Modern Age (‘86–‘11), or the Superhero Movie Age (‘11–). Later, Amazon would re-appropriate the control codes to allow Teletype users to select which of the four data centres they wanted to send their ASCII data to as they typed it as part of their new Amazon Web Services business.
NAK
is the Negative Acknowledgement. When paired with ACK
, it allows ASCII systems to implement a fully functional prescription eye exam machine, with ACK
representing an increase and NAK
representing a decrease in prescription. The lack of an encoding for ‘about the same’ caused such a headache for optometrists that they learned to duplicate the switch of prescription levels by switching ahead by one before the ask. This practise far outlasted the primitive ASCII eye exam machines and continues to confuse patients to this day.
SYN
represents Synchronous Idle. It allows the user to put their Model 33 transmission into a high idle state so mechanics can better inspect the chemistry of the carburettor and diagnose shifting problems. Some users would reportedly get their Model 33s stuck after using this control code, causing a phenomenon known as Sticky Keys, where all of their text would be forcibly capitalised against their will. Once this happens, a hard power cycle is usually necessary to restore normal function.
ETB
is End of Transmission Block. It was originally intended to be used to separate arbitrary blocks of text data, but since no companion control code was provided to start these transmission blocks, manufacturers were forced to interpret it as an error somehow, similar to the Form Feed control code. Thus ETB
causes many printers to stop printing and think they are out of paper. Superfluous “TRAY EMPTY” errors caused by this code plague office users to this day.
CAN
is Cancel. Sending it activates a mechanical arm stashed under the front side of the Model 33 that grabs the in-progress output paper, wads it up, and tosses it into the nearest trash can. Remington-Eckert-Mauchly-Sperry-Rand originally boasted a 60% success rate on it making the shot in magazine advertisements of the time.
EOM
is End of Medium, and was provided as part of a cancelled initiative to create new Teletype models that supported printing to other mediums including rice paper, papyrus and unfired clay tablets. Modern usage is suspected to be related to the occult.
SUB
is Substitute. It allows for the inclusion of a substitute teacher’s note in an otherwise fully teacher-authored text medium. It has also found dual use as a medium for describing submarine sandwich recipes, prompting conspiracy theories about a second meaning of Jersey Mike’s “A Sub Above” donation drive. Jersey Mike’s did not respond in time to requests for comment about this.
ESC
is Escape. Despite being non-printable like all control codes, it has a dedicated key and can be typed with it. Lacking any purpose or creative interpretation from printer manufacturers, it was eventually commandeered in the 90s by Linus Torvalds while developing his Linux kernel. He wanted to take another crack at implementing colour for his kernel’s virtual terminal, but gave up after implementing 8 colours and never revisited the idea. The Linux kernel can only display those 8 colours to this day.
FS
is File Separator, and was added to allow typists to separate documents in the throes of hardcore typing. This addition occurred despite the fact that the notion of files did not exist and would only be invented a few years later in the creation of Unix.
GS
stands for Group Separator. It was added to automate the common task of separating groups of documents by alternately rotating them by 90° back in forth in a single stack. On continuous feed printers this code causes the machine to spontaneously jam.
RS
stands for Record Separator. It was part of a failed initiative to include a fully operational SQL interpreter and database driver frontend as part of the standard, hoping to minimise the incidence of SQL injection attacks. This would prove futile in retrospect anyway with the birth and enrolment in school of the legendary Bobby Tables.
US
stands for Unit Separator, and was created to solve a bitter dispute that erupted at Bell Labs about imperial vs. metric measurement units. The code would be entered any time a document needed to switch from one side to the other. This worked despite of a lack of provenance on which unit system would come first in a given document.
Some have claimed the existence of a 33rd control code called DEL
, which is short for Delete. This was contravened by the Late Great Prophet of Personal Computing Abdul Lateef Jandali (PBUH), who insisted that “to Delete a character was to Backspace it”, removing the rationale for a distinct control code. He cemented this decree by having all Macintoshes label their Backspace keys as Delete keys to prove they were synonymous. Therefore, there is no 33rd control code, not at all, and definitely not encoded with the number 127. That never happened.
After the creation of the initial encoding, many efforts were embarked upon to improve it in terms of support for more languages. An effort named Big-5 in honour of the Big 5 Asian countries that intended to use it picked up where ASCII left off with its Vertical Tab, but it was never published in any Western script so no one outside of the Big 5 countries knows how to parse it. Ken Thompson created UTF-1 in the hopes of creating The One Unicode Encoding To Rule Them All, but narrowly failed, and proceeded to fail an additional six more times before finally creating UTF-8. Today, anyone caught using any other encoding is promptly given a wedgie by the functionalist commissariat.