Modular memory
Large, flat addressing in programs is wasteful. Here’s how we can do better with a little geometry.
Large, flat addressing in programs is wasteful. Here’s how we can do better with a little geometry: enter modular memory.
I had hinted towards this with some paragraphs written back in September of last year, which appeared in an early draft of the README of the Byblos SDK’s new assembler, Oración. It takes the initial context of cleanly targeting DOS and its 16-bit memory model in an extensible way, then expanding outward to talk about highly parallel high-performance computing, and how the model can be generalised onto platforms that have larger pointers. Here I have transcribed it for posterity:
In PC/DR/MS-DOS
It is most natural to generate binaries that are COM files or can easily be turned into them. This means a conventionally-called “small memory model” is used, and assembler output is limited to 64 KiB - 256.
By using segmenting of larger RAM capacities in this natural way, it becomes an obvious basis for a “module format” of sorts on top of which a schematic can be laid for a program head (loader), modules, and “heaps” of additional working memory for file buffering or such. We call these blocks “16-modules” for the 16-bit addressing they use within. They can be further distinguished as “16-ROMs” for code/data typically loaded read-only from file, and “16-RAMs” for modifiable heaps of working memory, including file contents in the process of being created.
Beyond DOS, esp. in parallel
For compute kernels, 16-bit addressing presents an attractive bounding point on which to limit the size of code and data modules. This is true thanks to the compact size of the pointer data with regard to memory capacities; no bits in pointers are wasted, and furthermore the signal-to-noise ratio of such pointers is kept high, as typically the entire addressing capacity is used by the program.
It is straightforward then to adopt wholesale the 16-module model outlined for DOS, and give it a secondary purpose of compelling programmers to make room for logical extensibility of the programs they write. Parallel and/or concurrent computing thrives on programs that are as subdivided and simple in a kernel as possible, so reaching of RAM limitations is a clue that the architecture is being stretched beyond its optimal parameters anyway.
Modules in flat land, incl. the GBA
The Game Boy Advance presents a favourable context on top of which we can model programs in larger, flat address spaces where pointers are typically 32 bits if not a higher exponent of 2 (e.g. 64 or 128). When a machine provides pointers this large, we can take the 16-module concept and make it configurable in size; we call these “X-modules” as the number of bits their local addressing schemes consume is arbitrary. We describe here a notation not unlike the Worldwide Web’s “Classless Inter-Domain Routing” or “CIDR”, which we have come to call “Module Segment Notation” or “MSN”:
{leading nybbles from MSB}:{number of significant bits from LSB}
For example, if one wished to describe a segment that began with the hexadecimal of 05 and the last 9 bits of which were significant, the MSN would be:
05000000:9
Alternatively, one may abbreviate all of the remaining zeroes using an additional colon in IPv6 fashion, like so:
05::9
This kind of configurable modelling has useful properties for bounding the analytical task of law enforcement in C*
, and may also be of use with conventional trap conditions in C runtime implementations. As GBA programmers will know, this maps trivially to the complete memory map of the console for all regions, with the added benefit of providing for unlimited further subdivision (e.g. sub-regions within ROM, or a “heap” within EWRAM, or such). It is a kind of soft memory segmenting that gives programmers much greater analytical power and at a negligible overhead energy cost, the same of which cannot be said for MMU-based protections.
This kind of model can be ported over without much change to similar machine platforms, including later revisions of ARM, IA-32 and AMD64, and RISC-V.
Much experimentation was done in January of ‘23 in the v2 branch of Hinterlib to figure out a basic model in ANSI C for such modulation of memory. Eventually, a few terms were settled upon: twigs, trunks, and addresses (distinguished from pointers).
Twigs are the central unit of information at hand here. Theoretically they may be any size, but for practicality’s sake Hinterlib defines 8-bit, 12-bit and 16-bit twig sizes, along with corresponding trunk and address type names for each. Regardless of your machine’s pointer size, you can be sure that you can always directly apprehend and work with a twig without worrying about second-order effects that degrade performance. Or, at least that’s the theory. You may still run into automagical processor caches that don’t play well with it. But fundamentally, the idea is incredibly friendly to the economies of SRAM, so it’s really more on your processor vendor if we’re being honest.
Anyway, with the twigs being so small, you will often need a larger component that contextualises it in the grand scheme of things, so to speak; this is precisely what trunks are for. Like twigs, trunks are also arbitrarily sized, up to 128 bits, which is enough granularity to store the coordinates of every atom in the observable universe with micrometre precision. (If it wasn’t, we would provide 256 bits as the maximum.) So, for whatever twig size you may have, you can get a trunk for it that ratchets it up to 32, 64 or 128 remaining bits, irrespective of your machine’s pointer size. In larger systems, those higher bits can become quite relevant.
This neatly brings us to our last concept, namely addresses. Addresses are a quick way to take a twig-pointer combination and size them down to whatever size pointers are on your machine, while still retaining the structure that separates the twig from the trunk.
A big point that I want to drive home about all of these provisions is that they are not types! In fact, this whole system is nothing more than a ruleset for bit manipulation, and I even went so far as to define and utilise a CPP macro HN_TYPELESS
that tells Clang and GCC as much. I know, according to their docs, the may_alias
attribute doesn’t literally mean typelessness, but for all intents and purposes it entirely shuts them up about abstract data type rules, because it forces the compiler to treat the data as nothing more particular than a set of octets.
This radical typelessness is a central pillar of Hinterlib 2.0, and it’s also something I discussed in-depth on this blog before:
Without the type system enforcing its arbitrary conceptual supremacy on code, all that remains are voluntary boundaries and distinctions about octets and bits, and code that manipulates the data as such. This is a key of mechanicalism and a serious attempt at implementing the Harvard architecture in software.
In your bio you mentioned working with bare metal where you get lots of ability to setup memory rules and adjust startup rules. I wonder to what degree are there general ways to do so under conventional operating systems like Linux, especially in portable ways.