Probably one of the more boring and contrived subjects to discuss, especially in old platforms like this, is how to handle memory in applications, from demos to games or even a normal GEM application. However, since people are still confused about this on anything than the trivial cases, it’s probably a good idea to review some concepts and offer some food for thought.
Table Of Contents
How to allocate RAM from the system in the first place
Let’s hope right into it without further ado. What are some common ways to allocate RAM for our program?
Let the assembler/compiler handle it
“Just use ds.w
bro!”. And indeed that method gives a lot of mileage. Just tell the assembler or compiler when creating our TOS PRG to reserve some RAM on the BSS and give us its start address by means of a symbol:
bss
my_awesome_buffer: ds.w 16384
or in C:
unsigned word my_awesome_buffer[16384];
Boom, done. For most cases this is what people do and get away with.
…until the computer runs out of RAM and there is need for more.
Hardcoded addresses
An equivalent to that is to go one step above and not even ask the assembler. Just hardcode all the buffers as equates and move on. This is very valuable when one needs absolute control over RAM, doesn’t care to be system friendly, and does not intend to return to the system at all (barring a total system reset).
Perfectly legal and doable. As long as the program and other data are out of the way of those addresses. Which means that a relocator is probably needed after loading. Or a system that takes over the system at boot time and disables the OS. Which increases complexity. And, in this age, reduces the amount of people that are going to be bothered to use the software as most people these days own a sort of hard disk solution and don’t really want to mess with floppies. (Unless one also writes a hard disk driver that is low level for their purposes, and manages to not trash other data on disk, etc).
Allocation via spreadsheet
Not to diss the above ideas of course. There have been a good number of software shipped that have their RAM allocation managed manually by people using spreadsheets. There is an anecdote of one of the people who worked on the original God Of War games on PlayStation 2 (Christen Ericson) describing this idea as-is. Unfortunately the story is now probably hidden behind Twitter’s “log on to see post” wall, so it’s not easy to find (and the author might have taken then post down anyway).
System allocator
If static buffers don’t cut it, then it’s time to let the system assist.
The relevant functions we need are Malloc and Mfree in GEMDOS. One asks the system for a block of contiguous RAM like so:
move.l amount,-(sp)
move.w #$48,-(sp)
trap #1
addq.l #6,sp
and the other frees it:
pea startadr
move.w #$49,-(sp)
trap #1
addq.l #6,sp
There are of course some drawbacks with this as well:
- It is common knowledge (or at least it used to be) that the early TOS versions (especially 1.0) have many bugs in Malloc, especially when many different blocks are requested. The situation was so bad that the early C SDKs used their own Malloc replacement as part of their runtime. So the best good practice is to reserve as much RAM as possible into one block, and then split the allocation manually
- Freeing a block of RAM does not automatically mean that it’s available. For example, if 3 blocks are allocated with sizes
a
,b
andc
, thenb
is freed, and we request a block of sized
>b
then the system cannot give theb
block to the user. This is known as fragmentation and although TOS tries to do its best to work around the problem (like for example joining adjacent freed blocks into a common large one), the system can run out of RAM under circumstances - On later TOS versions there is another call in GEMDOS called MxAlloc which can also give blocks of “fast” (or “TT” or “alt”) RAM
Consolidating all the above points, we come up with something like the following C code to allocate almost all RAM available:
#include <osbind.h>
long ram_available;
void *free_ram_start;
void *allocation_start;
void grab_ram()
{
// Check if GEMDOS is new enough to use Mxalloc
// (so we can allocate TT/FastRAM)
unsigned short gemdos_version = Sversion();
unsigned char use_mxalloc = gemdos_version >= 0x1900;
if (use_mxalloc)
{
ram_available = Mxalloc(-1, MX_PREFTTRAM);
if (!ram_available)
{
Cconws("\x1b" "ECannot allocate RAM!\r\nPress any key to exit.");
Pterm0();
}
free_ram_start = (void *)Mxalloc(ram_available, MX_PREFTTRAM);
if ((unsigned long)free_ram_start < 0xe00000)
{
// No FastRAM allocated, so let's mark that we didn't
// (needed for allocating screen RAM for pictures below)
use_mxalloc = 0;
}
}
else
{
ram_available = Malloc(-1);
if (ram_available < 65536)
{
// Probably enough for caching all the music files
Cconws("\x1b" "ENot enough RAM!\r\nPress any key to exit.");
Pterm0();
}
ram_available -= 1024; // leave 1k to the system for luck
free_ram_start = (void *)Malloc(ram_available);
}
allocation_start = free_ram_start;
}
void free_ram()
{
Mfree(free_ram_start);
}
Scared of C? See the assembly code for the above here.
Using RAM
Using some method from the above (or even something completely different, doesn’t matter) we are now the proud owners of a block of RAM. How can we use (and possibly re-use) it effectively?
Assembler handles it (hardcoded)
If we simply asked the assembler to reserve some memory statically (either from TEXT, DATA or BSS segments), and we don’t care much, we can just go ahead and fill it in with the data we want to generate, and that’s all.
In case that the buffer needs to be used more than once, we simply need to rewrite to it, taking care that the two (or more) use cases don’t overlap. This can lead to waste if the amount of RAM needed between the use cases is significantly different.
Structs
XiA has written a piece on this which is a recommended read. Here we will present a quick review of the ideas discussed.
Assume we want to store a few parameters in that buffer, like game stats. Traditionally one can do something like the following:
lives: ds.w 1
energy: ds.w 1
powerup_charge: ds.b 1
hidden_level_trigger:
ds.b 1
game_state: ds.w 1
sprite_base_pointer:
ds.l 1
sprite_animation_frame_index:
ds.b 1
And so on and so forth. Then we can simply access the data by refering to the labels:
subq.w #1,lives
beq game_over
move.w #$3f,energy
rts
That’s all good, but what if in the example above we want to have 2 players? The easy solution is to just duplicate the variables and also the controlling code that accesses them. What if we need 20 though? In this case can use something more sophisticated (note: the code has been written with the assembler rmac in mind, for devpac/vasm follow the instructions in the comments):
.abs 0 ; devpac: rsreset
player_lives: ds.w 1 ; devpac: rs.w
player_energy: ds.w 1 ; devpac: rs.w
powerup_charge: ds.b 1 ; devpac: rs.b
hidden_level_trigger:
ds.b 1 ; devpac: rs.b
game_state: ds.w 1 ; devpac: rs.w
sprite_base_pointer:
ds.l 1 ; devpac: rs.l
sprite_animation_frame_index:
ds.b 1 ; devpac: rs.w
player_data_size equ ^^abscount ; devpac: player_data_size equ rscount
.text ; devpac: not needed
Here we asked the assembler to create a series of symbols that their values start at 0 and increment depending on the size requested in the ds (rs) instruction. So in practice this is player_lives equ 0
, player_energy equ 2
etc etc.
Now we can instantiate two or more of those structures whenever we want in RAM, but in our case we could do something like (assuming a0
has our buffer’s start):
move.l a0,player_1_data
lea player_data_size(a0),a0
move.l a0,player_2_data
...
bss
player_1_data: ds.l 1
player_2_data: ds.l 1
And then we can simply use them like this:
move.l player_1_data,a0
bsr handle_lives
move.l player_2_data,a0
bsr handle_lives
...
handle_lives: subq.w #1,player_lives(a0)
beq game_over
move.w #$3f,player_energy(a0)
rts
...
This way, more reusable and flexible code can be written at the expense of a speed penalty. But in practice this shouldn’t be too bad (depending on the case of course).
Bump allocator (ask for RAM from allocated block)
Once we have obtained a block of RAM, no matter how, we can start allocating it as we want. There has been a substantial amount of research and development done in this field, but for most purposes we can get away with the simplest one we can get away with. And that’s what is commonly known as a bump allocator.
There are many resources to consult about the bump allocator (for example Wikipedia), but in the end it boils down to: keep a pointer to an address in RAM, starting at the top of the allocated block of RAM. Every time a routine asks for some RAM, return that address, and then increment it by the amount of RAM requested. That’s all there’s to it: it’s terrible, it doesn’t support freeing RAM (perhaps just the last block allocated, but that’s mostly up to the caller), but it’s super fast and that’s what most people need most of the times. It also comes with the benefit of instant deallocting of all buffers by resetting the allocation pointer.
“Fixed” and “Dynamic” allocation
As said above, bump allocators have a hidden cost of not being able to free allocated memory easily (which might also lead to fragmentation). Suppose we have a use case where some allocated buffers have to be reused many times (let’s call them “fixed”), while others have to be wiped before the fixed buffers (let’s call those “dynamic”. This will lead to slightly more complicated handling code, and we don’t like complications.
One solution would be to perform the allocation of the fixed buffers before the dynamic, so at least we know where to reset our allocator when it’s time to let the dynamic buffers go. At best this is mildly uncomfortable, but it could be impossible as a fixed buffer might need to be allocated after the dynamic ones due to execution order (for example, we might not know how much memory to allocate before too late).
Another idea is to have two different bump allocators, each with their own allocated block of RAM. But this immediately brings up the problem of how much RAM to allocate to each allocator. In cases where RAM availability is low, one might have to change these values quite a bit until everything fits.
A more elegant solution is inspired by The Black Lotus, and simply involves having either the fixed or dynamic blocks coming from the start of the allocated block towards the end, and the other coming from the end going to start. (Check out the original article under “Memory Allocation) So basically: 2 bump allocators, one starting from the start, and the other from the end.
The implementation following was used almost verbatim in TONY - Montezuma’s Gold
DYNAMIC_RAM_SIZE .equ $30000
; d0=how many bytes does the idiot want.
; if negative, reserve the absolute value from the static reserve
; (the idea here is that we reserve from the same block of ram. "dynamic" allocations
; run from the start of the buffer towards the end, and "static" allocations start
; from the end of the buffer towards the start. Obviously, if the static and dynamic
; cursors cross we're out of RAM)
; On exit: d0=address or 0 on failure
ask_for_some_ram:
lea current_dynamic_ram_pointer,a1
tst.l d0
bgt.s ask_for_some_ram_dynamic
lea current_static_ram_pointer,a1
move.l (a1),a0
add.l a0,d0
and.w #$fffc,d0 ; round it down to 4 bytes
move.l d0,(a1)
bra.s ask_for_some_ram_check_bounds
ask_for_some_ram_dynamic:
move.l (a1),a0
add.l a0,d0
addq.l #3,d0 ; round it up to 4 bytes
and.w #$fffc,d0
move.l d0,(a1)
move.l a0,d0 ; return value
ask_for_some_ram_check_bounds:
move.l current_dynamic_ram_pointer,a1
cmp.l current_static_ram_pointer,a1
bgt.s ask_for_some_ram_fail ; static and dynamic buffers crossed, we're out of RAM
rts
ask_for_some_ram_fail:
moveq #0,d0 ; insert appropriate swear word here
rts
.data
current_dynamic_ram_pointer: .dc.l dynamic_ram_buffer
current_static_ram_pointer: .dc.l dynamic_ram_buffer+DYNAMIC_RAM_SIZE
.bss
dynamic_ram_buffer: .ds.l DYNAMIC_RAM_SIZE>>2
.text
free_dynamic_buffers:
move.l #dynamic_ram_buffer,current_dynamic_ram_pointer
rts
The code should be self explanatory, but two things worth mentioning are: a) There is bounds checking so if the two pointers cross, then the allocator is out of RAM and then it signals the caller by returning 0, b) RAM blocks are aligned to 4 byte increments to save the calling code some logic.
Initialising RAM
One common trick which is very handy for debugging (and runtime checks also) is to fill the allocated buffer with a known pattern before handing it off to the caller, like for example $12345678
. That way it becomes much easier to find problems where more RAM is required than allocated, because the spillover will be visible under a debugger, as the buffer will spill over the allocated amount and the pattern will not be visible.
An extra trick here is for the allocator to reserve more RAM than asked (for example 16 bytes), and fill that extra bit with yet a different known patter, like $87654321
. Then if 2 consecutive buffers are both filled and the pattern is corrupt then this is an indication that one buffer tried to spill to the next one. This might even prevent the program from crashing, but this is not the intended use.
(Note that these can also be runtime checks and added to the code at any suspicious place. Then locating the source of memory spillovers can be easier. Alternatively, if a debugger supports it, write breakpoints can be employed)
Linked list - Arena allocator
With more complex problems we might not know how much memory we need ahead of time. We might want to store homogenous data (i.e. data of the same structure) or eterogenous data (i.e. different structures). And we might not know how many of each structure we need.
One of the simpler ways to solve this problem is often called an Arena allocator. The general idea is that each time we run out of RAM, we allocate a fixed amount that is reasonable and we use it.
Some code might make this concept clearer:
ARENA_SIZE=4096
; Struct 1
.abs
struct_1_next: ds.l 1 ; points to the next structure of this type
struct_1_field_1: ds.l 1
struct_1_field_2: ds.w 1
struct_1_field_3: ds.w 1
struct_1_field_4: ds.b 1
struct_1_field_5: ds.b 1
struct_1_size = ^^ABSCOUNT
.text
; Struct 2
.abs
struct_2_next: ds.l 1 ; points to the next structure of this type
struct_2_field_1: ds.w 1
struct_2_field_2: ds.w 1
struct_2_size = ^^ABSCOUNT
.text
; Arena data
.abs
arena_next: ds.l 1 ; points to the next arena (0=no more)
arena_data_size: = ^^ABSCOUNT
.text
;
; Or setup code
;
move.w #struct_1_size,d7
bsr add_struct_to_arena ; allocate the first one
move.l a0,struct_1_first ; save the address of the first one
move.l a0,a1
moveq #120-1-1,d0 ; need to allocate 120 "struct_1"s (the first is allocated above)
allocate_1:
bsr add_struct_to_arena ; allocate another one
move.l a0,struct_1_next(a1) ; link the previous one to this
dbra d0,allocate_1
clr.l struct_1_next(a1) ; signal end of list
move.w #struct_2_size,d7
bsr add_struct_to_arena ; allocate the first one
move.l a0,struct_2_first ; save the address of the first one
move.l a0,a1
moveq #600-1-1,d0 ; need to allocate 600 "struct_2"s (the first is allocated above)
allocate_2:
bsr add_struct_to_arena ; allocate another one
move.l a0,struct_2_next(a1) ; link the previous one to this
dbra d0,allocate_2
clr.l struct_2_next(a1) ; signal end of list
; We now need to allocate and append a few more "struct_1"s to the chain.
; We could have kept the last address when the first batch was added, but
; if we want to avoid that, we can now traverse the list of "struct_1"s
; until we find one which has a 0 in its "next" element
move.l struct_1_first,a1
scan_1:
move.l struct_1_next(a1),d0
beq.s scan_1_done
move.l d0,a1
bra.s scan_1
scan_1_done:
; When we reach this, a1 has the last linked list in the chain
move.w #struct_1_size,d7
moveq #93-1,d0 ; need to allocate 93 "struct_1"s
allocate_1_part_2:
bsr add_struct_to_arena ; allocate another one
move.l a0,struct_1_next(a1) ; link the previous one to this
dbra d0,allocate_1_part_2
illegal ; end of demonstration, folks!
; Add struct to arena if there's free space for it, or allocate new
; Input: d7.w=struct size
; Output: a0.l=pointer to struct's start
add_struct_to_arena:
move.l d1,-(sp)
move.w current_arena_free_bytes,d1
cmp.w d7,d1
bge.s add_struct_to_arena_go
; Out of size for current arena, allocate a new one
move.l #ARENA_SIZE,d0
bsr ask_for_some_ram ; use the above allocator, or any else
; TODO: panic if this fails (i.e. d0=0)
move.w #ARENA_SIZE-arena_data_size,current_arena_free_bytes ; set the number of free bytes for this arena
move.w #arena_data_size,current_arena_offset ; set the offset of the first free byte for this arena
move.l current_arena_address,d1
move.l d1,a0
bne.s add_struct_to_arena_link ; skip if we already have an arena
; This is the first arena, so we need to do some extra steps
move.l d0,first_arena_address
move.l d0,a0
bra.s add_struct_to_arena_finalise
; Point current arena to new one
add_struct_to_arena_link:
move.l d0,arena_next(a0) ; point current arena to next
move.l d0,a0
add_struct_to_arena_finalise:
clr.l arena_next(a0) ; indicate that this is the last arena in the chain
move.l a0,current_arena_address
add_struct_to_arena_go:
add.w d7,current_arena_offset ; update offset for next addition
sub.w d7,current_arena_free_bytes ; update number of free bytes
move.l current_arena_address,a0
adda.w current_arena_offset,a0 ; a0 now contains the start address we return
move.l (sp)+,d1
rts
; Static arena bookkeeping data
current_arena_address: dc.l 0
current_arena_offset: dc.w 0
current_arena_free_bytes: dc.w 0
first_arena_address: dc.l 0
struct_1_first: ds.l 1
struct_2_first: ds.l 1
Our scenario is that we have two structs, struct_1
and struct_2
and we require a few copies of each. Unforunately we don’t have any guarantee about the ordering, so we need to keep the address of each of the copy. This is stored in the field struct_1_next
or struct_2_next
as an address. So if we want to process all struct_1
copies, we can simply start from the address of its first copy, struct_1_first
, do some work, then acquire the next copy by reading struct_1_first
’s struct_1_next
field. We’re at the end of our linked list when struct_1_next
reads 0. See scan_1
above for an example.
Because the count of each copy of the structs we need might be arbitrary, we allocate RAM for them by using the subroutine add_struct_to_arena
. This routine allocates chunks of ARENA_SIZE
of bytes (set to 4k in our example) and adds each struct by using a bump allocator. When the current arena is exhausted, we simply allocate a new one, and link them together. This is not strictly necessary, but if for some reason we want to deallocate the RAM at some point and are using the system allocator, then this gives us a way to free the arenas (as Mfree requires the address of the start of each allocation).
There are more things one can do with linked lists/arenas like deleting entries, reclaiming space, sorting, etc. but these are omitted here. Some links at the end should cover these topics.
Please note: the code above has not been tested, but is more there to demonstrate the concept. If anyone discovers bugs in it, don’t hesitate to get in touch and they will be corrected.
List pools
A different way to solve the problem in the previous section is presented here. Here, we have a pool of lists (structs) that are initially unlinked. This list is doubly linked, i.e. it contains a “next” as well as a “previous” member. This means that the list can be traversed in both directions. In order to know where the list starts and ends, two “sentinel” items are placed at the start and end.
As said, at first (during “reset”) the list contains only the sentinel objects. Then during “initialise” an item is pulled from the pool of available lists (the last one) and is added to the chain. Then another list is pulled from the pool and is added to the chain. Finally, one list is deleted from the chain and is placed back into the pool.
This kind of structure offers plenty of flexibility. The pool is statically allocated in this example, but it could easily be grown by adding a new block of ram and allocating more structs to it (for example see the arena allocator above).
(Again, this code has not been tested and exists for demonstration purposes. Any bugs reported are appreciated)
; Reset list pools
lea list_pool(PC),A6
move.w #(max_list_items-1)*4,list_count
moveq #max_list_items-1,D7
lea list+(max_list_items-1)*my_list_len(PC),A2
pathfind_resetpool:move.l A2,(A6)+
lea -my_list_len(A2),A2
dbra D7,pathfind_resetpool
; Initialise lists
subq.l #4,A6 ;grab first pointer from pool for initial entry
subq.w #4,list_count
move.l #list_first,my_list_prev(A6) ;terminator for first open
move.l #list_last,my_list_next(A6) ;terminator for first open
move.l A6,list_first+my_list_next ;initialise sentinels
move.l A6,list_last+my_list_prev
; Let's add a list from the pool
lea list_pool,A1 ;get a new object from the pool
move.w list_count,D0
; TODO panic if D0=0
subq.w #4,D0
movea.l 0(A1,D0.w),A1
move.w D0,list_count
;update pointers
move.l #list_last,my_list_next(A1) ;last in the list for now
move.l A1,list_last+my_list_prev
move.l A3,my_list_prev(A1)
move.l A3,my_list_parent(A1)
move.l A1,my_list_next(A3)
;
; Remove item in a4 from open list and place it on closed
;
move.l A1,A4
lea list_pool,A3
move.w list_count,D5
move.l A4,0(A3,D5.w) ;release item into the pool
addq.w #4,D5 ;update pool counter
move.w D5,list_count
movea.l my_list_prev(A3),A2 ;fixup open list chain
movea.l my_list_next(A3),A3
move.l A3,my_list_next(A2)
move.l A2,my_list_prev(A3)
illegal ; And that is all
.abs
my_list_x: DS.W 1
my_list_y: DS.W 1
my_list_dir: DS.W 1
my_list_scor_mov: DS.W 1
my_list_scor_est: DS.W 1
my_list_parent: DS.L 1
my_list_next: DS.L 1
my_list_prev: DS.L 1
my_list_len EQU ^^ABSCOUNT
.text
; Space for lists
even
list: DS.B my_list_len*max_list_items
; Space for sentinel objects for lists
; Should be initialised before use!
list_first:DS.B my_list_len
list_last: DS.B my_list_len
list_pool: DS.L max_list_items
list_count:DS.W 1
Closing words
This article grew substantially in size from its initial conception. It is a good example of how a seemingly simple and boring subject like this can get very nuanced/complicated.
It is not expected that people will try to implement or use in practice all the ideas presented. At the very least this should have provided some food for thought, and possibly inspiration for people to mutate the ideas presented to suit their own needs.
Further reading
Niklas Gray has written 3 articles about data structures in the now-defunct blog for “Our machinery”: Part 1, Part 2, Part 3. Recommended.
Dmitry Soshnikov has written a more extensive article on pool allocators, well worth a read.