Why Atari ST games were slower than the Amiga versions
The dust settled on the Amiga vs Atari ST war a long time ago, with the Amiga clearly dominating the poor old ST. The Amiga often had 32 colour games, with smooth scrolling and stereo sound. The ST had 16 colour games and did not have the horsepower to match the Amiga. ST coders usually had 4 options:
- Reduce the size of the screen (often with ugly borders)
- Reduce the frame rate (resulting in jerky games)
- Reduce the number of objects on the screen (empty looking games)
- Use push scrolling near the screen edges rather than smooth pixel scrolling
Most people that owned an Amiga or Atari ST were fully aware that Amiga had the blitter chip that allowed it to mask, rotate and copy images extremely quickly. The blitter operated independently of the CPU, meaning the CPU could carry on with other tasks while the blitter continued working. But why were ST games so slow and jerky in comparison, even with a faster CPU? What was it having to do?
The Atari ST screen layout was the first nail in the coffin, and really killed performance. In 16 colour mode, 4 bitplanes are required. But the ST interleaves every word of each line, so the first word of bitplane 1 is followed by the first word of bitplane 2, then the first word of bitplane 3, and finally the first word of bitplane 4. This loop repeats for each successive word of data for the entire width of the screen, then repeats from top to bottom. The memory layout is continous, and does not have any spare areas on the side to play with.
As an example, a part of the title screen of the Atari ST version of Road Runner is shown in 16 colours, along with what memory looks like if you view it in one chunk:
Compare the Atari ST layout with the Amiga's interleaved bitmap (ILBM) format which is much more logical. Also the Amiga can optionally setup a buffer in memory that is larger than the screen, so that there are spare pixels on the sides that you can draw into without them being displayed. This is extremely useful for objects that are partially off the screen.
As an example, here is what Turrican 2 would look like if the screen was only 128 pixels wide while viewing the Amiga memory:Â
But to see why the ST suffered so badly in comparison, let's have a look at some example code from some commercial Atari ST games.
Drawing a 64x64 pixel alien on the Atari ST
This particular example is from a game that draws a 64x64 pixel alien. Undoubtedly for speed reasons, the ST version only draws it in 2 planes, allowing 3 colours (plus transparent). The game still has to clear the remaining 2 bitplanes, otherwise some other object that happened to overlap the alien would corrupt the graphics.
In this example, d0 is the X co-ordinate of the alien, a2 points to the address where the alien will be drawn, a4 points to the mask of the alien, and a5 is the alien itself. This code only draws the alien on 16 pixel boundaries, so there is nothing particularly hard to do. If the alien is to be drawn 16 pixels in, 16/16=1, multiply by 8 = 8 byte indent to find the destination:
movea.l #$4600,a2 ;Position of alien from base of screen adda.l (_WorkScreen).L,a2 ;Address of current screen subi.w #48,d0 ;Subtract hot-spot of alien asr.w #4,d0 ;Divide X co-ordinate by 16 asl.w #3,d0 ;Multiply by 8 to get destination andi.l #$FFFF,d0 ;Ensure sensible numbers adda.l d0,a2 ;Add to destination
And onto the drawing loop itself:
.DrawLoop move.w (a4)+,d0 ;Read the 1st word from the mask and.w d0,(a2) ;Mask out pixels from bitplane 1 and.w d0,(2,a2) ;Mask out pixels from bitplane 2 and.w d0,(4,a2) ;Mask out pixels from bitplane 3 and.w d0,(6,a2) ;Mask out pixels from bitplane 1 move.w (a4)+,d0 ;Read the 2nd word from the mask and.w d0,(8,a2) and.w d0,(10,a2) and.w d0,(12,a2) and.w d0,(14,a2) move.w (a4)+,d0 ;Read the 3rd word from the mask and.w d0,($10,a2) and.w d0,($12,a2) and.w d0,($14,a2) and.w d0,($16,a2) move.w (a4)+,d0 ;Read the 4th word from the mask and.w d0,($18,a2) and.w d0,($1A,a2) and.w d0,($1C,a2) and.w d0,($1E,a2) move.w (a5)+,d0 ;Read 1st word of bitplane 1 graphics from alien or.w d0,(a2) ;Combine pixels into bitplane 1 move.w (a5)+,d0 ;Read 1st word of bitplane 2 graphics from alien or.w d0,(2,a2) ;Combine pixels into bitplane 2 move.w (a5)+,d0 ;Read 2nd word of bitplane 1 graphics from alien or.w d0,(8,a2) ;Combine pixels into bitplane 1 move.w (a5)+,d0 ;Read 2nd word of bitplane 2 graphics from alien or.w d0,(10,a2) ;Combine pixels into bitplane 2 move.w (a5)+,d0 ;Read 3rd word of bitplane 1 graphics from alien or.w d0,($10,a2) ;Combine pixels into bitplane 1 move.w (a5)+,d0 ;Read 3rd word of bitplane 2 graphics from alien or.w d0,($12,a2) ;Combine pixels into bitplane 2 move.w (a5)+,d0 ;Read 4th word of bitplane 1 graphics from alien or.w d0,($18,a2) ;Combine pixels into bitplane 1 move.w (a5)+,d0 ;Read 4th word of bitplane 2 graphics from alien or.w d0,($1A,a2) ;Combine pixels into bitplane 2 lea ($A0,a2),a2 ;Move destination to next line dbra d7,.DrawLoop ;And repeat for the next row of pixels rts
That entire block of code has to be repeated for every single row of pixels in the alien! The alien is 64 pixels high, so that loop runs 64 times!
The programmer made a minimal optimisation by unrolling the loop again and running it only half the number of times, to save 32 iterations of the dbra instruction at the expense of memory. The original game also has a large case statement above this code so that when the alien is partially off the edge of the screen, there are versions of the code that skips drawing part of the alien. As you can imagine, the code becomes very long and messy!
Drawing a 64x64 pixel alien on the Amiga
The same ST setup code can be used on the Amiga, with one instant speedup. There is no need to perform the second shift (which is an expensive operation on the 68000) because the bitplanes are stored with an entire row of pixels in each bitplane. If you were drawing the alien 16 pixels in, 16/16 = 1, add 1 to itself to give 2 and therefore we add 2 bytes to find the destination. (You could also divide by 8 and clear the lowest bit instead):
movea.l #$4600,a2 ;Position of alien from base of screen adda.l (_WorkScreen).L,a2 ;Address of current screen subi.w #48,d0 ;Subtract hot-spot of alien asr.w #4,d0 ;Divide X co-ordinate by 16 add.w d0,d0 ;Double to get destination andi.l #$FFFF,d0 ;Ensure sensible numbers adda.l d0,a2 ;Add to destination
And onto the drawing loop itself, this time using the blitter. We can just as easily blit a 16 colour version of the alien, with smaller code, and it will operate a hell of a lot quicker than the ST version:
move.l a6,-(sp) ;Save a6 since we are trashing it lea $dff000,a6 bsr .BlitWait move.l #$0fac0000,bltcon0(a6) ;D = !AB+AC with no preshift move.l #$ffffffff,bltafwm(a6) ;Mask move.w #0,bltamod(a6) ;Mask modulo move.w #(64>>3)*1,bltbmod(a6) ;Source modulo move.w #((320*4)-64)>>3,bltcmod(a6) move.w #((320*4)-64)>>3,bltdmod(a6) moveq #4-1,d7 ;Draw 4 bitplanes in a loop! :) .DrawLoop move.l a4,bltapt(a6) ;A = Mask source move.l a5,bltbpt(a6) ;B = Gfx source move.l a2,bltcpt(a6) ;C = Screen move.l a2,bltdpt(a6) ;D = Destination move.w #(64<<6)+(64>>4),bltsize(a6) ;Start 64x64 blit lea (64>>3,a5),a5 lea (40,a2),a2 bsr .BlitWait dbf d7,.DrawLoop move.l (sp)+,a6 ;Restore a6 rts
If the alien is partially off the screen, the blitter can handle that easily by skipping parts of the image using the modulo registers. It can also indent the alien any number of pixels if it had been required.
The poor old blitter-less Atari ST doesn't stand a chance of competing with the Amiga version!
Drawing a 16x16 pixel alien anywhere on the Atari ST
A second example really highlights what a nightmare the Atari ST was to work with. In this example, the game is drawing a small 16x16 alien anywhere onto the screen. The alien graphics are in register a0, the mask is in a1, the destination is a2, the buffer is a3, the indent is d6, and the alien height is in d7.
At the start of each line, it saves the existing graphics behind the alien into the buffer a3. The values in the comments are just made up examples to trace what is happening to the numbers as the routine runs:
.DrawAlienST move.l (a2),(a3) ;Store graphics behind alien move.l (4,a2),(4,a3) ;into buffer a3 move.l (8,a2),(8,a3) move.l (12,a2),(12,a3) move.l #$FFFFFFFF,d4 ;d4 = $ffffffff move.w (a1),d4 ;d4 = $ffffc005 swap d4 ;d4 = $c005ffff ror.l d6,d4 ;d4 = $30017fff (rotate twice) clr.l d0 clr.l d1 clr.l d2 clr.l d3 move.w (a0),d0 ;d0 = $0000aaaa move.w (2,a0),d1 ;d1 = $0000bbbb move.w (4,a0),d2 ;d2 = $0000cccc move.w (6,a0),d3 ;d3 = $0000dddd swap d0 ;d0 = $aaaa0000 swap d1 ;d1 = $bbbb0000 swap d2 ;d2 = $cccc0000 swap d3 ;d3 = $dddd0000 ror.l d6,d0 ;d0 = $2aaa8000 (rotate twice) ror.l d6,d1 ;d1 = $2eeec000 (rotate twice) ror.l d6,d2 ;d2 = $33330000 (rotate twice) ror.l d6,d3 ;d3 = $37774000 (rotate twice) and.w d4,(8,a2) ;Mask 2nd words of destination with $7fff and.w d4,(10,a2) and.w d4,(12,a2) and.w d4,(14,a2) swap d4 ;Get high word of mask ($3001) and.w d4,(a2) ;Mask first words of destination with $3001 and.w d4,(2,a2) and.w d4,(4,a2) and.w d4,(6,a2) or.w d0,(8,a2) ;Merge 2nd words of alien graphics bitplane 1 or.w d1,(10,a2) or.w d2,(12,a2) or.w d3,(14,a2) swap d0 ;Get high words of alien graphics swap d1 swap d2 swap d3 or.w d0,(a2) ;Merge 1st words of alien graphics bitplane 1 or.w d1,(2,a2) or.w d2,(4,a2) or.w d3,(6,a2) lea (160,a3),a3 ;Increase save buffer lea (160,a2),a2 ;Next row of destination for alien lea (160,a1),a1 ;Next row of mask lea (160,a0),a0 ;Next row of source alien dbra d7,.DrawAlienST
You can see how inefficient this is, with over 50 instructions (including 5 slow rotate instructions) just to draw a single line of a 16 pixel wide alien onto the screen!
The Amiga version can be rewritten far more efficiently even without the blitter for such a tiny object:
; eg. Gfx a0 = $0558, Mask a1 = $f003, a2 = Destination, a3 = Save buffer, Shift d6 = 8 pixels .DrawAlienAmiga moveq #-1,d4 move.w (a1),d4 ;d4 = $fffff003 swap d4 ;d4 = $f003ffff ror.l d6,d4 ;d4 = $fff003ff moveq #4-1,d3 .MaskInnerLoop move.l (a2),(a3) ;Save background (32 pixels) moveq #0,d0 ;d0 = $00000000 move.w (a0),d0 ;d0 = $00000558 swap d0 ;d0 = $05580000 ror.l d6,d0 ;d0 = $00055800 and.l d4,(a2) ;Mask out new background or.l d0,(a2) ;Insert new graphics lea (40,a3),a3 lea (40,a2),a2 lea (40,a0),a0 dbra d3,.MaskInnerLoop lea (160,a1),a1 dbra d7,.DrawAlienAmiga
The inner loop of the Amiga loop could be unrolled for even more speed. And if the blitter was used, this would be even faster!
The combination of the awful screen layout of the ST and the lack of a blitter means the ST is going to have to perform a huge number of extra operations with the CPU than the Amiga has to. At a certain point, the CPU won't be able to maintain the frame rate, and the game will "drop a frame".
A 50fps game will instantly become half speed and run at 25fps, or even worse! Most coders do not like frame rates that vary wildly, so will lock the game to the most consistent frame rate. Some 3D games can handle the screen update speed varying, but platform games in particular become unplayable if you are trying to line up a jump and the game either speeds up or slows down at that critical stage.
And that is the main reason games are much slower on the Atari ST!
You may enjoy these articles...
If you look inside many Amiga games, secret messages have been hidden by the programmers. Richard Aplin was the king of hiding messages in the startup-sequence file, and his Line of Fire and Final Fight startup-sequences have become legendary! The Sensible Software team were also prolific at hiding messages in their games.
A collection of technical interviews with Amiga programmers that worked on commercial software in the glory days of the Amiga (late 1980s to early 1990s!)
The Ultimate Amiga Graphics, Level and Map Ripper!
A random assortment of rants relating to the Amiga!
An explanation of how many famous Amiga games utilised sprites in weird and interesting ways