Codetapper's Amiga Site

Why Atari ST games were slower than the Amiga versions

The dust settled on the Amiga vs Atari ST war a long time ago, with the Amiga clearly dominating the poor old ST. The Amiga often had 32 colour games, with smooth scrolling and stereo sound. The ST had 16 colour games and did not have the horsepower to match the Amiga. ST coders usually had 4 options:

  1. Reduce the size of the screen (often with ugly borders)
  2. Reduce the frame rate (resulting in jerky games)
  3. Reduce the number of objects on the screen (empty looking games)
  4. Use push scrolling near the screen edges rather than smooth pixel scrolling

Most people that owned an Amiga or Atari ST were fully aware that Amiga had the blitter chip that allowed it to mask, rotate and copy images extremely quickly. The blitter operated independently of the CPU, meaning the CPU could carry on with other tasks while the blitter continued working. But why were ST games so slow and jerky in comparison, even with a faster CPU? What was it having to do?

Screen Layout

The Atari ST screen layout was the first nail in the coffin, and really killed performance. In 16 colour mode, 4 bitplanes are required. But the ST interleaves every word of each line, so the first word of bitplane 1 is followed by the first word of bitplane 2, then the first word of bitplane 3, and finally the first word of bitplane 4. This loop repeats for each successive word of data for the entire width of the screen, then repeats from top to bottom. The memory layout is continous, and does not have any spare areas on the side to play with.

As an example, a part of the title screen of the Atari ST version of Road Runner is shown in 16 colours, along with what memory looks like if you view it in one chunk:

Atari ST screen layoutCompare the Atari ST layout with the Amiga's interleaved bitmap (ILBM) format which is much more logical. Also the Amiga can optionally setup a buffer in memory that is larger than the screen, so that there are spare pixels on the sides that you can draw into without them being displayed. This is extremely useful for objects that are partially off the screen.

As an example, here is what Turrican 2 would look like if the screen was only 128 pixels wide while viewing the Amiga memory: 

Amiga ILBM screen layout

But to see why the ST suffered so badly in comparison, let's have a look at some example code from some commercial Atari ST games.

Drawing a 64x64 pixel alien on the Atari ST

This particular example is from a game that draws a 64x64 pixel alien. Undoubtedly for speed reasons, the ST version only draws it in 2 planes, allowing 3 colours (plus transparent). The game still has to clear the remaining 2 bitplanes, otherwise some other object that happened to overlap the alien would corrupt the graphics.

In this example, d0 is the X co-ordinate of the alien, a2 points to the address where the alien will be drawn, a4 points to the mask of the alien, and a5 is the alien itself. This code only draws the alien on 16 pixel boundaries, so there is nothing particularly hard to do. If the alien is to be drawn 16 pixels in, 16/16=1, multiply by 8 = 8 byte indent to find the destination:

                movea.l #$4600,a2                    ;Position of alien from base of screen
                adda.l  (_WorkScreen).L,a2           ;Address of current screen
                subi.w  #48,d0                       ;Subtract hot-spot of alien
                asr.w   #4,d0                        ;Divide X co-ordinate by 16
                asl.w   #3,d0                        ;Multiply by 8 to get destination
                andi.l  #$FFFF,d0                    ;Ensure sensible numbers
                adda.l  d0,a2                        ;Add to destination

And onto the drawing loop itself:

.DrawLoop       move.w  (a4)+,d0                     ;Read the 1st word from the mask
                and.w   d0,(a2)                      ;Mask out pixels from bitplane 1
                and.w   d0,(2,a2)                    ;Mask out pixels from bitplane 2
                and.w   d0,(4,a2)                    ;Mask out pixels from bitplane 3
                and.w   d0,(6,a2)                    ;Mask out pixels from bitplane 1
                move.w  (a4)+,d0                     ;Read the 2nd word from the mask
                and.w   d0,(8,a2)
                and.w   d0,(10,a2)
                and.w   d0,(12,a2)
                and.w   d0,(14,a2)
                move.w  (a4)+,d0                     ;Read the 3rd word from the mask
                and.w   d0,($10,a2)
                and.w   d0,($12,a2)
                and.w   d0,($14,a2)
                and.w   d0,($16,a2)
                move.w  (a4)+,d0                     ;Read the 4th word from the mask
                and.w   d0,($18,a2)
                and.w   d0,($1A,a2)
                and.w   d0,($1C,a2)
                and.w   d0,($1E,a2)
                move.w  (a5)+,d0                     ;Read 1st word of bitplane 1 graphics from alien
                or.w    d0,(a2)                      ;Combine pixels into bitplane 1
                move.w  (a5)+,d0                     ;Read 1st word of bitplane 2 graphics from alien
                or.w    d0,(2,a2)                    ;Combine pixels into bitplane 2
                move.w  (a5)+,d0                     ;Read 2nd word of bitplane 1 graphics from alien
                or.w    d0,(8,a2)                    ;Combine pixels into bitplane 1
                move.w  (a5)+,d0                     ;Read 2nd word of bitplane 2 graphics from alien
                or.w    d0,(10,a2)                   ;Combine pixels into bitplane 2
                move.w  (a5)+,d0                     ;Read 3rd word of bitplane 1 graphics from alien
                or.w    d0,($10,a2)                  ;Combine pixels into bitplane 1
                move.w  (a5)+,d0                     ;Read 3rd word of bitplane 2 graphics from alien
                or.w    d0,($12,a2)                  ;Combine pixels into bitplane 2
                move.w  (a5)+,d0                     ;Read 4th word of bitplane 1 graphics from alien
                or.w    d0,($18,a2)                  ;Combine pixels into bitplane 1
                move.w  (a5)+,d0                     ;Read 4th word of bitplane 2 graphics from alien
                or.w    d0,($1A,a2)                  ;Combine pixels into bitplane 2
                lea     ($A0,a2),a2                  ;Move destination to next line
                dbra    d7,.DrawLoop                 ;And repeat for the next row of pixels
                rts

That entire block of code has to be repeated for every single row of pixels in the alien! The alien is 64 pixels high, so that loop runs 64 times!

The programmer made a minimal optimisation by unrolling the loop again and running it only half the number of times, to save 32 iterations of the dbra instruction at the expense of memory. The original game also has a large case statement above this code so that when the alien is partially off the edge of the screen, there are versions of the code that skips drawing part of the alien. As you can imagine, the code becomes very long and messy!

Drawing a 64x64 pixel alien on the Amiga

The same ST setup code can be used on the Amiga, with one instant speedup. There is no need to perform the second shift (which is an expensive operation on the 68000) because the bitplanes are stored with an entire row of pixels in each bitplane. If you were drawing the alien 16 pixels in, 16/16 = 1, add 1 to itself to give 2 and therefore we add 2 bytes to find the destination. (You could also divide by 8 and clear the lowest bit instead):

                movea.l #$4600,a2                    ;Position of alien from base of screen
                adda.l  (_WorkScreen).L,a2           ;Address of current screen
                subi.w  #48,d0                       ;Subtract hot-spot of alien
                asr.w   #4,d0                        ;Divide X co-ordinate by 16
                add.w   d0,d0                        ;Double to get destination
                andi.l  #$FFFF,d0                    ;Ensure sensible numbers
                adda.l  d0,a2                        ;Add to destination

And onto the drawing loop itself, this time using the blitter. We can just as easily blit a 16 colour version of the alien, with smaller code, and it will operate a hell of a lot quicker than the ST version:

                move.l  a6,-(sp)                     ;Save a6 since we are trashing it

                lea     $dff000,a6
                bsr     .BlitWait
                move.l  #$0fac0000,bltcon0(a6)       ;D = !AB+AC with no preshift
                move.l  #$ffffffff,bltafwm(a6)       ;Mask
                move.w  #0,bltamod(a6)               ;Mask modulo
                move.w  #(64>>3)*1,bltbmod(a6)       ;Source modulo
                move.w  #((320*4)-64)>>3,bltcmod(a6)
                move.w  #((320*4)-64)>>3,bltdmod(a6)

                moveq   #4-1,d7                      ;Draw 4 bitplanes in a loop! :)
.DrawLoop       move.l  a4,bltapt(a6)                ;A = Mask source
                move.l  a5,bltbpt(a6)                ;B = Gfx source
                move.l  a2,bltcpt(a6)                ;C = Screen
                move.l  a2,bltdpt(a6)                ;D = Destination
                move.w  #(64<<6)+(64>>4),bltsize(a6) ;Start 64x64 blit
                lea     (64>>3,a5),a5
                lea     (40,a2),a2
                bsr     .BlitWait
                dbf     d7,.DrawLoop

                move.l  (sp)+,a6                     ;Restore a6
                rts

If the alien is partially off the screen, the blitter can handle that easily by skipping parts of the image using the modulo registers. It can also indent the alien any number of pixels if it had been required.

The poor old blitter-less Atari ST doesn't stand a chance of competing with the Amiga version!

Drawing a 16x16 pixel alien anywhere on the Atari ST

A second example really highlights what a nightmare the Atari ST was to work with. In this example, the game is drawing a small 16x16 alien anywhere onto the screen. The alien graphics are in register a0, the mask is in a1, the destination is a2, the buffer is a3, the indent is d6, and the alien height is in d7.

At the start of each line, it saves the existing graphics behind the alien into the buffer a3. The values in the comments are just made up examples to trace what is happening to the numbers as the routine runs:

.DrawAlienST    move.l  (a2),(a3)                    ;Store graphics behind alien
                move.l  (4,a2),(4,a3)                ;into buffer a3
                move.l  (8,a2),(8,a3)
                move.l  (12,a2),(12,a3)
                move.l  #$FFFFFFFF,d4                ;d4 = $ffffffff
                move.w  (a1),d4                      ;d4 = $ffffc005
                swap    d4                           ;d4 = $c005ffff
                ror.l   d6,d4                        ;d4 = $30017fff (rotate twice)
                clr.l   d0
                clr.l   d1
                clr.l   d2
                clr.l   d3
                move.w  (a0),d0                      ;d0 = $0000aaaa
                move.w  (2,a0),d1                    ;d1 = $0000bbbb
                move.w  (4,a0),d2                    ;d2 = $0000cccc
                move.w  (6,a0),d3                    ;d3 = $0000dddd
                swap    d0                           ;d0 = $aaaa0000
                swap    d1                           ;d1 = $bbbb0000
                swap    d2                           ;d2 = $cccc0000
                swap    d3                           ;d3 = $dddd0000
                ror.l   d6,d0                        ;d0 = $2aaa8000 (rotate twice)
                ror.l   d6,d1                        ;d1 = $2eeec000 (rotate twice)
                ror.l   d6,d2                        ;d2 = $33330000 (rotate twice)
                ror.l   d6,d3                        ;d3 = $37774000 (rotate twice)
                and.w   d4,(8,a2)                    ;Mask 2nd words of destination with $7fff
                and.w   d4,(10,a2)
                and.w   d4,(12,a2)
                and.w   d4,(14,a2)
                swap    d4                           ;Get high word of mask ($3001)
                and.w   d4,(a2)                      ;Mask first words of destination with $3001
                and.w   d4,(2,a2)
                and.w   d4,(4,a2)
                and.w   d4,(6,a2)
                or.w    d0,(8,a2)                    ;Merge 2nd words of alien graphics bitplane 1
                or.w    d1,(10,a2)
                or.w    d2,(12,a2)
                or.w    d3,(14,a2)
                swap    d0                           ;Get high words of alien graphics
                swap    d1
                swap    d2
                swap    d3
                or.w    d0,(a2)                      ;Merge 1st words of alien graphics bitplane 1
                or.w    d1,(2,a2)
                or.w    d2,(4,a2)
                or.w    d3,(6,a2)
                lea     (160,a3),a3                  ;Increase save buffer
                lea     (160,a2),a2                  ;Next row of destination for alien
                lea     (160,a1),a1                  ;Next row of mask
                lea     (160,a0),a0                  ;Next row of source alien
                dbra    d7,.DrawAlienST

You can see how inefficient this is, with over 50 instructions (including 5 slow rotate instructions) just to draw a single line of a 16 pixel wide alien onto the screen!

The Amiga version can be rewritten far more efficiently even without the blitter for such a tiny object:

; eg. Gfx a0 = $0558, Mask a1 = $f003, a2 = Destination, a3 = Save buffer, Shift d6 = 8 pixels

.DrawAlienAmiga moveq   #-1,d4
                move.w  (a1),d4                       ;d4 = $fffff003
                swap    d4                            ;d4 = $f003ffff
                ror.l   d6,d4                         ;d4 = $fff003ff

                moveq   #4-1,d3
.MaskInnerLoop  move.l  (a2),(a3)                     ;Save background (32 pixels)

                moveq   #0,d0                         ;d0 = $00000000
                move.w  (a0),d0                       ;d0 = $00000558
                swap    d0                            ;d0 = $05580000
                ror.l   d6,d0                         ;d0 = $00055800

                and.l	d4,(a2)                       ;Mask out new background
                or.l    d0,(a2)                       ;Insert new graphics

                lea     (40,a3),a3
                lea     (40,a2),a2
                lea     (40,a0),a0
                dbra    d3,.MaskInnerLoop

                lea     (160,a1),a1
                dbra    d7,.DrawAlienAmiga

The inner loop of the Amiga loop could be unrolled for even more speed. And if the blitter was used, this would be even faster!

Conclusion

The combination of the awful screen layout of the ST and the lack of a blitter means the ST is going to have to perform a huge number of extra operations with the CPU than the Amiga has to. At a certain point, the CPU won't be able to maintain the frame rate, and the game will "drop a frame".

A 50fps game will instantly become half speed and run at 25fps, or even worse! Most coders do not like frame rates that vary wildly, so will lock the game to the most consistent frame rate. Some 3D games can handle the screen update speed varying, but platform games in particular become unplayable if you are trying to line up a jump and the game either speeds up or slows down at that critical stage.

And that is the main reason games are much slower on the Atari ST!

Post your comment

Comments

  • Gravatar for Leroy

    But in 3D The ST rips the amiga ???????????????????? , epic, starglider, etc

    Leroy 12/11/2022 10:38pm (17 months ago)

  • Gravatar for The Paranoid

    Nice comparison. The code is not necessarily optimized for all use cases, but for showing how the major differences between the Atari ST and the Amiga affected graphics programming in games, it certainly does its job and i enjoyed reading it.
    I do have to agree to Cyprian though: There are some use cases where the more compact bitplane layout of the Atari ST comes in handy. And in case you're interested, have a closer look at "chunky 2 planar" algorithms on both the Amiga and the ST where the ST's compact bitplane layout turns out to be a major advantage - Even though this didn't affect the 16-Bit era your article primarily refers to.

    Many greetings,

    The Paranoid / Paradox

    The Paranoid 28/09/2020 2:22pm (3 years ago)

  • Gravatar for Cyprian

    some comments:

    1) code posted by you looks very very inefficient. E.g.:
    .DrawAlienST move.l (a2),(a3) ;Store graphics behind alien
    move.l (4,a2),(4,a3) ;into buffer a3
    move.l (8,a2),(8,a3)
    move.l (12,a2),(12,a3)

    on ST we use much faster code:
    movem.l (a2),D0-D4
    movem.l D0-D4,(a3)


    2) on ST sprites and masks are usually preshifted, therefore there is no need to use any rol/lsr instructions:
    ror.l d6,d0 ;d0 = $2aaa8000 (rotate twice)
    ror.l d6,d1 ;d1 = $2eeec000 (rotate twice)
    ror.l d6,d2 ;d2 = $33330000 (rotate twice)
    ror.l d6,d3 ;d3 = $37774000 (rotate twice)


    3) ST bitplane organisation is faster than amiga's one.
    Below the fill code on ST and Amiga. Set pixel is also faster.

    ST:
    movem.l d0-d1,(a0)+

    Amiga:
    move.l d0,(a0)
    swap d0
    move.l d0,(a1)
    move.l d1,(a2)
    swap d1
    move.l d1,(a3)


    4) and finally, since 1987 ST has blitter chip (Mega ST) or free socket for it (520/1040 ST)


    Cyprian 08/11/2019 7:11pm (4 years ago)

  • Gravatar for Petari

    What a shallow and outdated (even for year 1987) writing.
    And not that don't know enough about Atari ST, not knowing about Amiga too. There is much more than blitter in it. There are many games with excellent scroll on bare Atari ST, without blitter - what ? Oh yeah, there is blitter chip in Atari STs, at 1987. Myself made some blitter game codes, and oh man, it scrolls like a dream.
    No blitter, good scroll, needs only 512 KB RAM = Potsworth & Co, Terry's Big Adventure .

    Petari 29/06/2019 5:19pm (5 years ago)

RSS feed for comments on this page | RSS feed for all comments

You may enjoy these articles...

Why Atari ST games were slower than the Amiga versions

Comedy

If you look inside many Amiga games, secret messages have been hidden by the programmers. Richard Aplin was the king of hiding messages in the startup-sequence file, and his Line of Fire and Final Fight startup-sequences have become legendary! The Sensible Software team were also prolific at hiding messages in their games.

Why Atari ST games were slower than the Amiga versions

Interviews

A collection of technical interviews with Amiga programmers that worked on commercial software in the glory days of the Amiga (late 1980s to early 1990s!)

Why Atari ST games were slower than the Amiga versions

Maptapper

The Ultimate Amiga Graphics, Level and Map Ripper!

Why Atari ST games were slower than the Amiga versions

Random Rants

A random assortment of rants relating to the Amiga!

Why Atari ST games were slower than the Amiga versions

Sprite Tricks

An explanation of how many famous Amiga games utilised sprites in weird and interesting ways