What's screwed up about cpu speed comparison charts

Discussion in 'Game Development General Discussion' started by darkangel, Mar 4, 2009.

  1. Jamtex

    Jamtex Adult Orientated Mahjong Connoisseur

    Joined:
    Feb 21, 2007
    Messages:
    5,472
    Likes Received:
    16
    One thing I noticed about the code is that you can only do 128 nybbles with a loop, the 68000 could do more then 128 nybbles (I sure it could do the entire RAM area of a computer without any changes) but the others would require more code.

    Although it is a subroutine, it would be interesting to see how many cycles it would take if you write it as a stand alone program with code to set the pointers.

    I enclose some Z80 examples, the unrolled code wode save about 13 cycles per iteration and would just require the DJNZ to be removed.

    Be interesting to see if there are any coders can do a TMS9900, NS16032, Z8000, RCA 1802 and CP1610 versions of the code...
     

    Attached Files:

    • z80.txt
      File size:
      1.7 KB
      Views:
      187
    Last edited: Mar 8, 2009
  2. darkangel

    darkangel Guest

    I just did a second cpu comparison chart, only this time I'm only using 68000 vs. 65816. I'm not afraid of comparing them side by side anymore, thanks for Tom's and Modzilla's post.

    This time I wrote an algorithm for testing what type of tile, a coordinated point on a large 2d playfield of tiles is overlapping. The point has 16-bit x and y coordinates and is in units of subpixels, and the playfield is 256 by 256 tiles large, and each tile is 256 by 256 subpixels big. That way the top 8-bits of each coordinate marks what tile it is on, and the low 8-bits mark where the point is within the tile.

    This test, gave some really scary results, that would rip most people's knowledge of the two cpus to shreads.
     

    Attached Files:

    Last edited by a moderator: Mar 8, 2009
  3. Twimfy

    Twimfy Site Supporter 2015

    Joined:
    Apr 10, 2006
    Messages:
    3,570
    Likes Received:
    32
    :thumbsup:
     
  4. darkangel

    darkangel Guest

    The 65816 is more than 3x as cycle efficient as the 68000 when running through that algorithm! Isn't that scary?
     
  5. Calpis

    Calpis Champion of the Forum

    Joined:
    Mar 13, 2004
    Messages:
    5,906
    Likes Received:
    21
    Considering your algorithm, it's hardly scary, for a real world comparison how about trying CRC32 (using LUT)?
     
  6. darkangel

    darkangel Guest

    Well, sprite to tile collision uses points to check collision for holes, walls, floors, ramps ect. This would be very useful in platformer and run 'n' gun videogames. Unless somebody has something better?
     
  7. Jamtex

    Jamtex Adult Orientated Mahjong Connoisseur

    Joined:
    Feb 21, 2007
    Messages:
    5,472
    Likes Received:
    16
    Apart from not reading any posts, wasn't the whole point of your thread to compare the 68000 and 6809 rather then the 65816?

    CPU tests are all well and good but as stated before certain tests highly strengths of certain CPUs and weaknesses in others for example using 32bit math which is a 68000 strength to doing byte calculation which the 6809 would do faster.

    However unless you demonstrate a number of real life code tests any results are going to be a waste of time. As stated in the test you did, the 6809 and 6502 based processers could only handle a maximum of 128 nybbles using the code your provided (unless you are going to have >128 unrolled pieces of code...), where as the 68000 could do easily handle more then that.

    The code also depends on the coder, comparing exact like for like is a waste of time as the 68000 and Z80 have lots of registers and the 6502, 6809 and 65816 are more memory based, so simple algorithms make one CPU look better then another. Same with just using the algorithm rather then showing a complete working program.

    Brute force tests are pretty to look at but precalculating tables and the like does mean that some tests like multiplication can be done much much quicker.

    Although you might have shown that the 6809, 65816 and 6502 are faster then a 68000 in one specific test, it does seem to ignore the fact in real life the CPU speeds of the 6809 and 6502 compared to even the Z80 were slower (for example taking the Commodore 64 (1Mhz 6502), Sinclair Spectrum (3.5Mhz Z80A) and Dragon 32 / Tandy CoCo (890Khz 6809)), the Z80 will still execute the code faster despite taking more cycles. Taking like for like, the Atari ST (68000) would outperform the Apple IIGS (65816) if mainly as it was running about 2.5 times the clock speed (8Mhz vs 2.8Mhz).
     
  8. tomaitheous

    tomaitheous Spirited Member

    Joined:
    Jun 29, 2007
    Messages:
    100
    Likes Received:
    0
    Which the 6809, 65x, and '816 all accel at over the 68k for the most part.

    Only because the z80 is running at over 3 times the clock speed and even at that, the performance is still compareable to the 65x/6809 at 1mhz. But I don't how any of that is relative to this thread. It's a discussion of overall cycles to cycles, not differentially clocked processors.

    Also, I posted some real world scenario code. I have more as well. I'm writing my game engine in both 68k and 65x, with each code optimized for the strengths of the processors.


    Also, Darkangel - you have errors in the 65x and '816 code examples.
     
  9. darkangel

    darkangel Guest

    what errors do I have?

    I beleive I used an unnecessary AND.W $FF00,d2 in that last 68000 code that I don't even need there.

    and I am using real world examples. That byte to nibbles thing helps make compression and software tricks like scaling and rotation easier to do. The second was a type of tile to sprite collision.

    BTW, Tom's game engine sound's interesting.
     
    Last edited by a moderator: Mar 9, 2009
  10. tomaitheous

    tomaitheous Spirited Member

    Joined:
    Jun 29, 2007
    Messages:
    100
    Likes Received:
    0
    For the 65x, there is no lda [zp],x. There's lda [zp,x] or [zp],y. It doesn't even have a straight [zp] like the 65816 until you get into the 'C02 and later processors.

    Second, you don't want to ROL as it rotates the carry flag into the register for the 65x family. It's a 9bit rotate system. The reason for it, is that so you can do signed arithmetic shifting on an 8bit register ( cmp #$80, rol a, etc ) and for rotating larger than 8bit values (like 16, 24, 32bit and larger). So you'd want to use a shift register instead.


    6502 version:

    45 cycles.

    The '816 and 65C02 versions would be 2 cycles faster because you can do this:

    For the 65816 version, I think you meant lda $0000,x in syntax. I assume you're treating the instruction as lda [x]. Nice optimization :) Oh, the BNE is 3 cycles if branch taken.
     
  11. darkangel

    darkangel Guest

    doesn't $00,x mean the address is x+#$00? and doesn't ($00),x mean the address is x+$00?

    I was using x and y like they were a0-a7 for the 65816 because I thought they made good address registers, but because they're only 8-bit on the 6502, I used $00 and $02 as address registers to have a wider memory range, and used x and y to increment the addresses without actually incrementing $00 and $02.
     
  12. tomaitheous

    tomaitheous Spirited Member

    Joined:
    Jun 29, 2007
    Messages:
    100
    Likes Received:
    0
    An assembler will most likely assemble opc $00,x as opc ZP,x. If DP bank register is pointing to $00 range, then I guess it's fine. [ZP],x doesn't exist as an address mode.
     
  13. darkangel

    darkangel Guest

    fixed


    Okay, wouldn't this work for the 65816?

    loop LDA $0000,x ;4
    INX ;2
    ASL a ;2
    ASL a ;2
    ASL a ;2
    ASL a ;2
    ORA $0000,x ;4
    INX ;2
    STA $0000,y ;4
    INY ;2
    DEC $00 ;5
    BNE loop ;3


    34 cycles
     
    Last edited by a moderator: Mar 10, 2009
  14. MottZilla

    MottZilla Champion of the Forum

    Joined:
    Feb 1, 2006
    Messages:
    5,066
    Likes Received:
    102
    Why don't you assemble it and try rather than ask?
     
  15. darkangel

    darkangel Guest

    That was a rhetorical question.
     
  16. Piglet

    Piglet Spirited Member

    Joined:
    May 28, 2008
    Messages:
    175
    Likes Received:
    0
    68000 always used an even number of cycles (2,4,6,8 and so on) for each instruction. On the Amiga it was rounded up to 4 cycle multiples (4,8,12,16,20).
    Its important on those old machines just what instructions you got. I grew up thinking that the 6510 (1 MHz) in the C64 was more powerful than the 8080 in the ZX Spectrum because 8080 seemed to use many, many cycles. Thing is, the numbers were CPU cycles that ran at 4x the bus speed so it might take 12 cycles, but really it was only 3 cycles at 3.58 MHz.
    When I did a Master System game I became very impressed. I wrote a sound system with editor on PC and the game engine used by another coded called Martin Gibbons for his first game ever, Astrix and the Great Rescue. He and the artist (and the musician) did a superb job. I wrote a groovy sound driver so it sounded great as well. I think I am credited, so someone download and look for Sean.

    There was a hidden level. When you lost all your lives, the game went to a screen with ground and a tree on it as well as 2 signs. One was <-End Game, the other Continue ->. I think you went left a couple of paces. jumped 3 times and walked off to the right. The hidden level was a roller coaster with loads and loads of coins on it. Very pretty.
    Back to the plot. 8080 begat Z80 and Z80 at 3.58 was powefrful for 8-bit CPUs. It paired registers (so 16-bit maths), Global base registers (IX & IY) and many, many things that the thing did a lot in few instructions.
    6502 did most instructions in 2 or 3 cycles, but it only had A,X & Y so 8-bit maths. The only good thing that they SHOULD have added to Z80 was a zero-page. Sega used a neat trick on the Megadrive. The 64K of work-RAM at $ff0000. The top 8 address-bits were set to 0 so, if you pointed one of your address registers to the middle (32K in) to the work-RAM then you could use short-word addressing for all work-RAM adressing.
     
  17. tomaitheous

    tomaitheous Spirited Member

    Joined:
    Jun 29, 2007
    Messages:
    100
    Likes Received:
    0
    Yeah, but 3 cycles @ 1120ns or 12 cycles @ 280ns - still the samething. It's not 3 cycles @ 280ns (3 cycles of 3.58mhz). I thought the speccy used a z80. It has faster cycle times than the 8080 - it's not fixed at a constant 4 cycles, but uses T state cycles instead. Or did the speccy have wait states on the bus?
     
  18. Jamtex

    Jamtex Adult Orientated Mahjong Connoisseur

    Joined:
    Feb 21, 2007
    Messages:
    5,472
    Likes Received:
    16
    To show how pointless this is, my page alligned code Z80 code runs in 24 cycles on a eZ80. CISC cpu running as fast as the 'RISC' 6502... we'll ignore the fact it runs at many many times faster then the 68000 ever did.
     
  19. darkangel

    darkangel Guest

    why does whenever I find old documents on the internet about speed comparisons of old cpus (such as articals that were copied from tech magazines a gazillion years ago) why do the people who are benchmarking the cpus always blatantly suck at all accumulator based cpus?

    for instance this page: http://www.amigau.com/68k/dg/dg12.htm
    and this: http://www.amigau.com/68k/dg/dg01.htm

    he focuses on 32-bit addition, way to much, and he relies on the "load"-"then add"-"then store" method of adding.
     
    Last edited by a moderator: Mar 15, 2009
  20. Twimfy

    Twimfy Site Supporter 2015

    Joined:
    Apr 10, 2006
    Messages:
    3,570
    Likes Received:
    32
    Because in HIS OPINION that might have been the best way to do it.
     
sonicdude10
Draft saved Draft deleted
Insert every image as a...
  1.  0%

Share This Page