What's screwed up about cpu speed comparison charts

Jamtex · Mar 8, 2009

One thing I noticed about the code is that you can only do 128 nybbles with a loop, the 68000 could do more then 128 nybbles (I sure it could do the entire RAM area of a computer without any changes) but the others would require more code.

Although it is a subroutine, it would be interesting to see how many cycles it would take if you write it as a stand alone program with code to set the pointers.

I enclose some Z80 examples, the unrolled code wode save about 13 cycles per iteration and would just require the DJNZ to be removed.

Be interesting to see if there are any coders can do a TMS9900, NS16032, Z8000, RCA 1802 and CP1610 versions of the code...

darkangel · Mar 8, 2009

I just did a second cpu comparison chart, only this time I'm only using 68000 vs. 65816. I'm not afraid of comparing them side by side anymore, thanks for Tom's and Modzilla's post.

This time I wrote an algorithm for testing what type of tile, a coordinated point on a large 2d playfield of tiles is overlapping. The point has 16-bit x and y coordinates and is in units of subpixels, and the playfield is 256 by 256 tiles large, and each tile is 256 by 256 subpixels big. That way the top 8-bits of each coordinate marks what tile it is on, and the low 8-bits mark where the point is within the tile.

This test, gave some really scary results, that would rip most people's knowledge of the two cpus to shreads.

Twimfy · Mar 9, 2009

darkangel said:

This test, gave some really scary results, that I presume would rip most people's knowledge of the two cpus to shreads.
Click to expand...

:thumbsup:

darkangel · Mar 9, 2009

The 65816 is more than 3x as cycle efficient as the 68000 when running through that algorithm! Isn't that scary?

Calpis · Mar 9, 2009

Considering your algorithm, it's hardly scary, for a real world comparison how about trying CRC32 (using LUT)?

darkangel · Mar 9, 2009

Well, sprite to tile collision uses points to check collision for holes, walls, floors, ramps ect. This would be very useful in platformer and run 'n' gun videogames. Unless somebody has something better?

Jamtex · Mar 9, 2009

Apart from not reading any posts, wasn't the whole point of your thread to compare the 68000 and 6809 rather then the 65816?

CPU tests are all well and good but as stated before certain tests highly strengths of certain CPUs and weaknesses in others for example using 32bit math which is a 68000 strength to doing byte calculation which the 6809 would do faster.

However unless you demonstrate a number of real life code tests any results are going to be a waste of time. As stated in the test you did, the 6809 and 6502 based processers could only handle a maximum of 128 nybbles using the code your provided (unless you are going to have >128 unrolled pieces of code...), where as the 68000 could do easily handle more then that.

The code also depends on the coder, comparing exact like for like is a waste of time as the 68000 and Z80 have lots of registers and the 6502, 6809 and 65816 are more memory based, so simple algorithms make one CPU look better then another. Same with just using the algorithm rather then showing a complete working program.

Brute force tests are pretty to look at but precalculating tables and the like does mean that some tests like multiplication can be done much much quicker.

Although you might have shown that the 6809, 65816 and 6502 are faster then a 68000 in one specific test, it does seem to ignore the fact in real life the CPU speeds of the 6809 and 6502 compared to even the Z80 were slower (for example taking the Commodore 64 (1Mhz 6502), Sinclair Spectrum (3.5Mhz Z80A) and Dragon 32 / Tandy CoCo (890Khz 6809)), the Z80 will still execute the code faster despite taking more cycles. Taking like for like, the Atari ST (68000) would outperform the Apple IIGS (65816) if mainly as it was running about 2.5 times the clock speed (8Mhz vs 2.8Mhz).

tomaitheous · Mar 9, 2009

Brute force tests are pretty to look at but precalculating tables and the like does mean that some tests like multiplication can be done much much quicker.
Click to expand...

Which the 6809, 65x, and '816 all accel at over the 68k for the most part.

Jamtex said:

Although you might have shown that the 6809, 65816 and 6502 are faster then a 68000 in one specific test, it does seem to ignore the fact in real life the CPU speeds of the 6809 and 6502 compared to even the Z80 were slower (for example taking the Commodore 64 (1Mhz 6502), Sinclair Spectrum (3.5Mhz Z80A) and Dragon 32 / Tandy CoCo (890Khz 6809)), the Z80 will still execute the code faster despite taking more cycles. Taking like for like, the Atari ST (68000) would outperform the Apple IIGS (65816) if mainly as it was running about 2.5 times the clock speed (8Mhz vs 2.8Mhz).
Click to expand...

Only because the z80 is running at over 3 times the clock speed and even at that, the performance is still compareable to the 65x/6809 at 1mhz. But I don't how any of that is relative to this thread. It's a discussion of overall cycles to cycles, not differentially clocked processors.

Also, I posted some real world scenario code. I have more as well. I'm writing my game engine in both 68k and 65x, with each code optimized for the strengths of the processors.

Also, Darkangel - you have errors in the 65x and '816 code examples.

darkangel · Mar 9, 2009

what errors do I have?

I beleive I used an unnecessary AND.W $FF00,d2 in that last 68000 code that I don't even need there.

and I am using real world examples. That byte to nibbles thing helps make compression and software tricks like scaling and rotation easier to do. The second was a type of tile to sprite collision.

BTW, Tom's game engine sound's interesting.

tomaitheous · Mar 9, 2009

darkangel said:

what errors do I have?
Click to expand...

For the 65x, there is no lda [zp],x. There's lda [zp,x] or [zp],y. It doesn't even have a straight [zp] like the 65816 until you get into the 'C02 and later processors.

Second, you don't want to ROL as it rotates the carry flag into the register for the 65x family. It's a 9bit rotate system. The reason for it, is that so you can do signed arithmetic shifting on an 8bit register ( cmp #$80, rol a, etc ) and for rotating larger than 8bit values (like 16, 24, 32bit and larger). So you'd want to use a shift register instead.

6502 version:

clx
.loop
lda [$00],y ;6
asl a ;2
asl a ;2
asl a ;2
asl a ;2
iny ;2
ora [$00],y ;6
iny ;2
sta [$02,x] ;6
inc $02 ;5
beq .msb ;2
.dec_cntr
dec <$04 ;5
bne .loop ;3

.msb
inc $02+1
bra .dec_cntr
Click to expand...

45 cycles.

The '816 and 65C02 versions would be 2 cycles faster because you can do this:

.loop
lda [$00],y ;6
tax ;2
lda shift_left,x ;4
Click to expand...

For the 65816 version, I think you meant lda $0000,x in syntax. I assume you're treating the instruction as lda [x]. Nice optimization Oh, the BNE is 3 cycles if branch taken.

darkangel · Mar 10, 2009

doesn't $00,x mean the address is x+#$00? and doesn't ($00),x mean the address is x+$00?

I was using x and y like they were a0-a7 for the 65816 because I thought they made good address registers, but because they're only 8-bit on the 6502, I used $00 and $02 as address registers to have a wider memory range, and used x and y to increment the addresses without actually incrementing $00 and $02.

tomaitheous · Mar 10, 2009

darkangel said:

doesn't $00,x mean the address is x+#$00? and doesn't ($00),x mean the address is x+$00?
Click to expand...

An assembler will most likely assemble opc $00,x as opc ZP,x. If DP bank register is pointing to $00 range, then I guess it's fine. [ZP],x doesn't exist as an address mode.

darkangel · Mar 10, 2009

doesn't $00,x mean the address is x+#$00? and doesn't ($00),y mean the address is y+$00?
Click to expand...

fixed

Okay, wouldn't this work for the 65816?

loop LDA $0000,x ;4
INX ;2
ASL a ;2
ASL a ;2
ASL a ;2
ASL a ;2
ORA $0000,x ;4
INX ;2
STA $0000,y ;4
INY ;2
DEC $00 ;5
BNE loop ;3

34 cycles

MottZilla · Mar 10, 2009

Why don't you assemble it and try rather than ask?

darkangel · Mar 10, 2009

That was a rhetorical question.

Piglet · Mar 12, 2009

68000 always used an even number of cycles (2,4,6,8 and so on) for each instruction. On the Amiga it was rounded up to 4 cycle multiples (4,8,12,16,20).
Its important on those old machines just what instructions you got. I grew up thinking that the 6510 (1 MHz) in the C64 was more powerful than the 8080 in the ZX Spectrum because 8080 seemed to use many, many cycles. Thing is, the numbers were CPU cycles that ran at 4x the bus speed so it might take 12 cycles, but really it was only 3 cycles at 3.58 MHz.
When I did a Master System game I became very impressed. I wrote a sound system with editor on PC and the game engine used by another coded called Martin Gibbons for his first game ever, Astrix and the Great Rescue. He and the artist (and the musician) did a superb job. I wrote a groovy sound driver so it sounded great as well. I think I am credited, so someone download and look for Sean.

There was a hidden level. When you lost all your lives, the game went to a screen with ground and a tree on it as well as 2 signs. One was <-End Game, the other Continue ->. I think you went left a couple of paces. jumped 3 times and walked off to the right. The hidden level was a roller coaster with loads and loads of coins on it. Very pretty.
Back to the plot. 8080 begat Z80 and Z80 at 3.58 was powefrful for 8-bit CPUs. It paired registers (so 16-bit maths), Global base registers (IX & IY) and many, many things that the thing did a lot in few instructions.
6502 did most instructions in 2 or 3 cycles, but it only had A,X & Y so 8-bit maths. The only good thing that they SHOULD have added to Z80 was a zero-page. Sega used a neat trick on the Megadrive. The 64K of work-RAM at $ff0000. The top 8 address-bits were set to 0 so, if you pointed one of your address registers to the middle (32K in) to the work-RAM then you could use short-word addressing for all work-RAM adressing.

tomaitheous · Mar 12, 2009

Piglet said:

Its important on those old machines just what instructions you got. I grew up thinking that the 6510 (1 MHz) in the C64 was more powerful than the 8080 in the ZX Spectrum because 8080 seemed to use many, many cycles. Thing is, the numbers were CPU cycles that ran at 4x the bus speed so it might take 12 cycles, but really it was only 3 cycles at 3.58 MHz.
Click to expand...

Yeah, but 3 cycles @ 1120ns or 12 cycles @ 280ns - still the samething. It's not 3 cycles @ 280ns (3 cycles of 3.58mhz). I thought the speccy used a z80. It has faster cycle times than the 8080 - it's not fixed at a constant 4 cycles, but uses T state cycles instead. Or did the speccy have wait states on the bus?

Jamtex · Mar 14, 2009

To show how pointless this is, my page alligned code Z80 code runs in 24 cycles on a eZ80. CISC cpu running as fast as the 'RISC' 6502... we'll ignore the fact it runs at many many times faster then the 68000 ever did.

darkangel · Mar 15, 2009

why does whenever I find old documents on the internet about speed comparisons of old cpus (such as articals that were copied from tech magazines a gazillion years ago) why do the people who are benchmarking the cpus always blatantly suck at all accumulator based cpus?

for instance this page: http://www.amigau.com/68k/dg/dg12.htm
and this: http://www.amigau.com/68k/dg/dg01.htm

he focuses on 32-bit addition, way to much, and he relies on the "load"-"then add"-"then store" method of adding.

Twimfy · Mar 15, 2009

darkangel said:

why does whenever I find old documents on the internet about speed comparisons of old cpus (such as articals that were copied from tech magazines a gazillion years ago) why do the people who are benchmarking the cpus always blatantly suck at all accumulator based cpus?

for instance this page: http://www.amigau.com/68k/dg/dg12.htm
and this: http://www.amigau.com/68k/dg/dg1.htm

he focuses on 32-bit addition, way to much, and he relies on the "load"-"then add"-"then store" method of adding.
Click to expand...

Because in HIS OPINION that might have been the best way to do it.

What's screwed up about cpu speed comparison charts

Jamtex Adult Orientated Mahjong Connoisseur

Attached Files:

z80.txt

darkangel Guest

Attached Files:

cpu comparison chart 2.txt

Twimfy Site Supporter 2015

darkangel Guest

Calpis Champion of the Forum

darkangel Guest

Jamtex Adult Orientated Mahjong Connoisseur

tomaitheous Spirited Member

darkangel Guest

tomaitheous Spirited Member

darkangel Guest

tomaitheous Spirited Member

darkangel Guest

MottZilla Champion of the Forum

darkangel Guest

Piglet Spirited Member

tomaitheous Spirited Member

Jamtex Adult Orientated Mahjong Connoisseur

darkangel Guest

Twimfy Site Supporter 2015

Share This Page

What's screwed up about cpu speed comparison charts

Jamtex Adult Orientated Mahjong Connoisseur

Attached Files:

z80.txt

darkangel Guest

Attached Files:

cpu comparison chart 2.txt

Twimfy Site Supporter 2015

darkangel Guest

Calpis Champion of the Forum

darkangel Guest

Jamtex Adult Orientated Mahjong Connoisseur

tomaitheous Spirited Member

darkangel Guest

tomaitheous Spirited Member

darkangel Guest

tomaitheous Spirited Member

darkangel Guest

MottZilla Champion of the Forum

darkangel Guest

Piglet Spirited Member

tomaitheous Spirited Member

Jamtex Adult Orientated Mahjong Connoisseur

darkangel Guest

Twimfy Site Supporter 2015

Share This Page

Useful Searches