Just wondering which would be faster for 2D or 2.5D? My guess would be the quad would be faster as it would use the GTE for scaling, I'm not sure if theirs sprite hardware acceleration. Anyone know or can take a guess? Heres a function to draw a 3d sprite: void draw_3dsprite (GsSPRITE *sprite,long x,long y,long z,short precision) { VECTOR transformedPosition,position={x,y,z}; ApplyMatrixLV(&GsWSMATRIX,&position,&transformedPosition); transformedPosition.vz+=GsWSMATRIX.t[2]; if (transformedPosition.vz>0) { transformedPosition.vx+=GsWSMATRIX.t[0]; transformedPosition.vy+=GsWSMATRIX.t[1]; sprite.scalex=sprite.scaley=16777216/transformedPosition.vz; sprite.x=(transformedPosition.vx*250)/transformedPosition.vz; sprite.y=(transformedPosition.vy*250)/transformedPosition.vz; GsSortSprite(&sprite,&OTable_Header[CurrentBuffer],transformedPosition.vz>>(precision-2)); } } Substitute the "250" for the whatever you've set your projection distance to.
true I forgot about the texture distortion issue. There's auto subdivision support in the GS lib's (ie GsDOBJ2) to avoid this for close up models tho. I'm just wondering how they did it for Die Hard trilogy (game 2 the FPS on rails). I know the charaters are flat, I just can't tell if they are poly textured or Spirit skeletons. The characters heads are billboarded (always facing the camera), the rest, I'm not sure:
No$PSX allows the usage of a VRAM viewer as you play so you can open it up and see in a single frame (it draws a red line for each poly, tri or quad) and you can see from there I suppose.
PSX GPU sprites can't be rotated around any axis, or in fact scaled. If you tell libgs to scale a sprite, then it draws it as a quad anyway. If you want the fastest drawing then you should be using libgpu. I'm not sure your picture is particularly relevant, PSX cannot do perspective correct texturing as the GPU only works in 2d. The best you can do is subdivide large polygons down to lots of smaller ones to make the errors less noticeable. libgs can do this for you, which is another reason why it's slower than libgpu. Your sprite example does use GTE (ApplyMatrixLV), the GPU doesn't access the GTE when it's rendering.
In non-interlaced mode sprites are always faster, just because they can't be scaled or rotated in the slightest (straightforward blit to framebuffer). As pointed out above, the only way to perform scaling is by using either quads or tris, but they are limited in UV range (you can only effectively map UV as 1-254 values, 0-255 can create glitches when scaled, sprites can use full 0-256 and above for tiling). As for libgs being slow, it is indeed and most games use custom low-level implementations for that specific reason. Nobody should ever bother with that sluggish code, it has way too much customization it actually hurts performance in multiple ways.
Thanks for the info guys! @PixelButts thanks for the tip, that works well. @smf @Gemini Cool I didn't know, 2D scaled/rotated sprites get converted to 3D textures. GsSortFastSprite renders sprites without scaling/rotating also. Well, I have a NetYaroze, which is libgs, I have it setup, I know it and I can debug it etc, so not looking at psy-q, psxsdk etc. Thanks again.
My only wish was that the VRAM viewing was able to do dumps during emulation. Would save so much time since everything is basically done for palette, location, ID etc
IIRC GsSortFastSprite is pretty much a stripped version of GsSortSprite, only without checks on attributes used to tell if a sprite or a polygon are required. So yeah, it's a lot faster, but still bloated like hell and does weird stuff with sort order (AddPrim is LIFO, but libgs works as FIFO). It doesn't really matter what you're using, most transformation effects are usually handled with extremely low level commands (i.e. inlined assembly). Libgs keeps that away from the user, but you're still free to use whatever you need and on any toolchain.
True, I'm sure the no$psx dev gets asked that a lot. Painters algorithm is FIFO which makes sense as the OT is FIFO. But I haven't used more then a few OT levels, So I don't know 100% I thought PSXSDK was only 2D, I just noticed it has some GS support The Yaroze can only sort (add to OT) TMD's but that source adds primitives! Also the Net Yaroze libGS (by design) doesn't support 3D Lines nor double sided poly's. So I might try that also.
The OT is actually supposed to be LIFO, it's a linked list sending packets in reverse order. Technically you can even reverse the Z order, depending on how the OT is cleared. Most of those sort methods are the same even on PSY-Q, but the overall idea behind linking doesn't make much sense. It tends to allocate packets that need to be repopulated quite often, unless you're using specific functions that simply regenerate screen-space coordinates, avoiding 90% of static attributes such as clut, tpage, and UV coordinates. I don't remember if libgs does smart writes of attributes, but that is also another bottle neck with other people's code. RAM is slow as hell and you need that for storing primitives anyways, so better avoid writing as much as possible. If you check out some of the official documentation or no$psx's, you can definitively come up with custom solutions that work with no extra dependencies. That means you can manage, populate, and sort whatever primitive type, no matter what libgs forces you to use (which is crap anyway).
Nocash PSXSPX Playstation Specifications I'm sure it wouldn't be a big deal for those with MIPS assembly experience, but yeah it's over my head :/ It's a shame really.. there's 100's of PSX emulators yet only one opensource PSX SDK and it only does 2D.
libgpu and libgte give you a C interface to the hardware, you should have those already. They are quite low level though, because you need that for performance. You have to design your code around the limitations of the hardware to get decent results, if you start with an idea of how to write a modern PC game and fit that to a PS1 then it will be very slow. Roll boss rush (I think) appears to suffer from that a bit, made worse because he seems to do all his development on an emulator and they generally run code quicker than consoles (because it's hard to make them run at the same speed). libgs was made so you could hire a developer with no experience and put out a sub par game (AKA shovelware), with sony making some money on the deal when unsuspecting people buy it. With slow memory, no data cache and a small write fifo, you really need to think about every load and store to memory. This includes trying to keep as much code in the instruction cache as you can. If you look at a well written PS1 game then you may not even recognise it as a way to write a game.
I think the only emulator that can return decent results for tests is no$psx (Xebra is a slug, but at least you can force interlaced mode to work more or less like on hardware with GPU tweaks), the rest just ignores everything. In some cases it's good, see pSX running at full speed some games with heavy lag like Sokaigi or Chrono Cross. No wonder why Resident Evil Survivor sucked so badly. That one uses LibGS for general purposes and LibHMD for the actual 3D handling. The lag is so real you can almost touch it behind every corner. Speaking of slow memory, one thing always puzzled me: gte macros abusing the stack (or supposedly the scratchpad) to store gte flag and otz values. Sony's macros always require a memory location to write those values, which you have to reload soon after for nclip and depth tests. If you use mfc2 instruction you can retrieve the same values and keep them cached inside registers. Not sure if that gives a massive boost in performance, but from my tests it works exactly the same and can possibly push a few more polys in the long run. I suspect at some point my bottleneck won't even be the GTE+CPU anymore, but the GPU fillrate (mip mapping may produce better results with environments).
I don't know if there is any reason why mfc2 would not work, although you should try it on all the different PS1 revisions if you're going off piste. There are plenty of bugs in the SDK/BIOS/hardware if you don't follow the magic incantations that Sony give game developers, which is one of the reasons I wouldn't recommend a third party sdk (I would need to be satisfied it did all the BIOS patches the official SDK does so that GTE doesn't get upset by interrupts etc). The scratchpad is fast, so the difference between storing it there instead of using mfc2 is probably minute. mfc2 on the other hand will stall until the current gte opcode is complete. So you probably want to make sure the gte is idle, however that means you can't saturate it as much. mtc2 on the other hand is safe to use while gte is running, as long as you don't overwrite inputs that the current operation is using. For some operations you can setup the inputs before reading the result and starting the next opcode.
I should probably look into some of those troublesome games, like Tekken 2 and similar to use weirdass optimizations, too see if anything similar has been ever attempted. Mtc2 should theoretically work and GCC handle it correctly, no idea if the stall is anything you can actually notice, but at least it's not writing and reading into the scratchpad a lot more than it should. I guess I'll also try to run some stress tests on console with profiler turned on.