New coordinate system

After spending some time toying with OpenGL coordinate system (OCS), I was able to get the same one than the Saturn’s. That’s really interesting as up to now I was using the original OCS (ie with values between -1.0 and 1.0), which implied to do all sort of conversions to get one dot in the Saturn coordinate system (SCS) converted into the OpenGL one. For instance, the dot [50,50] in the SCS had to be converted into something like [0.215, 0.1483] in the OCS. Now it’s one for one, so no more conversions are needed.

I never dug into that aspect until now, as everything worked fine. But  with the cache system I want to setup for the VDP2, that could have become a major drawback.

So less calculations automatically lend to more speed, but in the other hand the video card has to support the same viewport size than the Saturn’s, which is 2048*2048. I’ve sent a version to beta testers to check what’s the maximum viewport supported by various graphic cards, and so far none has failed (even an old S3 from 2001 was able to do so …)
So I have reached the decision to drop support for graphic card not supporting 2048*2048 viewport for the current version. If there’s a demand I’ll try to do something for older cards, but you’ll have to be convincing 😉

Yesterday I finished modifying the VDP1 to take into account the new coordinate system and everything went smoothly. I even added the VDP2 planes 🙂 Now the bios is almost back to the way it should look, minus the scrolling (which should be taken care of quickly).

Next step is to add the cache detection, as it’s reloaded every frame right now. When it’ll be done, I’ll have a better view on the performances of the cache …

VDP2 texturing

Back from holidays !
I’m making some progress on the VDP2 : 512*512 textures are now correctly filled with data.
Now I have to add it to the rendering engine (with the VDP1).
After that, depending on how good the perfs are, I’ll extend the code to the whole VDP2 (currently just the code used in the bios is changed)
Stay tuned, there’s more to come

VDP1 updated

I didn’t thought it would be that hard to add this feature to my vdp1 rendering system …
Anyway, for those interested, the discussion regarding this matter started on this page.

Now for the good stuff : backgrounds aren’t plugged in as I haven’t finished my VDP2 cache yet, but the sprites are fully functionnal.

  • Previous rendering :

  • Current rendering :

bakubaku_ok1 bakubaku_ok2

Quite neat heh 🙂
Perspective correction won’t work in some particular cases (like non trapezoid quads), but it’s marginal. I’m trying to get info on how making it work in every possible quad configuration, but it’s getting really technical and mathematical, and I’m not that good at it :p

Now I’m moving back to my VDP2 cache 😉

And now for something a little different

I’ve always been worried about the way OpenGL renders the Saturn’s distorded polygons. As the Saturn doesn’t specify any Z coordinate (aka depth coordinate) when displaying a polygon, OpenGL has to approximate its value to apply a texture to it.
When the polygon is a regular quadrangle (ie a square, rectangle, etc. ), the texture coordinates and the polygon ones are identicals, so OpenGL texture mapping is correct. In the case of a distorded quadrangle, only half of the texture coordinates are identical to the polygon coordinates, and the texture seems to be mapped on the polygon as 2 different triangles. (OpenGL always splits quadrangles into 2 triangles as modern graphic cards only work with triangles)

Maybe a graphical example will be better to grasp the idea :

  • original texture / texture coordinates (will be used to map the texture to the polygon)

  /

  • texture coordinates + regular quadrangle coordinates (identical to the texture coordinates) = correct texture mapping on the quad

+ =

 

  • texture coordinates + distorded quadrangle coordinates(different from the texture coordinates) = incorrect texture mapping on the quad

So I did some research, I wasn’t really sure that this problem could be solved without using a software renderer, but I was wrong. Using the texture projective space allows to change the way OpenGL maps the texture coordinates to the quad coordinates, rendering neatly distorded quads (I won’t enter in the details :p )

Here is a sample. I won’t use the same example as above as I haven’t yet implemented it in the VDP1 renderer, but the following screenshots were done through a test renderer in the emu. The left one is rendered like it was done so far, and the right one using the above technique. Both use a 4*4 black and white checkerboard as texture.

qcoord_ok

Slowness, the sequel

Ok. After some more testing, I have to face it : my cache isn’t that good. When the cache is used at full capacity (ie nothing is read from the Saturn memory, everything is already in the vector and cells are just displayed to the framebuffer, I only have a 0.5 fps increase …

So I did some more thinking.
The cache is organized like that :

  • one map storing 8*8 pixels textures (one texture from the map can be used by one or more cells)
  • one vector storing cells (up to 4096 by page), each cell being linked to a texture in the map

Currently the cache detects when a cell has changed in the Saturn memory, reloading it if necessary. So when the framebuffer is filled with vector data, each cell is displayed.
Here’s the catch : this method isn’t using the graphic card memory to store texture data. So every cell displayed is loaded in memory, displayed then discarded. That costs a lot performance wise …

So I’ve decided to do it another way :

  • the map and vector contents will stay the same as before
  • display to the framebuffer won’t be done directly : instead 512*512 pixels textures will be defined, filled with cell data, and stored into the graphic card memory. In that case a whole page (4096 cells) will be cached at one time, and reused at will.

I need to be careful not to saturate the graphic card memory, but I expect a huge perf increase 🙂

And as a nice bonus, I can this way handle per dot and per cell priority, without much effort 😉

Now that’s the theory. I hope that I won’t be disapointed by the results …

Slowness …

My cache isn’t crashing anymore, but there’s another problem now : speed is way too slow 🙁
At the end of the Sega Saturn logo assembly in the bios, where the VPD2 is used for the first time, speed slows down to 2 fps … I’ve tried to profile the program to see where the bottleneck is, but it was of no use.
I suspect however something in the fact that I’m now filling the list with the whole VDP2 page (512*512 pixels) instead of just the display area as it was until now (320*224 in that case). That means a lot of extra calculations, I’ll do some testing tonight to see if I’m right, and what can be done if that’s the case …

Stay tuned !

VDP2 problem found

Great news ! I think I’ve found out where my problem is …
Actually I was using one vector to store texture data, and another one to store parts to be displayed. Each VDP2 background is splitted into smaller parts of 8*8 pixels, each of them having a texture linked to it.
But this link was done pointing to the texture data from outside the vector, instead of inside, meaning that a texture value which was correct at part creation wasn’t anymore when it was displayed, leading to a crash as pointers were invalid …

I’ve decided to use a map instead of a vector to store the list of textures, as the key to access data can be easily calculated (texture address + color mode). It needs a lot of modifications as the program wasn’t supposed to work that way, but I think that’s the right way, as now the texture will be referenced by its map key in the VDP2 part instead of a pointer to the texture …

On a side note, I remembered a mail from Fabien from early 2006 stating that he corrected a bug in the SCSP that was responsible for stopped / choppy video display … I applied his correction to Saturnin (it was about time ^^)
We’ll see later if it really changes something.

Cd block is done

The cdblock is now finished, I only had one bug left in my SPTI code which was quickly corrected after some testing.
I’m quite happy as it went smoothly, that was somewhat unexpected 🙂

Update I did some testing without the VDP2 activated, and the SPTI code works great ! Without backgrounds it’s not really interesting, but sound is working for games, meaning that SPTI is fully functionnal 🙂
Now back to the VDP2 cache. (for real :p)

Code cleaning

Not much time lately, but I’m still advancing :

  • all the access method code is now removed from the cdrom class, and added to the corresponding files. I took the opportunity to get rid of a bunch of unused code,
  • C code used to build the file system tree is now converted to a more maintainable C++ / STL code.

The only thing left to do is to create the ReadTOC function in SPTI. It won’t be a problem, and I expect to finish it tonight. When that’s done, I’ll get back to the VDP2 cache problem 🙂

Done with the dll

The “dll compliant” code is in place 🙂
What does it mean :

  • the wnaspi32.dll isn’t loaded at the start, it’s only loaded when needed (reading cdrom system id, displaying the cd drive list, etc.)
  • when you choose the access method to the cd drive (ASPI or SPTI), Saturnin asks to choose the correct drive within a list. When using ASPI the SCSI address is displayed (1:0:0 for instance), while the letter drive is displayed when using SPTI (E: for instance)
  • all the cd access code is now splitted into separate files, which means that a very few work is needed to switch to a full dll application. If I got a little more spare time, I would do a SPTI dll for Satourne 😉

Now that the harder part is done, let’s get to the longer one :

  • creating the missing SPTI functions, not much are missing (read TOC, and a few others)
  • converting the ASPI functions still in the cdrom class (same as above : read TOC and a few more)
  • converting some of the cdrom functions to full C++ and STL, as they were coded by Fabien in C originally and aren’t compatible anymore with my code …

That is starting to look pretty good !
All this will need extensive testing when the cache problem will be solved 😀