I figured out what made most of the casted shadows look wrong. It's again the problem of 1/Z -> Z conversion, and really limited precision of 32/24 bit floating point numbers. But that doesn't mean it's resolved now I'm afraid.
Quick summary: PVR2 expects 3D data to be registered as [X; Y; 1/Z], or [X; Y; 1/W]. X and Y are already in screen space, so those actually hold pixel position value. 1/Z is used for two things: Z-buffering and perspective-correct texture mapping.
What is W exactly anyway? Well, all 3D transformations can be undone one way or the other. That's beacuse the end result is always a homogeneous space. To simplify things consider it always being a subset of space that can be fitted into a cube. For example, to keep visibility clipping fast the hardware assumes it needs to fit into [-1; 1] data range. So your task is to write the matrices in such a way that in the end everything you want on the screen will fit into [-1; -1; -1] / [1; 1; 1] cube. That's the price for having the hardware do it for you so the CPU can spend more time on AI or whatever.
But there's a catch, as always. Namely perspective transformation is one-way and once applied it cannot be reversed to produce exactly the same input data. That's because the last step is a division that looks like this: K / (K + N). Having only a number as the end result you can't tell what original K & N were. However, since the division is the very last thing to do, there is a workaround
Keep 3D data as 4D vectors, that is [X; Y; Z] becomes [X; Y; Z; W] where W is always 1. Always, except the perspective transformation, after which W becomes something else and the last division is actually [X/W; Y/W; Z/W]. Since we keep W we can always reverse it if needed. What's more, you need only to invert W once into 1/W - one easy case division and then 3 multiplies is faster than 3 full-blown divisions. Not by much perhaps, but still, every cycle counts if you want a high fill rate.
Since PVR2 has no hardware transform engine of any kind (all it does is rasterization) it will happily accept the bare minimum - that is X, Y (already after the division!) and 1/W. In most cases you can freely substitute 1/Z for 1/W as both will produce required Z diversity for Z-buffering and the correction factor for texturing. In other words, if you are emulating PVR2 you know not if what you get is 1/W or 1/Z, and you still need to recreate all four vector elements for the PC graphics card to work with.
And it's not only that... the 1/Z isn't actually 1/Z, it's just referenced to as such so you don't have to write 200 / (Z + 200) every time. And this is more likely what it is. Except of course 200 can be any number and you don't know it anyway
Not difficult enough yet? Then how about when you start messing with Z too much you suddenly realize you're loosing precision bits. Fast. If you do (N * N) / N, or (N / N) * N, you will most likely not get exactly N in return. Worse even, if N is very big or very small the end result might be quite different from what you'd expect.
Ah, but that is still not all there is to it. Some games out there are not too shy of registering Z values as low as 0.00001 and as big as 10000 at the same time. Now try to compress that into required [0; 1] range and you'll realize there are simply not enough precision bits in modern 24-bit Z-buffer implementations. 24 because it's 32 minus 8 bits for stencil buffer (and stencil buffer is no longer optional, there is simply no support for plain 32-bit Z-buffering).
Wait, have I forgot to mention that some titles go as far as throwing infinities and even NANs into 1/Z? Have fun with these.
So... shadows look much better (some issues still remain). Train wheels in Evolution 2 intro should not be visible through the floor. But that broke Dead or Alive 2 intro in few places (eyeballs sticking out of skulls, anyone?). That too can be fixed, more or less, but it will break Soul Calibur character selection screen. Getting this to work will in turn cause menus and other stuff in Soul Reaver become invisible. And so on...
I see no other way now but to break some games on purpose (keeping the glitches to minimum) in order to make others work at all. The alternative would be to try and create several profiles for different games and use those. I'll have to think about it some more.