The VAO surface cache uses glBufferSubData and triggers a very slow
path in some GLES implementations. Specifically I have observed 10x
frame times under Emscripten with ANGLE/Metal on macOS and with Mali
on Android.
Co-authored-by: Zack Middleton <zack@cloemail.com>
Co-authored-by: WofWca <wofwca@protonmail.com>
Using GPU vertex skinning is significantly faster than CPU vertex
skinning. Especially since OpenGL2 has to run R_VaoPackNormal() and
R_VaoPackTangent() each vertex each frame which causes CPU vertex
skinning to be significantly slower than OpenGL1 renderer.
Moved all the code using Altivec intrinsics to separate files. This
means we can optionally use GCC's -maltivec on just these files, which
are chosen at runtime if the CPU supports Altivec, and compile the rest
without it, making a single binary that has Altivec optimizations but
can still work on G3.
Unlike SSE and similar extensions on x86, there does not seem to be
a way to enable conditional, targeted use of Altivec based on runtime
detection (which is what ioquake3 wants to do) without also giving the
compiler permission to use Altivec in code generation; so to not crash
on CPUs that do not implement Altivec, we'll have to turn it off
altogether, except in translation units that are only entered when
runtime Altivec detection is successful.
This has been tested on Linux PPC (on an Altivec-enabled CPU),
but we may need further work after testing trickles out to other
PowerPC devices and ancient Mac OS X builds.
I did a little work on this patch, but the majority of the effort belongs
to Simon McVittie (thanks!).