Code Monkey home page Code Monkey logo

Comments (277)

ptitSeb avatar ptitSeb commented on June 29, 2024

Not much things in that log, but the workaround is used. So without it, it will probably not run fine without the fix in the driver.

I'll build ioquake3 with GL driver on my side tonight also.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Uhm, last version can't build:

libgl4es.a(fpe.o): In function builtin_CheckVertexAttrib': fpe.c:(.text+0x1310): undefined reference to isBuiltinAttrib'
libgl4es.a(fpe.o): In function builtin_CheckUniform': fpe.c:(.text+0x1400): undefined reference to isBuiltinMatrix'
libgl4es.a(shader.o): In function redoShader': shader.c:(.text+0x1cb0): undefined reference to ConvertShader'
libgl4es.a(shader.o): In function glShaderSourceARB': shader.c:(.text+0x20b0): undefined reference to ConvertShader'
collect2: ld returned 1 exit status

Probably because of new changes about preproc.c stuff, etc ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ah, that because shaderconv.c can't compiles:

src/gl/shaderconv.c: In function 'ConvertShader':
src/gl/shaderconv.c:220: error: 'pBuffer' undeclared (first use in this function)
src/gl/shaderconv.c:220: error: (Each undeclared identifier is reported only once
src/gl/shaderconv.c:220: error: for each function it appears in.)

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, it was DBG(printf("Shader source%s:\n%s\n", pBuffer, fpeShader?" (FPEShader generated)":""); , so commented out at moment, rebuild quake3_gl4es, run.

Visually almost the same, maybe little differences:

http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/screen2.jpg

And full debug output:

http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/output_full_debug.txt

Can be that it just other issues, not related to vertexattribs. All in all its just menu, not game itself..

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

As i can see from log, shaders are simple and primitive ones (so just for menu that understandable). And mostly only usage of textures going on. So probably it can be indeed some other issues

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

By the way, i tested Cadog without workoround : it of course draw a bit wrong things, but it works. Some parts just missed, but no mess of course like in quake3.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have fixed the issue with DEBUG.

Now, for the menu, I have to check tonight at home. quake3 engine can use lots of different type of texture, so maybe one type is not bigendian friendly yes.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Mmmm, after checking the log, I see no exotic texture. Only regular RGBA/UBYTE here...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Yeah, there really not much happens, just a menu. Same quake code works ok with minigl, so code is ok too(what mean ppc related parts in games code also ok). Also if cadog and letters fall kind of works with my hacked sdl1, then quake should too, imho..

To add, i do not use pandora's settings from your makefile, but my own. But imho there is nothing gl4es special in quake's code was done, pure as pure ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Yeah Pandora settings on the Makefile are for activating ARM Cpu optimisation, special keymap, and activating GLES 1.1 renderer. You need nothing of that so custom settings sounds good to me.

I'll test tonight with GLES2 backend to see what is happening.
(you can try to run it with the environnement variable LIBGL_MIPMAP=1 maybe, to force all texture to use mipmap).

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Tested with settins LIBGL_MIPMAP to 1 : no differences. Maybe a little different looks of distortion, but still the same.

It also need to be noted, that whole game's logic works: i.e. if i press "enter" few times, i can see changes in that distorted mess, etc (so i go to the game). I also can heard sounds.

I also can for example press "2 times enter, 1 times up, then "y" " , and so to exit from game.

I even can press after running "4 times enter, 6 times down, 2 times enter", and game kind of starts, but its all looks reall straaaaange:
http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/ingame3.jpg

And music start to be "jump" (probably because of sync with frame rates which broken).

But as you say that it works for you with GLES1.1 renderer, then it may be that GLES2 renderer need fixes ? But that to be seen once you build your version over GLES2, and we will know where to start :)

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Some good news !

After i lower every possible settings to minimum, it starts !! Menu works, games works ! Through it slow like hell (1 fps), probably because of enabled debuggin.

So for first will try to disable debugging, and for second will try to found what options exactly give problems for us.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

A texture size issue?

Note that you can trigger some internal texture shrinking also, with gl4es, using LIBGL_SHRINK=10 (10 is an example, there is a lot of possibility, refer to the USAGE.md file for detail). Shrink is not usefull here, as the engine as integrate downsizing, but it's interesting that reducing size of Texture make it work.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, with disabled debugging, it give 5fps only :( On the same place where minigl give 22fps.

Thats very strange, i expect to have about 60-80 fps at least in compare with MiniGL...

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Yeah, something is wrong.
Also, I checked texture size with previous log, and they are not that big. Max is 512x512, so it's not big. All textures are POT. some are not sqare, but there is really not much pressure on the texture side.

What did you disabled in the config exactly?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Found also, that its the GL Extensions, which make it looks messy. Once i disable them, all works, once enable all in mess.

But most important question is : where is FPS :)

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I mean, it should be about 50-100-150 fps, as MiniGL have 22 (at that with software TCL, etc)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

It should at least be 20fps, and probably more, not 5.

What GL Extension seems to break rendering?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Do you think it should be on pair with MiniGL ? I mean, minigl are TCL in software, so, all gl4es stuff which use more that few arrays should be faster a lot. At lest in 2-3-4 times for sure ?

As for what GL Extensions : dunno, just ioquake3 have options "disable / enable gl extensions", which ones exactly dunno, i just do disable them from minigl version, and then can run gl4es version.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I also was able to build some version of quake3 we have on os4 without SDL1 at all (pure minigl), so there i swap on creating gl4es-ogles context, etc : same 5fps :( So we can rule out SDL1 as potential perfomance penalty.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Well, Transform and Lighting will be done in some Vertex Shader with gl4es. If vertex shader are executed in the GPU, then yes, you should see some improvement. But vertex shader can be implemented on the CPU (not the Fragment Shaders), in that case, speed will not change much.

I'm home, I'll build ioquake soon and see how it runs o the Pandora.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

But vertex shader can be implemented on the CPU (not the Fragment Shaders), in that case, speed will not change much

But probably not with our case, as gl4es works over ogles2, which works over warp3d which all the shaders implement over GPU only.

That why i think it should be around 50 fps at minimum. Also our MiniGL is done a bit "wrong" in terms of coding, so not only TCL is impact, but just in whole all should be better.

I was thinking about SDL1 being problem, but as i post in previous message, i build version without SDL1 at all, which give same 5fps.

Through, from good news , is that there is almost no visual glitches. I.e. gl4es/ogles2 render all practically correctly.

Main problem for now is where FPS :))

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

About the extension, looking at the log, here is what ioquake find:

Initializing OpenGL extensions
...ignoring GL_EXT_texture_compression_s3tc
...GL_S3_s3tc not found
...using GL_EXT_texture_env_add
...using GL_ARB_multitexture
...using GL_EXT_compiled_vertex_array
...GL_EXT_texture_filter_anisotropic not found

So there is 3 candidates: GL_EXT_texture_env_add, GL_ARB_multitexture and GL_EXT_compiled_vertex_array.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I also change in the quake3 option "lighting" , from "lightmap" to the "vertex", it then give 10 fps. But then, minigl with the same settings, give 40fps.

As for extensions, yeah.. but that for sure can't be reassons of low fps ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

No. I have no explanation for the low fps for now.

But it would be interested to understand wich extension. Multi-texturing may have a big impact on fps, and having it disabled is not good.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Related to extensions: from another side, when i press on "driver info" in the quakes3 settings, it bring me a lot of extensions from gl4es. But not sure if they will be used of course (but maybe it try, and crash because of it ?)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

No, the "glinfo" gives all the extension present in the driver. The one used are simply the few I listed.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

That what i have in MiniGL version when press on the "driver info" in quake3 settings:
http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/mgl_extensions.jpg

And that what i have in GL4ES version when press on the "driver info" in quake3 settings:
http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/gl4es_extensions.jpg

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

That's quite some extensions supported in gl4es ;)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Just to be sure it doesn't try to use an OpenGL 2.0 renderer, can you launch it with environement variable "LIBGL_GL=15" to force OpenGL 1.5 and not 2.0

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Seeting LIBGL_GL enveronment to 15 make no differences.
Also when i build quake objects, i can see that only "renderer1" ones is builded. Or you mean its gl4es may try to use opengl2.0 by default ?

But then, setting LIBGL_GL environment to 20, change nothing at all. Same looks, same 5 fps.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

You should try to find wich extension break things: reactivate the extensions in ioquake3, and mess with the exentions string in gl4es so iq3 doesn't find them. It's in src/gl/getter.c. Then reactivate the extensions one by one to see wich one break rendering.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, will try now.

But interesting to know, how it runs for you over gl4es / gles2 :) Maybe there some general problem.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

It's not built yet... The Pandora is just an Arm @1gz, it's not super fast...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

To add, in MiniGL i have 22 fps with disable extensions as well. With enabled it give 23 fps, which jump most of time back to 22, so probably no differences at all.

But i will try to find which one mess the things.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, probably found right from begining, i just doing that in getter.c:

"GL_EXT_!!!!compiled_vertex_array "

So, when i enable gl extensions in quake, i have:
multitexture: enabled
compiled vertex arrays: disabled
texenv addd: enabled
compressed textures: disabled

And then, no trashing. Rechecked again with gl4es where i didn't mess that extension in the getter.c , and it trashes. So its cleary GL_EXT_compiled_vertex_array which mess things up. Will report that one to Daniel.

But then, as in case with MiniGL, those extensions almost make no differences.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

By the way, in the MiniGL version compiled vertex arrays is disabled as well (just to rule out gl_extensions as perfomance penalty).

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Finished building on my side.

Working perfectly on GLES2 backend on the Pandora.
Graphics are just fine, and the demo runs at 40fps, with some parts lowing down to 22, and sometime up to 60fps...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

That kind of bad news then !
My machine are 2ghz cpu, with some good radeon (2 years old only), so it should be about 100 fps then.

uhm ! strange !

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

So that GL_EXT_compiled_vertex_array extension ensable the use of glDrawElements(...). Without it it use glBegin(...)/glEnd() blocks. So this is consistant with the issue with vertex size.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I'm still unsure why it would be that slow. Are you able to do some profiling?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Daniel ask me to pack everything up, so he can profile it , let's see.
But as you have on 1ghz machine results even better than with minigl.. Then it shows !

Btw, when i say "22 fps", i mean you just run the game, and go to the first strage, and once it loads, you didn't move. What you have then there ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Also, you set no special environments ? All just works as it ? With latest commit of gl4es ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Ah yes, at main menu I'm only at 10fps. Strange, I think I remember it was (much) faster on GLES1.1 backend.

Also yes, it run without setting enything, with all default settings (geometry high, light map...)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

The slow down seems to be caused by the large mirror in front of starting point. If I move and look else were, fps jumps at 40fps.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, 10 fps with lighting lightmap or vertex ? It can be, that gl4es give us that framerate for example ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I mean its strangely the same fps for both of us

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have done a quick profile, and most time was spent in actual drawing (in gles2 driver). Nothing really wrong, I'll see if I can get an GLES2 Trace of that.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Same for me . If i move away from mirros, fps in some places can be even 70 (if you move back from mirror), and if you move left-right from mirros , around 30.

But that pretty bad values. Our minigl which is coded real bad, have 22 fps when i wath in the mirror, 80 when i move to the left, 100 when i move to the right, and 90 when i around so mirros at my back.

Can you check with GLES1.1 as well ? Maybe its just GLES2 rendering slow things down (some slow shaders, or something)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have just checked with gl4es but using GLES1.1 backend. That should be equivalent to your MiniGL.
I get 22fps looking at the mirror...
I'll try to get a GLES2 Trace of that rendering, to see what is happening.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Got my trace. Now I need to analyse it
image

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Its not only with mirror slow, just with it problem visibly. Whole framerate slower in about 3-4 times everywhere, so its not related to particular effect, but to some general things..

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I think I have a idea why it's slower: It seems to use some clipplane.
Look a vertex shader:

#version 100
precision mediump float;
precision mediump int;
uniform highp mat4 _gl4es_ModelViewMatrix;
uniform highp mat4 _gl4es_ModelViewProjectionMatrix;
attribute highp vec4 _gl4es_Vertex;
attribute lowp vec4 _gl4es_Color;
attribute highp vec4 _gl4es_MultiTexCoord0;
// FPE_Shader generated
varying vec4 Color;
uniform highp vec4 _gl4es_ClipPlane_0;
varying mediump float clippedvertex_0;
varying vec2 _gl4es_TexCoord_0;

void main() {
vec4 vertex = _gl4es_ModelViewMatrix * _gl4es_Vertex;
clippedvertex_0 = dot(vertex, _gl4es_ClipPlane_0);
gl_Position = _gl4es_ModelViewProjectionMatrix * _gl4es_Vertex;
Color = _gl4es_Color;
_gl4es_TexCoord_0 = _gl4es_MultiTexCoord0.xy;
}

I know that the way I implemented clip planes are probably not the best way. I'll try to disable them to see if it improve things.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Yes, If I disable clipplane (line 219 of scr/glx/hardext.c change to hardext.maxplanes = 0;//6;) I have 22fps (like on GLES1.1). But the mirror don't render correctly.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Will check now. If it at least on the pair as GLES1.1 for you without, maybe it will be faster than minigl version for me now, 10 mins and we will know :)

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

In my case sadly disabling clipplane almost make no differences, just +4 fps.
I made i test check on timedemo1/demo four, and:

minigl: 1260 frames 15.1 seconds, 83.2 fps
gl4es_no_clipplane: 1260 frames 56.6 seconds, 22.3 fps
gl4es_all_as_before: 1260 frames 58.2 seconds, 21.7 fps

About 4 times slower, while should be probably 2-3 times faster :) At least in theory.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Well, are you sure MiniGL is doing TnL in software?

Also, let's wait for some profiling from Daniel.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Yes, 100%. And it also works throught Warp3D (that one which OGLES2 uses).
And even if, then GLES2 version should't be slower, but at least the same, or a little bit faster (because of shaders). But as minigl have TCL in software, then gl4es verion should be 100% faster.

Let's wait what Daniel will find..

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

While Daniel checking it, i got also some note from Hans (warp3d developer): Maybe something is flushing the pipeline like crazy? That'll give a performance hit, because there's a limit in how many draw operations/command-queues can be submitted per second.

But that probably not about gl4es, but about our ogles2 driver..

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

The GLES2 Trace I have done shows that when facing the mirror, there is around 550 draw commands. This seem reasonable.

Many GLES2 hardware also doesn't like to have many draw command, so gl4es tries to group them as much as it can.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

As author of warp3d says "5fps * 550 = 2750 draw calls/s. We can manage a lot more than that, so something must be getting in the way.

Also if you say that gl4es tries to group them as much as it can... Then it should't be "fluhing the pipeline like crazy" issue then :(

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

What is even stranger is that the other games that already work use similar stuff anyway.
Still,there must be either something OGES2/Warp3D doesn't like in shaders or something in how the data are fed in the driver.
Just to note, gl4es doesn't use any VBO for now (I plan to try use them, but for now, all VBO are emulated), and the array generated by glBegin(..) / glEnd() are not interlaced, they are separate arrays (I'll try to work on that also, it can helps performances I think).

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Oh,, emulated VBO :( That can the reassons maybe ? Daniel says that other projects also use VBO a lot, and all works fine, but he didn't know for now that gl4es do emulate them in software .. Imho , that for sure can be reasson ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

And games in question which already works and which we tests just 3 : bloboats, letters fall and cadog: all os them very little, small, and can't show any problems with speed.. Quake3 imho first test which "a little make things harder".

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Just to note, VBO are not used by Quake3, like in most OpenGL 1.x games. But maybe OGLES2 driver expect all its data in VBO yes.
Using actual VBO in gl4es require some work. It was not designed to use VBO in the first place, so I need to alter many critical place. Using real VBO is part of my TODO, as I expect some speed boost in some architecture (but not on the Pandora according to some preliminary tests done with Doom3), but, it's not a small change...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Thanks for explain, will see what Daniel will say about, after he profiling it on our side.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Btw, doom3 also works over gl4es ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

No, regular Doom3 doesn't use GLSL and will not work on gl4es (but I have to try Dhewm3 https://github.com/dhewm/dhewm3 and with BFG edition, that I think support GLSL).

I use the Dante project, that is a direct GLES2 port of Doom3: https://github.com/omcfadde/dante (slightly adapted to the Pandora...)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Mmmm, when I analysed the number of draw call, I used the regular Pandora version, so using glDrawElements(...) that are quite optimized by the idTech3 engine.
But on AmigaOS/GLES for now, it's using the glBegin(...)/glEnd() code path, that I don't know well. I have to check in that case if there isn't something odd or broken happenning.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

@ptitSeb
I have some very HOT discussion with ogles2 author, and .. from begining, he profilin it a bit, and that what he say:

The reason for Q3 being so slow is that the game does practically zero batching. ogles2 is flooded by glDraw-calls of practically always less or equal to 10 triangles. If I artifically limit ogles2 to ignore any draw-calls with more than 10 triangles, then everything looks like before. Drawing a scene like that is the ultimate most inefficient way to do things and one of the big "donts" in terms of GL. OGES2 is not optimized for what Q3+gl4es deliver right now and I probably won't optimize it for that kind of stuff. Eveything will be fine, as soon as you start to feed it with something else than single triangles. So the obvious solution is: extend gl4es instead to collect the data of such small draw calls and then issue a bigger one.

When he say "game", he probably mean "when it compiled over gl4es". I think he didn't mean quake3 code, as that one for sure should do things right ?

When you say " I used the regular Pandora version" , do you mean non gl4es version , but just some regular one ? I mean, while it use on amigaos/gles gl_Begin/gl_End, it should probably do the same and for all other gl4es port everywhere when they works over GLES2 backend ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

@ptitSeb
Also Daniel explain a bit futher about, so i jsut will copy+paste his answer, hope he doesnt mind (it will just help to make things better):

Right now Q3/gl4es draws a scene in a way that's no good with ogles2/Nova. The latter like rather big amounts of triangles. That's what they are designed for. And this is how you get good performance from it.

Making hundreds or thousands of draw-calls with less than 10 triangles each is missing the topic of those libs. And it was never a good idea. Apparently you're lucky and other ogles2 implementations on other hardware isn't hurt by that so much. And apparently you're lucky that MiniGL/Warp3D(SI) is of some help in the background.

The thing is, like said before, that this type of inefficient drawing is not what 99% of ogles2 programs do. That's why I'm absolutely not convinced that it makes sense to optimize ogles2 also for this niche task. IMHO something like that has to be implemented in the next higher level (where it's also most likely easier to do and where other systems also benefit from it). The next higher level in case of the constellation here would be gl4es.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

By regular, I mean using gl4es and all extension. So it use glDrawElement(...) and the calls are batched (with 550 calls to draw the initial scene in front of the mirror).

But, remember, on AmigaOS4, we have disabled that extension, and Q3 use a glBegin(...)/glEnd() loop to draw.
Now, GL4ES should try to batch this calls. There some code to simply do that. But I haven't checked on Q3 if the "collapse" code is working. I'll check tonight on the Pandora: I'll disable glDrawElements extension and do another GLES2 Trace capture, to see what is happening. If there is many small batch of 10 triangles, the Pandora will just go at 1fps, so I'll see it. I'll then try to see why the collapse code is not working (as it should, batching drawing call is a sure source of fps!).
In the mean time, you can try to use this env. variable LIBGL_BEGINEND=2 to make gl4es try harder to batch glBegin/glEnd call. Maybe it will help?

(also, don't forget that once the fix for the vertex attribs is done, you can enable all extension and have 550 calls for the frame)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Just to be clear: I do agree that making many little call of a few triangles instead of single large call with many triangles is bad for the performances.
I'm well aware of that, and that's why I worked on gl4es to try to avoid that, by batching as much as I can.
Unfortunatly, this kind of things is not uncommon in games, but usealy, gl4es is abble to batch reasonable chunk of traingle. The Warp3D Hardware is more powerfull than what the Pandora have, so there must be something wrong indeed, and we'll find out what it is.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Ok, thanks for help ! In meantime i will try LIGBL_BEGINEND=2.

But did i undestand you right, that once GL_EXT_compiled_vertex_array will work, it will do 550 calls as one call (i.e. batch them all), and not like its now when do 550 little small calls ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Yes, with GL_EXT_compiled_vertex_array you will have the same rendering as I had yesterday on the Pandora.
idTech3 engine is pretty good at batching call. It was not the case with idTech1 and 2 (because of the software rendering), but, again, I don't know well the "glBegin/glEnd" path, maybe it's fragmented. I'll check tonight.

But you will have 550 "large" calls.
If I beleive Daniel, current Q3 renders with thousand of small calls, not hundreds of large.
550 call per frame is a good value.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Yeah, we can do on ogles2/nova much more thant 550 calls per frame, just throwing hundreds or thousands of such micro-draw-calls per frame at the ogles2 / Nova AND expect it to deliver fast results ... :)

If it will be 550 large calls which hold whole quake-triangles-data, that probably will boost perfomance a lot ?

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I took a quick look at the code.

Main drawing function is here: https://github.com/ptitSeb/ioq3/blob/master/code/renderergl1/tr_shade.c#L177
there is some #ifdef HAVE_GLES that is the pure 1.1 GLES renderer. But for gl4es, it's a regular build so HAVE_GLES is not defined.
As you can see, because qglLockArraysEXT is undefined (it comes with the extension), the code will call R_DrawStripElements( numIndexes, indexes, qglArrayElement );
This function does that (quoting code comments):

/*
==================
R_DrawElements
Optionally performs our own glDrawElements that looks for strip conditions
instead of using the single glDrawElements call that may be inefficient
without compiled vertex arrays.
==================
*/

If you look at the code of this function (start here: https://github.com/ptitSeb/ioq3/blob/master/code/renderergl1/tr_shade.c#L177 )
you'll see it tries to do TRIANGLE_STRIP, so making more calls instead of on single glBegin/glEnd, as it was more optimised in the early days.
I don't know how your AmigaOS miniGL handle this, but the only way to collapse thoses triangles strip is to put them back in individual triangles... gl4es is supposed to do that, but maybe it doesn't for some reason (and so LIBGL_BEGINEND=2 will not help here).

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I (well, not that I, but Daniel point me on it), that probably minigl (which works not over our warp3d which have shaders, but over some other older warp3d), have "batching of calls" inside.

But we need to know how it reacts for you once you disable extensions. Through, your ogles2 driver may have also "batching" inside ?

But, as with current gl4es compile we have many calls with less than 10 triangles , then probably "bachnig" code in gl4es dind't work in case when we disable extensisions ? But that to be seen once you run quake3 without extensions on pandora.

Btw, is there any way, so i can know that settings of environments works at all ? I.e. some simple test environment, which when i set, will be visibly that gl4es do change things ?

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Btw, about "550 calls", its about your Pandora's check, how many we have : dunno. All what Daniel say: its a lot lot lot of calls with less than 10 triangles each.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Ok, I have done an analysis when using the glBegin/glEnd path: its bad. It seems that gl4es doesn't collapse the blocks (but it should). So I have a bug in gl4es to leads to bad performances in this case.
I have 3906 drawing calls (instead of 550), and yeah, performances are terrible on the Pandora too.

I have to fix that, as the code for collapsing the call is there, so it's "just" a bug somewhere...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Oh, thats start! That give you probably 1 fps only ?:)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

yeah, a few fps, 2 or 3, not sure... awfull anyway, I haven't let it run for long.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

:) well, as i have 5-6fps in that case, we can expect to have values better than on minigl (22fps there).

Fixing batching with glbegin/glend, will probably help all the other stuff too.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Yep. batching help a few games...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I can help with debug , but probably there is no needs as you have seen it all on pandora .. But as i can compile it all fast .. Let me know if i can be any of use there :)

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I found something. I'll push something soon.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Ok, I have pushed the change. I get my 10fps back when facing the mirror. You should note some significant improvements too.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have made a capture, and it's back to ~500 calls.
There are still a few draw calls that are not merge but seems to be compatible, but that much better then before.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Ok, found that last small bug that made some command to not merge when they should.

Number of calls is now down to ~300. I hope you'll like it :p

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Tested with 500 calls, without clamp was 13 fps only, now will check with 300 :)

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

With 300 have 14 fps only :( and that without clamp in shaders. Will enable them back and see what it will have.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

When put clamp in shaders back, it give abou the same 13-14 fps.
I.e. for the same timedemo1/demo four, i now have:

minigl: 1260 frames 15.1 seconds, 83.2 fps
gl4es: 1260 frames 39.1 seconds, 32.2fps

Better than before (~22 fps), but still faaar away from minigl with this TCL in software .. We moving step by step, but still .. uhm, i will have needs probably upload new quake3 binary to daniel for another profiling now ..

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

Well, now, you can re-ask Daniel to do some profiling. Activating the glDrawElements(...) will not reduce the number of call at this stage, but they will reduce the cpu load / number of malloc()/free() done...

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Btw, when i enable GL extensision in MiniGL version, it give me 41-42 fps instead of 22 when i look at mirror.

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have done some quick cpu profiling on the Pandora.
With the extension, it's clearly GPU limited when facing the mirror, but without the extension, I can see it's more CPU limited (and I have 10fps vs 6fps at initial position). So there maybe some stuff to optimize, maybe.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

I just do some checking, that what i have now on aos4 with current minigl and gl4es/ogels2:

With GL extensions OFF:

MGL/SDL1: 50.2
MGL/SDL2: 54.9
GL4ES/SDL1: 31.8

With enabled GL extensions:

MGL/SDL1: 81.9
MGL/SDL2: 89.1
GL4ES/SDL1: 35.8

That cleary show, that extensions help A LOT... Which ones through , i do not know. List of support MiniGL extensions not that big as in the GL4ES, but on running Quake3 says only about the same 3 extensions:

using G_EXT_Texture_env_add
using GL_ARB_multitexture
using GL_EXT_compiled_vertex_array

So, is it probably GL_EXT_compiled_vertex_array which give that huge boost ? As other 2 in GL4ES version only add 4 fps.

But also "non extensions" version looks slower in 1.7 times.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

That for timedemo1/demofour

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

You cannot really trust the benchmark for ogles2/gl4es with GL_EXT_compiled_vertex_array as long as the bug with vertex attribute is there. You don't know what it is trying to draw, and how drawing the garbage slowdown things.

from gl4es.

kas1e avatar kas1e commented on June 29, 2024

Of course, i just measure it all without GL_EXT_compiled_vertex_array, i have it messed in hardtex.c, so only 2 others extensions loaded.

But that yes, to be seen once it fixed.

Problem imho, is non-extensions version, which still slower than minigl with software TCL.. But let's see what Daniel say. He probably will just wait for the vertexattrib fix firstly..

from gl4es.

ptitSeb avatar ptitSeb commented on June 29, 2024

I have pushed a last optim on glArrayElements that also helps quake3 when extension is off.
I think gl4es now works correctly, so I stop trying to get more fps for now on quake3 and the glBegin/glEnd code path.

from gl4es.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.