Planet Renderer – Week 2: Basic OpenGL Framework

Gif showing icosphere subdivision program in opengl c++

This week I have implemented a basic OpenGL framework for future application of the planet renderer. It can be found on Github, which will be frequently be updated to  keep track of the progress of the rendering tech.

I reused a bunch of (modified) code from the OpenGL Framework I wrote last spring, without actually writing the project inside in order to avoid making the techdemo unnecessarily complex. The goal afterall is implementing a techdemo that has the necessary code for rendering planets and nothing more, so that it is easy to identify the code necessary. It will always be possible to transfer the project into a more game friendly engine.

So for now, the framework contains the following components:

  • A shader class that combines all shader types and handles loading and simple preprocessing of .glsl files
  • An input manager
  • a transform class that contains basic model-world matrix transformations and information about position rotation and scale of an object
  • a camera class that handles view and projection matrices
  • opengl and sdl initialisation, a main loop
  • window, time and settings information
  • a scene handling all important objects

By necessity I will probably also add sprite rendering for post processing with framebuffers and bitmap text rendering.

I also took the opportunity to test the generation of icospheres directly in the framework, the process of which I outlined in my last post. In a first step, the 12 vertices are generated using the golden ratio, and then the triangles are subdivided, with the vertices generated being extended out to follow the radius of the ideal sphere.c++ program for the recursive subdivsion of triangles in a sphere for planet rendering

The nice thing about this approach is that it allows stopping the recursion depending on a different hueristic than being below the maximum level. As an example, the distance from the camera and the visibility in the viewport could be used. 😉 For actual planet rendering, the entire thing should be placed in some sort of tree hierachy, but for now this works.

I also tested the real time updating of the vertex buffer with the most simple approach (simply rebinding it). When changing the buffer, actually recursively generating the geometry takes the longest time. When taking out this step (while still rebinding the vertex buffer), I was able to send nearly a million triangles to the GPU (NVidia Quadro K3100M, Intel i7-4700MQ) at around 180fps. Without rebinding the buffer those triangles ran at around 550 fps. I was quite impressed with the amount of data I was able to push to the gpu, by an extension of these measurments I could probably send one triangle per pixel to the gpu every frame and stay above 60fps.runtime updates opengl vertex buffer binding planet rendering program

Of course, spending all of the performance on streaming vertices is less than ideal, since in a real world application a lot more calculations need to be done apart from terrain generation (not to mention actually generating the vertex data which was not done every frame here). It is defenitly not necessary to have one triangle represent every pixel though. If every triangle covers a 5- 10 pixel square it should still be enough to represent curves rather nicely, and for areas that do not form silhouttes from the camera perspective even less vertex detail is necessary since a lot can be done with nice shaders and textures.

On top of that, most modern algorithms I have been researching for terrain rendering use some approach that is hybrid of performing LOD calculations and having patches of precalculated data as a form of minimizing the amount of bandwidth that is used. This is done because, as I demonstrated earlier, the main performance killer is not having a lot of triangles on screen, but sending a lot of them from the cpu to the gpu, and having predefined patches is an effective way of reducing the amount of data that needs to be sent (for instance with an index list). It is not inconcievable to even generate low level details with a geometry or tesselation shader on the gpu to minimize that effect where it doesnt affect things like collision anymore.

I have noticed that there are two ways of dealing with low resolution details. The first is putting a lot of terrain data into paged files on the harddrive and streaming in the necessary data when the resolution is needed, and the second is procedural generation. Since I am researching planet renering for the usage in games, holding detailed information about entire planets is probably an impractical solution to persue, since  a lot of game applications will require moving between a multitude of planets, and storing all that data will be likely to be inconvenient for both users and developers. Therefore it seems more adequate to either generate the planets completely procedurally or have some low resolution data and use procedural refinement at lower altitudes. This doesnt throw low resolution data completely out of the window, since it might still be useful to be able to add detailed modifications in specific positions (though a lot of that could be achieved by loading specific meshes at certain distances).

On the side of generating the geometry in real time, a lot can be done to minimize the amount of time the cpu is busy with that. The most important one is actually keeping all the generated data in a tree structure and only updating the tree when the camera is moved. Also all triangles that are not in the viewport can be culled (including their children). This can be checked by transforming their positions with the viewProjection matrix into the screen coordinate system. The second form of culling can be done by checking if a triangle is backfacing from the camera perspective, a simple dot product will suffice to check this.

Lastly an interesting trick used in the ROAM algorithm is queueing pointers to triangles that are to be split in a list sorted by their priority, and triangle children that are to be merged in another queue. This way, the subdivision level of the triangle will only change by one subdivision per frame, and it limits the amount of calculations done and changes pushed. As a result it will prevent spikes in calculation time during rapid camera movement.

 

The research I have done mainly served as a reminder how complex this topic can get, but I have found some general rules for my approach:

  • Limit the amount of geometry to recalculate by caching data in a tree hierachy and only allowing one subdivision change per frame
  • Cull invisible triangles at the parent position in their tree structure
  • maximum triangle resulution needed is a square of 5 px
  • stop using LOD at a higher resolution and patch the in between space with non LOD dependant methods
  • Dont page an entire planet in memory, rather refine the terrain procedurally
  • Possibly only resend changed parts of the vertex information to the gpu
  • additional quality improvement is still possble with shaders and meshes on top afterwards.

I hope this post summarizes the progress I made since last week nicely.

Author: Leah Lindner

Hi, you are looking at the portfolio of a German-English game developer with a focus on graphics programming. I love finding out how things work and visualizing them in a creative way using computer technology. I first became interested in computer graphics when I started creating 3D art using Blender in 2008. After majoring in programming at secondary school and also teaching myself digital painting, I moved to Belgium to take a Bachelor in Digital Arts and Entertainment. Following work on projects in both film and video games, I have increasingly focused on graphics programming, and moved to Brighton to work as a GameDev at Electric Square.

8 thoughts on “Planet Renderer – Week 2: Basic OpenGL Framework”

  1. Hi Robert,

    Been working towards what you have done for few years now but with month in between the tries 🙂 Thanks for this blogg, it’s really helping me getting closer to my goal. I implemented frustum culling yesterday and now im looking into the big FPS drop I got from it. I use DirectX compare to you that uses OpenGL.

    I’m runing on an internal Graphic card, intel 620 or something like that, right now Im mapping, unmapping the vertexbuffer after having updated the geometry. and I do like 4 subdivisions right now.

    30FPS seems really low for sending 15360 triangles to the GPU, This low FPS only occurs ofc when I move the camera around and the geomtry needs to be updated. Just cant get my head around what is causing the big drop in FPS compared to your tests.

    Any ideas or pointers are much appreciated

    Best Regards
    Johan

    1. Hi Johan,

      have you been using the CDLOD method that I wrote about in later blog posts, or simple triangle recursion? Triangle recursion had really bad performance for me too, which is why I implemented CDLOD for triangles in a tree structure.
      If you have implemented that, I would say it sounds like your GPU has a hard time dealing with high dynamic vertex throughput (unsurprisingly for integrated graphics). You could try playing around with different vertex streaming methods.
      I would definitly try increasing the size of patches, larger patches would mean less instances need to be sent to the GPU dynamically.
      Did you use a graphics debugger to check the timings? Might help identify the bottleneck.

      1. Hi, thanks for the fast reply

        The plan for tonight is to check with a graphics debugger and see, but I when I remove the recursive recalculations it runs very smooth, which should mean, if Im correct that the mapping and unmapping isn’t the issue and that the recalculations is. I haven’t implementet the CDLOD yet, I’m taking it step by step since I’m learning c++ at the same time. It’s part of the plan in the end, might look into that sooner then I thought 🙂 I’m also not back culling right now but thats more cause I thought my CPU could handle 15000 vertices without issues which doesn’t seem to be the case.

        I haven’t implemented a patch system, which from my understanding is several vertexbuffers instead of one big, and you decide which ones to send to the GPU instead of sending them in one big buffer correct?

        Thanks in advance
        Johan

        1. I think actually if you are using simple recursion for the planet then there is no point in using a graphics debugger, because your bottleneck is probably not the GPU but the CPU.

          Think about it, your CPU basically has to generate 15000 vertices and their higher subdivisions every time you move the camera, and for each of those there are about 50 lines of code (horrible measure but I couldn’t bother counting the instructions accurately, it proves the point). After that it still needs to actually upload all those triangles to the GPU, and then render them. I don’t think you will ever get good performance that way, you will need to implement some optimisation for terrain rendering eventually (I didn’t have good performance on a dedicated GPU either until I did this).

          Back-culling will be a good first step I suspect, because if you’re already frustum culling I suspect more than half of the triangles will be in the back and therefore invisible.

          The patch system is part of CDLOD, think of it as instead of rendering individual triangles you render instances of large triangle patches which have subtriangles, which are morphed into the correct shape by the GPU.
          These patches are basically pre-subdivided triangles which you generate once and then never have to calculate again.

          For example if your camera angle dictated that you need around 200.000 triangles on the screen in order to make the terrain look good, if you had to generate all of those every frame on the CPU your computer would crash and burn. But you could have patches which are subdivided 5 times, which means every patch contains 4^5 (1024) triangles. This means your CPU would only need to calculate the positions of 200 patches, and leave the rest of the calculations for morphing those patches into the correct shape to the GPU.

          And you can balance this perfectly for your system, for instance if you notice your CPU is still the bottleneck you could increase the amount of subdivisions per patch to 6, making each patch contain ~4000 triangles so your CPU would only calculate 50 patches. (Your GPU will do 4 times the work instead, but they are good for parallel processing to it will be way more efficient to do it there).

          All this can be done in a single drawcall if you use instancing.

          1. Will start to look into this right away tonight, always nice to learn something new 😀 I’m using your code as guidelines and ideas while building the rendering framework for the planet

            Thanks alot for your help

  2. Last question only,

    “The patch system is part of CDLOD, think of it as instead of rendering individual triangles you render instances of large triangle patches which have subtriangles, which are morphed into the correct shape by the GPU.
    These patches are basically pre-subdivided triangles which you generate once and then never have to calculate again.”

    Are you using shaders to morph them into the right shape?

  3. Hi,
    i have a Question about your Patch Shader. i try to learn from your code. The Thing i don’t understand is how the Shader knows which “vec2 pos” and “vec2 morph” to use for the vec3 a, r, s.
    When you start the programm m_Vertices Size is 153. and m_Indices Size is 768.
    Now the thing i don’t understand. You send PatchInstances to the Shader with a size of 14.
    So there is some Magic behind that? 🙂 How do those values fit together?

Leave a Reply

Your email address will not be published. Required fields are marked *

*