➟ Go back to April 2009.
➟ Proceed to June 2009.
| In these blog entries, you will find information on the things that I am currently working on. Whatever you read in recent entries does not necessarily describe things that are available for public download. The sole purpose of this blog is to inform you about the progress on development which will eventually - that is, in the future - result in an actual release of those features. Do not mistake this blog for a changelog. |
I am pleased to say that I have found a way to get rid of those fringes that occasionally occur when blending alpha polygons, which also affects the Smooth transparency mode. This is a bit of an explanation for the causes, along with the solution.
When painting a picture, you need to paint from the back to the front, because foreground objects might obscure background objects. For rendering computer graphics, this is not usually done, because identifying the correct rendering order (depth sorting) is way too expensive if you want to get it right.
Instead, computer graphics employ the so-called z-buffer. Using this technique, you can draw polygons in any arbitrary order, and whenever a single pixel is to be drawn to the screen, the hardware first checks if that particular pixel was drawn already and compares the distance of that pixel to the camera: If the previously drawn pixel was farer away from the camera, it is overdrawn. But if it was nearer, then the current pixel is not rendered at all. This works, as said, at the pixel level.
One problem with the z-buffer is that it cannot be meaningfully used with translucent polygons. This is due to the fact that you cannot simply skip rendering behind an already drawn pixel, because if that pixel is translucent, you need to see what is behind it. This requires that you do render your data in the correct order. However, doing the depth sorting right is too computationally expensive, which is why compromises are used. In openBVE, this compromise is computing the distance to the camera (not the cartesian distance though) per polygon, and then sorting the polygons by that distance. Sounds good, doesn't it? But the problem is that first, you cannot get pixel-perfect results this way as some pixels might be in front of another polygon, and others behind. And second, you cannot convey the complexity of 3D geometry in just one number.
This is why even with openBVE's depth sorting, you will occasionally end up with polygons that are rendered in the wrong order, resulting in this effect:
What happens here is that the foreground object is accidentally rendered first, while all pixels which are not fully transparent write to the z-buffer and thus prevent the subsequently rendered back tree to be rendered, thus the background image shows in its place.
In the Sharp transparency mode, this does not happen, because only pixels which are fully opaque are drawn, resulting in a non-smooth look:
My solution is to draw all transparent polygons twice. In the first round, the polygon is rendered smoothly, with z-buffer writing disabled. This way, subsequently rendered faces may draw over it, e.g. the back tree drawing over the fringes. However, as this would prevent the fore tree from appearing at all, a second round draws only opaque pixels, with z-buffer writing enabled. This way, at least all opaque pixels are in the correct order, while fringe pixels might be in the wrong order. In the case that two polygons have similar color, such as all the trees, this is not even noticable, even though it's incorrect:
If the polygons had very dissimilar colors, you would still see a fringe, but not of the background image in this case, but of the background object which partly overdraws the fringe region of the foreground object.
This technique requires all transparent polygons to be rendered twice, which costs a bit more performance, but is at a much better quality than in the Sharp mode. A bit of performance optimization is done in that the first stage only renders non-opaque pixels (thus only the fringe) and the second stage only purely opaque pixels (thus not the fringe), however, it's still a performance impact. I think this solution is highly acceptable nonetheless. I will integrate this for the version following 1.0.5 as part of the Smooth transparency mode, which will make this mode more attractive to use.
I have started to lay out first code for the 2.0 architecture in a new project. Basically every low-level code will need to be entirely rewritten, and I am starting with the renderer.
In some routes, at low quality settings, you might achieve frame rates in the hundreds, so while it seems that openBVE 1.0's renderer offers quite high performance, this is not actually the case.
So far, I have been using OpenGL's immediate mode, which basically means that every frame, the renderer sends instructions on vertices, colors, normals, textures, etc., to the graphics card. This is highly unnecessary for static scenery, which doesn't change in-between frames and thus doesn't need to be sent to the graphics card all over again. OpenGL offers a concept called display lists, which pre-compile instructions and optimize them for the graphics card to be stored on the card and to be later recalled, without having to exchange unnecessary data between the RAM and the card.
I have made first experiments with display lists by rendering 10,000 solid-color quads. Without display lists, I get about 214 fps, but with display lists, this goes up to over 7,700 fps (35x). I am not sure if this is a best-case or worst-case scenario. After all, it's too theoretical data for what is eventually needed. There are textures, there is lighting, there are different quality options to allow for smooth alpha, etc. Some of these things might achieve an even better performance ratio, maybe not, but some of them cannot meaningfully profit from display lists. Interestingly, a first test with lighting enabled, textures used, texture coordinates supplied and normals supplied, shows that the performance in both immediate mode and with display lists is the same as in the above scenario. I am not sure why this is. I have validated the OpenGL code in all cases, and there were no errors reported.
The biggest issue though is dynamic concent, and this unfortunately includes static scenery as well, in particular all scenery that has to be depth-sorted, including, for the best quality, screendoor polygons and full-alpha polygons. However, assuming that the depth sort is relatively stable when the camera doesn't move too far (e.g. a few meters), there isn't much of a benefit in depth-sorting static scenery on every frame, allowing for the use of pre-compiled display lists to some degree within short periods of time.
Fully dynamic content is a bit more difficult, depending on how dynamic it is. It might be possible to use display lists to some sort, but I am not far enough with my experiments to give more details. Eventually though, the slowest elements will dictate overall performance. For example, while rendering 10,000 polygons in my above experiment gave about 7,700 fps when using display lists, rendering an additional amount of 100 polygons in immediate mode reduces performance to 5,900 fps. While they are less than 1 percent of the total polygons, these immediate mode polygons reduce performance by about 25 percent here. Surely, I might be unable to use display lists with extremely dynamic data, and even if it's few, it will decrease performance dramatically. However, the overall performance will still improve significantly, allowing for more detailed routes in the future.
I have started designing the API for loading-stage plugins. Currently, the API includes functions to register textures from a file, to load raw data of textures for post-processing and to register such raw data. On openBVE's part, the texture manager is now entirely rewritten and basically complete, with the option to load and unload textures as necessary, to extract clips of textures, to handle transparent colors, to convert textures to a power-of-two, and to convert them into the required internal format. I have not yet any plugins to make use of the API, though.
As a little background information, the API is an abstraction layer between a host program (e.g. openBVE) and a plugin (e.g. B3D object loader). Technically, a .NET plugin needs to link against something, and instead of directly linking against OpenBve.exe, which would prevent other programs from using the plugins (e.g. an object viewer, a route viewer, any fork of openBVE), plugins will instead link against the API, which resides in a separate DLL and only includes the specification (interface). The host program (e.g. openBVE) implements this interface then and gives a loaded plugin an instance of the interface to work with. This approach also allows change of openBVE's internals at any time as there wouldn't be any plugin directly linking against and making use of openBVE's internals. The API actually exposes some functions in a nice-to-access way which will be handled a bit differently in openBVE itself for performance reasons, especially regarding object management.
Ok, to keep it short, I basically can render textured polygons, but that's it for now.
The B3D/CSV object parser has been adapted as a plugin, making use of the object loading API. Not all features are supported yet, such as glow and nighttime textures. As I am departing from using the immediate mode in favor of display lists, I have to think about how to handle anything not opaque or dynamic with best efficiency. As such, I will incorporate these features at a later time until everything develops a bit further.
Something I have not mentioned earlier is that the API exposes different types of faces but just polygons. The basic organization is the face-vertex model, meaning that a list of vertices is stored, and a list of faces which links against the vertices. In addition to just a list of used vertices, each face also stores its type, which can be POLYGON, QUADS, QUAD_STRIP, TRIANGLES, TRIANGLE_STRIP or TRIANGLE_FAN, as illustrated in the following image.
Basically, all existing objects, be it B3D, CSV or X, use POLYGON only. Assuming that you want to render a mesh of adjacent faces, there are usually a few vertices that are shared between those faces. Choosing a different primitive but POLYGON allows to skip transmitting additional vertices to the graphics pipeline. Let's illustrate this with the following picture I found on the internet:
Here, a track was built whose pieces could be individual polygons, or quads. However, the QUAD_STRIP structure is used to make the whole mesh a single "face". What the structure does it basically indicating the first four vertices that make up one quad, and then only supplying two additional vertices per additional quad, which is implicitly connected to the previous quad. For example, instead of rendering vertices (v0, v1, v2, v3) and vertices (v2, v3, v4, v5) as two POLYGONS, you would just render vertices (v0, v1, v2, v3, v4, v5) as a single QUAD_STRIP, omitting two vertices. Performance gain will usually not be observed in such small-scale examples. However, as the length of a single strip grows, there is usually a noticable performance gain. The following is a good example, where individual QUAD_STRIPS are shaded with the same color:
Click to enlarge. See here for more information.
By using these structures efficiently, performance should increase. I am thinking of introducing a new object format for advanced developers who would like to make use of such structures. There are also some practial reasons to use such structures as it can take less time to code. Compare a pyramid with four POLYGONs (16 vertices altogether) against a single TRIANGLE_FAN with six vertices, for example.
There are some other structures. TRIANGLES is just a list of separated triangles, e.g. supplying (v0, v1, v2, v3, v4, v5) renders two triangles (three vertices are used to form one), QUADS is similarily a list of quadrilaterals, TRIANGLE_STRIP supplies three vertices for an initial triangle plus one more for every additional triangle, QUAD_STRIP uses four to define an initial quad plus two more for every additional quad, and TRIANGLE_FAN creates a series of adjacent triangles having one vertex in common.
Of course, existing objects don't make use of such structures. I have written an object optimizer which tries to create such structures automatically from POLYGONs. Similarity to the openBVE 1 object optimizer, the algorithm first eliminates all unused and duplicated vertices and material properties, leading to much lower storage requirements and a small performance boost due to shared materials where possible. Beyond that, the algorithm creates TRIANGLES and QUADS out of POLYGONS which have three or four vertices, respectively. Then it tries to join them into strips. Take a cube, for example. Using six faces with four vertices each, you end up with submitting 24 vertices to the rendering pipeline. By using a QUAD_STRIP structure for the side wall, you can reduce this to 18 overall; 10 for the side wall, and 8 for the isolated top and bottom quads in a single QUADS structure. However, this is not much saving actually, and a single QUADS structure with 24 vertices seems to be equally fast here. However, with larger structures, a performance benefit is noticable. With existing BVE objects, there is little chance of ending up with an optimal structure; objects have to be built with those structures in mind in order to fully exploit the potential. However, just joining the individual POLYGONs into QUADS boosts performance by grouping faces with same material properties, thus reducing state changes for the rendering pipeline.
Here is some data to compare:
| Object | Vertices | Faces | IM (not) | IM (opt) | DL (not) | DL (opt) |
| 3200.csv | 132 | 33 | 878 fps | 1157 fps | 7593 fps | 7886 fps |
| ElStaIsland.csv | 2520 | 550 | 62 fps | 78 fps | 4053 fps | 7656 fps |
| IM | Immediate mode |
| DL | Display list |
| not | Not optimized |
| opt | Optimized |
Not optimized immediate mode rendering is the current rendering paradigm in openBVE 1, while optimized display lists are the one of openBVE 2. Please note though that the above data is still very theoretical data. In a real route, there will be some things that slow down everything, including alpha channels, glow, the interface, etc., not to mention sound, and of course the actual simulation.
➟ Go back to March 2009.
➟ Proceed to June 2009.
|
|