A Pixel Format Guide (to the galaxy) – Update

A few weeks ago I introduced the Pixel Format Guide — a collection of documents and the accompanying pfg tool which aim to help people navigate the wilderness of pixel format definitions. In this post I will list the most exciting improvements that have been made since the original announcement.

New pixel format families

The core mission of the Pixel Format Guide is to become a comprehensive reference for pixel format definitions. Therefore, it’s no surprise that I have put a lot of effort into adding more pixel format families. At the time the previous post was written, the Pixel Format Guide supported 3 formats: Vulkan, OpenGL and Wayland-DRM. Since then, I have added the Cairo, DirectFB, Pixman, SDL2 and V4L2 pixel format families, bringing the total number of supported families to 8 and the total number of supported pixel format definitions to 459!

Bit indices

While working with packed pixel formats, I noticed that the ordering of the component bits is sometimes difficult to figure out. This happens especially when the bits of a component are split between multiple bytes, like, for example, in an RGB565 16-bit format:

Format:               SDL_PIXELFORMAT_RGB565
Described as:         Native 16-bit type
Native type:          M              L
                      RRRRRGGGGGGBBBBB
Memory little-endian: 0        1
                      M      L M      L
                      GGGBBBBB RRRRRGGG
Memory big-endian:    0        1
                      M      L M      L
                      RRRRRGGG GGGBBBBB

Each byte in memory holds 3 bits of the G component, but it’s not easy to tell exactly which bits are in each byte. To fix this, the latest version of the pfg tool introduces component bit indices. Every component bit is now accompanied by its index, making the bit order crystal clear:

Format:               SDL_PIXELFORMAT_RGB565
Described as:         Native 16-bit type
Native type:          M                              L
                      R₄R₃R₂R₁R₀G₅G₄G₃G₂G₁G₀B₄B₃B₂B₁B₀
Memory little-endian: 0                1
                      M              L M              L
                      G₂G₁G₀B₄B₃B₂B₁B₀ R₄R₃R₂R₁R₀G₅G₄G₃
Memory big-endian:    0                1
                      M              L M              L
                      R₄R₃R₂R₁R₀G₅G₄G₃ G₂G₁G₀B₄B₃B₂B₁B₀

If you prefer not to see the bit indices you can use the --hide-bit-indices flag.

Discovery of compatible pixel formats

The inspiration for the Pixel Format Guide was a series of frustrating experiences trying to manually match pixel formats with other, compatible pixel formats from different families. The latest version of the pfg tool finally includes support for automating such operations, in the form of the find-compatible command.

With the find-compatible command, discovering which OpenGL formats are compatible with the PIXMAN_b5g6r5 format is now as easy as:

$ python3 -m pfg find-compatible PIXMAN_b5g6r5 opengl
Format: PIXMAN_b5g6r5
Is compatible on all systems with:
        GL_RGB+GL_UNSIGNED_SHORT_5_6_5_REV
        GL_RGB_INTEGER+GL_UNSIGNED_SHORT_5_6_5_REV
Is compatible on little-endian systems with:
Is compatible on big-endian systems with

Similarly, to find out which SDL2 formats are compatible with the VK_FORMAT_R8G8B8A8_UNORM format, you can run:

$ python3 -m pfg find-compatible VK_FORMAT_R8G8B8A8_UNORM sdl2
Format: VK_FORMAT_R8G8B8A8_UNORM
Is compatible on all systems with:
        SDL_PIXELFORMAT_RGBA32
Is compatible on little-endian systems with:
        SDL_PIXELFORMAT_ABGR8888
Is compatible on big-endian systems with:
        SDL_PIXELFORMAT_RGBA8888

Listing supported pixel formats and families

The pfg tool now supports the list-formats and list-families commands. The former lists the supported pixel formats for the specified family, while the latter lists all the supported pixel format families. These commands can be very useful when writing scripts involving the pfg tool.

As an example, with the list-formats command you can find which OpenGL formats are compatible with each Cairo format by running:

$ for f in $(python3 -m pfg list-formats cairo); do python3 -m pfg find-compatible $f opengl; done

I hope you enjoy the improvements!

Once again, I would like to thank my employer, Collabora, for sponsoring my work on the Pixel Format Guide as an R&D project.

Advertisements

A Pixel Format Guide (to the galaxy)

I spend a lot of my time in various corners of the graphics domain of the FOSS world. In my time there I frequently have to deal with a variety of pixel formats definitions. Every graphics API and project has its own way of describing and naming pixel formats — the different flavors used by Vulkan, OpenGL, Mesa, Gallium, Wayland, DRM, GBM, Pixman, Mir and SDL are just a few of the beasts one can encounter in the graphics wilderness.

It could be my aging memory, but, for some reason, I never seem to be able to remember how to interpret all the different formats. “How are the components laid out in memory?”, “Does it matter if the system is little-endian or big-endian?” are some of the questions I often have to look up, with varying degrees of success.

It turns out that I am not the only one facing this issue. It’s not uncommon for developers to misinterpret pixel formats, often with entertaining and psychedelic effects. If you are lucky you may catch a glimpse of uncanny blue foxes running under alien red skies.

Despite the potential for entertainment, this problem is a constant cause of frustration for developers. I finally decided to do something about it — I have started the Pixel Format Guide.

The Pixel Format Guide consists of two components. The first is the guide itself — a collection of documents describing how to interpret the pixel format definitions of various APIs and projects.

The second component is the pfg tool which performs the interpretation of pixel formats automatically. Did you ever wonder how the GL_RGBA with GL_UNSIGNED_INT_2_10_10_10_REV OpenGL pixel format is laid out in memory?  Now it’s easy to figure it out:

$ python3 -m pfg describe GL_RGBA+GL_UNSIGNED_INT_2_10_10_10_REV
Format:               GL_RGBA+GL_UNSIGNED_INT_2_10_10_10_REV
Described as:         Native 32-bit type
Native type:          M                              L
                      AABBBBBBBBBBGGGGGGGGGGRRRRRRRRRR
Memory little-endian: 0        1        2        3
                      M      L M      L M      L M      L
                      RRRRRRRR GGGGGGRR BBBBGGGG AABBBBBB
Memory big-endian:    0        1        2        3
                      M      L M      L M      L M      L
                      AABBBBBB BBBBGGGG GGGGGGRR RRRRRRRR

How about the WL_DRM_FORMAT_ARGB8888 Wayland-DRM format? Again, it’s easy:

$ python3 -m pfg describe WL_DRM_FORMAT_ARGB8888
Format: WL_DRM_FORMAT_ARGB8888
Described as: Bytes in memory
Memory little-endian: 0        1        2        3
                      M      L M      L M      L M      L
                      BBBBBBBB GGGGGGGG RRRRRRRR AAAAAAAA
Memory big-endian:    0        1        2        3
                      M      L M      L M      L M      L
                      BBBBBBBB GGGGGGGG RRRRRRRR AAAAAAAA

The Pixel Format Guide is a work in progress. It currently supports many Vulkan, OpenGL and Wayland(-DRM) formats, and it’s constantly growing. This project was conceived as a collaborative endeavor, not a one-person effort (but I’ll do my best), so, if you are familiar with a pixel format family, please consider contributing to the guide and the python tool!

Before signing off I would like to thank my employer, Collabora, for sponsoring my work on the Pixel Format Guide as an R&D project.

vkmark: more than a Vulkan benchmark

Ever since Vulkan was announced a few years ago, the idea of creating a Vulkan benchmarking tool in the spirit of glmark2 had been floating in my mind. Recently, thanks to my employer, Collabora, this idea has materialized! The result is the vkmark Vulkan benchmark, hosted on github:

https://github.com/vkmark/vkmark

Like its glmark2 sibling project, vkmark’s goals are different from the goals of big, monolithic and usually proprietary benchmarks. Instead of providing a single, complex benchmark, vkmark aims to provide an extensible suite of targeted, configurable benchmarking scenes. Most scenes exercise specific Vulkan features or usage patterns (e.g., desktop 2.5D scenarios), although we are also happy to have more complex, visually intriguing scenes.

Benchmarking scenes can be configured with options that affect various aspects of their rendering. We hope that the ease with which developers can use different options will make it painless to perform targeted tests and eventually provide best practices advice.

A few years ago we were pleasantly surprised to learn that developers were using glmark2 as a testing tool for driver development, especially in free (as in freedom) software projects. This is a goal that we want to actively pursue for vkmark, too. The flexible benchmarking approach is a natural fit for this kind of development; the developer can start with getting the simple scenes working and then, as the driver matures, move to scenes that use more advanced features. vkmark has already proved useful in this regard, being an valuable testing aid for my own experiments in the Mesa Vulkan WSI implementation.

With vkmark we also want to be on the cutting edge of software development practices and tools. vkmark is a modern, C++14 codebase, using the vulkan-hpp bindings, the Meson build system and the Catch test framework. To ensure a high quality codebase, the core of vkmark is developed using test-driven development.

It is still early days, but vkmark already has support for X11, Wayland and DRM/KMS, and provides two simple scenes: a “clear” scene, and a “cube” scene that renders a simple colored cube based on the vkcube example (which is itself based on kmscube). The future looks bright!

We are looking forward to getting more feedback on vkmark and, of course, contributions are always welcome!

C vs C++11: C++ goes to eleven!

One of the top web results when searching for “C vs C++” is Jakob Østergaard’s article of the same name. In his article, Jakob presents the challenge of writing a program that counts the unique words in a text file, and tries out various versions he got or created himself. Although Jakob’s text can’t really be considered a comprehensive comparison of C vs C++, it does provide some insight into how powerful C++ can be “out of the box”.

The original C++ implementation given by Jakob is:

#include <set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
{
    // Declare and Initialize some variables
    std::string word;
    std::set<std::string> wordcount;
    // Read words and insert in rb-tree
    while (std::cin >> word) wordcount.insert(word);
    // Print the result
    std::cout << "Words: " << wordcount.size() << std::endl;
    return 0;
}

Unfortunately, the concise and highly readable solution presented above leaves a lot to be desired on the performance front. So, I set out to improve it, trying to also take advantage of any relevant C++11 features. My updated C++11 version is:

#include <unordered_set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
{
    // Declare and Initialize some variables
    std::string word;
    std::unordered_set<std::string> wordcount;
    std::ios_base::sync_with_stdio(false);
    // Read words and insert in set
    while (std::cin >> word) wordcount.insert(std::move(word));
    // Print the result
    std::cout << "Words: " << wordcount.size() << std::endl;
    return 0;
}

There are three changes in the new code. The first change is using the new C++11 std::unordered_set container instead of std::set. Internally, unordered_set uses a hash table instead of balanced tree, losing support for item ordering, but gaining significantly in average performance.

The second change is actually an old C++ option, not particular to C++11: disabling stdio synchronization. This is a big performance booster for intensive I/O. It is highly recommended to turn synchronization off, unless you really, really need to use the C and C++ standard streams at the same time.

The third change is explicitly taking advantage of C++11 move semantics (std::move()). In my benchmarks the change didn’t have a noticable impact, perhaps because the compiler was eliding the copy anyway, or because the strings were small enough that a copy and a move weren’t significantly different in performance.

To test the different versions, I created a series of word files containing 4 million words each, each one consisting of a different number of unique words. The tested versions include all the versions from Jakob’s article, plus the new cpp4, c2, and python versions.

Name Description SLOC
cpp1 Original C++ version 11
cpp1-fixed “Fixed” C++ version (using scanf) 12
cpp2 C++ version of c1 100
cpp3 Jakob’s Ego-booster 83
cpp4 C++11 version 12
c1 C hash 71
c2 Glib hash 73
py Python 5

Here are the run time results:

Here are the results for the maximum RSS:

The updated C++11 version (cpp4) is about 5 times (!) faster than the original, partly because of using unordered_map, and partly because of not synchronizing with stdio. The memory usage has decreased by a decent amount, too! For lower numbers of unique words the performance results are somewhat mixed, but, as the number of unique words grows, the C++11 and Glib versions become clear winners. C++ goes to 11, indeed!

Based on the results above, here are some tips:

  1. Rolling your own implementation is probably not worth it.
  2. In C++11, when you don’t need item ordering, you are probably better off using the unordered variants of the containers (but don’t forget to benchmark).
  3. If you use standard streams, and don’t need to be in sync with stdio streams, be sure to turn synchronization off. If you need to be in sync, try hard to stop needing it!
  4. If you just want to quickly create something having decent performance, consider using python.

You can find the code and scripts used for benchmarking here. To create the sample text files (‘make texts’) you need to have an extracted copy of scowl in the project directory.

Changing gdm/lightdm user login settings programmatically

Recently, as part of the automated testing efforts in Linaro, I needed to programmatically change the default X session for a user. It used to be the case that editing ~/.dmrc was enough to achieve this. Display managers (DMs), such as gdm and lightdm, would read the contents of the ~/.dmrc at login time to decide which language and X session to use (among other settings). When a user changed a setting through the GUI, the DM would write the new settings to ~/.dmrc (only after successfully logging in, of course).

However, all of this changed with the introduction of AccountsService. As the name suggests, AccountsService provides a service for the management of user accounts, and among other things, some of the login settings that were previously configured in ~/.dmrc. It offers this functionality through the org.freedesktop.Accounts DBus service on the system bus.

Modern Display Managers use AccountsService to manipulate user  login settings, falling back to ~/.dmrc only when the service is unavailable (at least in the case of lightdm). What this means for the end-user is that editing ~/.dmrc manually has no effect. For example, you can try changing the Session field in ~/.dmrc all you want, but next time you log in you will end up in the same session and your ~/.dmrc changes will have been overwritten.

In order to actually change any settings we need to communicate with AccountsService through DBus. The first step is to find out the object that corresponds to the user we want to change the settings for. The path of this object is of the form /org/freedesktop/Accounts/. <USER> is usually of the form User<UID> but there is no guarantee of that.  Thankfully, the /org/freedesktop/Accounts object provides the org.freedesktop.Accounts.FindUserByName and org.freedesktop.Accounts.FindUserById methods, which we can use to get the object path for a user.

Having the user object path, we can use the org.freedesktop.Accounts.User.* methods on the user object to change the required settings.

We can use the dbus-send program to perform the operations mentioned above. For example:

$ USER_PATH=$(dbus-send --print-reply=literal --system --dest=org.freedesktop.Accounts /org/freedesktop/Accounts org.freedesktop.Accounts.FindUserById int64:1000)
$ dbus-send --print-reply --system --dest=org.freedesktop.Accounts $USER_PATH org.freedesktop.Accounts.User.SetXSession string:’xterm’

As I needed to get and set the X session for a user in a user-friendly manner,  I decided to create a small python program instead. You can find the program here: user_xsession.py

You can use user_xsession.py like:

$ ./user_xsession.py [--user-id <ID>|--user-name <NAME>] set <SESSION>
$ ./user_xsession.py [--user-id <ID>|--user-name <NAME>] get

where <SESSION> is one of the sessions available in /usr/share/xsessions/ . Note that you may need to run as root, depending on the account you want to change the settings for.

For example:

$ ./user_xsession.py --user-id 1000 set xterm

You can easily tweak the program to change another setting instead of the X session.

Update 2012-08-10: Fixed problem with wordpress converting -- (double-dash) to – (en-dash) in code snippets.

glmark2: more than a benchmark

Genesis

Almost 1.5 year ago, we (at Linaro) set out on a mission to provide consolidation and optimization of GNU/Linux for ARM hardware. Soon after, the Linaro Graphics Working Group was formed to focus on the graphics and user interface parts of the stack. Like all other groups within Linaro, the Graphics WG strived (and still does, of course!) to provide palpable and measurable improvements. One of the tools we needed to ensure this goal, and we found was missing, was a Free Software OpenGL ES 2.0 benchmark.

Why did we even care about this when there surely are professional, proprietary alternatives used in the industry? The answer is simple: we couldn’t imagine doing this any other way.  Linaro, both as an organization and as individuals, strongly believe that Free Software is good for society. Even if we didn’t believe in the ethics of Free Software, using a proprietary solution would have been the wrong choice from a practical point of view. Many of our goals, which reach beyond plain benchmarking, would be very difficult to achieve with a proprietary solution. We wanted a tool that was freely (in every sense) available to all, so that it would provide a common reference point for all developers and users that didn’t have access to the proprietary tools.

Instead of starting completely from scratch, we leveraged an existing GPL licensed desktop GL benchmark, called glmark, and ported it to support OpenGL ES 2.0. We decided to call the new benchmark glmark2. Although OpenGL ES 2.0 was the primary goal for us (this API is prevalent in the ARM world), we continued to treat desktop OpenGL as a first class citizen. This mindset eventually led to what we call the “subset approach”: using only the common subset of desktop OpenGL 2.1 and OpenGL ES 2.0 APIs to produce a single, easily maintainable code base, working happily with both versions.

Goals

After the initial porting to OpenGL ES 2.0 was done, and as we continued to work on new features, a set of goals for glmark2 began to crystallize in our minds. These goals transcended the limits of plain benchmarking, and can be summarized as: flexible benchmarking, best practices, validation and educating new developers.

Flexible benchmarking

The primary function of glmark2 is, of course, to provide a comprehensive benchmarking suite. What differentiates glmark2 from other tools is the unique flexibility it delivers. Most existing benchmarking tools just provide the option to run benchmarks from a predefined fixed set. For glmark2, however, we decided that we didn’t want to force our own selections on users. In this spirit, glmark2 offers a suite of scenes that can be used to evaluate many aspects of OpenGL (ES) 2.0 performance. The way in which each scene is rendered is configurable through a set of scene-specific options, that range from the simple, like selecting the texturing mode for the texturing scene, to the complex, like specifying the convolution matrix for the GPU convolution scene. A benchmark is just a scene instantiated with specific options.

For the casual user, who just wants to get an overview of the graphics stack’s performance, glmark2 comes with a predefined set of default benchmarks. For users that need to explore a particular aspect in more depth, we have made it trivial to specify and execute a custom set of benchmarks.

Regarding the actual benchmark content, we draw inspiration from typical applications that use OpenGL, like games, modern user interfaces and our own experience about important features. We have given glmark2 a focus on fundamental techniques used in 3D and 2.5D graphics, so most scenes are relatively simple, but we don’t shy way from other kinds of benchmarks. We already have low-level benchmarks for specific shader features, and we are planning to add high-level benchmarks involving more complex and visually intriguing scenes in the future.

Best practices

The flexibility offered by our option-driven benchmarking approach lends itself naturally to another one of our goals: answering developer questions and providing best practices. “Should I use X or Y to get the best performance/quality/of both worlds on this class of hardware?” is a common form of question among developers. For example, we have implemented a benchmark to test how different methods of uploading data to the GPU (glBufferData vs glMapBuffer, interleaved buffers vs separate buffers etc) affect performance. We hope that the ease with which developers can use different options will make it painless to perform targeted tests and eventually provide best practices advice.

Validation

Besides measuring the graphics performance, we also care about output quality. That is, we want to validate the correctness of the graphics stack.  Of course, we don’t want to perform validation manually, by having someone looking at pictures. We want the process to be automatic, ideally as part of our continuous integration efforts.

To handle validation in glmark2 we added a special mode in which we just draw the first frame of each benchmark and fuzzily compare some pixel values against expected reference values. We rely on the 3D pipeline being deterministic, so, if a single pixel is correct, chances are that all pixels are correct. Is this a 100% robust validation solution? No, but it is more than enough for our needs; it’s not our aim to provide a conformance suite.

Educating new developers

The last (but not least) goal we have for glmark2 is a surprising but important one: educating new developers.  We found that one of the main issues developers have when trying to move to modern, programmable 3D APIs, and in particular OpenGL ES 2.0, is the lack of concrete information on how to work with the new APIs, like EGL, and, also, how to apply fundamental 3D techniques that were straightforward before, e.g., lighting. Due to our focus on benchmarks for fundamental techniques, we are actually providing clear examples of how to achieve useful results. We make a special effort to ensure that both the C++ and the shader code are understandable, including comments explaining why and how we are doing things. Developers can use the glmark2 code base as a  launchpad to explore the wonders of modern 3D graphics.

Our journey with glmark2 has been very exciting so far, and the future looks brighter than ever! We are constantly working on new features, and the recent addition of support for Android has made glmark2 one of the most versatile Free Software 3D benchmarking tools available. You can learn more about what we are planning by visiting our blueprints page.

What are you waiting for? Grab glmark2 and start exploring!