Moved to

I have decided to move this blog to my own domain, and also use a static site generator for the new site. The new site, at, contains copies of all posts from this blog, and will host all my future online activity. Enjoy!


Running Chromium with Ozone-GBM on a GNU/Linux desktop system

Ozone is Chromium’s next-gen platform abstraction layer for graphics and input.  When developing either Ozone itself or an application that uses Ozone, it is often beneficial to be able to run the code on the development machine, which is usually a typical GNU/Linux desktop system, since doing so speeds up the development cycle.

The X11 backend for Ozone works without much trouble on a Linux desktop system. However, getting the DRM/GBM backend to run on such a system, which I recently needed to do as part of my work at Collabora, turned out to be significantly less straightforward. In this guide I will describe all the steps that are required to run Chromium with Ozone-GBM on a typical GNU/Linux desktop system.

Building Chromium

The Chromium developer documentation provides detailed build instructions for Linux. For this guide, we have to ensure that we enable Ozone and that the target OS for the build is “chromeos”:

$ gn gen out/OzoneChromeOS
$ gn args --args='use_ozone=true target_os="chromeos"' out/OzoneChromeOS
$ ninja -C out/OzoneChromeOS chrome

Building a functional minigbm

Ozone-GBM uses the GBM API to create buffers. However, it doesn’t use Mesa’s GBM implementation, but ships its own in the form of the minigbm library. The Chromium source code contains a copy of the library under third_party, but uses it only for building and testing purposes without enabling any of the minigbm hardware drivers.

In order to run Ozone-GBM on real hardware we need to create a build of minigbm that supports our target GPU. For the purposes of this guide, the simplest way to provide a functional minigbm is to build it independently and provide it at runtime to Chromium using LD_LIBRARY_PATH.

First we need to get the minigbm source code with:

$ git clone

minigbm depends on libdrm, so we have to ensure that we have the development files for the libdrm library and the vendor specific extensions. On a Debian/Ubuntu system we can get everything we need by installing the libdrm-dev package:

$ sudo apt install libdrm-dev

We can now build minigbm with the correct flags to ensure the proper GPU driver is supported:

$ make CPPFLAGS="-DDRV_I915" DRV_I915=1

Note that we need to provide the driver flag both as a preprocessor definition and a Make variable. Other driver flags for common desktop GPUs are DRV_RADEON and DRV_AMDGPU (but see below for amdgpu).

Finally we need to create a link with the proper file name so that chrome can find the library:

$ ln -s

Building minigbm with amdgpu support

The amdgpu driver for minigbm depends on the elusive amdgpuaddr library. This library is part of mesa, but it’s not installed, and thus not provided in any package on most distributions.

To get it we need to build Mesa ourselves and extract it from the built objects. An easy way to get all the dependencies and build Mesa with all the required flags is to use the distribution’s build method. On a Debian/Ubuntu system this translates to a command sequence like:

$ sudo apt build-dep mesa
$ apt source mesa
$ cd [mesa-dir] && DEB_BUILD_OPTIONS="parallel=$(nproc)" debian/rules build

After the build is done, we have to copy (and rename) the addrlib library and the required headers to the minigbm directory:

$ cp [mesa-dir]/build/src/amd/addrlib/.libs/libamdgpu_addrlib.a [minigbm-dir]/libamdgpuaddr.a
$ cp [mesa-dir]/src/amd/addrlib/*.h [minigbm-dir]

Finally, we are able build minigbm with amdgpu support with:

$ ln -s

Running Chromium

We are almost there, but one last detail remains. If we try to run Chromium we will get an error message informing us that EGL initialization failed. The reason is that, when using Ozone-GBM, the EGLDisplay is created using EGL_DEFAULT_DISPLAY as the native display argument, under the assumption that the EGL implementation knows to interpret this as a request for a so-called surfaceless platform. However, Mesa doesn’t recognize this hint. We need to explicitly tell Mesa to use the surfaceless platform with the EGL_PLATFORM environment variable.

The following command line brings all the pieces of the puzzle together,  allowing us to run Chromium with Ozone-GBM on a typical GNU/Linux desktop:

$ sudo LD_LIBRARY_PATH=[minigbm-dir] EGL_PLATFORM=surfaceless out/OzoneChromeOS/chrome --ozone-platform=gbm --no-sandbox --mash

[Update March 2018]

To run Chromium on a typical GNU/Linux desktop we must now use the following command:

$ sudo LD_LIBRARY_PATH=[minigbm-dir] EGL_PLATFORM=surfaceless out/OzoneChromeOS/chrome --ozone-platform=gbm --no-sandbox --force-system-compositor-mode


Converting emails from top-posted to bottom-posted

I have been an adherent of bottom-posting, or, to be more precise, of interleaved-posting for as long as I can remember. I don’t recall why I selected this posting style when first I started sending emails — I guess it just felt more intuitive and natural.

These days I find top-posted messages so annoying to deal with, that I manually convert them to bottom-posted before replying. Unfortunately, many modern email clients and web interfaces promote a top-posting style, so I end up performing such conversions more often than I would like. A few weeks ago I received an 8 level deep top-posted email and decided that it is time for me to welcome our automated email conversion overlords. The result is the top2bottom command-line tool.

There are many forms of top-posting, but the one I have to deal with the most, and therefore the one that the top2bottom tool handles, looks like:

2017-10-28 B wrote:
> bbbbb
> bbbbb
> 2017-10-28 A wrote:
>> aaaaaa
>> aaaaaa

The number of quote-prefix markers (>) in the beginning of each line indicates the quote-depth of that line.

This form consists of two kinds of line blocks:

  • attribution blocks, denoted by aD, where D is the quote-depth of the block
  • body blocks, denoted by bD, where D is the quote-depth of the block

In a top-posted message the blocks are interleaved:

b0, a0, b1, a1, b2, a2, …

To convert such a message to bottom-posted we need to move all attribution blocks to the top in quote-depth order, followed by all body blocks in reverse quote-depth order:

a0, a1, a2, …, b2, b1, b0

The block reordering operation can be easily implemented by sorting with an appropriate comparison function. The more interesting and non-trivial part is figuring out which lines belong to which block. To do this, we first need to define some terms:

  • An empty line is a line that has no message content, although it may have one or more quote-prefix markers (>), i.e., a non-zero quote-depth.
  • A paragraph is a sequence of non-empty, consecutive lines of the same quote-depth.

Now we can provide precise definitions for attribution and body blocks:

  • An attribution block of quote-depth D is the sequence of all consecutive lines of quote-depth D which start from the beginning of the last paragraph before a quote-depth increase from D to D+1, and end at the last line before the quote-depth increase.
  • A body block of quote-depth D is the sequence of all lines (not necessarily consecutive) of quote-depth D that do not belong to an attribution block.

Based on these definitions, I experimented with a number of  classification algorithms. For my first attempt I used a chain of functional operations to process the message, but I found it unintuitive to express the intricacies of block detection in this manner. My next attempt was based on a finite state machine and performed a single pass of the message from bottom to top. The finite state machine approach worked, but it was more complex and difficult to reason about than I would like. Finally, I developed an algorithm that I found to be both simpler and more obviously correct than the alternatives.

The algorithm works by processing lines from top to bottom, marking them as body lines of their respective quote-depth. When a quote-depth increase is detected, and before processing the line with the increased quote-depth, the algorithm backtracks until it reaches the beginning of a paragraph, marking all lines it visits as attribution lines. It then continues processing from the point it stopped before backtracking.

The empty line (if any) just before an attribution block needs special care. It is an artifact of top-posting and serves to separate the preceding message body from the attribution lines. In top2bottom I opted to repurpose this line as an extra space after a quote-depth change in the bottom-posted output. This works well because top-posted replies usually don’t start with an empty line, so adding this line makes the bottom-posted version look better. In algorithmic terms, this line is marked as belonging to a special before-body block (denoted by bD), which is placed just before the bD block in the final output.

Here is the algorithm in action:


The algorithm works well if the input is top-posted, but what happens if the input has some other form? It turns out that the algorithm works well for two additional classes of messages that are not strictly top-posted and are often encountered in the wild. The first class consists of messages that end with a cascade of empty lines of decreasing quote-depth, which is a result of (inadvertently) adding empty lines to the end of a message when replying. The example used above to show the algorithm in action actually belongs to this class. The second class comprises messages that started as (strictly) bottom-posted, but for which the replies changed to top-posted at some point.

There is, however, a third class of messages that is also frequently seen, but the  algorithm I described so far cannot handle without some enhancements. This class is composed of messages that started as interleaved-posted and then turned to top-posted. To handle this class we need a way to detect the section of the message that is interleaved-posted and ensure that we don’t reorder its parts, but rather treat it as an indivisible whole.

Fortunately, there is an easy way to decide which lines of the message belong to the interleaved section. The quote-depth curve of interleaved messages has a characteristic pattern of multiple peaks (lines with the highest quote-depth locally) and valleys (lines with lower quote-depths between successive peaks). In a deeply interleaved message the base quote-depth of the interleaved section is the quote-depth of the lowest valley. All lines with quote-depth equal to or higher than the base quote-depth belong to the interleaved section of the message. In the following example, the lowest (and only) valley has a quote-depth of 2, so we treat the marked part of the message, which consists of all lines with quote-depth 2 or more, as indivisible:

2017-10-28 C wrote:      0
> ccccc                    1
>                          1
> 2017-10-28 B wrote:      1      ___
>> 2017-10-28 A wrote:       2     |
>>> aaaaa                      3   |
>>> aaaaa                      3   |
>>                           2     |
>> bbbbb                     2 ----|
>>                           2     |
>>> aaaaa                      3   |
>>                           2     |
>> bbbbb                     2    _|_
>                          1

The latest version of top2bottom implements the aforementioned enhancement, and, with it, can handle almost all of the top-posted messages of this form that I have encountered. The few messages which top2bottom refuses to convert are either badly formatted or use a different form of top-posting.

I have been using top2bottom for a few weeks now and I am very pleased with it. I am using Vim for composing and replying to emails, so my preferred way to convert emails is by using the Vim filter command: :%!top2bottom. I hope you find top2bottom useful!

A closing note: while doing some research for this post I came across another, much older program that also performs top to bottom email conversion, and which, unsurprisingly, is also named top2bottom. The code for it was posted in an old Red Hat list post. I have tried it on a few sample top-posted emails, but it doesn’t seem to work well, at least not for the kinds of emails that I get. YMMV.

A Pixel Format Guide (to the galaxy) – Update

A few weeks ago I introduced the Pixel Format Guide — a collection of documents and the accompanying pfg tool which aim to help people navigate the wilderness of pixel format definitions. In this post I will list the most exciting improvements that have been made since the original announcement.

New pixel format families

The core mission of the Pixel Format Guide is to become a comprehensive reference for pixel format definitions. Therefore, it’s no surprise that I have put a lot of effort into adding more pixel format families. At the time the previous post was written, the Pixel Format Guide supported 3 formats: Vulkan, OpenGL and Wayland-DRM. Since then, I have added the Cairo, DirectFB, Pixman, SDL2 and V4L2 pixel format families, bringing the total number of supported families to 8 and the total number of supported pixel format definitions to 459!

Bit indices

While working with packed pixel formats, I noticed that the ordering of the component bits is sometimes difficult to figure out. This happens especially when the bits of a component are split between multiple bytes, like, for example, in an RGB565 16-bit format:

Format:               SDL_PIXELFORMAT_RGB565
Described as:         Native 16-bit type
Native type:          M              L
Memory little-endian: 0        1
                      M      L M      L
                      GGGBBBBB RRRRRGGG
Memory big-endian:    0        1
                      M      L M      L
                      RRRRRGGG GGGBBBBB

Each byte in memory holds 3 bits of the G component, but it’s not easy to tell exactly which bits are in each byte. To fix this, the latest version of the pfg tool introduces component bit indices. Every component bit is now accompanied by its index, making the bit order crystal clear:

Format:               SDL_PIXELFORMAT_RGB565
Described as:         Native 16-bit type
Native type:          M                              L
Memory little-endian: 0                1
                      M              L M              L
                      G₂G₁G₀B₄B₃B₂B₁B₀ R₄R₃R₂R₁R₀G₅G₄G₃
Memory big-endian:    0                1
                      M              L M              L
                      R₄R₃R₂R₁R₀G₅G₄G₃ G₂G₁G₀B₄B₃B₂B₁B₀

If you prefer not to see the bit indices you can use the --hide-bit-indices flag.

Discovery of compatible pixel formats

The inspiration for the Pixel Format Guide was a series of frustrating experiences trying to manually match pixel formats with other, compatible pixel formats from different families. The latest version of the pfg tool finally includes support for automating such operations, in the form of the find-compatible command.

With the find-compatible command, discovering which OpenGL formats are compatible with the PIXMAN_b5g6r5 format is now as easy as:

$ python3 -m pfg find-compatible PIXMAN_b5g6r5 opengl
Format: PIXMAN_b5g6r5
Is compatible on all systems with:
Is compatible on little-endian systems with:
Is compatible on big-endian systems with

Similarly, to find out which SDL2 formats are compatible with the VK_FORMAT_R8G8B8A8_UNORM format, you can run:

$ python3 -m pfg find-compatible VK_FORMAT_R8G8B8A8_UNORM sdl2
Is compatible on all systems with:
Is compatible on little-endian systems with:
Is compatible on big-endian systems with:

Listing supported pixel formats and families

The pfg tool now supports the list-formats and list-families commands. The former lists the supported pixel formats for the specified family, while the latter lists all the supported pixel format families. These commands can be very useful when writing scripts involving the pfg tool.

As an example, with the list-formats command you can find which OpenGL formats are compatible with each Cairo format by running:

$ for f in $(python3 -m pfg list-formats cairo); do python3 -m pfg find-compatible $f opengl; done

I hope you enjoy the improvements!

Once again, I would like to thank my employer, Collabora, for sponsoring my work on the Pixel Format Guide as an R&D project.

A Pixel Format Guide (to the galaxy)

I spend a lot of my time in various corners of the graphics domain of the FOSS world. In my time there I frequently have to deal with a variety of pixel formats definitions. Every graphics API and project has its own way of describing and naming pixel formats — the different flavors used by Vulkan, OpenGL, Mesa, Gallium, Wayland, DRM, GBM, Pixman, Mir and SDL are just a few of the beasts one can encounter in the graphics wilderness.

It could be my aging memory, but, for some reason, I never seem to be able to remember how to interpret all the different formats. “How are the components laid out in memory?”, “Does it matter if the system is little-endian or big-endian?” are some of the questions I often have to look up, with varying degrees of success.

It turns out that I am not the only one facing this issue. It’s not uncommon for developers to misinterpret pixel formats, often with entertaining and psychedelic effects. If you are lucky you may catch a glimpse of uncanny blue foxes running under alien red skies.

Despite the potential for entertainment, this problem is a constant cause of frustration for developers. I finally decided to do something about it — I have started the Pixel Format Guide.

The Pixel Format Guide consists of two components. The first is the guide itself — a collection of documents describing how to interpret the pixel format definitions of various APIs and projects.

The second component is the pfg tool which performs the interpretation of pixel formats automatically. Did you ever wonder how the GL_RGBA with GL_UNSIGNED_INT_2_10_10_10_REV OpenGL pixel format is laid out in memory?  Now it’s easy to figure it out:

$ python3 -m pfg describe GL_RGBA+GL_UNSIGNED_INT_2_10_10_10_REV
Format:               GL_RGBA+GL_UNSIGNED_INT_2_10_10_10_REV
Described as:         Native 32-bit type
Native type:          M                              L
Memory little-endian: 0        1        2        3
                      M      L M      L M      L M      L
Memory big-endian:    0        1        2        3
                      M      L M      L M      L M      L

How about the WL_DRM_FORMAT_ARGB8888 Wayland-DRM format? Again, it’s easy:

$ python3 -m pfg describe WL_DRM_FORMAT_ARGB8888
Described as: Bytes in memory
Memory little-endian: 0        1        2        3
                      M      L M      L M      L M      L
Memory big-endian:    0        1        2        3
                      M      L M      L M      L M      L

The Pixel Format Guide is a work in progress. It currently supports many Vulkan, OpenGL and Wayland(-DRM) formats, and it’s constantly growing. This project was conceived as a collaborative endeavor, not a one-person effort (but I’ll do my best), so, if you are familiar with a pixel format family, please consider contributing to the guide and the python tool!

Before signing off I would like to thank my employer, Collabora, for sponsoring my work on the Pixel Format Guide as an R&D project.

vkmark: more than a Vulkan benchmark

Ever since Vulkan was announced a few years ago, the idea of creating a Vulkan benchmarking tool in the spirit of glmark2 had been floating in my mind. Recently, thanks to my employer, Collabora, this idea has materialized! The result is the vkmark Vulkan benchmark, hosted on github:

Like its glmark2 sibling project, vkmark’s goals are different from the goals of big, monolithic and usually proprietary benchmarks. Instead of providing a single, complex benchmark, vkmark aims to provide an extensible suite of targeted, configurable benchmarking scenes. Most scenes exercise specific Vulkan features or usage patterns (e.g., desktop 2.5D scenarios), although we are also happy to have more complex, visually intriguing scenes.

Benchmarking scenes can be configured with options that affect various aspects of their rendering. We hope that the ease with which developers can use different options will make it painless to perform targeted tests and eventually provide best practices advice.

A few years ago we were pleasantly surprised to learn that developers were using glmark2 as a testing tool for driver development, especially in free (as in freedom) software projects. This is a goal that we want to actively pursue for vkmark, too. The flexible benchmarking approach is a natural fit for this kind of development; the developer can start with getting the simple scenes working and then, as the driver matures, move to scenes that use more advanced features. vkmark has already proved useful in this regard, being an valuable testing aid for my own experiments in the Mesa Vulkan WSI implementation.

With vkmark we also want to be on the cutting edge of software development practices and tools. vkmark is a modern, C++14 codebase, using the vulkan-hpp bindings, the Meson build system and the Catch test framework. To ensure a high quality codebase, the core of vkmark is developed using test-driven development.

It is still early days, but vkmark already has support for X11, Wayland and DRM/KMS, and provides two simple scenes: a “clear” scene, and a “cube” scene that renders a simple colored cube based on the vkcube example (which is itself based on kmscube). The future looks bright!

We are looking forward to getting more feedback on vkmark and, of course, contributions are always welcome!

C vs C++11: C++ goes to eleven!

One of the top web results when searching for “C vs C++” is Jakob Østergaard’s article of the same name. In his article, Jakob presents the challenge of writing a program that counts the unique words in a text file, and tries out various versions he got or created himself. Although Jakob’s text can’t really be considered a comprehensive comparison of C vs C++, it does provide some insight into how powerful C++ can be “out of the box”.

The original C++ implementation given by Jakob is:

#include <set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
    // Declare and Initialize some variables
    std::string word;
    std::set<std::string> wordcount;
    // Read words and insert in rb-tree
    while (std::cin >> word) wordcount.insert(word);
    // Print the result
    std::cout << "Words: " << wordcount.size() << std::endl;
    return 0;

Unfortunately, the concise and highly readable solution presented above leaves a lot to be desired on the performance front. So, I set out to improve it, trying to also take advantage of any relevant C++11 features. My updated C++11 version is:

#include <unordered_set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
    // Declare and Initialize some variables
    std::string word;
    std::unordered_set<std::string> wordcount;
    // Read words and insert in set
    while (std::cin >> word) wordcount.insert(std::move(word));
    // Print the result
    std::cout << "Words: " << wordcount.size() << std::endl;
    return 0;

There are three changes in the new code. The first change is using the new C++11 std::unordered_set container instead of std::set. Internally, unordered_set uses a hash table instead of balanced tree, losing support for item ordering, but gaining significantly in average performance.

The second change is actually an old C++ option, not particular to C++11: disabling stdio synchronization. This is a big performance booster for intensive I/O. It is highly recommended to turn synchronization off, unless you really, really need to use the C and C++ standard streams at the same time.

The third change is explicitly taking advantage of C++11 move semantics (std::move()). In my benchmarks the change didn’t have a noticable impact, perhaps because the compiler was eliding the copy anyway, or because the strings were small enough that a copy and a move weren’t significantly different in performance.

To test the different versions, I created a series of word files containing 4 million words each, each one consisting of a different number of unique words. The tested versions include all the versions from Jakob’s article, plus the new cpp4, c2, and python versions.

Name Description SLOC
cpp1 Original C++ version 11
cpp1-fixed “Fixed” C++ version (using scanf) 12
cpp2 C++ version of c1 100
cpp3 Jakob’s Ego-booster 83
cpp4 C++11 version 12
c1 C hash 71
c2 Glib hash 73
py Python 5

Here are the run time results:

Here are the results for the maximum RSS:

The updated C++11 version (cpp4) is about 5 times (!) faster than the original, partly because of using unordered_set, and partly because of not synchronizing with stdio. The memory usage has decreased by a decent amount, too! For lower numbers of unique words the performance results are somewhat mixed, but, as the number of unique words grows, the C++11 and Glib versions become clear winners. C++ goes to 11, indeed!

Based on the results above, here are some tips:

  1. Rolling your own implementation is probably not worth it.
  2. In C++11, when you don’t need item ordering, you are probably better off using the unordered variants of the containers (but don’t forget to benchmark).
  3. If you use standard streams, and don’t need to be in sync with stdio streams, be sure to turn synchronization off. If you need to be in sync, try hard to stop needing it!
  4. If you just want to quickly create something having decent performance, consider using python.

You can find the code and scripts used for benchmarking here. To create the sample text files (‘make texts’) you need to have an extracted copy of scowl in the project directory.