GTK: graphics offload revisited

Thom Holwerda 2024-04-18 GTK+ 5 Comments

We first introduced support for dmabufs and graphics offload last fall, and it is included in GTK 4.14. Since then, some improvements have happened, so it is time for an update.
↫ GTK Development Blog

This one’s for the ones smarter than me.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

5 Comments

2024-04-19 8:11 am
sukru
This is interesting. But also shows GNOME is really starting to catch up with modern graphical systems. (Not sure where KDE is on this though).
Basically, you have two approaches when working with pixels in applications:
1. Have a concrete bitmap as an array, and pass that around (even as a pointer)
2. Have an abstract pixel “buffer” which is defined as more or less a mathematical formula, and pass that instead
The second one has the advantage of not needing to be materialized, and can be processed in other compute units (like the GPU). In fact, it might never be fully realized on the CPU if, for example, we read the buffer from an MPEG stream from SSD, decompress with hardware offload, ask the GPU to process effects (say sharpening, or black and white), and pass it directly to the screen output.
For reference, the mac version is roughly this one (CVBuffer, CVPixelBuffer, …):
https://developer.apple.com/documentation/corevideo/cvbuffer-nfm
And Windows seems to be (IBuffer, PixelBuffer):
https://learn.microsoft.com/en-us/uwp/api/windows.ui.xaml.media.imaging.writeablebitmap.pixelbuffer?view=winrt-22621

2024-04-19 8:30 am
Alfman verbose=1
sukru,
Basically, you have two approaches when working with pixels in applications:
1. Have a concrete bitmap as an array, and pass that around (even as a pointer)
2. Have an abstract pixel “buffer” which is defined as more or less a mathematical formula, and pass that instead
I don’t follow your distinction here. A pixel array and a pixel buffer is the exact same thing under the hood, it can even be cast from one to the other. Aren’t they the same solution accomplished with different syntactic sugar?
The second one has the advantage of not needing to be materialized, and can be processed in other compute units (like the GPU). In fact, it might never be fully realized on the CPU if, for example, we read the buffer from an MPEG stream from SSD, decompress with hardware offload, ask the GPU to process effects (say sharpening, or black and white), and pass it directly to the screen output.
For reference, the mac version is roughly this one (CVBuffer, CVPixelBuffer, …):
Of course nearly all graphic APIs will have this kind of abstraction, but I don’t think I follow your point. Are you talking about the difference between a software API versus a hardware accelerated one? If so I would say the distinction isn’t so much the format of the pixel buffer so much as using a hardware accelerated interface to manipulate it.

2024-04-19 8:33 pm
sukru
Alfman,
The difference is like that of between procedural and functional languages.
In one of them you describe each operation along with byte formats, and also handle storage of data structures.
In the other one, you describe the flow of operations in an abstract way, and ask the compiler (or here the framework) to resolve them.
For example,
Let’s say we want to build a video chat application, that replaces the background:
While (forever)
1. Grab buffer from device
2. Run image segmentation algorithm to build a mask of the person
3. Replace the inverse of that masked region with a predetermined background image
4. Compress that image for streaming
5. Shrink that image to a smaller size
6. Send the compressed image to streaming service
7. Send the shrunk image to window server to be displayed as local preview
8. Repeat
(This will most likely also contain format conversion operations)
If done the old way, you’d need to allocate buffers on main system RAM, specify their concrete types (RGBA or YUV, etc), materialize the output for every intermediate operation, while possibly dispatching some of them to compute accelerators (like GPU or video encoder) and manually synchronize between the stages.
When you use “buffers”, you only have abstract “handles” to the image, which may or may not be materialized in intermediate stages. The framework will chose when and where to allocate the actual physical buffers, and possibly even their pixel formats as well. And it had the capability to replace them with better options as hardware or software changes.
For the “one extreme case”, say we have a DMA capable Thunderbolt capture card. The card might directly feed the YUV buffer to a neural network core, which will generate those marks. And then those two could be sent to a GPU’s mapped memory, again skipping the CPU, and be processed directly on there, which will later be copied to both screen and video encoder hardware. Maybe even the TCP/IP stack can have zero copy for the encoded buffer. (Most likely none of these will happen, but they can happen).
At no point of time, there will be a bitmap allocated in CPU main memory.

2024-04-20 12:17 am
Alfman verbose=1
sukru,
If done the old way, you’d need to allocate buffers on main system RAM, specify their concrete types (RGBA or YUV, etc), materialize the output for every intermediate operation, while possibly dispatching some of them to compute accelerators (like GPU or video encoder) and manually synchronize between the stages.
At no point of time, there will be a bitmap allocated in CPU main memory.
What is confusing me is that you are describing local allocation as “the old way” versus “modern graphical systems” using handles. But windowing operating systems have used these kinds of graphics handles forever. Look at “ancient” APIs including X11, win32, opengl, directx – they all use graphic context handles to render output to the screen or GPU buffers without needing to locally allocate any pixel buffers. But most modern GUI toolkits have stopped using them to render primitives altogether. preferring instead to render everything into local pixel buffers and then blit to the screen in one go.
I suspect I may be completely missing your point though, if you are talking about the “dmabufs” from the article then I completely missed that might be what you are talking about.

2024-04-20 5:40 pm
sukru
Alfman,
I think the issue stems from the GTK APIs being still pretty much incomplete, and in early stages. I tried to look into their documentation:
https://docs.gtk.org/gdk4/class.DmabufTextureBuilder.html
But they basically refer to GStreamer for concrete examples. (That is not too much unexpected, though)
But if you look at, for example, the Microsoft docs:
(once again): https://learn.microsoft.com/en-us/uwp/api/windows.ui.xaml.media.imaging.writeablebitmap.pixelbuffer?view=winrt-22621
You’ll see they are pretty much different than just using HBITMAP OS resources:
using (IRandomAccessStream fileStream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
{
BitmapDecoder decoder = await BitmapDecoder.CreateAsync(fileStream);
// Scale image to appropriate size
BitmapTransform transform = new BitmapTransform() {
ScaledWidth = Convert.ToUInt32(Scenario4WriteableBitmap.PixelWidth),
ScaledHeight = Convert.ToUInt32(Scenario4WriteableBitmap.PixelHeight)
};
PixelDataProvider pixelData = await decoder.GetPixelDataAsync(
BitmapPixelFormat.Bgra8, // WriteableBitmap uses BGRA format
BitmapAlphaMode.Straight,
transform,
ExifOrientationMode.IgnoreExifOrientation, // This sample ignores Exif orientation
ColorManagementMode.DoNotColorManage
);
// An array containing the decoded image data, which could be modified before being displayed
byte[] sourcePixels = pixelData.DetachPixelData();
// Open a stream to copy the image contents to the WriteableBitmap’s pixel buffer
using (Stream stream = Scenario4WriteableBitmap.PixelBuffer.AsStream())
{
await stream.WriteAsync(sourcePixels, 0, sourcePixels.Length);
}
}
(Why do we still not have code blocks in here)?
Anyway, the main keyword is async. Even when materializing pixel data they use await calls and explicitly detach the buffers. Meaning those buffers themselves are managed internally by the frameworks, and could be processed in anywhere.
This is becoming more common, as the buffer could be a texture on GPU, or a tensor on NPU.
(And it seems like GTK is also depending on Linux Kernel’s DRM architecture in this as well).