I Use This!
Very High Activity

News

Analyzed about 23 hours ago. based on code collected 2 days ago.
Posted over 2 years ago
Everyone Knows… That the one true benchmark for graphics is glxgears. It’s been the standard for 20+ years, and it’s going to remain the standard for a long time to come. Gears Through Time Zink has gone through a couple phases of glxgears ... [More] performance. Everyone remembers weird glxgears that was getting illegal amounts of frames due to its misrendering: We salute you, old friend. Now, however, some number of you have become aware of the new threat posed by heavy gears in the Mesa 21.3 release. Whereas glxgears is usually a lightweight, performant benchmarking tool, heavy gears is the opposite, chugging away at up to 20% of a single CPU core with none of the accompanying performance. Terrifying. What Creates Such A Monster? The answer won’t surprise you: GL_QUADS. Indeed, because zink is a driver relying on the Vulkan API, only the primitive types supported by Vulkan can be directly drawn. This means any app using GL_QUADS is going to have a very bad time. glxgears is exactly one of these apps, and (now that there’s a ticket open) I was forced to take action. Transquadmation The root of the problem here is that gears passes its vertices into GL to be drawn as a rectangle, but zink can only draw triangles. This (currently) results in doing a very non-performant readback of the index buffer before every draw call to convert the draw to a triangle-based one. A smart person might say “Why not just convert the vertices to triangles as you get them instead of waiting until they’re in the buffer?” Thankfully, a smart person did say that and then do the accompanying work. The result is that finally, after all these years, zink can actually perform well in a real benchmark: Stay Tuned For more exciting zink updates. You won’t want to miss them. [Less]
Posted over 2 years ago
UberCTS In the course of working on more CI-related things for zink, I came across a series of troublesome tests (KHR-GL46.geometry_shader.rendering.rendering.triangles_*) that triggered a severe performance issue. Specifically, the LLVM optimizer ... [More] spends absolute ages trying to optimize ubershaders like this one used in the tests: #version 440 in vec4 position; out vec4 vs_gs_color; uniform bool is_lines_output; uniform bool is_indexed_draw_call; uniform bool is_instanced_draw_call; uniform bool is_points_output; uniform bool is_triangle_fan_input; uniform bool is_triangle_strip_input; uniform bool is_triangle_strip_adjacency_input; uniform bool is_triangles_adjacency_input; uniform bool is_triangles_input; uniform bool is_triangles_output; uniform ivec2 renderingTargetSize; uniform ivec2 singleRenderingTargetSize; void main() { gl_Position = position + vec4(float(gl_InstanceID) ) * vec4(0, float(singleRenderingTargetSize.y) / float(renderingTargetSize.y), 0, 0) * vec4(2.0); vs_gs_color = vec4(1.0, 0.0, 0.0, 0.0); if (is_lines_output) { if (!is_indexed_draw_call) { if (is_triangle_fan_input) { switch(gl_VertexID) { case 0: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 1: case 5: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 2: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 3: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 4: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_input) { switch(gl_VertexID) { case 1: case 6: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 0: case 4: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 2: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 3: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 5: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_adjacency_input) { switch(gl_VertexID) { case 2: case 12: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 0: case 8: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 4: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 6: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 10: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangles_input) { switch(gl_VertexID) { case 0: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 1: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 2: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 3: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 4: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 5: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 6: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 7: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 8: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 9: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 10: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 11: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; } } else if (is_triangles_adjacency_input) { vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); switch(gl_VertexID) { case 0: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 2: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 4: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 6: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 8: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 10: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 12: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 14: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 16: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 18: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 20: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 22: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; } } } else { if (is_triangles_input) { switch(gl_VertexID) { case 11: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 10: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 9: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 8: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 7: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 6: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 5: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 4: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 3: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 2: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 1: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 0: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; } } else if (is_triangle_fan_input) { switch(gl_VertexID) { case 5: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 4: case 0: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 3: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 2: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 1: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_input) { switch (gl_VertexID) { case 5: case 0: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 6: case 2: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 4: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 3: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 1: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_adjacency_input) { switch(gl_VertexID) { case 11: case 1: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 13: case 5: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 9: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 7: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 3: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangles_adjacency_input) { vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); switch(gl_VertexID) { case 23: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 21: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 19: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 17: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 15: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; case 13: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 11: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 9: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 7: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 5: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 3: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 1: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; } } } } else if (is_points_output) { if (!is_indexed_draw_call) { if (is_triangles_adjacency_input) { vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); switch (gl_VertexID) { case 0: case 6: case 12: case 18: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 2: case 22: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 4: case 8: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 10: case 14: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 16: case 20: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; } } else if (is_triangle_fan_input) { switch(gl_VertexID) { case 0: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 1: case 5: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 2: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 3: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 4: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_input) { switch (gl_VertexID) { case 1: case 4: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 0: case 6: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 2: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 3: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 5: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_adjacency_input) { switch (gl_VertexID) { case 2: case 8: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 0: case 12: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 4: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 6: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 10: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangles_input) { switch (gl_VertexID) { case 0: case 3: case 6: case 9: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 1: case 11: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 2: case 4: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 5: case 7: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 8: case 10: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; } } } else { if (is_triangle_fan_input) { switch (gl_VertexID) { case 5: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 4: case 0: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 3: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 2: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 1: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_input) { switch (gl_VertexID) { case 5: case 2: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 6: case 0: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 4: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 3: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 1: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangle_strip_adjacency_input) { switch (gl_VertexID) { case 11: case 5: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 13: case 1: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 9: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 7: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 3: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; default: vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); break; } } else if (is_triangles_adjacency_input) { vs_gs_color = vec4(1.0, 1.0, 1.0, 1.0); switch (gl_VertexID) { case 23: case 17: case 11: case 5: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 21: case 1: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 19: case 15: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 13: case 9: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 7: case 3: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; } } else if (is_triangles_input) { switch (gl_VertexID) { case 11: case 8: case 5: case 2: vs_gs_color = vec4(0.4, 0.5, 0.6, 0.7); break; case 10: case 0: vs_gs_color = vec4(0.5, 0.6, 0.7, 0.8); break; case 9: case 7: vs_gs_color = vec4(0.1, 0.2, 0.3, 0.4); break; case 6: case 4: vs_gs_color = vec4(0.2, 0.3, 0.4, 0.5); break; case 3: case 1: vs_gs_color = vec4(0.3, 0.4, 0.5, 0.6); break; } } } } else if (is_triangles_output) { int vertex_id = 0; if (!is_indexed_draw_call && is_triangles_adjacency_input && (gl_VertexID % 2 == 0) ) { vertex_id = gl_VertexID / 2 + 1; } else { vertex_id = gl_VertexID + 1; } vs_gs_color = vec4(float(vertex_id) / 48.0, float(vertex_id % 3) / 2.0, float(vertex_id % 4) / 3.0, float(vertex_id % 5) / 4.0); } } By ages I mean upwards of 10 minutes per test. Yikes. When In Doubt, Inline Fortunately, zink already has tools to combat exactly this problem: ZINK_INLINE_UNIFORMS. This feature analyzes shaders to determine if inlining uniform values will be beneficial, and if so, it rewrites the shader with the uniform values as constants rather than loads. This brings the resulting NIR for the shader from 4000+ lines down to just under 300. The tests all become near-instant to run as well. Uniform inlining has been in zink for a while, but it’s been disabled by default (except on zink-wip for testing) as this isn’t a feature that’s typically desirable when running apps/games; every time the uniforms are updated, a new shader must be compiled, and this causes (even more) stuttering, making games on zink (even more) unplayable. On CPU-based drivers like lavapipe, however, the time to compile a shader is usually less than the time to actually run a shader, so the trade-off becomes worth doing. Stay tuned for exciting announcements in the next few days. [Less]
Posted over 2 years ago
A few weeks ago I watched Victor's excellent talk on Vulkan Video. This made me question my skills in this area. I'm pretty vague on video processing hardware, I really have no understanding of H264 or any of the standards. I've been loosely ... [More] following the Vulkan video group inside of Khronos, but I can't say I've understood it or been useful.radeonsi has a gallium vaapi driver, that talks to firmware driver encoder on the hardware, surely copying what it is programming can't be that hard. I got an mpv/vaapi setup running and tested some videos on that setup just to get comfortable. I looked at what sort of data was being pushed about.The thing is the firmware is doing all the work here, the driver is mostly just responsible for taking semi-parsed h264 bitstream data structures and giving them in memory buffers to the fw API. Then the resulting decoded image should be magically in a buffer.I then got the demo nvidia video decoder application mentioned in Victor's talk. I ported the code to radv in a couple of days, but then began a long journey into the unknown. The firmware is quite expectant on exactly what it wants and when it wants it. After fixing some interactions with the video player, I started to dig.Now vaapi and DXVA (Windows) are context based APIs. This means they are like OpenGL, where you create a context, do a bunch of work, and tear it down, the driver does all the hw queuing of commands internally. All the state is held in the context. Vulkan is a command buffer based API. The application records command buffers and then enqueues those command buffers to the hardware itself.So the vaapi driver works like this for a videocreate hw ctx, flush, decode, flush, decode, flush, decode, flush, decode, flush, destroy hw ctx, flushHowever Vulkan wants things to be more likeCreate Session, record command buffer with (begin, decode, end) send to hw, (begin, decode, end), send to hw, End SesssionThere is no way at the Create/End session time to submit things to the hardware.After a week or two of hair removal and insightful irc chats I stumbled over a decent enough workaround to avoid the hw dying and managed to decode a H264 video of some jellyfish.The work is based on bunch of other stuff, and is in no way suitable for upstreaming yet, not to mention the Vulkan specification is only beta/provisional so can't be used anywhere outside of development.The preliminary code is in my gitlab repo here[1]. It has a start on h265 decode, but it's not working at all yet, and I think the h264 code is a bit hangy randomly.I'm not sure where this is going yet, but it was definitely an interesting experiment. [1]: https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-prelim-decode [Less]
Posted over 2 years ago
A basic example of the git alias function syntax looks like this. [alias] shortcut = "!f() \ {\ echo Hello world!; \ }; f" This syntax defines a function f and then calls it. These aliases are executed in a sh shell, which means ... [More] there's no access to Bash / Zsh specific functionality. Every command is ended with a ; and each line ended with a \. This is easy enough to grok. But when we try to clean up the above snippet and add some quotes to "Hello world!", we hit this obtuse error message. }; f: 1: Syntax error: end of file unexpected (expecting "}") This syntax error is caused by quotes needing to be escaped. The reason for this comes down to how git tokenizes and executes these functions. If you're curious … [Less]
Posted over 2 years ago
What’s Even Happening Where does the time go? I’m not sure, but I’m making the most of it. As we all know, the Mesa 21.3 release cycle is well underway, primarily to enable me to jam an increasingly ludicrous number of bug fixes in before the final ... [More] tarball ships. But is it possible that I’m doing other things too? Why yes. Yes it is. CI We all like CI. It helps prevent us from footgunning, even when we’re totally sure that our patches are exactly what the codebase needs. That’s why I decided to add GL4.6 CI runs over the past day or so. No more will tests be randomly fixed or failed with my commits! Unless they’re part of the Khronos Confidential GL Test Suite, of course, but we don’t talk about that one. Intrepid readers will note that there’s now a file in the repo which lists exactly how many failures there are on lavapipe, so now everyone knows how many thousands of tests are doing the opposite. Rendering A new Vulkan spec was released this week and it included something I’ve been excited about for a while: KHR_dynamic_rendering. This extension is going to let me cut down on some CPU overhead, and so some time ago I wrote the lavapipe implementation to get a feel for it. Surprisingly, this means that lavapipe is now the only mesa driver implementing it, though I don’t expect that to be the case for long. I’m looking forward to seeing more projects switch to this, since let’s be honest, nobody ever liked renderpasses anyway. [Less]
Posted over 2 years ago
…For 2021 Zink is done. It’s finished. I’ve said it before, I’ve said it again after that, and I even said I meant it that one time, but now I’m serious. Super serious. We’re done here. There’s no need to check this blog anymore, and you don’t ... [More] have to update zink ever again if you’ve pulled in the last week. Mesa 21.3 This is it. This is the release everyone’s been waiting for. Why is that? Because zink is Pretty Fast™ now. And it can run lots of stuff. Blog enthusiasts will recall all that time ago when zink was over that I noted a number of Big Name™ games that zink could now run, including Metro: Last Light Redux and HITMAN (Agent 47’s Deluxe Psychedelic Trip Edition). True blog connoisseurs will recall when zink started to pull ahead of real GL drivers in features in order to run Warhammer 40k: Dawn of War. But how many die-hard blog fans are here from the future and can remember when I posted that Bioshock Infinite now actually runs on RADV instead of hanging? It’s hard to overstate the amount of work that’s gone into zink for this release. Over 400 patches amounted to ES 3.2, a suballocator, and a slew of random extensions to improve compatibility and performance across the board. If you find a game or app* which doesn’t run at all on zink 21.3, play the lottery. It’s your day. Except using it for your Xorg session or Wayland compositor. Don’t try this without supervision. Bioshock Infinite Now Runs On Zink Zink+RADV: New BFFs As part of a cross-training exercise, I’ve been hanging around with the hangmaster himself, the Baron of Black Screens, the Liege of Lost Sessions, Samuel Pitoiset. Together we’ve (but mostly him while I watch) been stamping out a colossal number of pathological hangs and crashes with zink on RADV. At present, zink on RADV has only around 200 failures in the GL 4.6 conformance suite. It’s not quite there yet, but considering the number was well over 1000 just a week ago, there’s a lot of easy cleanup work that can be done here in both drivers to knock that down further. Will 2022 Be The Year Of Zink Conformance? Maybe. What’s Next? Bug fixes. Lots of bug fixes. Seriously, there’s so, so many bug fixes coming. I have some fun surprises to unveil in the near future too. For example: any guesses which Vulkan driver I’ll be showing zink off on next? Hint: B I G F P S. [Less]
Posted over 2 years ago
 In 2007, Jan Arne Petersen added a D-Bus API to what was still pretty much an import into gnome-control-center of the "acme" utility I wrote to have all the keys on my iBook working.It switched the code away from remapping keyboard keys to ... [More] "XF86Audio*", to expecting players to contact the D-Bus daemon and ask to be forwarded key events.  Multimedia keys circa 2003 In 2013, we added support for controlling media players using MPRIS, as another interface. Fast-forward to 2021, and MPRIS support is ubiquitous, whether in free software, proprietary applications or even browsers. So we'll be parting with the "org.gnome.SettingsDaemon.MediaKeys" D-Bus API. If your application still wants to work with older versions of GNOME, it is recommended to at least quiet the MediaKeys API's unavailability.  Multimedia keys in 2021  TL;DR: Remove code that relies on gnome-settings-daemon's MediaKeys API, make sure to add MPRIS support to your app. [Less]
Posted over 2 years ago
I’m Bad At Blogging I’m responsible enough to admit that I’m bad at blogging. I’ve said repeatedly that I’m going to blog more often, and then I go and do the complete opposite. I don’t know why I’m like this, but here we are, and now it’s time ... [More] for another blog post. What’s Been Happening In short: not a lot. All the features I’ve previously blogged about have landed, and zink is once again in “release mode” until the branchpoint next week to avoid having to rush patches in at the last second. This means probably there won’t be any interesting patches at all to zink until then. We’re in a good spot though, and I’m pleased with the state of the driver for this release. You probably still won’t be using it to play any OpenGL games you pick up from the Winter Steam Sale, but potentially those days aren’t too far off. With that said, I do have to actually blog about something technical for once, so let’s roll the dice and see what it’s going to be ARB_bindless_texture We did it. We got a good roll. This is actually a cool extension for an implementation deep dive because of how (relatively) simple Vulkan makes it to handle. First, an overview: What is ARB_bindless_texture? This is an extension used by only the most elite GL apps to enable texture streaming, namely the ability to continually add more images into the rendering pipeline either for sampling or shader write operations. An image is bound to a “handle”, and from there, it can be made “resident” at any time to use it in shaders. This is different from the general GL methodology where an image must be explicitly bound to a specific slot (instead each image has its own slot), and it allows for both greater flexibility and more images to be in use at any given time. At the implementation level, this actually amounts to three distinct features: the ability to track and manage unique “handles” for each image that can be made resident the ability to access these images from shaders the ability to pass these images between shader stages as normal I/O In zink, I tackled these in the order I’ve listed. Handle Management This wouldn’t have been (as) possible without one very special, very awful Vulkan extension. You knew this was coming. VK_EXT_descriptor_indexing. That’s right, it’s a requirement for this, but not for the reason you might think. Zink has no need for the impossibly large descriptorsets enabled by this extension, but I did need the other features it provides: VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT - enables binding a “bindless” descriptor set once and then performing updates on it without needing to have multiple sets VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT - enables invalidating deleted members of an active set and leaving them as garbage values in the descriptor set as long as they won’t be accessed in shaders (don’t worry, this is totally safe) VK_DESCRIPTOR_BINDING_UPDATE_UNUSED_WHILE_PENDING_BIT - enables updating members of an active set that aren’t currently in use by any shaders With these, it becomes possible to implement bindless textures using the existing Gallium convention: create u_idalloc instance to track and generate integer handle IDs map these handle IDs to slots in a large-ish (1024) sized descriptor array dynamically update the slots in the set as textures are made resident/not-resident return handle IDs to the u_idalloc pool once they are destroyed and the image is no longer in use This creates a cycle where a handle ID is allocated, an image is bound to that slot in the descriptor array, the image can be unbound, the handle ID is deleted, and then finally the ID is recycled, all while only binding and updating a single descriptor set as draws continue. Shader Access Now that the images are accessible to the GPU in the bindless descriptor array, shaders will have to be updated to read them. In NIR, bindless instructions come in two variants: nir_intrinsic_bindless_image_* nir_instr_type_tex with nir_tex_src_texture_handle These have their own unique semantics that I didn’t bother to look into; I only need to do completely normal array derefs, so what I actually needed was just to rewrite them back into normal-er instructions. For the image intrinsics, that ended up being the following snippet: nir_intrinsic_instr *instr = nir_instr_as_intrinsic(in); nir_intrinsic_op op; #define OP_SWAP(OP) \ case nir_intrinsic_bindless_image_##OP: \ op = nir_intrinsic_image_deref_##OP; \ break; /* convert bindless intrinsics to deref intrinsics */ switch (instr->intrinsic) { OP_SWAP(atomic_add) OP_SWAP(atomic_and) OP_SWAP(atomic_comp_swap) OP_SWAP(atomic_dec_wrap) OP_SWAP(atomic_exchange) OP_SWAP(atomic_fadd) OP_SWAP(atomic_fmax) OP_SWAP(atomic_fmin) OP_SWAP(atomic_imax) OP_SWAP(atomic_imin) OP_SWAP(atomic_inc_wrap) OP_SWAP(atomic_or) OP_SWAP(atomic_umax) OP_SWAP(atomic_umin) OP_SWAP(atomic_xor) OP_SWAP(format) OP_SWAP(load) OP_SWAP(order) OP_SWAP(samples) OP_SWAP(size) OP_SWAP(store) default: return false; } enum glsl_sampler_dim dim = nir_intrinsic_image_dim(instr); nir_variable *var = dim == GLSL_SAMPLER_DIM_BUF ? bindless_buffer_array : bindless_image_array; if (!var) var = create_bindless_image(b->shader, dim); instr->intrinsic = op; b->cursor = nir_before_instr(in); nir_deref_instr *deref = nir_build_deref_var(b, var); if (glsl_type_is_array(var->type)) deref = nir_build_deref_array(b, deref, nir_u2uN(b, instr->src[0].ssa, 32)); nir_instr_rewrite_src_ssa(in, &instr->src[0], &deref->dest.ssa); In short, swap the intrinsic back to a regular image one, then rewrite the image src as a deref of a bindless image variable (which is just image[1024]). In long…it’s the same thing. It’s actually that simple. The tex instruction is where things get trickier. nir_variable *var = tex->sampler_dim == GLSL_SAMPLER_DIM_BUF ? bindless_buffer_array : bindless_texture_array; if (!var) var = create_bindless_texture(b->shader, tex); b->cursor = nir_before_instr(in); nir_deref_instr *deref = nir_build_deref_var(b, var); if (glsl_type_is_array(var->type)) deref = nir_build_deref_array(b, deref, nir_u2uN(b, tex->src[idx].src.ssa, 32)); nir_instr_rewrite_src_ssa(in, &tex->src[idx].src, &deref->dest.ssa); This part is the same as the image rewrite: just rewriting the instruction as a deref. This part, however, is different: unsigned needed_components = glsl_get_sampler_coordinate_components(glsl_without_array(var->type)); unsigned c = nir_tex_instr_src_index(tex, nir_tex_src_coord); unsigned coord_components = nir_src_num_components(tex->src[c].src); if (coord_components < needed_components) { nir_ssa_def *def = nir_pad_vector(b, tex->src[c].src.ssa, needed_components); nir_instr_rewrite_src_ssa(in, &tex->src[c].src, def); tex->coord_components = needed_components; } The thing about bindless textures is that by the time zink sees them, they have no dimensionality. They’re just textures in an array, regardless of whether they’re 1D, 2D, 3D, or arrayed. This means the variables used for derefs might not have the right number of coordinate components, or the instructions using them might not have the right number. To fix this, an extra cleanup is needed here to match up the number of components with the variable being used. With all of that in place, basic bindless operations are working. But wait… Shader I/O This was the tricky part. According to the spec, it now becomes legal to have an image or a sampler as an input or an output in a shader. But is it really, truly necessary to pass images between the shaders? No. No it isn’t. nir_deref_instr *src_deref = nir_src_as_deref(instr->src[0]); nir_variable *var = nir_deref_instr_get_variable(src_deref); if (var->data.bindless) return false; if (var->data.mode != nir_var_shader_in && var->data.mode != nir_var_shader_out) return false; if (!glsl_type_is_image(var->type) && !glsl_type_is_sampler(var->type)) return false; var->type = glsl_int64_t_type(); var->data.bindless = 1; b->cursor = nir_before_instr(in); nir_deref_instr *deref = nir_build_deref_var(b, var); if (instr->intrinsic == nir_intrinsic_load_deref) { nir_ssa_def *def = nir_load_deref(b, deref); nir_instr_rewrite_src_ssa(in, &instr->src[0], def); nir_ssa_def_rewrite_uses(&instr->dest.ssa, def); } else { nir_store_deref(b, deref, instr->src[1].ssa, nir_intrinsic_write_mask(instr)); } nir_instr_remove(in); nir_instr_remove(&src_deref->instr); Bindless shader i/o is really just passing array indices that masquerade as images. If they’re rewritten back to integer types, that all goes away, and they become regular i/o that needs no additional handling. Just This Once The translation to Vulkan made everything incredibly easy. I didn’t need any special hacks or corner case behavior, and I didn’t have to spend time reading code from other drivers to figure out what the hell I was doing wrong. Validation even works for it! Truly miraculous. [Less]
Posted over 2 years ago
Wim Taymans laying out the vision for the future of Linux multimedia PipeWire has already made great strides forward in terms of improving the audio handling situation on Linux, but one of the original goals was to also bring along the video side ... [More] of the house. In fact in the first few releases of Fedora Workstation where we shipped PipeWire we solely enabled it as a tool to handle screen sharing for Wayland and Flatpaks. So with PipeWire having stabilized a lot for audio now we feel the time has come to go back to the video side of PipeWire and work to improve the state-of-art for video capture handling under Linux. Wim Taymans did a presentation to our team inside Red Hat on the 30th of September talking about the current state of the world and where we need to go to move forward. I thought the information and ideas in his presentation deserved wider distribution so this blog post is building on that presentation to share it more widely and also hopefully rally the community to support us in this endeavour. The current state of video capture, usually webcams, handling on Linux is basically the v4l2 kernel API. It has served us well for a lot of years, but we believe that just like you don’t write audio applications directly to the ALSA API anymore, you should neither write video applications directly to the v4l2 kernel API anymore. With PipeWire we can offer a lot more flexibility, security and power for video handling, just like it does for audio. The v4l2 API is an open/ioctl/mmap/read/write/close based API, meant for a single application to access at a time. There is a library called libv4l2, but nobody uses it because it causes more problems than it solves (no mmap, slow conversions, quirks). But there is no need to rely on the kernel API anymore as there are GStreamer and PipeWire plugins for v4l2 allowing you to access it using the GStreamer or PipeWire API instead. So our goal is not to replace v4l2, just as it is not our goal to replace ALSA, v4l2 and ALSA are still the kernel driver layer for video and audio. It is also worth considering that new cameras are getting more and more complicated and thus configuring them are getting more complicated. Driving this change is a new set of cameras on the way often called MIPI cameras, as they adhere to the API standards set by the MiPI Alliance. Partly driven by this V4l2 is in active development with a Codec API addition, statefull/stateless, DMABUF, request API and also adding a Media Controller (MC) Graph with nodes, ports, links of processing blocks. This means that the threshold for an application developer to use these APIs directly is getting very high in addition to the aforementioned issues of single application access, the security issues of direct kernel access and so on. Libcamera is meant to be the userland library for v4l2. Of course we are not the only ones seeing the growing complexity of cameras as a challenge for developers and thus libcamera has been developed to make interacting with these cameras easier. Libcamera provides unified API for setup and capture for cameras, it hides the complexity of modern camera devices, it is supported for ChromeOS, Android and Linux. One way to describe libcamera is as the MESA of cameras. Libcamera provides hooks to run (out-of-process) vendor extensions like for image processing or enhancement. Using libcamera is considering pretty much a requirement for embedded systems these days, but also newer Intel chips will also have IPUs configurable with media controllers. Libcamera is still under heavy development upstream and do not yet have a stable ABI, but they did add a .so version very recently which will make packaging in Fedora and elsewhere a lot simpler. In fact we have builds in Fedora ready now. Libcamera also ships with a set of GStreamer plugins which means you should be able to get for instance Cheese working through libcamera in theory (although as we will go into, we think this is the wrong approach). Before I go further an important thing to be aware of here is that unlike on ALSA, where PipeWire can provide a virtual ALSA device to provide backwards compatibility with older applications using the ALSA API directly, there is no such option possible for v4l2. So application developers will need to port to something new here, be that libcamera or PipeWire. So what do we feel is the right way forward? How we envision the Linux multimedia stack going forward Above you see an illustration of what we believe should be how the stack looks going forward. If you made this drawing of what the current state is, then thanks to our backwards compatibility with ALSA, PulseAudio and Jack, all the applications would be pointing at PipeWire for their audio handling like they are in the illustration you see above, but all the video handling from most applications would be pointing directly at v4l2 in this diagram. At the same time we don’t want applications to port to libcamera either as it doesn’t offer a lot of the flexibility than using PipeWire will, but instead what we propose is that all applications target PipeWire in combination with the video camera portal API. Be aware that the video portal is not an alternative or a abstraction of the PipeWire API, it is just a way to set up the connection to PipeWire that has the added bonus of working if your application is shipping as a Flatpak or another type of desktop container. PipeWire would then be in charge of talking to libcamera or v42l for video, just like PipeWire is in charge of talking with ALSA on the audio side. Having PipeWire be the central hub means we get a lot of the same advantages for video that we get for audio. For instance as the application developer you interact with PipeWire regardless of if what you want is a screen capture, a camera feed or a video being played back. Multiple applications can share the same camera and at the same time there are security provided to avoid the camera being used without your knowledge to spy on you. And also we can have patchbay applications that supports video pipelines and not just audio, like Carla provides for Jack applications. To be clear this feature will not come for ‘free’ from Jack patchbays since Jack only does audio, but hopefully new PipeWire patchbays like Helvum can add video support. So what about GStreamer you might ask. Well GStreamer is a great way to write multimedia applications and we strongly recommend it, but we do not recommend your GStreamer application using the v4l2 or libcamera plugins, instead we recommend that you use the PipeWire plugins, this is of course a little different from the audio side where PipeWire supports the PulseAudio and Jack APIs and thus you don’t need to port, but by targeting the PipeWire plugins in GStreamer your GStreamer application will get the full PipeWire featureset. So what is our plan of action> So we will start putting the pieces in place for this step by step in Fedora Workstation. We have already started on this by working on the libcamera support in PipeWire and packaging libcamera for Fedora. We will set it up so that PipeWire can have option to switch between v4l2 and libcamera, so that most users can keep using the v4l2 through PipeWire for the time being, while we work with upstream and the community to mature libcamera and its PipeWire backend. We will also enable device discoverer for PipeWire. We are also working on maturing the GStreamer elements for PipeWire for the video capture usecase as we expect a lot of application developers will just be using GStreamer as opposed to targeting PipeWire directly. We will start with Cheese as our initial testbed for this work as it is a fairly simple application, using Cheese as a proof of concept to have it use PipeWire for camera access. We are still trying to decide if we will make Cheese speak directly with PipeWire, or have it talk to PipeWire through the pipewiresrc GStreamer plugin, but have their pro and cons in the context of testing and verifying this. We will also start working with the Chromium and Firefox projects to have them use the Camera portal and PipeWire for camera support just like we did work with them through WebRTC for the screen sharing support using PipeWire. There are a few major items we are still trying to decide upon in terms of the interaction between PipeWire and the Camera portal API. It would be tempting to see if we can hide the Camera portal API behind the PipeWire API, or failing that at least hide it for people using the GStreamer plugin. That way all applications get the portal support for free when porting to GStreamer instead of requiring using the Camera portal API as a second step. On the other side you need to set up the screen sharing portal yourself, so it would probably make things more consistent if we left it to application developers to do for camera access too. What do we want from the community here? First step is just help us with testing as we roll this out in Fedora Workstation and Cheese. While libcamera was written motivated by MIPI cameras, all webcams are meant to work through it, and thus all webcams are meant to work with PipeWire using the libcamera backend. At the moment that is not the case and thus community testing and feedback is critical for helping us and the libcamera community to mature libcamera. We hope that by allowing you to easily configure PipeWire to use the libcamera backend (and switch back after you are done testing) we can get a lot of you to test and let us what what cameras are not working well yet. A little further down the road please start planning moving any application you maintain or contribute to away from v4l2 API and towards PipeWire. If your application is a GStreamer application the transition should be fairly simple going from the v4l2 plugins to the pipewire plugins, but beyond that you should familiarize yourself with the Camera portal API and the PipeWire API for accessing cameras. For further news and information on PipeWire follow our @PipeWireP twitter account and for general news and information about what we are doing in Fedora Workstation make sure to follow me on twitter @cfkschaller. [Less]
Posted over 2 years ago
For the hw-enablement for Bay- and Cherry-Trail devices which I do as a side project, sometimes it is useful to play with the Android which comes pre-installed on some of these devices.Sometimes the Android-X86 boot-loader (kerneflinger) is locked ... [More] and the standard "Developer-Options" -> "Enable OEM Unlock" -> "Run 'fastboot oem unlock'" sequence does not work (e.g. I got the unlock yes/no dialog, and could move between yes and no, but I could not actually confirm the choice).Luckily there is an alternative, kernelflinger checks a "OEMLock" EFI variable to see if the device is locked or not. Like with some of my previous adventures changing hidden BIOS settings, this EFI variable is hidden from the OS as soon as the OS calls ExitBootServices, but we can use the same modified grub to change this EFI variable. After booting from an USB stick with the relevant grub binary installed as "EFI/BOOT/BOOTX64.EFI" or "BOOTIA32.EFI", entering thefollowing command on the grub cmdline will unlock the bootloader:setup_var_cv OEMLock 0 1 1Disabling dm-verity support is pretty easy on these devices because they can just boot a regular Linux distro from an USB drive. Note booting a regular Linux distro may cause the Android "system" partition to get auto-mounted after which dm-verity checks will fail! Once we have a regular Linux distro running step 1 is to find out which partition is the android_boot partition to do this as root run:blkid /dev/mmcblk?p#Replacing the ? for the mmcblk number for the internal eMMC and then for # is 1 to n, until one of the partitions is reported as having 'PARTLABEL="android_boot"', usually "mmcblk?p3" is the one you want, so you could try that first.Now make an image of the partition by running e.g.:dd if=/dev/mmcblk1p3" of=android_boot.imgAnd then copy the "android_boot.img" file to another computer. On this computer extract the file and then the initrd like this:abootimg -x android_boot.imgmkdir initrdcd initrdzcat ../initrd.img | cpio -iNow edit the fstab file and remove "verify" from the line for the system partition. after this update android_boot.img like this:find . | cpio -o -H newc -R 0.0 | gzip -9 > ../initrd.imgcd ..abootimg -u android_boot.img -r initrd.imgThe easiest way to test the new image is using fastboot, boot the tablet into Android and connect it to the PC, then run:adb reboot bootloaderfastboot boot android_boot.imgAnd then from an "adb shell" do "cat /fstab" verify that the "verify" option is gone now. After this you can (optionally) dd the new android_boot.img back to the android_boot partition to make the change permanent.Note if Android is not booting you can force the bootloader to enter fastboot mode on the next boot by downloading this file and then under regular Linux running the following command as root:cat LoaderEntryOneShot > /sys/firmware/efi/efivars/LoaderEntryOneShot-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f [Less]