I Use This!
Very High Activity

News

Analyzed about 4 hours ago. based on code collected 1 day ago.
Posted over 2 years ago
Khronos submission indicating Vulkan 1.1 conformance for Turnip on Adreno 618 GPU. It is a great feat, especially for a driver which is created without hardware documentation. And we support features far from the bare minimum required for ... [More] conformance. But first of all, I want to thank and congratulate everyone working on the driver: Connor Abbott, Rob Clark, Emma Anholt, Jonathan Marek, Hyunjun Ko, Samuel Iglesias. And special thanks to Samuel Iglesias and Ricardo Garcia for tirelessly improving Khronos Vulkan Conformance Tests. At the start of the year, when I started working on Turnip, I looked at the list of failing tests and thought “It wouldn’t take a lot to fix them!”, right, sure… And so I started fixing issues alongside of looking for missing features. In June there were even more failures than there were in January, how could it be? Of course we were adding new features and it accounted for some of them. However even this list was likely not exhaustive because for gitlab CI instead of running the whole Vulkan CTS suite - we ran 1/3 of it. We didn’t have enough devices to run the whole suite fast enough to make it usable in CI. So I just ran it locally from time to time. 1/3 of the tests doesn’t sound bad and for the most part it’s good enough since we have a huge amount of tests looking like this: dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_copy_format_list dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_load_format_list dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_clear_texture_format_list dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_copy_format_list dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_load_format_list dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture dEQP-VK.image.mutable.2d_array.b8g8r8a8_unorm_r32_sfloat_copy_texture_format_list ... Every format, every operation, etc. Tens of thousands of them. Unfortunately the selection of tests for a fractional run is as straightforward as possible - just every third test. Which bites us when there a single unique tests, like: dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_depth_no_attachment dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil_no_attachment dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_depth_no_attachment dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil_no_attachment ... Most of them test something unique that has much higher probability of triggering a special path in a driver compared to uncountable image tests. And they fell through the cracks. I even had to fix one test twice because the CI didn’t run it. A possible solution is to skip tests only when there is a large swath of them and run smaller groups as-is. But it’s likely more productive to just throw more hardware at the issue =). Not enough hardware in CI Another trouble is that we had only one 6xx sub-generation present in CI - Adreno 630. We distinguish four sub-generations. Not only they have some different capabilities, there are also differences in the existing ones, causing the same test to pass on CI and being broken on another newer GPU. Presently in CI we test only Adreno 618 and 630 which are “Gen 1” GPUs and we claimed conformance only for Adreno 618. Yet another issue is that we could render in tiling and bypass (sysmem) modes. That’s because there are a few features we could support only when there is no tiling and we render directly into the sysmem, and sometimes rendering directly into sysmem is just faster. At the moment we use tiling rendering by default unless we meet an edge case, so by default CTS tests only tiling rendering. We are forcing sysmem mode for a subset of tests on CI, however it’s not enough because the difference between modes is relevant for more than just a few tests. Thus ideally we should run twice as many tests, and even better would be thrice as many to account for tiling mode without binning vertex shader. That issue became apparent when I implemented a magical eight-ball to choose between tiling and bypass modes depending on the run-time information in order to squeeze more performance (it’s still work-in-progress). The basic idea is that a single draw call or a few small draw calls is faster to render directly into system memory instead of loading framebuffer into the tile memory and storing it back. But almost every single CTS test does exactly this! Do a single or a few draw calls per render pass, which causes all tests to run in bypass mode. Fun! Now we would be forced to deal with this issue since with the magic eight-ball games would partly run in the tiling mode and partly in the bypass, making them equally important for real-world workload. Does conformance matter? Does it reflect anything real-world? Unfortunately no test suite could wholly reflect what game developers do in their games. However, the amount of tests grows and new tests are getting contributed based on issues found in games and other applications. When I ran my stash of D3D11 game traces through DXVK on Turnip for the first time - I found a bunch of new crashes and hangs but it took fixing just a few of them for majority of games to render correctly. This shows that Khronos Vulkan Conformance Tests are doing their job and we at Igalia are striving to make them even better. [Less]
Posted over 2 years ago
One of the extensions released as part of Vulkan 1.2.199 was VK_EXT_image_view_min_lod extension. I’m happy to see it published as I have participated in the release process of this extension: from reviewing the spec exhaustively (I even contributed ... [More] a few things to improve it!) to developing CTS tests for it that will be eventually merged to the CTS repo. This extension was proposed by Valve to mirror a feature present in Direct3D 12 (check ResourceMinLODClamp here) and Direct3D 11 (check SetResourceMinLOD here). In other words, this extension allows clamping the minimum LOD value accessed by an image view to a minLod value set at image view creation time. That way, any library or API layer that translates Direct3D 11/12 calls to Vulkan can use the extension to mirror the behavior above on Vulkan directly without workarounds, facilitating the port of Direct3D applications such as games to Vulkan. For example, projects like Vkd3d, Vkd3d-proton and DXVK could benefit from it. Going into more details, this extension changed how the image level selection is calculated and sets an additional minimum required in the image level for integer texel coordinate operations if it is enabled. The way to use this feature in an application is very simple: Check the extension is supported and if the physical device supports the respective feature: // Provided by VK_EXT_image_view_min_lod typedef struct VkPhysicalDeviceImageViewMinLodFeaturesEXT { VkStructureType sType; void* pNext; VkBool32 minLod; } VkPhysicalDeviceImageViewMinLodFeaturesEXT; Once you know everything is working, enable both the extension and the feature when creating the device. When you want to create a VkImageView that defines a minLod for image accesses, then add the following structure filled with the value you want in VkImageViewCreateInfo’s pNext. // Provided by VK_EXT_image_view_min_lod typedef struct VkImageViewMinLodCreateInfoEXT { VkStructureType sType; const void* pNext; float minLod; } VkImageViewMinLodCreateInfoEXT; And that’s all! As you see, it is a very simple extension. Happy hacking! [Less]
Posted over 2 years ago
I was interested in how much work a vaapi on top of vulkan video proof of concept would be.My main reason for being interested is actually video encoding, there is no good vulkan video encoding demo yet, and I'm not experienced enough in the area to ... [More] write one, but I can hack stuff. I think it is probably easier to hack a vaapi encode to vulkan video encode than write a demo app myself.With that in mind I decided to see what decode would look like first. I talked to Mike B (most famous zink author) before he left for holidays, then I ignored everything he told me and wrote a super hack.This morning I convinced zink vaapi on top anv with iris GL doing the presents in mpv to show me some useful frames of video. However zink vaapi on anv with zink GL is failing miserably (well green jellyfish).I'm not sure how much more I'll push on the decode side at this stage, I really wanted it to validate the driver side code, and I've found a few bugs in there already.The WIP hacks are at [1]. I might push on to encode side and see if I can workout what it entails, though the encode spec work is a lot more changeable at the moment. [1] https://gitlab.freedesktop.org/airlied/mesa/-/commits/zink-video-wip [Less]
Posted over 2 years ago
Last Post Of The Year Yes, we’ve finally reached that time. It’s mid-November, and I’ve been storing up all this random stuff to unveil now because I’m taking the rest of the year off. This will be the final SGC post for 2021. As such, it has to be ... [More] a good one, doesn’t it? Zink Roundup It’s been a wild year for zink. Does anybody even remember how many times I finished the project? I don’t, but it’s been at least a couple. Somehow there’s still more to do though. I’ll be updating zink-wip one final time later today with the latest Copper snapshot. This is going to be crashier than the usual zink-wip, but that’s because zink-wip doesn’t have nearly as much cool future-zink stuff as it used to these days. Nearly everything is already merged into mainline, or at least everything that’s likely to help with general use, so just use that if you aren’t specifically trying to test out Copper. One of those things that’s been a thorn in zink’s side for a long time is PBO handling, specifically for unsupported formats like ARGB/ABGR, ALPHA, LUMINANCE, and InTeNsItY. Vulkan has no analogs for any of these, and any app/game which tries to do texture upload or download from them with zink is going to have a very, very bad time, as has been the case with CS:GO, which would take literal days to reach menus due to performing fullscreen GL_LUMINANCE texture downloads. This is now fixed in the course of landing compute PBO download support, which I blogged about forever ago since it also yields a 2-10x performance improvement for a number of other cases in all Gallium drivers. Or at least the ones that enable it. CS:GO should now run out of the box in Mesa 22.0, and things like RPCS3 which do a lot of PBO downloading should also see huge improvements. That’s all I’ve got here for zink, so now it’s time once again… THIS IS NO LONGER A ZINK BLOG That’s right, it’s happening. Change your hats, we’re a Gallium blog again for the first time in nearly five months. Everyone remembers when I promised that you’d be able to run native Linux D3D9 games on the Nine state tracker. Well, I suppose that’s a fancy way of saying Source Engine games, aka the ones Valve ships with native Linux ports, since probably nobody else has shipped any kind of native Linux app that uses the D3D9 API, but still! That time is now. Right now. No more waiting, no new Mesa release required, you can just plug it in and test it out this second for instantly improved performance. As long as you first acknowledge that this is not a Valve-official project, and it’s only to be used for educational purposes. But also, please benchmark it lots and tell me your findings. Again, just for educational purposes. Wink. How? This has been a long time in the making. After the original post, I knew that the goal here was to eventually be able to run these games without needing any kind of specialized Mesa build, since that’s annoying and also breaks compatibility with running Nine for other purposes. Thus I enlisted the further help of Nine expert and image enthusiast, Axel Davy, to help smooth out the rough edges once I was done fingerpainting my way to victory. The result is a simple wrapper which can be preloaded to run any DXVK-compatible (i.e., any of them that support -vulkan) Source Engine game on Nine—and obviously this won’t work on NVIDIA blob at all so don’t bother trying. In short: clone that repo right click on Properties for e.g., Left 4 Dead 2 change the command line to LD_PRELOAD=/path/to/Xnine/nine_sdl.so %command% -vulkan For Portal 2 (at present, though this won’t always be the case), you’ll also need to add NINE_VHACKS=1 to work around some frogs that were accidentally added to the latest version of the game as a developer-only easter egg. Then just run the game normally, and if everything went right and you have Nine installed in one of the usual places, you should load up the game with Gallium Nine. More details on that in the repo’s README. GPU Goes Brrr? Yes. Very brrr. Here’s your normal GL performance from a simple Portal 2 benchmark: Around 400 FPS. Here’s Gallium Nine: Around 600 FPS. A 50% improvement with the exact same backend GPU driver isn’t too bad for a simple preload shim. Can I Get A Side Of SHOTS FIRED With That? You got it. What about DXVK? This isn’t an extensive benchmark, but here we go with that too: Also around 600 FPS. I say “around” here because the variation is quite extreme for both Nine and DXVK based on slight changes in variable clock speeds because I didn’t pin them: Nine ranges between 590-610 FPS, and DXVK is 590-620 FPS. So now there’s two solid, open source methods for improving performance in these games over the normal GL version. But what if we go even deeper? What if we check out some real performance numbers? Power Consumption If you’ve never checked out PowerTOP, it’s a nice way to get an overview of what’s using up system resources and consuming power. If you’ve never used it for benchmarking, don’t worry, I took care of that too. Here’s some PowerTOP figures for the same Portal 2 timedemo: DXVK Nine What’s interesting here is that DXVK uses 90%+ CPU, while Nine is only using about 25%. This is a gap that’s consistent across runs, and it likely explains why a number of people find that DXVK doesn’t work on their systems: you still need some amount of CPU to run the actual game calculations, so if you’re on older hardware, you might end up using all of your available CPU just on DXVK internals. GPU Usage? Got you covered. Here’s a per-second poll (one row per second) from radeontop. DXVK: GPU Usage VGT Usage TA Usage SX Usage SH Usage SPI Usage SC Usage PA Usage DB Usage CB Usage VRAM Usage GTT Usage Memory Clock Shader Clock 35.83% 17.50% 23.33% 28.33% 17.50% 29.17% 28.33% 5.00% 27.50% 26.67% 12.75% 1038.15mb 7.82% 638.53mb 52.19% 0.457ghz 33.52% 0.704ghz 35.83% 17.50% 23.33% 28.33% 17.50% 29.17% 28.33% 5.00% 27.50% 26.67% 12.75% 1038.15mb 7.82% 638.53mb 52.19% 0.457ghz 33.52% 0.704ghz 36.67% 30.00% 33.33% 35.00% 30.00% 35.00% 32.50% 18.33% 30.83% 28.33% 12.76% 1038.57mb 7.82% 638.53mb 48.88% 0.428ghz 36.95% 0.776ghz 75.83% 63.33% 62.50% 66.67% 63.33% 68.33% 65.00% 27.50% 60.83% 53.33% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 95.82% 2.012ghz 71.67% 60.00% 60.00% 64.17% 60.00% 66.67% 60.83% 23.33% 56.67% 51.67% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 96.31% 2.023ghz 75.00% 62.50% 66.67% 66.67% 62.50% 69.17% 68.33% 23.33% 65.83% 59.17% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 96.71% 2.031ghz 63.33% 55.00% 56.67% 58.33% 55.00% 59.17% 59.17% 17.50% 52.50% 50.00% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 89.77% 1.885ghz 78.33% 64.17% 64.17% 65.00% 64.17% 69.17% 70.83% 30.00% 63.33% 58.33% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 97.33% 2.044ghz 73.33% 60.83% 64.17% 65.00% 60.83% 67.50% 64.17% 29.17% 59.17% 51.67% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 97.39% 2.045ghz 60.83% 50.83% 50.00% 53.33% 50.83% 55.00% 50.83% 25.83% 48.33% 45.00% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 95.35% 2.002ghz 67.50% 50.00% 55.00% 59.17% 50.00% 60.00% 58.33% 28.33% 52.50% 45.00% 12.76% 1038.73mb 7.82% 638.53mb 100.00% 0.875ghz 87.91% 1.846ghz Nine: GPU Usage VGT Usage TA Usage SX Usage SH Usage SPI Usage SC Usage PA Usage DB Usage CB Usage VRAM Usage GTT Usage Memory Clock Shader Clock 17.50% 11.67% 15.00% 10.83% 11.67% 15.00% 10.83% 3.33% 10.83% 10.00% 7.38% 600.56mb 1.60% 130.48mb 50.38% 0.441ghz 15.76% 0.331ghz 17.50% 11.67% 15.00% 10.83% 11.67% 15.00% 10.83% 3.33% 10.83% 10.00% 7.38% 600.56mb 1.60% 130.48mb 50.38% 0.441ghz 15.76% 0.331ghz 70.83% 63.33% 65.83% 60.00% 63.33% 68.33% 57.50% 24.17% 56.67% 54.17% 7.35% 598.43mb 1.60% 130.48mb 89.50% 0.783ghz 77.09% 1.619ghz 74.17% 70.00% 67.50% 60.00% 70.00% 70.83% 61.67% 17.50% 60.83% 58.33% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 91.03% 1.912ghz 78.33% 69.17% 72.50% 65.00% 69.17% 75.83% 65.83% 15.00% 65.83% 64.17% 7.37% 599.80mb 1.60% 130.47mb 100.00% 0.875ghz 93.92% 1.972ghz 70.83% 67.50% 64.17% 55.00% 67.50% 67.50% 57.50% 20.83% 55.83% 53.33% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 91.93% 1.930ghz 65.00% 64.17% 60.00% 51.67% 64.17% 61.67% 53.33% 18.33% 52.50% 50.83% 7.37% 599.80mb 1.60% 130.47mb 100.00% 0.875ghz 89.95% 1.889ghz 74.17% 68.33% 70.00% 60.83% 68.33% 72.50% 65.00% 24.17% 64.17% 58.33% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 92.53% 1.943ghz 77.50% 73.33% 73.33% 62.50% 73.33% 75.00% 61.67% 22.50% 62.50% 57.50% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 91.21% 1.915ghz 70.00% 65.83% 60.00% 57.50% 65.83% 61.67% 59.17% 24.17% 55.00% 54.17% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 92.69% 1.946ghz 70.00% 65.83% 60.00% 57.50% 65.83% 61.67% 59.17% 24.17% 55.00% 54.17% 7.35% 598.42mb 1.60% 130.47mb 100.00% 0.875ghz 92.69% 1.946ghz Again, here we see a number of interesting things. DXVK consistently provokes slightly higher clock speeds (because I didn’t pin them), which may explain why it skews slightly higher in the benchmark results. DXVK also uses nearly 2x more VRAM and nearly 5x more GTT. On more modern hardware it’s unlikely that this would matter since we all have more GPU memory than we can possibly use in an OpenGL game, but on older hardware—or in cases where memory usage might lead to power consumption that should be avoided because we’re running on battery—this could end up being significant. Conclusion Source Engine games run great on Linux. That’s what we all care about at the end of the day, isn’t it? But also, if more Source Engine games get ported to DXVK, give them a try with Nine. Or just test the currently ported ones, Portal 2 and Left 4 Dead 2. I want data. Lots of data. Post it here, email it to me, whatever. Until 2022 Lots of cool projects still in the works, so stay tuned next year! [Less]
Posted over 2 years ago
If you own a laptop (Dell, HP, Lenovo) with a WWAN module, it is very likely that the modules are FCC-locked on every boot, and the special FCC unlock procedure needs to be run before they can be used. Until ModemManager 1.18.2, the procedure was ... [More] automatically run for the FCC unlock procedures we knew about, but this will no longer happen. Once 1.18.4 is out, the procedure will need to be explicitly enabled by each user, under their own responsibility, or otherwise implicitly enabled after installing an official FCC unlock tool provided by the manufacturer itself. See a full description of the rationale behind this change in the ModemManager documentation site and the suggested code changes in the gitlab merge request. If you want to enable the ModemManager provided unofficial FCC unlock tools once you have installed 1.18.4, run (assuming sysconfdir=/etc and datadir=/usr/share) this command (*): sudo ln -sft /etc/ModemManager/fcc-unlock.d /usr/share/ModemManager/fcc-unlock.available.d/* The user-enabled tools in /etc should not be removed during package upgrades, so this should be a one-time setup. (*) Updated to have one single command instead of a for loop; thanks heftig! [Less]
Posted over 2 years ago
What If Zink Was Actually The Fastest GL Driver? In an earlier post I talked about Copper and what it could do on the way to a zink future. What I didn’t talk about was WSI, or the fact that I’ve already fully implemented it in the course of ... [More] bashing Copper into a functional state. Window System Integration …was the final step for zink to become truly usable. At present, zink has a very hacky architecture where it loads through the regular driver path, but then for every image that is presented on the screen, it keeps a shadow copy which it blits to just before scanout, and this is the one that gets displayed. Usually this works great other than the obvious (but minor) overhead that the blit incurs. Where it doesn’t work great, however, is on non-Mesa drivers. That’s right. I’m looking at you, NVIDIA. As long-time blog enthusiasts will remember, I had NVIDIA running on zink some time ago, but there was a problem as it related to performance. Specifically, that single blit turned into a blit and then a full-frame CPU copy, which made getting any sort of game running with usable FPS a bit of a challenge. WSI solves this by letting the Vulkan driver handle the scanout image entirely, removing all the copies to let zink render more like a normal driver (or game/app). So How Is it? That’s what everyone’s probably wondering. I have zink. I have WSI. I have my RTX2070 with the NVIDIA blob driver. How does NVIDIA’s Vulkan driver (with zink) stack up to NVIDIA’s GL driver? Everything below is using the 495.44 beta driver, as that’s the latest one at the time of my testing, and the non-beta driver didn’t work at all. But first, can NVIDIA’s GL driver even render the game I want to show? Confusingly, the answer is no, this version of NVIDIA’s GL driver can’t correctly render Tomb Raider, which is my go-to for all things GL and benchmarking. I’m gonna let that slide though since it’s still pumping out those frames at a solid rate. It’s frustrating, but sometimes just passing CTS isn’t enough to be able to run some games, or there’s certain extensions (ARB_bindless_texture) which are under-covered. The Numbers Don’t Lie I’ll say as a prelude that it was a bit challenging to get a AAA game running in this environment. There’s some very strange issues happening with the NVIDIA Vulkan driver which prevented me from running quite a lot of things. Tomb Raider was the first one I got going after two full days of hacking at it, and that’s about what my time budget allowed for the excursion, so I stopped at that. Up first: NVIDIA’s GL driver (495.44) Second: NVIDIA’s Vulkan driver (495.44) As we can see, zink with NVIDIA’s Vulkan driver is roughly 25-30% faster than NVIDIA’s GL driver for Tomb Raider. In Closing I doubt that zink maintains this performance gap for all titles, but now we know that there are already at least some cases where it can pull ahead. Given that most vendors are shifting resources towards current-year graphics APIs like Vulkan and D3D12, it won’t be surprising if maintenance-mode GL drivers start to fall behind actively developed Vulkan drivers. In short, there’s a real possibility that zink can provide tangible benefits to vendors who only want to ship Vulkan drivers, and those benefits might be more than (eventually) providing a conformant GL implementation. Stay tuned for tomorrow when I close out the week strong with one final announcement for the year. [Less]
Posted over 2 years ago
Previously I mentioned having AMD VCN h264 support. Today I added initial support for the older UVD engine[1]. This is found on chips from Vega back to SI.I've only tested it on my Vega so far.I also worked out the "correct" answer to the how to I ... [More] send the reset command correctly, however the nvidia player I'm using as a demo doesn't do things that way yet, so I've forked it for now[2].The answer is to use vkCmdControlVideoCodingKHR to send a reset the first type a session is used. However I can't see how the app is meant to know this is necessary, but I've asked the appropriate people.The initial anv branch I mentioned last week is now here[3].[1] https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-uvd-h264[2] https://github.com/airlied/vk_video_samples/tree/radv-fixes[3] https://gitlab.freedesktop.org/airlied/mesa/-/tree/anv-vulkan-video-prelim-decode [Less]
Posted over 2 years ago
Copper: It’s A Thing (Sort of) Over the past months, I’ve been working with Adam “X Does What I Say” Jackson to try and improve zink’s path through the arcane system of chutes and ladders that comprises Gallium’s loader architecture. The recent ... [More] victory in getting a Wayland system compositor running is the culmination of those efforts. I wanted to write at least a short blog post detailing some of the Gallium changes that were necessary to make this happen, if only so I have something to refer back to when I inevevitably break things later, so let’s dig in. Pipes: How Do They Work? It’s questionable to me whether anyone really knows how all the Gallium loader and DRI frontend interfaces work without first taking a deep read of the code and then having a nice, refreshing run around the block screaming to let all the crazy out. From what I understand of it, there’s the DRI (userspace) interface, which is used by EGL/GBM/GLX/SMH to manage buffers for scanout. DRI itself is split between software and platform; each DRI interface is a composite made of all the “extensions” which provide additional functionality to enable various API-level extensions. It’s a real disaster to have to work with, and ideally the eventual removal of classic drivers will allow it to be simplified so that mere humans like me can comprehend its majesty. Beyond all this, however, there’s the notion that the DRI frontend is responsible for determining the size of the scanout buffer as well as various other attributes. The software path through this is nice and simple since there’s no hardware to negotiate with, and the platform path exists. Currently, zink runs on the platform path, which means that the DRI frontend is what “runs” zink. It chooses the framebuffer size, manages resizes, and handles multisample resolve blits as needed for every frame that gets rendered. Too Many CooksPipes The problem with this methodology is that there’s effecively two WSI systems active simultaneously: the Mesa DRI architecture, and the (eventual) Vulkan WSI infrastructure. Vulkan WSI isn’t going to work at all if it isn’t in charge of deciding things like window size, which means that the existing DRI architecture can’t work, neither in the platform mode nor the software mode. As we know, there can be only one. Thus Adam has been toiling away behind the scenes, taking neither vacation nor lunch break for the past ten years in order to iterate on a more optimal solution. The result? Copper If you’re a Mesa developer or just a metallurgist, you know why the name Copper was chosen. The premise of Copper is that it’s a DRI interface extension which can be used exclusively by zink to avoid any of the problem areas previously mentioned. The application will create a window, create a GL context for it, and (eventually) Vulkan WSI can figure things out by just having the window/surface passed through. This shifts all the “driving” WSI code out of DRI and into Vulkan WSI, which is much more natural. In addition to Copper, zink can now be bound to a slight variation of the Gallium software loader to skip all the driver querying bits. There’s no longer anything to query, as DRI doesn’t have to make decisions anymore. It just calls through to zink normally, and zink can handle everything using the Vulkan API. Simple and clean. Unfortunately This all requires a ton of code. Looking at the two largest commits: 29 files changed, 1413 insertions(+), 540 deletions(-) 23 files changed, 834 insertions(+), 206 deletions(-) Is a big yikes. I can say with certainty that these improvements won’t be landing before 2022, but eventually they will in one form or another, and then zink will become significantly more flexible. [Less]
Posted over 2 years ago
Last week I mentioned I had the basics of h264 decode using the proposed vulkan video on radv. This week I attempted to do the same thing with Intel's Mesa vulkan driver "anv".Now I'd previously unsuccessfully tried to get vaapi on crocus working but ... [More] got sidetracked back into other projects. The Intel h264 decoder hasn't changed a lot between ivb/hsw/gen8/gen9 era. I ported what I had from crocus to anv and started trying to get something to decode on my WhiskeyLake.I wrote the code pretty early on, figured out all the things I had to send the hardware.The first anv side bridge to cross was Vulkan is doing H264 Picture level decode API, so it means you get handed the encoded slice data. However to program the Intel hw you need to decode the slice header. I wrote a slice header decoder in some common code. The other thing you need to give the intel hw is a number of bits of slice header, which in some encoding schemes is rounded to bytes and in some isn't. Slice headers also have a 3-byte header on them, which Intel hardware wants you to discard or skip before handing it to it.Once I'd fixed up that sort of thing in anv + crocus, I started getting grey I-frames decoded with later B/P frames using the grey frames as references so you'd see this kinda wierd motion.That was I think 3 days ago. I've have stared at this intently for those 3 days blaming everything from bitstream encoding to rechecking all my packets (not enough times though). I had someone else verify they could see grey frames.Today after a long discussion about possibilities, I was randomly comparing a frame from the intel-vaapi-driver and from crocus, and I spotted a packet header, the docs say is 34 dwords long, but intel-vaapi was only encoding 16 dwords, I switched crocus to explicitly state a 16-dword length and I started seeing my I-frames.Now the B/P frames still have issues. I don't think I'm getting the ref frames logic right yet, but it felt like a decent win after 3 days of staring at it.The crocus code is [1]. The anv code isn't cleaned up enough to post a pointer to yet, enterprising people might find it. Next week I'll clean it all up, and then start to ponder upstream paths and shared code for radv + anv. Then h265 maybe.[1]https://gitlab.freedesktop.org/airlied/mesa/-/tree/crocus-media-wip [Less]
Posted over 2 years ago
A Long Time Coming Zink can now run all display platform flavors of Weston (and possibly other compositors?). Expect it in zink-wip later today once it passes another round of my local CI. Here it is in DRM running weston-simple-egl and ... [More] weston-simple-dmabuf-egl all on zink: Under Construction This has a lot of rough edges, mostly as it relates to X11. In particular: xservers (including xwayland) can’t run because GLAMOR is hard some apps (e.g., Unigine Heaven) randomly get killed by the xserver for unknown reasons if you’re very lucky, you can hit a Vulkan WSI deadlock How? I’d go into details on this, but honestly it’s going to be like a week of posts to detail the sheer amount of chainsawing that’s gone into the project. Stay tuned for that and more next week. [Less]