I Use This!
Very High Activity

News

Analyzed 1 day ago. based on code collected 2 days ago.
Posted over 2 years ago
The Struggle Continues Everyone’s seen the Phoronix benchmark numbers by now, and though there’s a lot of confusion over how to calculate the percentage increase between “game does did not run a year ago” and “game runs”, it seems like a couple ... [More] people out there at Big Triangle are starting to take us seriously. With that said, even my parents are asking me what the deal is with this one result in particular: Performance isn’t supposed to go down. Everyone knows this. The version numbers go up and so does the performance as long as it’s not Javascript-based. Enraged, I sprinted to my computer and searched for tesseract game, which gave me the entirely wrong result, but I eventually did manage to find the right one. I fired up zink-wip, certain that this would end up being some bug I’d already fixed. Unfortunately, this was not the case. I vowed not to sleep, rebase, leave my office, or even run another application until this was resolved, so you can imagine how pleased I am to be writing this post after spending way too much time getting to the bottom of everything. Speculation Interlude Full disclosure: I didn’t actually check why performance went down. I’m pretty sure it’s just the result of having improved buffer mapping to be better in most cases, which ended up hurting this case. But Why …is the performance so bad? A quick profiling revealed that this was down to a Gallium component called vbuf, used for translating vertex buffers and attributes from the ones specified by the application to ones that drivers can actually support. The component itself is fine, the problem was that, ideally, it’s not something you ever want to be hitting when you want performance. Consider the usual sequence of drawing a frame: generate and upload vertex data bind some descriptors maybe throw in a query or two if you need some spice draw repeat until frame is done This is all great and normal, but what would happen—just hypothetically of course—if instead it looked like this: generate and upload vertex data stall and read vertex data rewrite vertex data in another format and reupload bind some descriptors maybe throw in a query or two if you need some spice draw repeat until frame is done Suddenly the driver is now stalling multiple times per frame on top of doing lots of CPU work! Incidentally, this is (almost certainly) why performance appeared to have regressed: the vertex buffer is now device-local and can’t be mapped directly, so it has to be copied to a new buffer before it can be read, which is even slower. Just AMD Problems DISCLAIMER: We’re going deep into meme territory now, so let’s all dial down the seriousness about a thousand notches before posting about how much I hate AMD or whatever. Unlike cool hardware, AMD opts to not support features which might be less performant. I assume this is in the hopes that developers will Make The Right Choice and not use those features, but obviously developers are gonna develop, and so it is that Tesseract-The-Game-But-Not-The-One-On-Steam uses 3-component vertex attributes that aren’t supported by AMD hardware, necessitating the use of vbuf to translate them to 4-component attributes that can be safely used. Decomposition The vertex buffer format at work here was R8G8B8_SNORM, which is a perfectly cromulent format as long as you hate yourself. A shader would read this as a vec4, which, by the power of buffer robustness, gets translated to vec4(x, y, z, 1.0) because the w component is missing. The approach I took to solving this was to decompose the vertex attribute into three separate R8_SNORM attributes, as this single-component format is wimpy enough for AMD to handle. Thus, a vertex input state containing three separate attributes including this one would now contain five, as the original R8G8B8_SNORM one is split into three, each reading a single component at an offset to simulate the original attribute. The tricky part to this is that it requires a vertex shader prolog and variant in order to successfully split the shader’s input in such a way that the read value is the same. It also requires a NIR pass. Let’s check out the NIR pass since this blog has gone for way too long without seeing any real work: struct decompose_state { nir_variable **split; bool needs_w; }; static bool decompose_attribs(nir_shader *nir, uint32_t decomposed_attrs, uint32_t decomposed_attrs_without_w) { uint32_t bits = 0; nir_foreach_variable_with_modes(var, nir, nir_var_shader_in) bits |= BITFIELD_BIT(var->data.driver_location); bits = ~bits; u_foreach_bit(location, decomposed_attrs | decomposed_attrs_without_w) { nir_variable *split[5]; struct decompose_state state; state.split = split; nir_variable *var = nir_find_variable_with_driver_location(nir, nir_var_shader_in, location); assert(var); split[0] = var; bits |= BITFIELD_BIT(var->data.driver_location); const struct glsl_type *new_type = glsl_type_is_scalar(var->type) ? var->type : glsl_get_array_element(var->type); unsigned num_components = glsl_get_vector_elements(var->type); state.needs_w = (decomposed_attrs_without_w & BITFIELD_BIT(location)) != 0 && num_components == 4; for (unsigned i = 0; i < (state.needs_w ? num_components - 1 : num_components); i++) { split[i+1] = nir_variable_clone(var, nir); split[i+1]->name = ralloc_asprintf(nir, "%s_split%u", var->name, i); if (decomposed_attrs_without_w & BITFIELD_BIT(location)) split[i+1]->type = !i && num_components == 4 ? var->type : new_type; else split[i+1]->type = new_type; split[i+1]->data.driver_location = ffs(bits) - 1; bits &= ~BITFIELD_BIT(split[i+1]->data.driver_location); nir_shader_add_variable(nir, split[i+1]); } var->data.mode = nir_var_shader_temp; nir_shader_instructions_pass(nir, lower_attrib, nir_metadata_dominance, &state); } nir_fixup_deref_modes(nir); NIR_PASS_V(nir, nir_remove_dead_variables, nir_var_shader_temp, NULL); optimize_nir(nir); return true; } First, the base of the pass; two masks are provided, one for attributes that are being fully split (i.e., four components) and one for attributes that have fewer than four components and thus need to have a w component added, as in the Tesseract case. Each variable in the mask is split into four, with slightly different behavior for the ones needing a w and the ones that don’t. The new variables are all given new driver locations matching the ones given to the split attributes for the vertex input pipeline state, and the decompose_state is passed along to the per-instruction part of the pass: static bool lower_attrib(nir_builder *b, nir_instr *instr, void *data) { struct decompose_state *state = data; nir_variable **split = state->split; if (instr->type != nir_instr_type_intrinsic) return false; nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); if (intr->intrinsic != nir_intrinsic_load_deref) return false; nir_deref_instr *deref = nir_src_as_deref(intr->src[0]); nir_variable *var = nir_deref_instr_get_variable(deref); if (var != split[0]) return false; unsigned num_components = glsl_get_vector_elements(split[0]->type); b->cursor = nir_after_instr(instr); nir_ssa_def *loads[4]; for (unsigned i = 0; i < (state->needs_w ? num_components - 1 : num_components); i++) loads[i] = nir_load_deref(b, nir_build_deref_var(b, split[i+1])); if (state->needs_w) { loads[3] = nir_channel(b, loads[0], 3); loads[0] = nir_channel(b, loads[0], 0); } nir_ssa_def *new_load = nir_vec(b, loads, num_components); nir_ssa_def_rewrite_uses(&intr->dest.ssa, new_load); nir_instr_remove_v(instr); return true; } The existing variable is passed along with the new variable array. Where the original is loaded, instead the new variables are all loaded in sequence and assembled into a vec matching the length of the original one. For attributes needing a w component, the first new variable is loaded as a vec4 so that the w component can be reused naturally. Then the original load instruction is removed, and with it, the original variable and its brokenness. Immediate Results Sort of. The frames were definitely there, but the graphics… Occlusion Queries It turns out there’s almost zero coverage for occlusion queries in Vulkan’s CTS. There’s surprisingly little coverage for most query-related things, in fact, which means it wasn’t too surprising when it turned out that there were RADV query bugs at play. What was surprising was how they manifested, but that was about par for anything that reads garbage memory. A simple one-liner later (just kidding, this fucken thing took like 4 days to find) and, magically, things were happening: We Did It. A big thanks to Bas Nieuwenhuizen for consulting along the way even despite being so busy getting a RADV raytracing MR up and, as always, preparing his next blog post. [Less]
Posted over 2 years ago
A year ago, I first announced libei - a library to support emulated input. After an initial spurt of development, it was left mostly untouched until a few weeks ago. Since then, another flurry of changes have been added, including some initial ... [More] integration into GNOME's mutter. So, let's see what has changed. A Recap First, a short recap of what libei is: it's a transport layer for emulated input events to allow for any application to control the pointer, type, etc. But, unlike the XTEST extension in X, libei allows the compositor to be in control over clients, the devices they can emulate and the input events as well. So it's safer than XTEST but also a lot more flexible. libei already supports touch and smooth scrolling events, something XTest doesn't have or is struggling with. Terminology refresher: libei is the client library (used by an application wanting to emulate input), EIS is the Emulated Input Server, i.e. the part that typically runs in the compositor. Server-side Devices So what has changed recently: first, the whole approach has flipped on its head - now a libei client connects to the EIS implementation and "binds" to the seats the EIS implementation provides. The EIS implementation then provides input devices to the client. In the simplest case, that's just a relative pointer but we have capabilities for absolute pointers, keyboards and touch as well. Plans for the future is to add gestures and tablet support too. Possibly joysticks, but I haven't really thought about that in detail yet. So basically, the initial conversation with an EIS implementation goes like this: Client: Hello, I am $NAME Server: Hello, I have "seat0" and "seat1" Client: Bind to "seat0" for pointer, keyboard and touch Server: Here is a pointer device Server: Here is a keyboard device Client: Send relative motion event 10/2 through the pointer device Notice how the touch device is missing? The capabilities the client binds to are just what the client wants, the server doesn't need to actually give the client a device for that capability. One of the design choices for libei is that devices are effectively static. If something changes on the EIS side, the device is removed and a new device is created with the new data. This applies for example to regions and keymaps (see below), so libei clients need to be able to re-create their internal states whenever the screen or the keymap changes. Device Regions Devices can now have regions attached to them, also provided by the EIS implementation. These regions define areas reachable by the device and are required for clients such as Barrier. On a dual-monitor setup you may have one device with two regions or two devices with one region (representing one monitor), it depends on the EIS implementation. But either way, as libei client you will know that there is an area and you will know how to reach any given pixel on that area. Since the EIS implementation decides the regions, it's possible to have areas that are unreachable by emulated input (though I'm struggling a bit for a real-world use-case). So basically, the conversation with an EIS implementation goes like this: Client: Hello, I am $NAME Server: Hello, I have "seat0" and "seat1" Client: Bind to "seat0" for absolute pointer Server: Here is an abs pointer device with regions 1920x1080@0,0, 1080x1920@1920,0 Server: Here is an abs pointer device with regions 1920x1080@0,0 Server: Here is an abs pointer device with regions 1080x1920@1920,0 Client: Send abs position 100/100 through the second device Notice how we have three absolute devices? A client emulating a tablet that is mapped to a screen could just use the third device. As with everything, the server decides what devices are created and the clients have to figure out what they want to do and how to do it. Perhaps unsurprisingly, the use of regions make libei clients windowing-system independent. The Barrier EI support WIP no longer has any Wayland-specific code in it. In theory, we could implement EIS in the X server and libei clients would work against that unmodified. Keymap handling The keymap handling has been changed so the keymap too is provided by the EIS implementation now, effectively in the same way as the Wayland compositor provides the keymap to Wayland clients. This means a client knows what keycodes to send, it can handle the state to keep track of things, etc. Using Barrier as an example again - if you want to generate an "a", you need to look up the keymap to figure out which keycode generates an A, then you can send that through libei to actually press the key. Admittedly, this is quite messy. XKB (and specifically libxkbcommon) does not make it easy to go from a keysym to a key code. The existing Barrier X code is full of corner-cases with XKB already, I espect those to be necessary for the EI support as well. Scrolling Scroll events have four types: pixel-based scrolling, discrete scrolling, and scroll stop/cancel events. The first should be obvious, discrete scrolling is for mouse wheels. It uses the same 120-based API that Windows (and the kernel) use, so it's compatible with high-resolution wheel mice. The scroll stop event notifies an EIS implementation that the scroll interaction has stopped (e.g. lifting fingers off) which in turn may start kinetic scrolling - just like the libinput/Wayland scroll stop events. The scroll cancel event notifies the EIS implementation that scrolling really has stopped and no kinetic scrolling should be triggered. There's no equivalent in libinput/Wayland for this yet but it helps to get the hook in place. Emulation "Transactions" This has fairly little functional effect, but interactions with an EIS server are now sandwiched in a start/stop emulating pair. While this doesn't matter for one-shot tools like xdotool, it does matter for things like Barrier which can send the start emulating event when the pointer enters the local window. This again allows the EIS implementation to provide some visual feedback to the user. To correct the example from above, the sequence is actually: ... Server: Here is a pointer device Client: Start emulating Client: Send relative motion event 10/2 through the pointer device Client: Send relative motion event 1/4 through the pointer device Client: Stop emulating Properties Finally, there is now a generic property API, something copied from PipeWire. Properties are simple key/value string pairs and cover those things that aren't in the immediate API. One example here: the portal can set things like "ei.application.appid" to the Flatpak's appid. Properties can be locked down and only libei itself can set properties before the initial connection. This makes them reliable enough for the EIS implementation to make decisions based on their values. Just like with PipeWire, the list of useful properties will grow over time. it's too early to tell what is really needed. Repositories Now, for the actual demo bits: I've added enough support to Barrier, XWayland, Mutter and GNOME Shell that I can control a GNOME on Wayland session through Barrier (note: the controlling host still needs to run X since we don't have the ability to capture input events under Wayland yet). The keymap handling in Barrier is nasty but it's enough to show that it can work. GNOME Shell has a rudimentary UI, again just to show what works: The status icon shows ... if libei clients are connected, it changes to !!! while the clients are emulating events. Clients are listed by name and can be disconnected at will. I am not a designer, this is just a PoC to test the hooks. Note how xdotool is listed in this screenshot: that tool is unmodified, it's the XWayland libei implementation that allows it to work and show up correctly The various repositories are in the "wip/ei" branch of: XWayland Barrier Mutter GNOME Shell And of course libei itself. Where to go from here? The last weeks were driven by rapid development, so there's plenty of test cases to be written to make sure the new code actually works as intended. That's easy enough. Looking at the Flatpak integration is another big ticket item, once the portal details are sorted all the pieces are (at least theoretically) in place. That aside, improving the integrations into the various systems above is obviously what's needed to get this working OOTB on the various distributions. Right now it's all very much in alpha stage and I could use help with all of those (unless you're happy to wait another year or so...). Do ping me if you're interested to work on any of this. [Less]
Posted over 2 years ago
A year ago, I first announced libei - a library to support emulated input. After an initial spurt of development, it was left mostly untouched until a few weeks ago. Since then, another flurry of changes have been added, including some initial ... [More] integration into GNOME's mutter. So, let's see what has changed. A Recap First, a short recap of what libei is: it's a transport layer for emulated input events to allow for any application to control the pointer, type, etc. But, unlike the XTEST extension in X, libei allows the compositor to be in control over clients, the devices they can emulate and the input events as well. So it's safer than XTEST but also a lot more flexible. libei already supports touch and smooth scrolling events, something XTest doesn't have or is struggling with. Terminology refresher: libei is the client library (used by an application wanting to emulate input), EIS is the Emulated Input Server, i.e. the part that typically runs in the compositor. Server-side Devices So what has changed recently: first, the whole approach has flipped on its head - now a libei client connects to the EIS implementation and "binds" to the seats the EIS implementation provides. The EIS implementation then provides input devices to the client. In the simplest case, that's just a relative pointer but we have capabilities for absolute pointers, keyboards and touch as well. Plans for the future is to add gestures and tablet support too. Possibly joysticks, but I haven't really thought about that in detail yet. So basically, the initial conversation with an EIS implementation goes like this: Client: Hello, I am $NAME Server: Hello, I have "seat0" and "seat1" Client: Bind to "seat0" for pointer, keyboard and touch Server: Here is a pointer device Server: Here is a keyboard device Client: Send relative motion event 10/2 through the pointer device Notice how the touch device is missing? The capabilities the client binds to are just what the client wants, the server doesn't need to actually give the client a device for that capability. One of the design choices for libei is that devices are effectively static. If something changes on the EIS side, the device is removed and a new device is created with the new data. This applies for example to regions and keymaps (see below), so libei clients need to be able to re-create their internal states whenever the screen or the keymap changes. Device Regions Devices can now have regions attached to them, also provided by the EIS implementation. These regions define areas reachable by the device and are required for clients such as Barrier. On a dual-monitor setup you may have one device with two regions or two devices with one region (representing one monitor), it depends on the EIS implementation. But either way, as libei client you will know that there is an area and you will know how to reach any given pixel on that area. Since the EIS implementation decides the regions, it's possible to have areas that are unreachable by emulated input (though I'm struggling a bit for a real-world use-case). So basically, the conversation with an EIS implementation goes like this: Client: Hello, I am $NAME Server: Hello, I have "seat0" and "seat1" Client: Bind to "seat0" for absolute pointer Server: Here is an abs pointer device with regions 1920x1080@0,0, 1080x1920@1920,0 Server: Here is an abs pointer device with regions 1920x1080@0,0 Server: Here is an abs pointer device with regions 1080x1920@1920,0 Client: Send abs position 100/100 through the second device Notice how we have three absolute devices? A client emulating a tablet that is mapped to a screen could just use the third device. As with everything, the server decides what devices are created and the clients have to figure out what they want to do and how to do it. Perhaps unsurprisingly, the use of regions make libei clients windowing-system independent. The Barrier EI support WIP no longer has any Wayland-specific code in it. In theory, we could implement EIS in the X server and libei clients would work against that unmodified. Keymap handling The keymap handling has been changed so the keymap too is provided by the EIS implementation now, effectively in the same way as the Wayland compositor provides the keymap to Wayland clients. This means a client knows what keycodes to send, it can handle the state to keep track of things, etc. Using Barrier as an example again - if you want to generate an "a", you need to look up the keymap to figure out which keycode generates an A, then you can send that through libei to actually press the key. Admittedly, this is quite messy. XKB (and specifically libxkbcommon) does not make it easy to go from a keysym to a key code. The existing Barrier X code is full of corner-cases with XKB already, I espect those to be necessary for the EI support as well. Scrolling Scroll events have four types: pixel-based scrolling, discrete scrolling, and scroll stop/cancel events. The first should be obvious, discrete scrolling is for mouse wheels. It uses the same 120-based API that Windows (and the kernel) use, so it's compatible with high-resolution wheel mice. The scroll stop event notifies an EIS implementation that the scroll interaction has stopped (e.g. lifting fingers off) which in turn may start kinetic scrolling - just like the libinput/Wayland scroll stop events. The scroll cancel event notifies the EIS implementation that scrolling really has stopped and no kinetic scrolling should be triggered. There's no equivalent in libinput/Wayland for this yet but it helps to get the hook in place. Emulation "Transactions" This has fairly little functional effect, but interactions with an EIS server are now sandwiched in a start/stop emulating pair. While this doesn't matter for one-shot tools like xdotool, it does matter for things like Barrier which can send the start emulating event when the pointer enters the local window. This again allows the EIS implementation to provide some visual feedback to the user. To correct the example from above, the sequence is actually: ... Server: Here is a pointer device Client: Start emulating Client: Send relative motion event 10/2 through the pointer device Client: Send relative motion event 1/4 through the pointer device Client: Stop emulating Properties Finally, there is now a generic property API, something copied from PipeWire. Properties are simple key/value string pairs and cover those things that aren't in the immediate API. One example here: the portal can set things like "ei.application.appid" to the Flatpak's appid. Properties can be locked down and only libei itself can set properties before the initial connection. This makes them reliable enough for the EIS implementation to make decisions based on their values. Just like with PipeWire, the list of useful properties will grow over time. it's too early to tell what is really needed. Repositories Now, for the actual demo bits: I've added enough support to Barrier, XWayland, Mutter and GNOME Shell that I can control a GNOME on Wayland session through Barrier (note: the controlling host still needs to run X since we don't have the ability to capture input events under Wayland yet). The keymap handling in Barrier is nasty but it's enough to show that it can work. GNOME Shell has a rudimentary UI, again just to show what works: The status icon shows ... if libei clients are connected, it changes to !!! while the clients are emulating events. Clients are listed by name and can be disconnected at will. I am not a designer, this is just a PoC to test the hooks. Note how xdotool is listed in this screenshot: that tool is unmodified, it's the XWayland libei implementation that allows it to work and show up correctly The various repositories are in the "wip/ei" branch of: XWayland Barrier Mutter GNOME Shell And of course libei itself. Where to go from here? The last weeks were driven by rapid development, so there's plenty of test cases to be written to make sure the new code actually works as intended. That's easy enough. Looking at the Flatpak integration is another big ticket item, once the portal details are sorted all the pieces are (at least theoretically) in place. That aside, improving the integrations into the various systems above is obviously what's needed to get this working OOTB on the various distributions. Right now it's all very much in alpha stage and I could use help with all of those (unless you're happy to wait another year or so...). Do ping me if you're interested to work on any of this. [Less]
Posted over 2 years ago
Hi all, hope you all are doing fine! Finally today the part 5 of my Outreachy Saga came out, week 9 was on 7/19/21, and as you can see I'm a little late too( ;P )... This week had the theme: “Career opportunities/ Career Goals” When I read the ... [More] Outreachy Organizers email, I had an anxiety crisis, starting to think about what I want to do after the internship, what my career goals are, I panicked... The Imposter Syndrome hit hard, and it still haunts my thoughts and it has been very challenging (as my therapist says) to work with this feeling of not being good enough to apply for a job opening or thinking that my resume is worthless... But week 11 arrived #SPOILERALERT with the theme: "Making connections" and talking to some people I could get to know their experiences in companies that work with free software and their contributions to it, I could feel that I am on the right path, that this is the area I want to work on. So let's get back to the topic of today's post!! What am I looking for? I'm looking for a job, preferably remote, which can be full or part-time! But also some other opportunity where I can improve my CV and help me to continue working with the Linux Kernel, preferably. The end of my Outreachy internship is fast approaching (or rather, it's the day after tomorrow (o_o) ), so after August 24th, I will be available to work full or part-time. I currently live in Fundão, Portugal, and I am open to remote positions based anywhere in the world, along with the possibility of international relocation (my dream is to live in Canada!!! 😜). What types of work would you like to contribute to? I would like to continue working with the Linux Kernel, and I also really like Embedded Systems (I've played a little with Raspberry, Beagle Bone, and Arduino) What tools or skills do you have that would help you with that work? You know... I have a lot of difficulty answering this question because I always think I don't have many skills, but I'll do my best!!! During my Outreachy internship, I Created vkms_config_show(), the function which aims to print the data in drm_debugfs_create_files(), I also started to learn how to debug the code. I already worked a little with the Coccinelle tool. And as I mentioned earlier, I've already used Raspberry, Beagle Bone, and Arduino. What languages do you speak, and at what school grade level? Portuguese (Native), English (intermediate) And reviewing now my CV and Linkedin, I could see that I've already done a lot! I have experience in remote work, with people from different parts of the world, I was the DAECOMP (Academic Directory of Computer engineering - that is how we call our students association) President, where I needed to interact with my classmates to find out which improvements they thought the course needed to improve, I also needed to interact with the campus management and our professors to get things to improve the course and the campus, I organized Free Software dissemination events. I was also a flute tutor (yes I study music!!!) and a robotics tutor, working with young people and children. Well, I'll stop here... To see more about my experience, feel free to visit my Linkedin Once again thank you for following me so far, my adventure with Outreachy is almost over, every day has been a lot of learning! Please feel free to comment! And stay tuned to the next chapters of this Saga!!! Take care and have a great day! [Less]
Posted over 2 years ago
We Back Just a quick update today while I dip my toes back into the blogosphere to remind myself that it’s not so scary. Remember when I blogged about how nice it would be to have a suballocator all those months ago? Now it’s landed, and it’s nice ... [More] indeed to have a suballocator. Remember when everyone wanted GL 4.6 compatibility contexts so they could play Feral ports of their favorite games? zink-wip did that 6 months ago. What this all means is that it’s more or less open testing season on zink. I’ve already got a sizable number of tickets open for various Steam games based on zink-wip testing, but this is hardly conclusive. What games work for you? What games don’t work? I’m not gonna test them all myself, so get out there and start crashing. [Less]
Posted over 2 years ago
Here I’m playing “Spelunky 2” on my laptop and simultaneously replaying the same Vulkan calls on an ARM board with Adreno GPU running the open source Turnip Vulkan driver. Hint: it’s an x64 Windows game that doesn’t run on ARM. The bottom ... [More] right is the game I’m playing on my laptop, the top left is GFXReconstruct immediately replaying Vulkan calls from the game on ARM board. How is it done? And why would it be useful for debugging? Read below! Debugging issues a driver faces with real-world applications requires the ability to capture and replay graphics API calls. However, for mobile GPUs it becomes even more challenging since for Vulkan driver the main “source” of real-world workload are x86-64 apps that run via Wine + DXVK, mainly games which were made for desktop x86-64 Windows and do not run on ARM. Efforts are being made to run these apps on ARM but it is still work-in-progress. And we want to test the drivers NOW. The obvious solution would be to run those applications on an x86-64 machine capturing all Vulkan calls. Then replaying those calls on a second machine where we cannot run the app. This way it would be possible to test the driver even without running the application directly on it. The main trouble is that Vulkan calls made on one GPU + Driver combo are not generally compatible with other GPU + Driver combo, sometimes even for one GPU vendor. There are different memory capabilities (VkPhysicalDeviceMemoryProperties), different memory requirements for buffer and images, different extensions available, and different optional features supported. It is easier with OpenGL but there are also some incompatibilities there. There are two open-source vendor-agnostic tools for capturing Vulkan calls: RenderDoc (captures single frame) and GFXReconstruct (captures multiple frames). RenderDoc at the moment isn’t suitable for the task of capturing applications on desktop GPUs and replaying on mobile because it doesn’t translate memory type and requirements (see issue #814). GFXReconstruct on the other hand has the necessary features for this. I’ll show a couple of tricks with GFXReconstruct I’m using to test things on Turnip. Capturing with GFXReconstruct At this point you either have the application itself or, if it doesn’t use Vulkan, a trace of its calls that could be translated to Vulkan. There is a detailed instruction on how to use GFXReconstruct to capture a trace on desktop OS. However there is no clear instruction of how to do this on Android (see issue #534), fortunately there is one in Android’s documentation: Android how-to (click me) For Android 9 you should copy layers to the application which will be traced For Android 10+ it's easier to copy them to com.lunarg.gfxreconstruct.replay You should have userdebug build of Android or probably rooted Android # Push GFXReconstruct layer to the device adb push libVkLayer_gfxreconstruct.so /sdcard/ # Since there is to APK for capture layer, # copy the layer to e.g. folder of com.lunarg.gfxreconstruct.replay adb shell run-as com.lunarg.gfxreconstruct.replay cp /sdcard/libVkLayer_gfxreconstruct.so . # Enable layers adb shell settings put global enable_gpu_debug_layers 1 # Specify target application adb shell settings put global gpu_debug_app # Specify layer list (from top to bottom) adb shell settings put global gpu_debug_layers VK_LAYER_LUNARG_gfxreconstruct # Specify packages to search for layers adb shell settings put global gpu_debug_layer_app com.lunarg.gfxreconstruct.replay If the target application doesn’t have rights to write into external storage - you should change where the capture file is created: adb shell "setprop debug.gfxrecon.capture_file '/data/data//files/'" However, when trying to replay the trace you captured on another GPU - most likely it will result in an error: [gfxrecon] FATAL - API call vkCreateDevice returned error value VK_ERROR_EXTENSION_NOT_PRESENT that does not match the result from the capture file: VK_SUCCESS. Replay cannot continue. Replay has encountered a fatal error and cannot continue: the specified extension does not exist Or other errors/crashes. Fortunately we could limit the capabilities of desktop GPU with VK_LAYER_LUNARG_device_simulation VK_LAYER_LUNARG_device_simulation when simulating another GPU should be told to intersect the capabilities of both GPUs, making the capture compatible with both of them. This could be achieved by recently added environment variables: VK_DEVSIM_MODIFY_EXTENSION_LIST=whitelist VK_DEVSIM_MODIFY_FORMAT_LIST=whitelist VK_DEVSIM_MODIFY_FORMAT_PROPERTIES=whitelist whitelist name is rather confusing because it’s essentially means “intersection”. One would also need to get a json file which describes target GPU capabilities, this should be done by running: vulkaninfo -j &> .json The final command to capture a trace would be: VK_LAYER_PATH=: \ VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_gfxreconstruct:VK_LAYER_LUNARG_device_simulation \ VK_DEVSIM_FILENAME=.json \ VK_DEVSIM_MODIFY_EXTENSION_LIST=whitelist \ VK_DEVSIM_MODIFY_FORMAT_LIST=whitelist \ VK_DEVSIM_MODIFY_FORMAT_PROPERTIES=whitelist \ Replaying with GFXReconstruct gfxrecon-replay -m rebind --skip-failed-allocations .gfxr -m Enable memory translation for replay on GPUs with memory types that are not compatible with the capture GPU’s rebind Change memory allocation behavior based on resource usage and replay memory properties. Resources may be bound to different allocations with different offsets. --skip-failed-allocations skip vkAllocateMemory, vkAllocateCommandBuffers, and vkAllocateDescriptorSets calls that failed during capture Without these options replay would fail. Now you could easily test any app/game on your ARM board, if you have enough RAM =) I even successfully ran a capture of “Metro Exodus” on Turnip. But what if I want to test something that requires interactivity? Or you don’t want to save a huge trace on disk, which could grow tens of gigabytes if application is running for considerable amount of time. During the recording GFXReconstruct just appends calls to a file, there are no additional post-processing steps. Given that the next logical step is to just skip writing to a disk and send Vulkan calls over the network! This would allow us to interact with the application and immediately see the results on another device with different GPU. And so I hacked together a crude support of over-the-network replay. The only difference with ordinary tracing is that now instead of file we have to specify a network address of the target device: VK_LAYER_PATH=: \ ... GFXRECON_CAPTURE_FILE=":" \ And on the target device: while true; do gfxrecon-replay -m rebind --sfa ":"; done Why while true? It is common for DXVK to call vkCreateInstance several times leading to the creation of several traces. When replaying over the network we therefor want gfxrecon-replay to immediately restart when one trace ends to be ready for another. You may want to bring the FPS down to match the capabilities of lower power GPU in order to prevent constant hiccups. It could be done either with libstrangle or with mangohud: stranglevk -f 10 MANGOHUD_CONFIG=fps_limit=10 mangohud You have seen the result at the start of the post. [Less]
Posted over 2 years ago
I’ve been silent here for quite some time, so here is a quick summary of some of the new functionality we have been exposing in V3DV, the Vulkan driver for Raspberry PI 4, over the last few months: VK_KHR_bind_memory2 VK_KHR_copy_commands2 ... [More] VK_KHR_dedicated_allocation VK_KHR_descriptor_update_template VK_KHR_device_group VK_KHR_device_group_creation VK_KHR_external_fence VK_KHR_external_fence_capabilities VK_KHR_external_fence_fd VK_KHR_external_semaphore VK_KHR_external_semaphore_capabilities VK_KHR_external_semaphore_fd VK_KHR_get_display_properties2 VK_KHR_get_memory_requirements2 VK_KHR_get_surface_capabilities2 VK_KHR_image_format_list VK_KHR_incremental_present VK_KHR_maintenance2 VK_KHR_maintenance3 VK_KHR_multiview VK_KHR_relaxed_block_layout VK_KHR_sampler_mirror_clamp_to_edge VK_KHR_storage_buffer_storage_class VK_KHR_uniform_buffer_standard_layout VK_KHR_variable_pointers VK_EXT_custom_border_color VK_EXT_external_memory_dma_buf VK_EXT_index_type_uint8 VK_EXT_physical_device_drm Besides that list of extensions, we have also added basic support for Vulkan subgroups (this is a Vulkan 1.1 feature) and Geometry Shaders (we use this to implement multiview). I think we now meet most (if not all) of the Vulkan 1.1 mandatory feature requirements, but we still need to check this properly and we also need to start doing Vulkan 1.1 CTS runs and fix test failures. In any case, the bottom line is that Vulkan 1.1 should be fairly close now. [Less]
Posted over 2 years ago
Just about a year after the original announcement, I think it's time to see the progress on power-profiles-daemon. Note that I would still recommend you read the up-to-date project README if you have questions about why this project was necessary ... [More] , and why a new project was started rather than building on an existing one.  The project was born out of the need to make a firmware feature available to end-users for a number of lines of Lenovo laptops for them to be fully usable on Fedora. For that, I worked with Mark Pearson from Lenovo, who wrote the initial kernel support for the feature and served as our link to the Lenovo firmware team, and Hans de Goede, who worked on making the kernel interfaces more generic. More generic, but in a good way  With the initial kernel support written for (select) Lenovo laptops, Hans implemented a more generic interface called platform_profile. This interface is now the one that power-profiles-daemon will integrate with, and means that it also supports a number of Microsoft Surface, HP, Lenovo's own Ideapad laptops, and maybe Razer laptops soon.  The next item to make more generic is Lenovo's "lap detection" which still relies on a custom driver interface. This should be soon transformed into a generic proximity sensor, which will mean I get to work some more on iio-sensor-proxy. Working those interactions  power-profiles-dameon landed in a number of distributions, sometimes enabled by default, sometimes not enabled by default (sigh, the less said about that the better), which fortunately meant that we had some early feedback available.  The goal was always to have the user in control, but we still needed to think carefully about how the UI would look and how users would interact with it when a profile was temporarily unavailable, or the system started a "power saver" mode because battery was running out.  The latter is something that David Redondo's work on the "HoldProfile" API made possible. Software can programmatically switch to the power-saver or performance profile for the duration of a command. This is useful to switch to the Performance profile when running a compilation (eg. powerprofilesctl jhbuild --no-interact build gnome-shell), or for gnome-settings-daemon to set the power-saver profile when low on battery.  The aforementioned David Redondo and Kai Uwe Broulik also worked on the KDE interface to power-profiles-daemon, as Florian Müllner implemented the gnome-shell equivalent. Promised by me, delivered by somebody else :)  I took this opportunity to update the Power panel in Settings, which shows off the temporary switch to the performance mode, and the setting to automatically switch to power-saver when low on battery.Low-Power, everywhere Talking of which, while it's important for the system to know that they're targetting a power saving behaviour, it's also pretty useful for applications to try and behave better.  Maybe you've already integrated with "low memory" events using GLib, but thanks to Patrick Griffis you can be an event better ecosystem citizen and monitor whether the system is in "Power Saver" mode and adjust your application's behaviour.  This feature will be available in GLib 2.70 along with documentation of useful steps to take. GNOME Software will already be using this functionality to avoid large automated downloads when energy saving is needed. Availability The majority of the above features are available in the GNOME 41 development branches and should get to your favourite GNOME-friendly distribution for their next release, such as Fedora 35. [Less]
Posted over 2 years ago
 I've been chasing a crocus misrendering bug show in a qt trace.The bottom image is crocus vs 965 on top. This only happened on Gen4->5, so Ironlake and GM45 were my test machines. I burned a lot of time trying to work this out. I trimmed the ... [More] traces down, dumped a stupendous amount of batchbuffers, turned off UBO push constants, dump all the index and vertex buffers, tried some RGBx changes, but nothing was rushing to hit me, except that the vertex shaders produced were different.However they were different for many reasons, due to the optimization pipelines the mesa state tracker runs vs the 965 driver. Inputs and UBO loads were in different places so there was a lot of noise in the shaders.I ported the trace to a piglit GL application so I could easier hack on the shaders and GL, with that I trimmed it down even further (even if I did burn some time on a misplace */+ typo).Using the ported app, I removed all uniform buffer loads and then split the vertex shader in half (it was quite large, but had two chunks). I finally then could spot the difference in the NIR shaders.What stood out was the 965 shader had an if which the crocus shader has converted to a bcsel. This is part of peephole optimization and the mesa/st calls it, and sure enough removing that call fixed the rendering, but why? it is a valid optimization.In a parallel thread on another part of the planet, Ian Romanick filed a MR to mesa https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191 fixing a bug in the gen4/5 fs backend with conditional selects. This was something he noticed while debugging elsewhere. However his fix was for the fragment shader backend, and my bug was in the vec4 vertex shader backend. I tracked down where the same changes were needed in the vec4 backend and tested a fix on top of his branch, and the misrendering disappeared.It's a strange coincidence we both started hitting the same bug in different backends in the same week via different tests, but he's definitely saved me a lot of pain in working this out! Hopefully we can combine them and get it merged this week.Also thanks to Angelo on the initial MR for testing crocus with some real workloads. [Less]
Posted over 2 years ago
Hi all, hope you all are doing fine! Finally today the part 4 of my Outreachy Saga came out, the mid-point was on 5/7/21 and as you can see I'm really late, this week had the theme: “Modifying Expectations”. But why did it take me so long to post? ... [More] First, I had to internalize the topic a lot, because in my head I thought that when I reached this point, I would have achieved all the goals I had proposed at the beginning of the internship, but when the mid-point arrived, it seemed to me that I didn't have done anything and that my internship was going to end, as I didn't fulfill expectations. The project aimed at 2 tasks Clean up the debugfs support Remove custom dumbmapoffset implementations During the development of the first task, we found that it could not be carried out as intended, so it needed to be restructured and resulted in: Create vkmsconfigshow( ) function Which aims to print the data in drmdebugfscreate_files() It has already been reviewed and approved to be part of the drm-misc tree During the development of this function, I came across an improvement in the code: Replace macro in vkms_release() It has already been reviewed and approved to be part of the drm-misc tree As part of this week's assignments, I needed to talk to my advisors about the internship progress and schedule review. During our conversation, I could see/understand that I managed to achieve one of the goals, as presented above (I thought I hadn't achieved anything!!), and we also realized that I was not going to be able to do the second task, as it was in another context and could take a long time to understand how to solve it. Thus, for the second half of the internship, it was decided to convert vkmsconfigdebugfs into the structure proposed by Wambui Karuga, and here I'm working on it. During this time I'm learning that the Linux Kernel development is not linear, that several things can happen (setup problem again, break the kernel, not knowing what to do...), so I realized that one of the goals of Outreachy is learning to contribute and work with the project I've chosen. So I started to take advantage of my journey to learn how to contribute as much I can in the Linux Kernel and as I identified a lot with the development, maybe I can find a job to keep working/contributing to the Linux Kernel development. Thank you for following me so far, please feel free to comment! And stay tuned to the next chapters of this Saga!!! Take care and have a great day! [Less]