Code Monkey home page Code Monkey logo

Comments (29)

bradgrantham-lunarg avatar bradgrantham-lunarg commented on July 28, 2024

Hi, @williamih ! Can you upload the logcat with the gfxrecon logs in them? That's what I was hoping to see but there don't seem to be any in there.

from gfxreconstruct.

bradgrantham-lunarg avatar bradgrantham-lunarg commented on July 28, 2024

It may be that the lack of gfxrecon lines in the logcat means that gfxrecon isn't being loaded. Does a trace file get created anywhere?

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Thanks @bradgrantham-lunarg - sorry, was just trying to save time parsing through a lot of logs. There are gfxrecon lines, but (at least to me) I'm not seeing anything obviously wrong.

gfxreconstruct_logs_quest.txt

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

I can also confirm that a trace file is being created, although I'd think it doesn't contain much, since the app doesn't finish starting up

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

A few further findings:

  1. The unassisted memory tracking option works correctly on Quest; however, the performance impact is large and unfortunately is too slow to use the app normally.
  2. I tried to work around (1) by using debug.gfxrecon.capture_android_trigger and only setting this to true once I've moved to the location in the app that I want to capture. However, it seems this option is not responding at runtime -- when I enable this while the app is already running, the gfxrecon trace does not capture anything or write anything to disk.
  3. I also tried setting debug.gfxrecon.capture_frames to a frame range so that the trace only starts once I'm loaded into the app, but when I set this option, nothing is traced or saved to disk.

If I use unassisted and don't set debug.gfxrecon.capture_frames, and either leave debug.gfxrecon.capture_android_trigger as unset or set it to true, the trace works and saves to disk, but the app performance is so slow that I can't get to the place in the app that I want to capture.

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

Hi @williamih

I think this is clearly a case of our signal handler being replaced. The backtrace suggests that the segmentation faults takes place during a texture uploading (writing to mapped memory) and also unassisted method is not crashing.

The absence of the message PageGuardManager: Signal handler has been removed. Re-installing. from the logcat suggests that the watcher thread did not get a chance to detect the substitution before the crash happened

Nevertheless can you please try raising debug.gfxrecon.page_guard_signal_handler_watcher_max_restores from the default 1 to something like 2 or 3 or maybe even to something more aggressive like -1 which will effectively keep the watcher thread running? Maybe you can get the chance to get it to work

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Hi @panos-lunarg

Thanks for looking into this - I'd missed that setting when looking at the docs originally, so it's good to know.

Oddly, when I tried to reproduce the issue today (with the same settings as last time), our app has been working correctly even using gfxrecon with page_guard. The issue of getting stuck on the app opening screen isn't happening. I'm not sure what's changed here.

However, there are still some further problems.

Regarding debug.gfxrecon.page_guard_signal_handler_watcher_max_restores, here's what I've found:

  1. Even after setting this to -1, I still don't see PageGuardManager: Signal handler has been removed. Re-installing. anywhere in the logs.
  2. Similarly, the error messages about a crash due to SIGSEGV are still being printed even after setting that option.

Although the app is running correctly with page_guard enabled now, it seems the trace is not being written out correctly. There are two .gfxr files being generated, one of which is 0 bytes and the other is only 4 KB, even after staying in the app for several minutes. Do you know why this might be?

Thanks!

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

That's odd. Can you please attach an unabridged logcat of a "successful" run?

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Thanks @panos-lunarg

Logs here:
logs-gfxrecon-infinite-restores.txt

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

Right, can't see anything new. I can't really explain how the application is not complete crashing but I still think this is because of our signal handler being replaced and the watcher thread does not get a chance to notice it.

I gave it a quick try to raise watcher's thread priority (or lowering parent's thread priority) but it looks like it's not that simple to do

Another user faced a similar issue in the past (#990). IIRC he got it working by doing a hack using https://github.com/iqiyi/xHook
If you feel up to it you can give it a try.

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

That xHook library looks interesting - thanks for the suggestion @panos-lunarg.

I have a couple of other thoughts/questions:

  1. It would be fairly easy to modify our Vulkan code to use vkFlushMappedMemoryRanges() so that we could then use the assisted memory tracking mode. However, part of the app's rendering (mostly UI elements, etc) is handled by Unity, and we can't modify Unity's Vulkan code, so some modifications to memory mapped by Unity would be missed by gfxreconstruct. Could this cause corruption of the generated trace file or other errors, if the Unity-related writes were missed?

  2. As mentioned a bit earlier - I couldn't get the debug.gfxrecon.capture_android_trigger and debug.gfxrecon.capture_frames settings to work. Do you have any thoughts on how to investigate this?

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024
  1. It would be fairly easy to modify our Vulkan code to use vkFlushMappedMemoryRanges() so that we could then use the assisted memory tracking mode. However, part of the app's rendering (mostly UI elements, etc) is handled by Unity, and we can't modify Unity's Vulkan code, so some modifications to memory mapped by Unity would be missed by gfxreconstruct. Could this cause corruption of the generated trace file or other errors, if the Unity-related writes were missed?

Yes this will cause corruption. Imagine you are uploading a texture but gfxr fails to notice that, so the texture's content are not dumped into the capture file. When you replay the file, gfxr will not update the device memory so the texture will be filled with uninitialized data.

  1. As mentioned a bit earlier - I couldn't get the debug.gfxrecon.capture_android_trigger and debug.gfxrecon.capture_frames settings to work. Do you have any thoughts on how to investigate this?

IIRC in order to use debug.gfxrecon.capture_android_trigger you have to set it to false when you start capturing and then set it to true to trigger trimming (then false again to stop). If it isn't set at all at the beginning then you'll be capturing as normal (non trimmed). I prepared a patch that should now print to logcat when this changes values (#1147)
It's strange that debug.gfxrecon.capture_frames isn't doing anything either. Can you see anything in logcat? Gfxr should print something there when capturing trimmed even before you enter the trim range.

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Thanks for the info - makes sense. With debug.gfxrecon.capture_android_trigger, I've been setting it to false before starting the app, then setting it to true once the app is running. However, nothing is captured. I can see from the logs that gfxr is detecting the false value when the app starts up, but it's not logging detection of the later modification to true. (Although maybe it's not meant to log that, since you created a patch? But it's definitely logging the initial false value, and not logging the change to true, and also not writing out capture data).

I can try both options again tomorrow to double check.

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

BTW doing a trimmed capture is not going to make the crash go away. page_guard mechanism needs to keep track of mapped device memory while doing trimmed captures, even outside the frame-range / trigger-regions.

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

I started experimenting with integrating xHook sources in GFXR in patch #1150
I couldn't get it to work on desktop but it did work on Android with Unreal (Fortnite) where we observe the same behavior.
Please give it a try and let me know it it works for you too. Basically apply #1150 and do not use the signal watcher option (set debug.gfxrecon.page_guard_signal_handler_watcher to 0)

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Thanks @panos-lunarg - I'll try the xHook patch out!

I was thinking of using a trimmed capture with unassisted memory tracking as I thought that might reduce some of the performance hit until I get to the part of the app that I want to trace. But based on your explanation of how page_guard works, I'd assume unassisted also still does all the tracking even with trimmed captures, and so there probably wouldn't be any performance benefit there?

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

Ah right. I would expect unassisted to be much faster outside the trim regions as it will not be dumping the mapped memory to the capture file. So it should be worth a try if for example you only want to capture a few frames from a specific scene and you don't care too much about precision.

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Makes sense. I still can't get the debug.gfxrecon.capture_frames setting working unfortunately. When I set this option, nothing seems to be captured (no files written). Any thoughts on this?

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Also - to test with the xHook patch (#1150), is there a way to get the Android binary from the CI build, or should I just build from source locally?

from gfxreconstruct.

bradgrantham-lunarg avatar bradgrantham-lunarg commented on July 28, 2024

Makes sense. I still can't get the debug.gfxrecon.capture_frames setting working unfortunately. When I set this option, nothing seems to be captured (no files written). Any thoughts on this?

Does it work if you set to "1-100"? What are you setting it to now?

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

I was about to suggest the same as @bradgrantham-lunarg: What are you setting debug.gfxrecon.capture_frames to? If you request a very high frame number, depending on the application's frame rate, it can take a considerable time to reach it, and until it is reached GFXR will not create a capture file. Try something low to make sure that it works

Choosing the correct frame range can be a tough guessing game so I would suggest trying debug.gfxrecon.capture_android_trigger as you have a better control of when to start capturing. I just merged #1147 which should make visible in logcat whencapture_android_trigger changes values.

Regarding #1150, yes it would require you to build GFXR locally. I don't think the CI builds are visible externally so If you are not comfortable with building it locally I can upload it for you here.

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Oddly, when I run adb shell "setprop debug.gfxrecon.capture_frames '1-100'" (following your idea @bradgrantham-lunarg) it works fine, i.e. saves a trace file.

But when I run adb shell "setprop debug.gfxrecon.capture_frames '2-100'" nothing gets traced or saved to a file. Logs here for the 2-100 frame range which isn't working:

logs-2-100.txt

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

I'll try building myself as that's probably useful to be able to do, but can let you know if I run into any issues - thanks.

from gfxreconstruct.

panos-lunarg avatar panos-lunarg commented on July 28, 2024

Interesting. Seems like we are missing the starting frame. Drafted a PR (#1153) to address it

from gfxreconstruct.

bradgrantham-lunarg avatar bradgrantham-lunarg commented on July 28, 2024

Oddly, when I run adb shell "setprop debug.gfxrecon.capture_frames '1-100'" (following your idea @bradgrantham-lunarg) it works fine, i.e. saves a trace file.

Your logs indicate that the "2-100" range was detected, but I don't understand how the logic in the code could miss the beginning of the range. I'd have to suspect a strange Android behavior for strings or some such or maybe even a memory stomp.

Sorry to keep bombarding you with requests for logs. Could you send the logs for the successful run for 1-100?

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

@bradgrantham-lunarg Thanks, logs here for 1-100:
logs-1-100.txt

from gfxreconstruct.

bradgrantham-lunarg avatar bradgrantham-lunarg commented on July 28, 2024

I think that because you're running an XR app on Quest, you're never getting a QueuePresentKHR, so we're never incrementing our frame counter. Thus we never enter frame 2. @davidd-lunarg , is this app a good candidate for testing out #1138 ?

from gfxreconstruct.

davidd-lunarg avatar davidd-lunarg commented on July 28, 2024

is this app a good candidate for testing out #1138 ?

Yes, this may be a good app to test with #1138. If the VR app is not using vkQueuePresentKHR then GFXR will not detect frame boundaries and the entire capture will treated as frame 1 by GFXR. Note that to test #1138 the android property debug.vr.profiler must be set to 1 for capture.

As to why trim frames 1-100 produces a capture file but 2-100 does not, this may be due to the fact that trim ranges starting at frame 1 are treated more like full captures--there is no need to track and write a trim state block. However if vkQueuePresentKHR is not being called I would guess the capture does not stop at frame 100. Instead it will capture until the app is closed.

from gfxreconstruct.

williamih avatar williamih commented on July 28, 2024

Sorry for the delayed reply - not sure if/when I'll get a chance to test out those PRs as there's a few other work priorities I need to focus on right now. Thanks for the help investigating - this does seem like a really useful tool so hopefully I can come back to this sometime and get it going!

from gfxreconstruct.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.