Code Monkey home page Code Monkey logo

Comments (9)

Carter12s avatar Carter12s commented on June 2, 2024

Continuing to investigate and this may not be a bug in aravis, but some kind of low level kernel contention / resource contention.

I set up the two seperate processes (one reading with aravis, one reading with pylon) in two separate docker containers and used docker's macvlan driver to fully isolate their networking from each other. In this setup will still saw framerates for the aravis process dropping substantially.

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

Have attempted a wide variety of different test setups and isolation layers:

Our aravis based code runs perfectly:

  • while stress is being used to load the majority of all CPU cores
  • while stress-ng --malloc is being used to thrash memory
  • while memtester is being used to thrash memory

The only scenario is which the aravis code starts choking and spitting out bad buffers is when the Basler Pylon code is running at the same time, and this affect persists through completely isolating the two processes networking from each other with docker. Our Pylon based code appears unaffected and its framerate does not dip when aravis code is running, but not the other way around.

Interesting the magnitude of the slowdown we observe does not change whether we compile the two processes in Debug or Release mode...

Having tried to eliminate as many possible sources of contention, as far as I can tell there must be some interaction at the kernel level between aravis and pylon that is causing misbehavior.

from aravis.

EmmanuelP avatar EmmanuelP commented on June 2, 2024

Did you try to increase the socket buffer size for aravis ?
Did you try the packet socket ?

https://aravisproject.github.io/aravis/ethernet.html

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

Excellent leads to try! Thank you for the quick response!

Results from testing:

  • Setting socket-buffer size to Auto with the code from here made no difference on the performance. Interested in trying some other modes/settings here, but it is unclear to me what are other valid settings for this (these?) properties.
  • I'm not using aravis's receive thread setup (can try switching to that), manually calling arv_stream_timeout_pop_buffer() in my own thread. I did play with using pthread_setschedparam to increase the thread priority of my thread, and observed no effect on this performance issue. Let me know if this isn't equivalent?
  • I increased the MTU on the receiving interface to 9000, and set the packet size via aravis to 8192. Confirmed jumbo packets were being used via a packet capture, and again observed no affect on the performance issue.
  • I'm interested in trying out the Packet Socket support, but don't see any guidelines for how to enable that in user code? I'll keep digging. I'm running on a headless server so can't directly use arv-viewer to test

I am pretty suspicious about some conflict between Pylon and Aravis. The system running this has ample resources to handle both camera streams, and it appears that any connection to our Pylon camera (even one running at an extremely low framerate) causes issues with the aravis connection.

from aravis.

EmmanuelP avatar EmmanuelP commented on June 2, 2024

You may want to try arv-camera-test, which can help you test the different parameters. Also, try to enable the debug output, especially the stream log which will give you some statistics about the packet handling.

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

Great suggestions again. Thank you for taking the time to assist me here.

arv-camera-test luckily exhibits the exact same performance as my code does directly. It runs flawlessly while running by itself, and has copious errors while running at the same time as our Basler Pylon code.

Running by itself:

$ arv-camera-test-0.8 -n Specim-FX17e-076900060266
Looking for camera 'Specim-FX17e-076900060266'
vendor name            = Specim
model name             = FX17e
device serial number   = 076900060266
image width            = 320
image height           = 224
horizontal binning     = 2
vertical binning       = 1
exposure               = 2400 ยตs
gain                   = 3 dB
payload                = 71680 bytes
gv n_stream channels   = 1
gv current channel     = 0
gv packet delay        = 0 ns
gv packet size         = 7204 bytes
399 frames/s -    28.6 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
^C401 frames/s -    28.7 MiB/s
n_completed_buffers    = 4404
n_failures             = 0
n_underruns            = 0
n_timeouts             = 0
n_aborted              = 0
n_missing_frames       = 0
n_size_mismatch_errors = 0
n_received_packets     = 52848
n_missing_packets      = 0
n_error_packets        = 0
n_ignored_packets      = 0
n_resend_requests      = 0
n_resent_packets       = 0
n_resend_ratio_reached = 0
n_resend_disabled      = 0
n_duplicated_packets   = 0
n_transferred_bytes    = 316295280
n_ignored_bytes        = 0

Running while our Pylon code is active:

$ arv-camera-test-0.8 -n Specim-FX17e-076900060266
Looking for camera 'Specim-FX17e-076900060266'
vendor name            = Specim
model name             = FX17e
device serial number   = 076900060266
image width            = 320
image height           = 224
horizontal binning     = 2
vertical binning       = 1
exposure               = 2400 ยตs
gain                   = 3 dB
payload                = 71680 bytes
gv n_stream channels   = 1
gv current channel     = 0
gv packet delay        = 0 ns
gv packet size         = 7204 bytes
373 frames/s -    26.7 MiB/s - 18 errors
370 frames/s -    26.5 MiB/s - 19 errors
340 frames/s -    24.4 MiB/s - 31 errors
431 frames/s -    30.9 MiB/s - 17 errors
350 frames/s -    25.1 MiB/s - 17 errors
386 frames/s -    27.7 MiB/s - 12 errors
373 frames/s -    26.7 MiB/s - 30 errors
375 frames/s -    26.9 MiB/s - 21 errors
372 frames/s -    26.7 MiB/s - 28 errors
^C390 frames/s -      28 MiB/s - 10 errors
n_completed_buffers    = 3761
n_failures             = 203
n_underruns            = 72
n_timeouts             = 203
n_aborted              = 0
n_missing_frames       = 4
n_size_mismatch_errors = 0
n_received_packets     = 47202
n_missing_packets      = 1564
n_error_packets        = 79
n_ignored_packets      = 72
n_resend_requests      = 190
n_resent_packets       = 71
n_resend_ratio_reached = 135
n_resend_disabled      = 79
n_duplicated_packets   = 0
n_transferred_bytes    = 281657752
n_ignored_bytes        = 381136

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

Investigating with ARV_DEBUG=stream:3,stream-thread:3 arv-camera-test-0.8 while our Basler Pylon code is running

I can find occasional entries for:

[16:41:27.799] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 1199 at dt = 121014  
[16:41:27.699] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 20621, packet id = 11 (12 packets/frame)                           
[16:41:27.699] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 1199 (from packet 0 to 0)                                                       
[16:41:27.699] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 20870, packet id = 0 frame id = 1199   
[16:41:26.065] ๐Ÿ…ธ stream-thread> [GvStream::missing_packet_check] Maximum number of requests reached at dt = 21940, n_packet_requests = 4 (12 packets/frame), frame_id = 545

An example of a longer run of issues:

[16:41:25.929] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 21783, packet id = 7 frame id = 491                                                                                                                                                                                        [4654/92657]
[16:41:25.931] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 501                                                                                                                                                                                                                                            
[16:41:25.934] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 502                                                                                                                                                                                                                                            
[16:41:25.936] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 503                                                                                                                                                                                                                                            
[16:41:25.939] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 504                                                                                                                                                                                                                                            
[16:41:25.941] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 505                                                                                                                                                                                                                                            
[16:41:25.944] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 506                                                                                                                                                                                                                                            
[16:41:25.946] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 507                                                                                                                                                                                                                                            
[16:41:25.949] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 508                                                                                                                                                                                                                                            
[16:41:25.950] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 21545, packet id = 11 (12 packets/frame)                                                                                                                                                                                       
[16:41:25.950] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 500 (from packet 2 to 4)                                                                                                                                                                                                                    
[16:41:25.950] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 21767, packet id = 2 frame id = 500                                                                                                                                                                                                    
[16:41:25.951] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 509                                                                                                                                                                                                                                            
[16:41:25.952] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 1006, packet id = 8 (12 packets/frame)                                                                                                                                                                                         
[16:41:25.952] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 509 (from packet 0 to 0)                                                                                                                                                                                                                    
[16:41:25.953] ๐Ÿ…ณ stream-thread> [GvStream::process_data_leader] Received resent packet 0 for frame 509                                                                                                                                                                                                                 
[16:41:25.954] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 510                                                                                                                                                                                                                                            
[16:41:25.956] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 511                                                                                                                                                                                                                                            
[16:41:25.957] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 466 at dt = 100642                                                                                                                                                                                                                
[16:41:25.957] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 466                                                                                                                                                                                                                                                
[16:41:25.959] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 512                                                                                                                                                                                                                                            
[16:41:25.961] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 513                                                                                                                                                                                                                                            
[16:41:25.964] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 514                                                                                
[16:41:25.966] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 515                                                                                
[16:41:25.969] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 516                                                                                
[16:41:25.971] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 517                                                                                
[16:41:25.974] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 518                                                                                
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 467 at dt = 121496                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 467                                                                                    
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 468 at dt = 121496                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 468                                                                                    
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 471 at dt = 120974                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 471                                                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::check_frame_completion] Completed frame 472
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 472
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::check_frame_completion] Completed frame 473
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 473
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 474 at dt = 113030

From looking at these myself not know aravis too well all of this appears to be very "normal" logs and do look exactly like a system that is bottle-necking or have some communication issues. What I can't for the life of me figure out is why only our Basler Pylon code causes these to manifest and no other synthetic load I place the system under...

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

As a last hurrah to work around this issue I re-wrote our code that was using Pylon to talk to the Basler camera to use Aravis instead.

I now successfully have both cameras reliably communicating with the server using Aravis to talk to both of them.

I do believe there very likely a real compatibility issue somehow between the two libraries, but I cannot explain it.

The only thing out of numerous attempts to resolve the performance issue was to remove Pylon from running on the affected system...

from aravis.

Carter12s avatar Carter12s commented on June 2, 2024

Okay I'm going to go ahead and close this issue. I'm not really sure aravis is in anyway the problem here, but I am going to explain my findings in case anyone else stumbles across this issue themselves.

  1. We're attempting to run our Specim camera at a very high frame rate (400 Hz) which means there is a very limited window for any packet retries should a packet be dropped anywhere in the network stack.
  2. Aravis + Specim behaves perfectly when 0 packets are being dropped, but as soon as packets start being dropped none of the packet retries are successful and performance tanks.
  3. I believe this is largely due to the Specim camera likely having a single internal frame buffer, and very quickly having to clear that buffer to make way for the next frame (2.5ms at most). Different values for "initial-packet-timeout", "packet-timeout", and "frame-retention" did not seem to make a big difference here on the Aravis side.
  4. For whatever reason (still very unclear to me) our Pylon camera software causes a small percentage of packets for the aravis camera to be dropped whenever it is running regardless of the networking configuration, running the two cameras on different dedicated Ethernet cables, thread priority, whatever we tried. The inability of the Specim Camera + Aravis to successfully retry packets magnifies how detrimental it is to performance.
  5. My final setup has us talking to both the Specim camera and Basler camera using aravis, and in this setup (with identical cameras and networking setup) that small % of packet loss is not present, and both camera connections behave well.

Some remarks:

  • arv_camera_test was immensely helpful in diagnosing this situation thank you for that tool
  • The documentation around how to set "packet-timeout" and its friends is a bit lacking, and we could we use definitions of units for those fields and some convenience methods for setting them.

from aravis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.