Comments (12)
Would you be able to provide a packet capture of (at least) the TCP part of this exchange? Preferably with wireshark?
from pvxs.
epicsThreadGetCPUs() -> 7
Unrelated to the issue reported. What kind of system has an odd number of CPU cores/hyperthreads? Is this some kind of VM?
from pvxs.
Yes I am running this in VirtualBox and I intentionally assign one core less than I have so that commands like make -j $(nproc)
does not entirely "kill" my laptop.
I hope this Wireshark log will help.
I had a running CSS with PV Formula: pva://topic1
and then I run ./example/O.linux-x86_64/mailbox topic1
mailbox-topic1.zip
Maybe not relevant but this is an error from CSS
2021-01-12T09:19:40.304+01 SEVERE [Thread 1] org.csstudio.logging.PluginLogListener (logging) - Unhandled event loop exception
java.lang.NullPointerException
at org.diirt.support.pva.PVAChannelHandler.getProperties(PVAChannelHandler.java:314)
at org.csstudio.diag.pvmanager.probe.DetailsPanel.setChannelProperties(DetailsPanel.java:214)
at org.csstudio.diag.pvmanager.probe.DetailsPanel$1$1.run(DetailsPanel.java:194)
at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:40)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:185)
at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:5026)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:4582)
at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$5.run(PartRenderingEngine.java:1173)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1062)
at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:155)
at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:644)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:566)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150)
at org.csstudio.utility.product.Workbench.runWorkbench(Workbench.java:99)
at org.csstudio.startup.application.Application.startApplication(Application.java:265)
at org.csstudio.startup.application.Application.start(Application.java:119)
at org.csstudio.iter.css.product.ITERApplication.start(ITERApplication.java:120)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:203)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:137)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:107)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:400)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:255)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:661)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:597)
at org.eclipse.equinox.launcher.Main.run(Main.java:1476)
at org.eclipse.equinox.launcher.Main.main(Main.java:1449)
from pvxs.
I hope this Wireshark log will help.
It looks like you captured only the UDP (search) traffic. The relevant part is the TCP traffic. I've added a section on packet capture to the documentation. Please let me know if this is helpful (and correct).
from pvxs.
I may have an idea of what is going wrong. Can you re-test with the master branch (at e9ce808)? If this doesn't fix the issue, I've also added some more detail to the error message which will hopefully give some further clue.
from pvxs.
I can confirm that the issue is fixed now.
I do not know if it is somehow related but I noticed one error in Log Messages in CSS while running the same example as described in my first post, which pops up exactly every 60s.
2021-01-13T12:17:29.262+01 WARNING [Thread 188] org.epics.pvaccess.impl.remote.codec.AbstractCodec (processHeader) - Invalid header received from client /10.0.2.15:59504, disconnecting...
I started capturing data a few seconds before the event and stopped about a second after the event.
Invalid header received from client.pcapng.gz
Maybe another issue should be opened for this?
from pvxs.
I can confirm that the issue is fixed now.
Good.
Invalid header received from client /10.0.2.15:59504, disconnecting...
I think this error message is itself in error. It indicates a protocol framing error. Based on your last packet capture, and some local tests, I think the actual cause is that the server is timing out and closing the connection.
I can see an unacknowledged CMD_ECHO from the client, and a ~200us later the server RSTs the connection. I guess this abnormal close somehow isn't handled properly in pvAccessJava and maybe junk in the RX buffer is being processed?
If I set export PVXS_LOG=*=DEBUG
(or WARN
) for the mailbox server I see eg.
2021-01-13T09:58:59.610581953 WARN pvxs.tcp.io Client 192.168.210.1:55892 connection timeout
I don't see this every time though.
The long story of inactivity timeouts with pvAccessCPP is laid out in epics-base/pvAccessCPP#139. The short story is that originally C++ clients were not sending CMD_ECHO, and C++ servers would never timeout. I tried to address this with epics-base/pvAccessCPP#144 .
I knew that pvAccessJava clients were sending CMD_ECHO, but it looks like I misinterpreted the meaning of the timeout configuration parameter. pvAccessJava clients are sending a echo every 30 seconds and timeout out after 60 seconds, while pvAccessCPP (and now PVXS) servers timeout after 30 seconds.
So with a C++ server, and Java client, there is a tight race between the client sending CMD_ECHO, and the server timing out. On my laptop it seems that the client echo won often enough that I didn't notice this at the time. I do sometimes see the "Invalid header" message now though.
I guess the only reasonable course of action is to increase the timeout in pvAccessCPP and PVXS from 30 seconds to 60, while leaving the echo interval at 15 seconds?
@kasemir fyi.
from pvxs.
6861f03 increases the inactivity timeout to 40 seconds. A future change will make this configurable.
from pvxs.
@mdavidsaver thanks for your quick response and detailed explanations.
With the latest commit, Invalid header received
warning does not show up any more.
Should we tag the latest commit with 0.1.1?
Or at least the commit which fixed the original error.
from pvxs.
Should we tag the latest commit with 0.1.1?
Since you didn't find a third issue today, sure!
from pvxs.
2021-01-14 18:34:59.061 SEVERE [Thread 1] org.csstudio.logging.PluginLogListener (logging) - Unhandled event loop exception
java.lang.NullPointerException
at org.epics.pvaccess.client.impl.remote.ChannelImpl.getRemoteAddress(ChannelImpl.java:558)
at org.diirt.support.pva.PVAChannelHandler.getProperties(PVAChannelHandler.java:313)
at org.csstudio.diag.pvmanager.probe.DetailsPanel.setChannelProperties(DetailsPanel.java:214)
...
Also, I was seeing, and continue to see, a log message message similar to #13 (comment). So I don't think it is related to the issue with processing of CMD_GET_FIELD
requests (aka Introspect). Thinking about null
is what led me to 0356eee though.
from pvxs.
In https://github.com/mdavidsaver/pvxs/releases/tag/0.1.1
from pvxs.
Related Issues (20)
- Stable release HOT 2
- CMake support? HOT 9
- Logger set not working as expected HOT 4
- Adding pvlist feature HOT 4
- Intermittent failure of `testsock` HOT 6
- pvget not returning anything when setting EPICS_PVA_ADDR_LIST and EPICS_PVA_AUTO_ADDR_LIST to NO HOT 4
- Decode error resulting from incorrect TypeStore maintenance HOT 2
- TypeDef from Value containing Union does not work HOT 3
- Problem when monitoring NT PV with value field as "any" HOT 5
- Intermittent failure of `test1000`
- How to improve PVA put concurrency performance HOT 1
- ioc: add PVA link support HOT 1
- pvxput doesn't understand NTEnum HOT 1
- pvxinfo also shows the null value for StoreType::Compound data types like any or union HOT 7
- Feature request: `python -m pvxslibs.ioc` HOT 2
- Seg fault if `PVXS_QSRV_ENABLE` not set HOT 5
- (apparently) spurious compile error with GCC 12 HOT 5
- ci-scripts submodule is using non-existing commit HOT 4
- PREC from DB isn't reported correctly. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pvxs.