openlvc / portico Goto Github PK

Portico is an open source, cross-platform, fully supported HLA RTI implementation. Designed with modularity and flexibility in mind, Portico is a production-grade RTI for the Simulation and Training Community, so come say hi!

Home Page: http://www.porticoproject.org

Shell 0.20% C++ 21.73% C 0.52% XSLT 1.73% Java 68.70% Perl 0.07% Python 0.02% NSIS 6.72% Batchfile 0.30%

portico's People

Contributors

Stargazers

Watchers

Forkers

graeme-muller michaelrfraser huidesign lumixen lorellajia bewjj raymondfrancis acarik fipn k0rppu oms8907 easytarget86 alvabai doctorruss sharpplayer lordard nkindlon michaeljfazio mduban stevehenderson zhongnanshan ajiteshjain qqdiguo anuragjainsagar stallmanliu redtracer sperka thekevinbutler jessome byungjooyoo dragsu sakuni-r emostafaali mgq812 manseu xusong0zju tpr1 pkkozlowski dewell1 miaoxd o27232 fredo-q ikriz msqljj qiup-q sx1616039 zhj149 nojgosu swimmingcreative devopstoday11 mike-macmullin examyes lzqlzqwhr tiagovm jackycjw martinffi lennoxx-programming sunxiaochuan17 zhuquanwen dvdrhr andrekuros ackbar1 smarts027 ilibx anthonycramp goodhobak huidian200803 jianguita yoyoma214 cxhaizxm cstdiohsy keithds ydyvip junyang0412 henriquegressler sunnynetx wbeebe tong0711 electrostat-lab

portico's Issues

Refactor the way message handling plugins are loaded

Summary

Back in the day, the original idea was that Portico would be extensible and would support research by allowing whole sections to be replaced (say in the pursuit of testing a new time management algorithm). There are two problems in the reality of this:

It's not that easy to split off the internals. Even if you can replace the classes that handle the messages there are still interdependencies between services that make it practically unachievable.
No fool would ever want to do this

I'll blame the naivety and optimism of youth for ignoring the basic tenants of a Minimum Viable Product.

Given that this will never be used, the dynamic loading of plugins should be removed. It does cause some problems as the classpath has to be set just right, and we do see reports of people complaining that they just get RTIinternalErrors referencing "Null response for message..." which is code for: there is no handler (because it hasn't loaded).

A class somewhere that manually creates all the links would be much simpler, clearer and robust, sacrificing only a feature that is never used.

Acceptance Criteria

Once complete:

The plugins shall no longer be dynamically loaded from the filesystem
The classpath searching shall be removed
Message handlers will be statically created and linked into the framework
- See org.portico.container.Container:297

Add tick processing logging to LRC

Story

{As a} Portico developer,
{I want} to see how many messages were processed each tick cycle,
{So that I can} debug strange problems with message releasing

Context

Working on the WAN test federate I have been trying to debug strange tick() problems and figure out if messages are hitting remote federates and being thrown away, or if they are not making it there at all.

I put some debugging print statements in around the tick processing that listed how many messages were in the queue at the start and end of the tick cycle, and how many were processed during the cycle. This was really handy information.

It would be nice to be able to see this information (if desired) in the Portico log file. It is really powerful when that is the only information you see, so some way to figure out filtering what is displayed around it may be needed, but overall, it was useful information and we should find a way to get it back into the distribution.

Acceptance Criteria

Once complete, I shall be able to:

MomManager not able to cope with 1516e-style names

The MomManager is responsible for handling Portico MOM related requests (defined in the MIM when talking in 1516e terminology).

In HLA v1.3 this part of the FOM was standardized with a certain set of names for classes, attributes, etc... In 1516 (and 1516e) the standard group changed all the names. On federation creation, Portico is now throwing NPE's when it attempts to register an instance of Manager.Federation. The problem is that this class doesn't exist in the standard 1516e MIM, but HLAmanager.HLAfederation does. So, fetching by the v1.3 name returns null and Portico then tries to use that to instantiate from, hence the NPE.

I thought I had fixed this in the 1516 interface with a name translator. Regardless, it isn't happening in 1516e yet and this needs to be fixed.

Glance estimate of 2h to fix and write a couple of unit tests. It'll get fixed pretty easy, but tests are required to validate it for future regression testing.

Portico logging to Log4j 2.x

Summary

He has risen! Log4j 2.x is back from the dead and now getting quite a bit of development activity. Time to jump. I am sure this will cause problems I'm not even aware of yet.

Note: Check whether we both with #80 if this is done first.

Changelog as text or PDF in Windows distribution

Story

{As a} Windows user,
{I want} Changelog I can read in Notepad,
{So that I can} figure out what has changed

Context

The current changelog is a markdown text file. Markdown rocks - it's each to read in its native text form and you can generate many more awesome formats (like PDF) from it.

Problem is this: the line endings are all unix, and the extension is .md. On the Mac or Linux this isn't as much of a problem as people can just hit it in Vim or the like - users on these platforms tend to be down with that.

On Windows however, you can't easily open the file because of the extension, and if you do fire it up in Notepad it's all weird because of the line endings anyway.

For Windows packages we should either:

Change the extension to .txt and convert line endings during the build -or-
Convert the doc to something like a PDF or webpage during the build and include both in the distribution. Whether it can be easily converted as part of the build will largely dictate which option we take. A HTML page is probably the most portable, although a PDF is pretty good as well, but the simplicity of just publishing it as a text file is also appealing.

Acceptance Criteria

All distributions will include a PDF or HTML version of the Change Log (generated as part of the build)

Audit log should print current date in its kick-off header section

Summary

I found myself today wanting to know when I recorded a particular audit log. Having the log print the current date in the header (current header below) would be useful:

Starting Audit log for federate [LVCGame] in federation [LVCGameRPR2]
Portico 2.0.1 (build 0) - JGroups 3.2.0.Beta1
Active Filters:
     direction: []
       message: []
       fomtype: []

Acceptance Criteria

Add current date to the audit log start header

Test Issue

Just playing around with GitHub issues to see what is going on.

Portico internal Object Model representation needs refactor to become clearer and more maintainable and better support ieee1516e.

Summary

In IEEE-1516e the ability to merge FOMs was added. This has shown up a number of flakey aspects in the way the ObjectModel in-memory hierarchy is implemented that really need to be addressed. They're causing all sorts of subtle bugs and it is not a clean or easy to understand process when you factor in the runtime modification of models.

The ObjectModel type is implemented with two stores for efficiency. The first is an object graph, starting at object root. Inserting an object into this graph is easy enough. However, lookups based on name or handle (which happen frequently as part of the interaction and reflection semantics of the RTI) are slow when using the graph. As such, a second cache of object classes is maintained in a map: <Handle,OCMetadata>. When doing lookups, this maps is used.

Folding into this, handles are assigned sequentially when a FOM is parsed. This means that multiple parsings of different files cause handle clashes (two classes with the same handle).

Problems caused and others observed:

FOMs can be locked to prevent modification (temporarily unlocked for merges). Locking prevents adding to the model, but it doesn't prevent object hierarchy modification (you can add children to a class, change parents etc... any time). This causes subtle bugs where it looks like a class is in the model, but it's only in the hierarchy and not in the lookup cache.
Adding classes to a locked model fails silently (no logging as it's part of the model, not the controller piece and thus just basic POJO data).
Handles assigned on parse. As mentioned above, this causes clashes. The optimization semantics should be disconnected entirely from the parsing and object model construction and done as a final pre-execution step.
Code gymnastics required to add a class to a model (have to insert it into both the hierarchy and the cache).
- Good code should be clear and not require any non-obvious steps. The cache is a non-obvious step.
The data captured by the object model is quite thin.
- In HLA v1.3 there wasn't much in the FOM we cared about. Now that there are rules around what can and can't be merged, additional data that is available in later FOM formats is needed. For now Portico just ignores these constraint, which probably poses little practical impact, but isn't standards compliant and really should be fixed.

All in all, the object model system needs a bit of a rethink and refactor. This will touch the FOM parsers as well as the connections and their federation creation and joining processes. A full unit-test suite is also needed. Could easily get lost in here for a while so have to be careful.

Local handling of looping back own messages for performance improvement

Summary

Portico requires that a number of messages be looped back to the input queue for processing. This is done so that some processing logic can be put all in one place.

Message handling logic has to be present for incoming handlers so that all federates can respond to messages they receive appropriately, so it makes sense in some circumstances to defer processing until we get the message back from ourselves. This means we don't have to duplicate logic in the outgoing handlers as well.

However, reflections and interactions (the bulk of traffic during a federation) are not included in the types of messages that benefit from this. The net effect being that we are pushing all these messages through both the JGroups and Portico stacks just so we can simplify the sending rule and say "we must loop back all messages".

There are three potential improvements I can think of right now:

Special Header: Put a special, 1-byte header on each message sent to show whether it requires local processing or not. This would help prevent us processing unnecessary loopback messages through the Portico stack. This header would be checked before we unpack the message, thus also saving the amount of time required to inflate it back into a PorticoMessage object only to learn we don't care about it.
Manual Loopback Disable loopback in the networking stack and manually insert messages into the incoming queue ourselves. Even though most OS networking stacks will short-path messages for the local host, they still have to go through to the OS level.
Loopback Prevention Don't loopback attributes/interactions at all (requires us to handling looping back ourselves as a minimum). This would prevent them from hitting the network or Portico stack at all.

These measures can help deliver performance improvements by reducing both the amount of traffic coming up the network stack, and the amount of time we spend processing messages that we're ultimately going to throw away.

With regard to approach, we should first build each of these in order and then benchmark at each step. Each one increasingly adds additional complexity and we may find we hit the law of diminishing returns.

Acceptance Criteria

Implement each of the Special Header, Manual Loopback and Loopback Prevention approaches successively after validating each with benchmarking
If there is no appreciable gain across Windows, Linux or the Mac, stop!
Benchmark for both Java and C++ (and 1.3, 1516e if possible).

Add support for optimistic time management

The current Portico time management implementation does not provide support for optimistic time management. Although you can call the RTIambassador methods, no retraction handle (valid at least) is provided back and the retract() message is not supported.

Time things are always fiddly. Schedule some time to add a solid set of tests and test across multiple communications bindings and platforms.

reflectAttributeValue delivered at incorrect timestep

Summary

Originally reported by Peter Ross on the old JIRA instance (PORT-143)

When interfaced to a time stepped federate, portico will sometimes deliver reflectAttributeValue() callbacks at the incorrect timestep. This occurs when the sending and receiving federates do not commence their execution at different wallclock times.

The program is not apparent in the portico example program (portico-2.0.0/examples/java/{hla13,ieee1516e}) because it uses a synchronisation point to ensure all federates commence execution at approximately the same time.

Reproduction Steps

Modify the example program (Example13Federate.java or ExampleFederate.java) to remove the "ReadyToRun" synchronisation point. Additionally remove sendInteraction and updateAttributeValues(RO) calls, and add some sleep to make the problem easier to witness.

Modified file and diff enclosed: https://gist.github.com/timpokorny/f8a11d1fbfa9f503ad83
Start two example federates in separate terminals. Press ENTER in the first federate window to start its main loop, then wait until it has advanced (say T=~10),and then press ENTER in the second federate window. The ENTER pressing sequence simulates a late joining federate.
Examine the program output for occurrences where a federate is delivered more than one attribute reflection per timestep. (Remembering that example program calls updateAttributeValues(TSO) only once per timestep). Examples of this below for both HLA13 and IEEE1516e examples.

====HLA13 Example Output====
ExampleFederate   : Joined Federation as B
ExampleFederate   :  >>>>>>>>>> Press Enter to Continue <<<<<<<<<<

ExampleFederate   : Time Policy Enabled
ExampleFederate   : Published and Subscribed
ExampleFederate   : Registered Object, handle=2097153
ExampleFederate   : Time Advanced to 1.0
ExampleFederate   : Time Advanced to 2.0
ExampleFederate   : Time Advanced to 3.0
ExampleFederate   : Time Advanced to 4.0
ExampleFederate   : Time Advanced to 5.0
ExampleFederate   : Time Advanced to 6.0
ExampleFederate   : Time Advanced to 7.0
ExampleFederate   : Time Advanced to 8.0
ExampleFederate   : Time Advanced to 9.0
FederateAmbassador: Discoverd Object: handle=2, classHandle=508, name=HLA2
FederateAmbassador: Reflection for object: handle=2, tag=138984811504, time=9.0, attributeCount=3
      attributeHandle=509, attributeValue=aa:9.0
      attributeHandle=510, attributeValue=ab:9.0
      attributeHandle=511, attributeValue=ac:9.0

FederateAmbassador: Reflection for object: handle=2, tag=138984811515, time=10.0, attributeCount=3
      attributeHandle=509, attributeValue=aa:10.0
      attributeHandle=510, attributeValue=ab:10.0
      attributeHandle=511, attributeValue=ac:10.0

ExampleFederate   : Time Advanced to 10.0
FederateAmbassador: Reflection for object: handle=2, tag=138984811525, time=11.0, attributeCount=3
      attributeHandle=509, attributeValue=aa:11.0
      attributeHandle=510, attributeValue=ab:11.0
      attributeHandle=511, attributeValue=ac:11.0

ExampleFederate   : Time Advanced to 11.0
FederateAmbassador: Reflection for object: handle=2, tag=138984811540, time=12.0, attributeCount=3    
      attributeHandle=509, attributeValue=aa:12.0
      attributeHandle=510, attributeValue=ab:12.0
      attributeHandle=511, attributeValue=ac:12.0

ExampleFederate   : Time Advanced to 12.0
FederateAmbassador: Reflection for object: handle=2, tag=138984811550, time=13.0, attributeCount=3
      attributeHandle=509, attributeValue=aa:13.0
      attributeHandle=510, attributeValue=ab:13.0
      attributeHandle=511, attributeValue=ac:13.0 ====

====IEEE 1516e Example Output====
ExampleFederate   : Joined Federation as A
ExampleFederate   :  >>>>>>>>>> Press Enter to Continue <<<<<<<<<<

ExampleFederate   : Time Policy Enabled
ExampleFederate   : Published and Subscribed
ExampleFederate   : Registered Object, handle=2097153
ExampleFederate   : Time Advanced to 1.0
ExampleFederate   : Time Advanced to 2.0
ExampleFederate   : Time Advanced to 3.0
ExampleFederate   : Time Advanced to 4.0
ExampleFederate   : Time Advanced to 5.0
ExampleFederate   : Time Advanced to 6.0
ExampleFederate   : Time Advanced to 7.0
ExampleFederate   : Time Advanced to 8.0
ExampleFederate   : Time Advanced to 9.0
FederateAmbassador: Discoverd Object: handle=2, classHandle=793, name=HLA2
FederateAmbassador: Reflection for object: handle=2, tag=(timestamp) 1389849601340, time=9.0, attributeCount=2
      attributeHandle=774 (NumberCups), attributeValue=8
      attributeHandle=794 (Flavor)    , attributeValue=Cola

FederateAmbassador: Reflection for object: handle=2, tag=(timestamp) 1389849601441, time=10.0, attributeCount=2
      attributeHandle=774 (NumberCups), attributeValue=9
      attributeHandle=794 (Flavor)    , attributeValue=Cola

ExampleFederate   : Time Advanced to 10.0
FederateAmbassador: Reflection for object: handle=2, tag=(timestamp) 1389849601542, time=11.0, attributeCount=2
      attributeHandle=774 (NumberCups), attributeValue=10
      attributeHandle=794 (Flavor)    , attributeValue=Cola

ExampleFederate   : Time Advanced to 11.0
FederateAmbassador: Reflection for object: handle=2, tag=(timestamp) 1389849601696, time=12.0, attributeCount=2
      attributeHandle=774 (NumberCups), attributeValue=11
      attributeHandle=794 (Flavor)    , attributeValue=Cola
====

Add Summary-Only option to Audit Mode

Description

The current audit mode prints details for every message that passes through the RTI. This can be enormous. When you're only looking for coarse information you often just want a summary. Extend the audit mode to include an option that only prints the summary table.

Acceptance Criteria

When complete, I shall be able to:

Set an option in the RID file to only print an audit summary
When set, this will print just the header and summary table, not the individual message lines for each sent/received message
This will not be the default

Portico C++ example federates don't compile on Mavericks

Summary

In 2.0.1 we fixed the C++ interface for Mavericks so that everything was cool after Apple LLVM switched over to default to Clang.

Life was good and there was happiness, until I tried to run the example federate... . Can't win them all.

The ./macos.sh compile for the 1516e interface currently fails with the output below. I suspect this is just because I have to roll the same changes made to compilation of the code interfaces to get them to work into the examples. Will it never end!?

[tim@zapp:ieee1516e (maintenance-2.0.x)]$ ./macos.sh compile
RTI_HOME environment variable is set to /Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1
compiling example federate
ExampleCPPFederate.cpp:54:25: error: implicit instantiation of undefined template 'std::auto_ptr<rti1516e::RTIambassador>'
        this->rtiamb = factory.createRTIambassador().release();
                               ^
/Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1/include/ieee1516e/RTI/RTIambassadorFactory.h:24:29: note: template is declared here
   template <class T> class auto_ptr;
                            ^
ExampleCPPFederate.cpp:269:31: error: implicit instantiation of undefined template 'std::auto_ptr<rti1516e::HLAfloat64Interval>'
        auto_ptr<HLAfloat64Interval> interval( new HLAfloat64Interval(lookahead) );
                                     ^
/Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1/include/ieee1516e/RTI/RTIambassadorFactory.h:24:29: note: template is declared here
   template <class T> class auto_ptr;
                            ^
ExampleCPPFederate.cpp:387:27: error: implicit instantiation of undefined template 'std::auto_ptr<rti1516e::HLAfloat64Time>'
        auto_ptr<HLAfloat64Time> time( new HLAfloat64Time(fedamb->federateTime+
                                 ^
/Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1/include/ieee1516e/RTI/RTIambassadorFactory.h:24:29: note: template is declared here
   template <class T> class auto_ptr;
                            ^
ExampleCPPFederate.cpp:424:27: error: implicit instantiation of undefined template 'std::auto_ptr<rti1516e::HLAfloat64Time>'
        auto_ptr<HLAfloat64Time> time( new HLAfloat64Time(fedamb->federateTime+
                                 ^
/Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1/include/ieee1516e/RTI/RTIambassadorFactory.h:24:29: note: template is declared here
   template <class T> class auto_ptr;
                            ^
ExampleCPPFederate.cpp:438:27: error: implicit instantiation of undefined template 'std::auto_ptr<rti1516e::HLAfloat64Time>'
        auto_ptr<HLAfloat64Time> newTime( new HLAfloat64Time(fedamb->federateTime+timestep) );
                                 ^
/Users/tim/Developer/workspace/opensource/portico/codebase/dist/portico-2.0.1/include/ieee1516e/RTI/RTIambassadorFactory.h:24:29: note: template is declared here
   template <class T> class auto_ptr;
                            ^
5 errors generated.

Environment and Logs

Mac OS X Mavericks
C++ 1516e and HLA v1.3 example federates

Reproduction Steps

Build a sandbox
Navigate to dist/portico-2.0.1/examples/cpp/ieee1516e
Run ./macos.sh compile and wait for errors

Portico cannot keep up with object create messages in high-traffic environment

Summary

Stack traces are being reported from a user who is registering a lot of objects (and doing it quickly).

When registering more than around 10,000 object instances the following error message appears:

ERROR [AWT-EventQueue-0] portico.lrc: org.portico.lrc.compat.JRTIinternalError: Problem sending message: channel=SEMSim, error message=Task org.jgroups.protocols.TP$5@2e95db03rejected from java.util.concurrent.ThreadPoolExecutor@45d6ff96[Running, pool size = 10, active threads = 1, queued tasks = 1000, completed tasks = 11782]

It seems that items of the internal queue with the size 1000 cannot be polled fast enough. Registering new object instances is faster than processing items of this queue making it grow until around 10,000 items (11,782 in this example but this number varies in each execution).

We helped ourselves in just letting our pushing thread wait for a couple of milliseconds and retry registering object instances. This way we were able to register around 200,000 object instances. But it seems that there is no way of configuring the queue size in the RTI.rid file. Is there any other workaround for this problem, maybe increasing the queue size?

via email from David Ciechanowicz, TUM Create, testing with Portico on large traffic simulator

Environment and Logs

Portico v2.0.0
Windows 7 SP1 64-bit w/Java 1.8

Reproduction Steps

Create a truck-load of objects quickly
Watch the logs for the crashy

WAN Support for Clusters of Federates

Story

{As a} federate operator,
{I want} to be able to connect to a federation over a WAN,
{So that I can} participate in exercises or experiments that are not running on my local network

Context

As part of #45, enable support for clusters of federates connected over a WAN.

This will cover deployment scenarios where one or more federates running on a local network will be able to connect with clusters of one or more federates running on separate networks:

Acceptance Criteria

Shall be able to run a single federation across a WAN with some federates running on a local LAN and others over a WAN
Shall support clustering networks with groups of federates where there is a single information relay point for local federates rather than having all federates connect as if they were remote, even if some of them are on the same network.

No tests exists for ObjectModel class and associated types

Summary

No unit tests currently exist for ObjectModel or any of the other types in org.portico.lrc.model. This is bad. Slap.

Fix.

Quite a few here to write tests for. Largely straightforward, just time consuming.

Acceptance Criteria

Have tests. Duh.

Portico WAN Support

Story

{As a} federate operator,
{I want} to be able to connect to a federation over a WAN,
{So that I can} participate in exercises or experiments that are not running on my local network

Context

Portico's current communications facilities use multicast to enable the efficient delivery of messages between large groups of federates. As multicast comms are blocked at the edge of the network, this has meant that for a federate to participate, it has to be on the same local network as the rest of the federation.

It has become increasingly common to run federations across geographically distributed areas, allowing more people to participate in an experiment or exercise without having to physically be present in the same location.

To support users wishing to run federations in this type of environment, enhancements to Portico are required to enable connection and communication across a WAN. This includes both single federates connecting into a central network, as well as enabling interoperation between clusters of federates located on different networks, where some federates are local while others are remote.

Design Considerations

Supporting federations across a WAN involves considering a number of different topologies. From 1-1 federations, to those with a single external connection (1-Many), to those that involve clusters of federates running at various sites (Many-to-Many).

On approach is to put everyone on a VPN and rely on the lower level network facilities to provide the connection. Unfortunately this does require a significant amount of network-level configuration and is not something that would typically be supported by IT administration due to the blanket network access it provides.

A simple, native Portico facility is required. This solution must deal appropriately with Firewall and NAT Traversal issues, and be capable of supporting the deployment types described above.

Requirements

The following high-level requirements are needed to complete this work:

Solution must support Firewall and NAT Traversal
Solution must enable Point-to-Point connections
- A single federate joining a federation running on a LAN with one or more participants
Solution must enable Clustered Connections
- Two networks connected which each have multiple local federates
Solution must enable Hybrid Federations
- A combination of the two modes above - with single federates and networks of federates joining a single federation
Solution must work across Java and C++ Interfaces
- Initial testing on 1516e federation

Acceptance Criteria

Once complete, I shall be able to:

Connect a single federate into a federation running on another network
Connect multiple federates running on a local network with a federation running on another separate local network
Create hybrid federations with some federates are the only ones running on their local network and other local networks have many active federates
Deploy the above facilities in environments that use Firewalls and NATs
Deploy this facility whether using the Java or C++ interface

Stories and Tasks

#44 WAN Support for Single Federates
#46 WAN Support for Clustered Federations
#47 WAN Test Federate

Distribution build without log4j

Story

{As a} java federate author,
{I want} a version of Portico that doesn't package log4j,
{So that I can} avoid conflicts with the version I'm trying to use!

Context

I've had a couple of requests for this now. As Portico ships in a single jar with all its dependencies, it can create conflicts. Given that Log4j is quite popular, this is where we see people having problems.

The one-jar approach simplifies things significantly, so I am hesitant to abandon it, however perhaps a build target or switch that lets it build without log4j would be possible for those who are comfortable hand-rolling their own distributions.

Acceptance Criteria

Build target or switch added to the Portico build scripts that allow the jar file to be built without log4j support
Documentation on the website about creating the Portico build

Move from license from CDDL to Apache Software License

Story

{As a} Portico user,
{I want} Portico licensed under the Apache Software License,
{So that I can} have stronger, less ambiguous patent protections than what is available in the CDDL

Context

Over the last 10 years the CDDL has slowly drifted into relative obscurity. It was originally chosen because it was permissive and required anyone distributing Portico with changes to release those back to the community.

The reality is that the second concern is not a significant one, meaning the only advantage is its permissive nature. There are better licenses out there that provide some stronger, clearer provisions around patents and the like. The Apache Software License v2.0 is the most common, and most widely accepted.

For these reasons, we shall switch the license for Portico to the ASL.

Acceptance Criteria

Once complete:

All headers shall be updated to reference the ASL
License artefacts in the system shall be updated to reference the ASL
Documentation shall be updated to reference the ASL

NPE when Portico attempts to callback for failed sync point registration

Summary

As registered by Jeremy Coulon on the Portico Users mailing list

When attempting to run two copies of the example federate, the second one prints out an exception after pressing Enter the first time

java.lang.NullPointerException at org.portico.impl.cpp1516e.ProxyFederateAmbassador.synchronizationPointRegistrationFailedProxyFederateAmbassador.java:170)
    at org.portico.impl.hla1516e.handlers.SyncRegResultCallbackHandler.process(SyncRegResultCallbackHandler.java:70)
    at org.portico.utils.messaging.MessageSink.process(MessageSink.java:187)
    at org.portico.lrc.LRC.tickProcess(LRC.java:678)
    at org.portico.lrc.LRC.tick(LRC.java:547)
    at org.portico.impl.hla1516e.Impl1516eHelper.evokeMultiple(Impl1516eHelper.java:157)
    at org.portico.impl.hla1516e.Rti1516eAmbassador.evokeMultipleCallbacks(Rti1516eAmbassador.java:5402)
    at org.portico.impl.cpp1516e.ProxyRtiAmbassador.evokeMultipleCallbacks(ProxyRtiAmbassador.java:2127)

The second federate is attempting to register the already registered sync point and failing (as it should), but the callback is causing an NPE for some reason. Perhaps the sync point failure reason is null (as call reason.name() on it, which would cause this if it were null).

What I'm trying to do

Run two copies of the example federate.

Steps to Reproduce

Step to reproduce:

Win32-vc10.bat compile
Win32-vc10.bat execute fed1
Win32-vc10.bat execute fed2
Press Enter on 1st federate
Press Enter on 2nd federate

Note that behaviour changes whether I press Enter first on fed1 or fed2.
One of the federate also crash randomly when closing.

Move C++/Java interface link code out of it's own package and into a sub-package of the main HLA-interface packages

Summary

Currently, the packages for HLA implementation code look like this:

cpp13
cpp1516e
hla13
hla1516
hla1516e

Splitting out the C++ code into separate packages just seems a bit silly now that we've got more than one. The C++ link stuff should be aligned with all the other code for the HLA-interface implementation. Move it so that it now looks like this:

hla13
hla13/cpp
ieee1516 (no 1516 C++ impl)
ieee1516e
ieee1516e/cpp

Will use the opportunity to do a bit of a rename as well :) Build and test on all platforms at the end to make sure everything is good. Should do this as a quick task as it will touch a lot of things and best not to do it while anyone else is working as well.

Sporadic exceptions when federates are resigning

Summary

While building the WAN test federate to validate some behaviour for the site-connector functionality currently being added I have noticed that sporadically exceptions will be encountered during the resign process.

From the exceptions, it looks to be happening when a remote federate receives a resignation (or some other message close to that event in the test federate code), not anything on the active federate end. However, I'm not certain of that.

These exceptions are always the same basic problem: Error processing received message: null. Perhaps there is a race condition happening and by the time a receiving federate attempts to process the message, the subject is gone and there is no null check. Or, it could be that a resigning federate has its infrastructure reset and then before it has a chance to shut down its receivers it gets a message it is not prepared to handle at that point in time.

Further investigation required. Two example stack traces included below:

INFO  [main] wantest: Resigned from Federation
ERROR [Incoming,WAN Test Federation,zapp-10678] portico.lrc.jgroups: Error processing received message: null
java.lang.NullPointerException
    at org.portico.bindings.jgroups.MessageReceiver.receiveAsynchronous(MessageReceiver.java:81)
    at org.portico.bindings.jgroups.channel.FederationListener.receive(FederationListener.java:246)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:528)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:606)
    at org.jgroups.JChannel.up(JChannel.java:715)
    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
    at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:504)
    at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)
    at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
    at org.jgroups.protocols.RSVP.up(RSVP.java:188)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
    at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:754)
    at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
    at org.jgroups.protocols.BARRIER.up(BARRIER.java:126)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
    at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177)
    at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
    at org.jgroups.protocols.Discovery.up(Discovery.java:359)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1293)
    at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1856)
    at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1824)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

INFO  [main] wantest:   5|++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ 
INFO  [main] wantest:    |--|--|--|--| 5|--|--|--|--|10|--|--|--|--|15|--|--|--|--|20|
WARN  [main] portico.lrc: MOM support is currently unsupported in IEEE-1516e federations.
ERROR [Incoming,WAN Test Federation,zapp-46900] portico.lrc.jgroups: Error processing received message: null
java.lang.NullPointerException
    at org.portico.bindings.jgroups.MessageReceiver.receiveAsynchronous(MessageReceiver.java:81)
    at org.portico.bindings.jgroups.channel.FederationListener.receive(FederationListener.java:246)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:528)
    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:606)
    at org.jgroups.JChannel.up(JChannel.java:715)
    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
    at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:504)
    at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)
    at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
    at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
    at org.jgroups.protocols.RSVP.up(RSVP.java:188)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
    at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:754)
    at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
    at org.jgroups.protocols.BARRIER.up(BARRIER.java:126)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
    at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177)
    at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
    at org.jgroups.protocols.Discovery.up(Discovery.java:359)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1293)
    at org.jgroups.protocols.TP$5.run(TP.java:1211)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Environment and Logs

Portico v2.0.1
Wan test federate (ieee1516e)
Ubuntu 14.04LTS (VM - Digital Ocean)

Reproduction Steps

Observed at random times while running the WAN test federate. I was not able to uncover a repeatable process, but the problem was reasonably frequent.

Calling tick(5.0,5.0) equiv does not always tick for at least 5 seconds

Summary

While building the WAN test federate I have noticed some strange behaviour in the implementation of tick() (or evokeXxx() in 1516/e).

Calling the equivalent of tick(5.0,5.0) does not appear to put a federate into ticking mode for at least 5 seconds if there is at least one event in the queue to be processed. It appears to process that message and then immediately return.

Test case needed

I have not cross-checked this against hla13 yet to see if the same behaviour exists there. I suspect it is in the core, and so the interface will make little difference, but some of the tick semantics is implemented in the impl-helper, so it could well be that it is interface specific.

Environment and Logs

Portico v2.0.1
Ubuntu 1404x64
WAN Test Federate

Reproduction Steps

With two test federates, go through the usual pub/sub process
In Federate A, send an update to Federate B
In Federate B, call tick(5.0,5.0) and notice that it does not pause for the minimum tick time of 5 seconds.

Acceptance Criteria

Once complete I shall be able to:

Specify a RID setting that tells the RTI to ignore the minimum wait time and just tick until the queue is drained, returning right away if it is empty. Off by default.

HLA 1.3 64-bit DLL named incorrectly

Summary

Although I could not find any reference to a particular standard convention for naming 64-bit binaries in any HLA v1.3 related document, looking at other RTIs it seems they do adopt a similar approach that is opposite to what is used in 1516/e (see PORT-125).

Other common RTIs use the following:

MaK: libRTI-NG_64.dll
Pitch: libRTI-NG_64.dll
NG-Pro libRTI-NG.dll (helpful)

LVC Game expected the libRTI-NG_64.dll format, so that's what we'll produce by default so that Portico is at least compatible with VBS, Steel Beasts, X-Plane, Prepar3D and Unity. Got to have friends and that.

Environment and Logs

Portico v2.0.x
VC10 x64

Reproduction Steps

Install Portico
Set up LVC Game for VBS to use HLA v1.3 (setting env vars and LVCG config)
Start VBS and watch as it complains it can't find the DLL

Enable parallel building for the C++ interface to speed the compile time up

Summary

The compile for the C++ interface (all or them or even individually) takes a little bit of time. Throw in a bit more when you're building deubg/release across three interface and a test suite.

To speed this up, look into enabling parallel builds. Not sure if this will work for cpptask as it's constructing a separate build command for each file at the moment, so may need some modifications there as well.

Acceptance Criteria

Any C++ code with the specified parameter (-j <X>) will use X threads to compile separate files in parallel and get through a compilation faster
Everything will still work and the logging of the compilation process won't change (it's nice to see the files names tick-past now)
Some thought required to managing the build output in a parallel environment

Portico 1516e always uses DoubleTime, even when user requests integer time

Summary

The guts of the problem here is that the 1516e (and 1516 for that matter) specification is completely, utterly, irrevocably broken. Designed to support edge cases present in the 0.001% of situations, they have made everybody pay through the introduction of an API that is even more convoluted, abstract and entirely impenetrable than one could imagine. It takes a special sort of genius to come up with something this bad.

The current implementation of 1516e will internally always use DoubleTime, equating to the standard HLAfloat64Time type. Users can specify the name of their time type in the create federation call, and currently Portico just checks that its one of the standard types, but then goes ahead and uses the 64-bit float version internally.

You can see this in the implementation of RTIambassadorgetTimeFactory():

public LogicalTimeFactory getTimeFactory() throws FederateNotExecutionMember, NotConnected
{
    return new DoubleTimeFactory();
}

This should be updated so that it at least works with the HLAinteger64Time-based types as well.

Environment and Logs

Portico 2.0.1
ieee1516e Java

Reproduction Steps

N/A

WAN Support for Single Federates

Story

{As a} federate operator,
{I want} to be able to connect to a federation over a WAN,
{So that I can} participate in exercises or experiments that are not running on my local network

Context

As part of #45, enable support for federations where a single remote federate is tunnelled into a federation running on a different network over a WAN.

This will cover two possible deployment topologies:

Two federates connected together over a WAN
A single federate connecting over a WAN with a two or more federates that are running on the same LAN

Investigate the use of the JGroups TUNNEL protocol to support this requirement, however a more sophisticated approach may be needed.

Acceptance Criteria

Two federates will be able to connect to and execute a federation over a WAN
A federate will be able to connect over a WAN into a federation that is already running between two or more other federates which are on the same network

ObjectModel needs refactor to remove need for classes to exist both in the hierarchy and in a flat store

Summary

The current implementation of ObjectModel requires that classes be stored both in a hierarchy rooted at Object/Interaction root, AND in a flat map store of <Name,Class>.

Thinking back, I believe I did this for performance reasons. The only problem is that it makes FOM merging code awkward (can't just move classes around between model hierarchies) and is a bit inelegant. Not all all reasons to justify a change, except for the fact that I think the performance reason is a bit of a false positive. Yes, lookups are faster with the flat map. However, classes are rarely looked up like this. Usually this is only done when federates cache a handle.

Go through some of the object class registration and updating code to ensure that the above is true (it might actually look stuff up and is used during a federation run more than frequently enough to justify the optimization). If it is, refactor ``ObjectModel` to only use the tree/hierarchy.

Again, lots of tests to run if this is going to get changed.

(attached to their parent object class and with attachments to children)

Developers should not have to link to jvm.dll at compile time or have it present on their path to start their federate

Story

{As a} user of a Portico federate,
{I want} to only have to specify RTI_HOME on my path when running my federate
{So that} I don't have to mess around with obscure environment paths (such as those that point to Java).

Context

At present, when we link to jvm.lib in the C++ bindings we create a dependency that must be resolved before anyone can load our RTI libraries. This means that jvm.dll (or the applicable library for *nix) must be present on the path whenever anyone tries to use their Portico-based federate. The problem here is that it is yet another thing that the user has to maintain in their environment.

At runtime we actually do set up the path appropriately so that JVM can load our library back. However, the pure C++ portion of the RTI won't actually start unless the JVM stuff is present on the path unless it can see the JVM libraries, thus requiring them to be present on the path.

Ideally, the user would only have to specify RTI_HOME and Portico could figure the rest out.

Requirements/Tasks

Find out a nice cross-platform way to delay-load the JVM DLL/SO's so that they don't have to be present on the path at runtime. We can figure out their location at dynamically and load them from there.

Windows has delay-load DLLs: http://msdn.microsoft.com/en-us/library/hf3f62bz(v=vs.80).aspx
@loader_path and @rpath specifications may be a way to hide this requirement on Mac OS X: link
We can probably use the rpath on Linux to get around this problem as well
- -Wl,-rpath,'$ORIGIN/lib:$ORIGIN/../../jre/bin/server:etc...'

Acceptance Criteria

Do not have to set anything except RTI_HOME to compile/link Portico federates on Windows
Do not have to set anything except RTI_HOME to compile/link Portico federates on Mac OS X
Do not have to set anything except RTI_HOME to compile/link Portico federates on Linux

Each of the above tested with the example federate in a clean environment

Portico zero's out bytes when trying to send large attribute/parameter values

I've had two reports now that indicate there is a problem when trying to transfer large parameter/attributes values, after a certain point, bytes begin to get zero'd out.

According to this report on the forums, when trying to send 1024-byte updates, bytes from index 987 onward are zero'd out (while 986 and below are correct):

Using the example federation i tried to exchange an Object that consists of one single attribute of 1024 Bytes length. When this attribute is reflected only the values of bytes 0 to 986 are correct, bytes 987 to 1024 have value zero. If I add an additional attribute of size 16 byte in front of the 1024 attribute, then I get zeros already at byte 963 which means 23 more zeros. Is there a limitation for the length of an Object or attribute?

I've also had a report from Anthony Cramp, who provide some log files showing the problem (attached). The files he provided are:

A log file showing the problem
The modified versions of the message handlers that were used
Support class used for serialization/deserialization

UPDATE
I now have a test that reproduces this problem. I'm about to check it in to the HLA 1.3 ReflectAttributesTest}

@Test(groups="jgroups")
public void testROUpdateWithLargeAttributeValue()
{
    byte[] sent1024 = new byte[1024];
    Random random = new Random();
    random.nextBytes( sent1024 );

    // package this into an update and sent it from the sender to the receiver
    Map<String,byte[]> updatedAttributes = new HashMap<String,byte[]>();
    updatedAttributes.put( "aa", sent1024 );
    defaultFederate.quickReflect( objectHandle, updatedAttributes, null );

    // validate that the values reach the other side ok
    secondFederate.fedamb.waitForROUpdate( objectHandle );
    Test13Instance temp = secondFederate.fedamb.getInstances().get( objectHandle );

    // ensure that it has all the appropriate values
    byte[] received1024 = temp.getAttributeValue( aaHandle );
    assertNotNull( received1024, "did not receive update for correct attribute" );
    assertEquals( received1024.length, 1024, "received wrong number of bytes in update" );
    for( int i = 0; i < 1024; i++ )
    {
        assertEquals( received1024[i], sent1024[i], "byte at ["+i+"] was incorrect" );
    }
}

This generates the following error:

java.lang.AssertionError: byte at [989] was incorrect expected:<84> but was:<0>
    at hlaunit.hla13.object.ReflectAttributesTest.testROUpdateWithLargeAttributeValue(ReflectAttributesTest.java:213)

This only shows up when using the JGroups binding. With the JVM binding, the test seems to pass happily. Further investigation underway.

Add documentation describing how to use the message audits for performance tuning

Summary

As part of #60 we added an auditing feature that allows a federate to log all its sent and received messages, types, sizes and some summary information. This in turn can be used to set various configuration values to tune performance for your federation. However, to use this effectively some information on what it all means and how to apply it is necessary. That should be written into the website documentation.

h3. Acceptance Criteria

Performance turning guide is added to the website

Portico crashes when registerObjectInstance() is invoked from a thread that is different the one that created the RTIambassador

Summary

When calling registerObjectInstance() on the HLA v1.3 RTIambassador from a different thread to the one that created it, the application will crash somewhere inside Portico.

What I'm trying to do

Invoke calls on the RTIambassador from a different thread to the one that created the RTIambassador instance. Basically, have multiple threads call on the same RTIambassador instance.

I expected the application to happily run to termination, but it crashes with an access violation. The JVM crash dump is also attached.

Steps to Reproduce

Demonstration code that reproduces the problem is attached: https://gist.github.com/timpokorny/0c6e4e3832dd06c579eb

Drop the provided main.cpp into the HLA v1.3 example directory
Edit win32-vc10.bat so that it only compiles main.cpp (and not the other example classes)
Call win32-vc10.bat execute to run the provided example
Hit enter when prompted to kick off the second thread
Wait for the crash

README is out of date

Summary

The current README that is distributed with Portico contains quite a bit of stale information. It really needs to be re-read and updated.

It mentions the old location for the announcement mailing list. It also mentions that a central RTI server is no longer required (only for about 5 years now - probably can take this notification out now :P). Those are just the things I spotted in the first 15 seconds.

Re-author the README.

WAN Test Federate to measure performance and conformity

Summary

As part of the Portico WAN Epic (#44) we require a testing federate that will validate and profile performance of federates connected with this facility rather than the standard multicast.

This federate will be used to do the following:

Confirm all traffic successfully traverses the WAN link
Confirm the integrity of traffic sent across the WAN link
Measure the throughput of data sent across the WAN link
Measure the latency of data sent across the WAN link
Test the above in a multi-federate federation
Test with above with varying packet sizes

Acceptance Criteria

Add federate call tracing feature to Portico

Story

{As a} federate developer or anyone trying to figure out what their federate is going,
{I want} to see a log file of all calls made to the RTIambassador (with parameters and values) and all callbacks sent by the RTI to the FederateAmbassador.
{So that I} can get some help understanding the methods my federate is calling, in what order, when, and the results they are triggering.

Context

When trying to figure out exactly what a particular federate is doing it can be quite useful to see what calls are being made on the RTI and what is being delivered back to the federate. This helps build a profile of how the application is using the RTI and can be useful in debugging both the RTI itself and the federate.

Acceptance Criteria

A separate log file would be produced that would outline a trace of the calls to the RTIambassador/FederateAmbassador.
This setting would be disabled by default
This setting would be enabled from the RID file
There would be settings to specify the log file name that is output

JGroups rejects requests to send messages when it is overloaded

Summary

Under very heavy load, Portico can crash thanks to JGroups refusing to be able to send any more messages. This happens when its sender thread pool is completely filled up. The rejection policy for tasks seems to be "reject" or something similar, when it should be "execute" so the task just gets executed on the thread submitting the job.

The following is a stack trace showing the problem when running the wantest federate (immediate callback mode - two federate federation running on the same computer - very fast send speeds).

ERROR [main] portico.lrc: org.portico.lrc.compat.JRTIinternalError: Problem sending message: channel=WAN Test Federation, error message=Task org.jgroups.protocols.TP$5@7fd751de rejected from java.util.concurrent.ThreadPoolExecutor@e162a35[Running, pool size = 10, active threads = 7, queued tasks = 995, completed tasks = 193678]
Exception in thread "main" hla.rti1516e.exceptions.RTIinternalError: org.portico.lrc.compat.JRTIinternalError: Problem sending message: channel=WAN Test Federation, error message=Task org.jgroups.protocols.TP$5@7fd751de rejected from java.util.concurrent.ThreadPoolExecutor@e162a35[Running, pool size = 10, active threads = 7, queued tasks = 995, completed tasks = 193678]
    at org.portico.impl.hla1516e.Rti1516eAmbassador.updateAttributeValues(Rti1516eAmbassador.java:2081)
    at wantest.throughput.ThroughputDriver.loop(ThroughputDriver.java:252)
    at wantest.throughput.ThroughputDriver.execute(ThroughputDriver.java:122)
    at wantest.Federate.execute(Federate.java:111)
    at wantest.Main.main(Main.java:49)
Caused by: org.portico.lrc.compat.JRTIinternalError: Problem sending message: channel=WAN Test Federation, error message=Task org.jgroups.protocols.TP$5@7fd751de rejected from java.util.concurrent.ThreadPoolExecutor@e162a35[Running, pool size = 10, active threads = 7, queued tasks = 995, completed tasks = 193678]
    at org.portico.bindings.jgroups.channel.FederationChannel.send(FederationChannel.java:252)
    at org.portico.bindings.jgroups.JGroupsConnection.broadcast(JGroupsConnection.java:153)
    at org.portico.lrc.services.object.handlers.outgoing.UpdateAttributesHandler.process(UpdateAttributesHandler.java:105)
    at org.portico.utils.messaging.MessageSink.process(MessageSink.java:187)
    at org.portico.impl.hla1516e.Impl1516eHelper.processMessage(Impl1516eHelper.java:99)
    at org.portico.impl.hla1516e.Rti1516eAmbassador.processMessage(Rti1516eAmbassador.java:5554)
    at org.portico.impl.hla1516e.Rti1516eAmbassador.updateAttributeValues(Rti1516eAmbassador.java:2063)
    ... 4 more
Caused by: java.util.concurrent.RejectedExecutionException: Task org.jgroups.protocols.TP$5@7fd751de rejected from java.util.concurrent.ThreadPoolExecutor@e162a35[Running, pool size = 10, active threads = 7, queued tasks = 995, completed tasks = 193678]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at org.jgroups.protocols.TP.down(TP.java:1209)
    at org.jgroups.protocols.Discovery.down(Discovery.java:576)
    at org.jgroups.protocols.FD_ALL.down(FD_ALL.java:201)
    at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:80)
    at org.jgroups.protocols.BARRIER.down(BARRIER.java:94)
    at org.jgroups.protocols.pbcast.NAKACK2.send(NAKACK2.java:673)
    at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:453)
    at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
    at org.jgroups.protocols.RSVP.down(RSVP.java:143)
    at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:328)
    at org.jgroups.protocols.pbcast.GMS.down(GMS.java:965)
    at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
    at org.jgroups.protocols.MFC.handleDownMessage(MFC.java:116)
    at org.jgroups.protocols.FlowControl.down(FlowControl.java:341)
    at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
    at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:238)
    at org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:312)
    at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
    at org.jgroups.JChannel.down(JChannel.java:729)
    at org.jgroups.JChannel.send(JChannel.java:445)
    at org.portico.bindings.jgroups.channel.FederationChannel.send(FederationChannel.java:247)

Environment and Logs

Portico 2.1.0-beta
Any
Tim's big iMac
wantest federate 1.0.0-beta

Reproduction Steps

Start two test federates and run their throughput tests

## `./wantest.sh --federate-name one --peers two --no-latency-test --packet-size 1K --loops 10000`

Wait for the exception to pop up in one federate mid-test. May only happen sometimes.

TBC

Linux 32-bit profile not present in the build system

Story

{As a} Linux user on old hardware,
{I want} a native 32-bit package,
{So that I can} actually use Portico

Context

We want to support 32-bit Linux as a native deployment target for Portico, but some moron forgot to write a build profile and link it in! The Java side of things is currently not a problem, but all generated sandboxes have no C++ libraries.

Acceptance Criteria

Will be able to successfully build and run the example application from a sandbox on CentOS 6.5 x32
C++ unit tests execute and pass on CentOS 6.5 x32
Release build generates packages that contain appropriate C++ libs on 32-bit

Add Save/Restore Support for 1516e

Story

{As a} 1516e federate developer and user,
{I want} Save/Restore support,
{So that I can} use my federates!

Context

Save/Restore support has not currently been added to the Portico 1516e interface. Support for the calls is present in the HLA 1.3 interface, but hasn't been ported yet.

Acceptance Criteria

Add support for the Save/Restore calls in the 1516e interface in both Java and C++
Unit tests in the 1516e test suites for Java and C++

RegisterSyncPoint failures in 1516e should not log at ERROR level

Summary

Sync point registration failures in 1516e cause Portico to log a message at the ERROR level in its log. This is typically above the threshold for the logger, so it both appears in the output of the application and in the Portico log file. This isn't a Portico failure, but rather something that should be assessed and handled by the federate itself.

Drop the level down to DEBUG

ERROR [main] portico.lrc: FAILURE Regiser sync point [FINISH_THROUGHPUT_TEST] by [one]: already registered
ERROR [main] portico.lrc: FAILURE Register sync point [FINISH_THROUGHPUT_TEST] by [one]: label already registered
ERROR [main] portico.lrc: FAILURE Regiser sync point [FINISH_LATENCY_TEST] by [one]: already registered
ERROR [main] portico.lrc: FAILURE Register sync point [FINISH_LATENCY_TEST] by [one]: label already registered

Acceptance Criteria

Level of sync point registration failures is dropped from ERROR to DEBUG

Update architecture documentation on the website

The current architecture documentation on the website is so woefully out of date that I can't even begin to explain. It is from a time when Portico was still a client/server framework and not decentralized like it is at the moment.

This is also a good time to review some architectural decisions and look at filing tickets to address unnecessary or overly complicated areas.

This will take a little while. Drawing diagrams is time consuming.

Add support for dumping metadata about state of LRC queue on federate resignation or on command

Story

{As a} Portico developer,
{I want} to be able to view metadata about the current LRC queue for a federate,
{So that I can} debug strange behaviour and fix issues

Context

While developing the WAN test federate I have noticed some strange behaviour around the {{tick()}} calls and the processing of an LRC's message queue (#53). Having very little insight into the state of the queue at any give time makes it difficult to debug problems like this, or to gain a better understanding of how various states affect LRC behaviour.

Add a facility that would allow for an LRC to dump information about the state of its queue to the log or do a separate dump file for further analysis. Ideally we'd have some sort of monitoring capability, but that is a much larger task.

The initial set of tools would just be methods to capture the information and put it into in-memory structures, before dumping those to a log file. This could be configured to happen on resignation from a federation (of particular interest given the current problems being experienced) or manually invoked via a special interaction or API call.

The API call would require that federate code make use of a portico-specific management API, which seems acceptable. Another option would be management extensions to the MOM, although they are ultimately just as specific as a special Portico API as RTIs often fill in a default MOM when those classes are left out of FOMs (common) - and non-Portico RTIs obviously won't have these extensions, causing problems around publication and subscription time. The API route seems less complex and doesn't have any practical cost compared to alternatives, but a bit of thought should validate this first.

Metadata
The following should be considered an incomplete list of things that would be handy to view in a dump file:

Queue size
For each message in the queue:
-- Message type
-- Sending federate
-- Time properties (TSO/RO? and timestamp)
-- If reflection or interaction, number of atts/params and their handles and size

Acceptance Criteria

Once complete, I shall be able to:

Trigger the Portico LRC to dump metadata to a file based on an API call
Configure the RID file to dump metadata on a specific event
- Presence of a particular message type (reflect, resign, sync point, etc.)
- At recurring intervals
Dump this information to separate dump files
Include an option to dump summary information to the log file

Update license to ASL 2.0

Story

{As a} Portico user,
{I want} Portico licensed under the Apache Software License,
{So that I can} have stronger, less ambiguous patent protections than what is available in the CDDL

Context

For these reasons, we shall switch the license for Portico to the ASL.

Acceptance Criteria

Once complete:

All headers shall be updated to reference the ASL
License artefacts in the system shall be updated to reference the ASL
Documentation shall be updated to reference the ASL

Self-extracting install on Linux/Mac

Story

{As a} Mac or Linux user,
{I want} Portico to auto-install rather than just extract itself as a tarball,
{So that I can} ensure my install follows intended defaults

Context

We used to ship Portico as a self-extracting shell file. Add this functionality back in.

Acceptance Criteria

Once complete:

Portico Mac and Linux installers shall ship as tar-zipped self-extracting shell files (have to zip them so they download properly)
Installer shall ask the user where it should be installed, defaulting to {{/opt/portico/portico-x.y.z}}
The installer shall take care of any permission issues such that once installed, the application directory can be written to and read from by the user
Installer shall provide information about suggested environment variable changes to the user at the completion of the installation process
Installer shall be able to automatically append any detected {{~/.bash_profile}} files (or related) once completed

Update Audit Logger to include class breakdown for Reflections and Interactions

Story

{As a} federation developer,
{I want} the message audit log to break down reflects/interactions by class,
{So that I can} get a deeper insight into the size and relative frequency of various types of messages happening in my federation

Context

JIRA Reference: PORT-187
Current, the message audit log (introduced in #43) provides a breakdown of information along the lines of RTI calls. While this is useful for some high-level performance tuning, if you want a deeper insight into what a federate is doing when it is running, this information is a bit coarse. An example summary report is shown below:

==============================================
  Execution Summary
==============================================
     Finish Time: 22:28:00.163
      Total Sent:    189 ( 63.0 KB)
  Total Received:      0 ()

  |-----------------------------|-------------------------------|-------------------------------|
  |                             |  Sent                         |  Received                     |
  |                             |---------------------------------------------------------------|
  | Message Name                |  Count  |   Size    |   Avg   |  Count  |   Size    |   Avg   |
  |-----------------------------|-------------------------------|-------------------------------|
  |            UpdateAttributes |     137 |   39.0 KB |  297 B  |         |           |         |
  |          PublishObjectClass |      14 |   10.0 KB |  739 B  |         |           |         |
  |        SubscribeObjectClass |       7 |    4.0 KB |  715 B  |         |           |         |
  |             SendInteraction |      15 |    3.0 KB |  270 B  |         |           |         |
  |   SubscribeInteractionClass |       5 |   1335 B  |  267 B  |         |           |         |
  |     PublishInteractionClass |       5 |   1180 B  |  236 B  |         |           |         |
  |              DiscoverObject |       2 |    808 B  |  404 B  |         |           |         |
  |                    RoleCall |       1 |    552 B  |  552 B  |         |           |         |
  |            ResignFederation |       1 |    475 B  |  475 B  |         |           |         |
  |                DeleteObject |       2 |    426 B  |  213 B  |         |           |         |
  |-----------------------------|-------------------------------|-------------------------------|

To better understand what is happening in my federation, I need to drill in a little further. For example, I might want to see what VBS is doing on the network when it is active with regard to how it is sending out entity state updates or weapon fire / detonations. I can only partially draw this information from the current logging format.

To assist this, some additional information about the most common calls (reflections and interactions) should be added. This would include more detailed information on each event line, as well as in the summary.

For an individual event log, the following would be added:

Reflections: Class name, attribute count, object name and ID
Interactions: Class name, parameter count
Discover Object: Class name, object name and ID
Delete Object: object name and ID

For the summary table, additional indented sub-lines under reflect and interaction calls would be added for specific classes.

Acceptance Criteria

Once complete, I shall be able to:

The logWithNames RID setting does not appear to be affecting exception messages

Summary

The RID file contains an option that allows users to specify whether logging is done using names, or using handles (the former being slower but more descriptive) via the logWithNames and logWithHandles properties. For example, when logging about an attribute publication, the names of the attributes received would be logged rather than just their handles.

This setting does not appear to be affecting messages enclosed in exceptions. When setting the system to log attribute names rather than handles, I only see handles in any exceptions for problems relating to publish and subscribe, and data exchange.

This may be due to hard limitations - for example, when trying to send a publication notice for an attribute that doesn't exist it's going to be hard to specify its name. However, in those exceptions the class name should still be present for example.

Some investigation required as this does limit the usefulness of this function from an end-user perspective. An exception is often the first and foremost thing they'll see when problems arise, which is exactly when you want the information.

Environment and Logs

portico-2.0.1
Ubuntu 14.04LTS

Reproduction Steps

Test case required. Attempt to publish and attribute that does not exist, catch the exception and validate the content.

Portico throws error if plugin path doesn't exist

I have a small note here saying that Portico will throw a large error and crash if a specified plug-in path doesn't exist (could be RTI configured).

Tone this down so that a crash doesn't happen and perhaps consider just removing the ability to add plug-ins at all, considering nobody is going to use them. Need a bit of mulling/thought time here.

Erroneous reporting of federate restore not supported in ieee1516e implementation

#70 will override this.

The featureNotSupported notifications in the restore service methods in org.portico.impl.hla1516e.Rti1516eAmbassador all report "queryFederationSaveStatus()" instead of the actual method names.

Set default log level to FATAL

Summary

Set the default Portico log level to FATAL. Let's be honest; nobody cares about the RTI logging unless they want to know something is wrong and go looking on purpose. Even a level of WARN it either going to provide no useful information (because they want more) or is just going to be unwanted noise.

Add tick metrics logging after each tick

Story

{As a} Portico developer,
{I want} some information about how many messages were processed in how much time each tick call,
{So that I can} track down strange bugs with tick processing!

Context

Primarily this is to add support for logging some tick metrics at the conclusion of each tick equiv call to assist with debugging problems in the tick implementation. However, the rate at which federates exchange data, and expect other federates to provide data, is critical for ensuring the smooth operation of a federation, so even once any short-term ticking issues are fixed, some coarse information logged on each tick call would be helpful.

Acceptance Criteria

Once complete, I shall be able to:

View basic metrics on what was processed during a tick call in the log file
- Logged at the DEBUG level (or should it be TRACE?)
- Includes number of messages processed, time spent in tick mode and number of messages that remain in the queue

Small typos and fixes for 1516e C++ interface

A number of small issues have been identified by JPL from ForwardSim. Given that they are all small, this issue just aggregates them for quick rectification:

1. CppTask Logging Level
From CPPTask, CompilerMSVC.java, function link()

task.log( "Starting Link" );
task.log( "Running link command: ", Project.MSG_DEBUG );
for( String argument : commandline.getCommandline() )
    task.log( argument, Project.MSG_DEBUG );

should be

task.log( "Starting Link" );
task.log( "Running link command: ", Project.MSG_VERBOSE );
for( String argument : commandline.getCommandline() )
    task.log( argument, Project.MSG_VERBOSE );

2. Path Generation off JAVA_HOME
From runtime.cpp, generateWinPath:

You use the function string jrelocation( getenv("JAVA_HOME") ); On windows, getenv can return NULL (thus, in my case, results in a crash)

3. RID Processing Error Message

Typo in void Runtime::processRid() throw( RTIinternalError ), if org/portico/impl/cpp1516e/ProxyRtiAmbassador is not found, the logger and the internal message are not the same.

4. HLA Version Return Value

string Runtime::getHlaVersion() throw( RTIinternalError ) the function will never return something else than HLA13 or DLC13

5. Native Library Loading Suggestion

If NativeLibraryLoader.java, you might wanna use System.Load instead of System.LoadLibrary when using the full path to load the DLL. Somehow Matlab plays with the Library path so I had to revert loading the dll using the full path.