sstsimulator / sst-core Goto Github PK
View Code? Open in Web Editor NEWSST Structural Simulation Toolkit Parallel Discrete Event Core and Services
Home Page: http://www.sst-simulator.org
License: Other
SST Structural Simulation Toolkit Parallel Discrete Event Core and Services
Home Page: http://www.sst-simulator.org
License: Other
When I .configure using --with-pin=${INTEL_PIN_DIRECTORY} and the directory which I point to contains a version of PIN which is too old to work with the current SST, an error should occur, rather than blind acceptance. It is not nice to have configure work, but then make fail with a very non-obvious compilation error (missing header files).
Component has member function isPortConnected() but SubComponent() does not.
This ticket is to document an issue with SSTs interaction with boost (version1.56 at the time of this issue).
SST serialization code sstcoreserializationcore.h , sstcoreserializationelement.h and others generate a significant amount of warnings related to boost serialization. Many of these warnings are in boost file mplprint.hpp and are related to signedunsigned comparison or divide by zero warnings.
According to boost issue #4953, these warnings (such as code causing a divide by zero) are intentional to assist users in tracing the templates used in the code.
These warnings are a significant distraction from real warnings and have had no impact on operation of SST. Therefore they are being ignored via some gcc and clang pragmas
search for
These pragmas are being created for gcc 4.6 and later and current versions of clang. gcc versions earlier than 4.6 will still generate the warnings.
While the warnings are being ignored, they still exist for a reason. In the future, it is advised that the issues causing serialization warnings be investigated and addressed as necessary.
make html (which kicks off a doxygen build) does not generate any files.
Arrange for the configure script to check the compiler version, in the case of known compilers (e.g., gcc, Intel cc) to make sure that is a new enough version to compile SST successfully. In the case of an older, unsupported compiler version, report an error and halt configuration.
When StopAction fires on a rank, all threads are stopped, even if they haven't reached the stopAt time. StopAction needs to be a collective operation and fire once all threads have made it to the stopAt time.
When the user runs .configure --disable-mpi --with-zoltan=somepath, Configure will report an error that Zoltan was not found. This is not accurate, rather, the Zoltan autoconf checks require the MPI compilers (#include <mpi.h>), which are not being used.
Two items:
SST with OpenMPI leaves many thousands of residual directories in /tmp that accumulate run after run. These residual directories slowly eat up filesystem space. At the time of this issue creation, there are over 20,000 residual directories consuming 81MB of space.
$ ls /tmp/openmpi-sessions-jwilso\@sst-test_0/ | wc -w
20264
$ du -d0 -h /tmp/openmpi-sessions-jwilso\@sst-test_0/
81M /tmp/openmpi-sessions-jwilso@sst-test_0/
$ ls /tmp/openmpi-sessions-jwilso\@sst-test_0/
32768 35968 38862 41576 44514 47469 50489 53566 56683 59753 62665
32769 35969 38863 41577 44516 47470 50497 53567 56686 59754 62666
32770 35970 38864 41578 44518 47471 50498 53568 56687 59756 62667
32775 35971 38866 41579 44520 47472 50500 53569 56688 59758 62668
...
35963 38854 41573 44511 47464 50486 53559 56680 59744 62655 65532
35965 38855 41574 44512 47465 50487 53562 56681 59749 62659 65533
35967 38857 41575 44513 47467 50488 53565 56682 59751 62662 65535
$ ls /tmp/openmpi-sessions-jwilso\@sst-test_0/ | wc -w
20196
The residual directories can be grouped by their disk usage: 4k, 12k, and 16k.
The residual 4k directories are empty.
$ ls -l 34616
total 0
$ ls -ld 34616
drwx------ 2 jwilso jwilso 4096 May 18 12:39 34616
$ du -h 34616
4.0K 34616
The residual 12k directories have the following directory structure:
$ ls -ld 57532
drwx------ 3 jwilso jwilso 4096 May 18 12:35 57532
$ tree 57532
57532
└── 1
└── 0
$ du -h 57532
4.0K 57532/1/0
8.0K 57532/1
12K 57532
The residual 16k directories have the following directory structure:
$ ls -ld 56572
drwx------ 3 jwilso jwilso 4096 May 18 14:28 56572
$ tree 56572
56572
├── 0
│ ├── 0
│ └── debugger_attach_fifo
└── contact.txt
2 directories, 2 files
$ du -h 56572
4.0K 56572/0/0
8.0K 56572/0
16K 56572
$ file 56572/0/debugger_attach_fifo
56572/0/debugger_attach_fifo: fifo (named pipe)
$ stat 56572/0/debugger_attach_fifo
File: ‘56572/0/debugger_attach_fifo’
Size: 0 Blocks: 0 IO Block: 4096 fifo
Device: fd03h/64771d Inode: 917729 Links: 1
Access: (0644/prw-r--r--) Uid: (17341/ jwilso) Gid: (17341/ jwilso)
Access: 2016-05-18 14:13:58.414379047 -0600
Modify: 2016-05-18 14:13:58.414379047 -0600
Change: 2016-05-18 14:13:58.414379047 -0600
Birth: -
$ cat 56572/contact.txt
3707502592.0;tcp://134.253.243.30:35477
24151
The Merlin Dragon-12 test using 2 threads failed with a segmentation fault that is possibly attributable to taking such a path thru the core. (This is not a reproducible failure.)
Dragon12-seq-fault.txt
This version of SharedRegion merge is not implemented which means that the Bulk Merger fails. You are put to Bulk Moves as soon as you call getRawPtr(), which means you can't get raw access to the region and have it propagate to other ranks.
Currently sst provides setProgramOption stopAtCycle option to stop simulation at certain simulation time. As majority of actual sst simulation jobs are submitted to be run on machine which have queue limits. It would be beneficial if there is option like stopAtWalltime so that simulation could be exited by the allocated time and users can get some result back rather than program termination.
Use case from ISCA tutorial, would like to be able to prototype components in Python and have them connect to SST simulations (which may use non-Python components). Requires connector from C++ classes to Python, issues with Python interpreter per component etc
The symptoms look the same as #213. However #7087, which fixed the iris test and the portals_sm tests does not fix the Sirius Zodiac Trace test case. To observe the failure go to the directory: sstelementszodiactestallreduce and enter: sst --output-config newPyFile.py --model-options --shape=27 --run-mode init allreduce.py > all.out If the --run-mode init is removed, it executes correctly. A non-gdb output from the seg-fault is attached.
Use case from ISCA tutorial - can we provide a way to profile a simulation linkscomponents and then record this information, load in for a future run so that the partition scheme is optimized for the next run.
You can search sstinfo
by component: sstinfo miranda.BaseCPU
, but you currently cannot search for subcomponents, ie: sstinfo miranda.GUPSGenerator
does not return anything.
Need an atomic output ability to the output class to allow a start_block_output() and stop_block_output() that collects output data. When stop_block_output() is called, all collected output data is then output. This is designed to prevent interleaving of output data.
Possible implementation with a counter that increments on each start and decrements on each stop, and outputs when counter reaches 0.
Requested by Scott H.
In file sst/core/statapi/stathistogram.h, template class HistogramStatistic, private member function registerOutputFields():
When the bin fields are generated in the for loop, the generated field name includes the range of bin values, formatted as -, where the lower and upper limits are formatted in decimal.
Sometimes, as in the case of collecting histograms over memory address ranges, BinDataType ends up being instantiated as a 64-bit type (uint64_t). Meanwhile, the number of bins (returned by getNumBins()) is always assumed by the code to fit in 32 bits (due to the uint32_t type on the bin index y), and meanwhile getBinWidth() (whose type is HistoBinType) may also return a value declared as 32 bits.
Therefore, in the following line of code which calculates binLL:
binLL = (y * getBinWidth()) + getBinsMinValue();
the multiplication operator here can be doing a 32-bit multiply (since both operands are 32 bits), yet the result may not actually fit in 32 bits, in which case the result of the multiplication expression will overflow and be truncated to 32 bits before being stored in binLL, and so, a corrupt value may end up in binLL (and also binUL). This will result in the bin field labels being incorrect, which may lead to incorrect display or sorting of data in some later processing of the output .csv file.
This can happen, in particular, when running memsieve to collect memory access histograms over address ranges which exceed the 4GB limit for 32-bit addresses. The bin width (page size) and number of bins can both be declared as 32-bit values, yet their product may not fit in 32 bits. E.g., in the example sst configuration for sieve, I encountered the particular case where 4 million bins representing memory pages of size 4 KB each were being generated, so that the address ranges spanned a full 16GB.
However, the overflow is easily avoided by forcing the above expression to always invoke a 64-bit multiply operation, by casting either of the multiplicands to 64 bits before the multiplication occurs, e.g.:
binLL = (y * (uint64_t)getBinWidth()) + getBinsMinValue();
And in cases when BinDataType happens to be only 32 bits, this change is harmless.
The above change has been implemented in a branch labeled "bugFix_binLabelOverflw" which I will push to the sst-core repository.
I suggest the following simple change in the .csv format
.configure should check for the existence of zlib.h
With the new autoconf setup, the support for auto-detecting the '-mt' variants of MacPort's Boost implementations has disappeared. (ie, 'libboost_serialization-mt.dylib) ./configure
passes, which suggests that our autoconf macro for BOOST does not actually attempt a library-test, only a header-test.
This ticket collects two related enhancements. Developers are often unsure when to do an svn update in their sandbox. They dont really know if the head of the trunk produced a good build or not. We need to publish in some very convenient place the svn revision number that was used for the last successfull build and test of the trunk during the nightly process. In a similar vein, some non-developer users may need a version of the source code from the trunk in order to obtain a new feature or bug fix. They too are at risk if they simply check out the trunk head. Instead, provide a tarball of the last successful build and test of the trunk during the nightly process.
Another feature gone missing in the autoconf of the new split-core: '--enable-debug'
. Without this, debug output is not available.
There are summary statistics that should be collected for a component as the component is being shutdown, therefore we need to move the shutdown of the Statistics objects to a time after the component objects have been shut down.
I've made small changes in main and simulation that work in my testing, though someone more familiar with the Statistics class should review them.
core-diffs.tar.gz
When dealing with long simulations, memHierarchys output could be quite big (100s GBs) even if the debug level is low.
Ali, from ARM, would like to see a feature to only activate --enable-debug after a some simulation time after the simulation begin. Nothing urgent though.
We have to be careful not to introduce some slowdown (ie a comparison every clock tick).
We should provide, on our download site, a nightly tarball of the current state of SSTs trunk repository - the result of running make dist. #152 could then download and use that tarball as the basis for running tests.
Five partitioning options are tested and single results seem unexpectedly different:
test_linear2.out:Simulation is complete, simulated time: 31.949 us
test_linear4.out:Simulation is complete, simulated time: 31.949 us
test_linear8.out:Simulation is complete, simulated time: 31.949 us
test_roundrobin2.out:Simulation is complete, simulated time: 31.949 us
test_roundrobin4.out:Simulation is complete, simulated time: 31.949 us
test_roundrobin8.out:Simulation is complete, simulated time: 31.949 us
test_simple2.out:Simulation is complete, simulated time: 31.949 us
test_simple4.out:Simulation is complete, simulated time: 31.949 us
test_simple8.out:Simulation is complete, simulated time: 31.949 us
test_single2.out:Simulation is complete, simulated time: 18.4467 Ms
test_single4.out:Simulation is complete, simulated time: 18.4467 Ms
test_single8.out:Simulation is complete, simulated time: 18.4467 Ms
static void
throw_exc(){
**# SST::**Output ser_abort("", 5, -1, SST::Output::STDERR);
ser_abort.fatal(CALL_INFO_LONG, -1, "ERROR: type %s should not be serialized\n",#obj);
} \
At startup of simulation, main() creates a Config object that decodes the runtime parameters. A pointer to this object is passed to the SimulationBase, but is not stored. When lower level objects want access to runtime parameters, they must go through various methods (either storing a parameter in simulation or passing a ptr to the config object) to get access.
If SimulationBase stored the pointer to the config object (which never goes away in main), and provided a getConfig() method, then objects could easily retrieve the runtime parameters.
Use case from ISCA, can we automatically connect to a debugger when a terminate signal is received Would be a good outcome for long running simulation.
Reported by a customer:
-> sst-info miranda
ERROR: No such file or directory - When trying to open Directory @SST_ELEMLIB_DIR@
PROCESSED 0 .so (SST ELEMENT) FILES FOUND IN DIRECTORY @SST_ELEMLIB_DIR@
Looks like sst-info doesn't pull the search path from the SST configuration, but rather the hard-coded #define.
It would be nice to be able to select which element libraries are built at .configure time. While one can put a .ignore file in each element library directory, it would be nice to have configure arguments like:
.configure --disable-elements=all --enable-elements=Merlin,MemHierarchy
Which would only enable those two Element Libraries. It would also be an error if a user specifically requested an element library which was unable to configure itself. ie, request Ariel, but PIN wasn't found.
sst-info does not find the registered external elements
configure creates incorrect config when pin is found in the shell environment but not on the command line
Marsaglia's RNG system only uses 32bits of random data to calculate a 64bit value (double, Int64, UInt64). This leads to cases where the results appear non-random. (For example, most of the time, a call to generateNextUInt64()
will return a number that is a multiple of 512.)
The compiler segment faults attempting to build Patterns -- Issue 439.
There is an sst seg-fault in the qsim test.
The Prospero PIN library appears to be incompatible with gcc-5.1.
The SST build gets an ugly number of warning from Boost (1.56).
With a dot-ignore in pattern, successfully ran 27 test Suites with 98 tests in addition to the 150 tests in the Ember Sweep test.
emberLoad.py is used as input to a number of tests. The "--output-config" option seems to be yielding invalid Python on these.
Here is a sample snip-it:
# Automatically generated by SST
import sst
# Define SST Program Options:
sst.setProgramOption("timebase", "1 ps")
sst.setProgramOption("stopAtCycle", "0 ns")
# Define SST Statistics Options:
# Define the SST Components:
comp_rtr_0x0x0 = sst.Component( < < < < < < < Broken line
rtr.0x0x0", "merlin.hr_router")
comp_nic0 = sst.Component(
nic0", "firefly.nic")
comp_nic0.addParams({
"link_bw" : """4GB/s""",
"nic2host_lat" : """150ns""",
"num_vNics" : """1""",
"packetSize" : """2048B""",
"module" : """merlin.linkcontrol""",
"rxMatchDelay_ns" : """100""",
"verboseLevel" : """0""",
"buffer_size" : """14KB""",
"txDelay_ns" : """50""",
"nid" : """0"""
})
It is currently possible for events that arrive at the same simulated time to be delivered in different orders depending on the partitioning. In general, models should be written to be agnostic to event ordering, but there are some models for which this is difficult. Need to add a method for guaranteeing that events are always delivered in the same order for serial and parallel jobs.
Apparently this problem is not limited to Ariel.
Ariel leaves files around on /tmp as a consequence. (until the next system reboot)
It would be nice for a component to be able to have a new exit condition which is essentially a global count of some event defined by the component. Each component could define a condition by name and their portion of the global count. All components which are part of that count will exit when the global count is reached.
When a simulation is run in parallel, one expects the exact same results as when a simulation is run in serial.
The SimTime field, seen in the CSV statistic output, does not match between serial and parallel. This is probably due to the sync intervals between ranks.
Implement the Statistics Histrogram using a sparse array to help reduce its memory footprint.
The merge() function in SharedRegionMerger copies the change set data without checking for conflicts. I'm not sure if the conflict check is supposed to happen elsewhere, but I was able to write multiple times to the same location in a shared region, and neither the local SharedRegionManager nor the merge detected the conflicts.
Neef a setThreadCount in Python for cases where threading shouldn't happen (components are known not to be thread safe).
@mjleven was performing a new install with MacPorts and OpenMPI is not listed in the how-to. We should probably add this.
checking for boost/filesystem.hpp... yes
configure: Performing linking checks for Boost Filesystem...
configure: Boost Filesystem configuration successful.
checking whether the Boost::Thread library is available... yes
checking for exit in -lboost_thread... no
checking for exit in -lboost_thread... (cached) no
checking for exit in -lboost_thread... (cached) no
configure: error: Could not link against boost_thread !
make: * [config.status] Error 1
[mjleven@sst-devel build]$
Use case from ISCA tutorial, need checkpointing for long running simulations.
Wants to be able to link some elements into SST Core so we can remove the need for dynamic linking. Can choose which elements to link in. Requested by Scott H.
Feedback from ISCA tutorial, want to see simple interfaces in the SST info output dump.
The sstcorestats directory should be removed. Note: this is an older implementation of statistics. The new statistics API is in the sstcorestatapi directory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.