Code Monkey home page Code Monkey logo

pypop's People

Contributors

jonathan-boyle avatar jonathan3145 avatar ptooley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pypop's Issues

Spurious message about missing .row and .pcf files

Running trace analysis on trace_np48_nt1_test.filter3.prv complains about not being able to find trace_np48_nt1_test.pcf and trace_np48_nt1_test.row, i.e.

/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/dimemas.py:145: UserWarning: Could not find /fserver/jonathanb/POP_trace_data/CASTEP/Solid_Benzene/junk/trace_np48_nt1_test.row, dimemas may fail or produce invalid data
warnings.warn(
/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/dimemas.py:145: UserWarning: Could not find /fserver/jonathanb/POP_trace_data/CASTEP/Solid_Benzene/junk/trace_np48_nt1_test.pcf, dimemas may fail or produce invalid data
warnings.warn(

Analysis should fail gracefully on missing trace data

E.g for missing PAPI_TOT_INS:

  • Analysis should gracefully fail with a warning about missing events and insert np.nan
  • Reloading a summary with missing data should raise a (silenceable) warning of missing data
  • Metric creation should silently accept nans in metric input
  • Metric table plotting should support hide_nans={"never", "all", "any"} to control plotting of rows containing nans.

normalize use of __repr__ and _html_repr_ on classes

[ ] - Meaningful repr for metadata may be generically useful here
[ ] - RunData should have repr for use when printing (e.g) a dict
[ ] - Traceset should have a meaningful repr and html_repr
[ ] - PRV should have meaningful repr and html_repr

RuntimeError: Paramedir execution failed

Hi, I am trying to rerun the Jupyter Notebook sample in our cluster and got this error:

RuntimeError: Paramedir execution failed:
/tmp/tmp8abkgbf6/serial_useful_computation.cfg: Some of the events specified in the filter doesn't appear in the trace. File creation aborted.

As far as I checked os.path.exists(outfile) returns False and returncode is 0.

Is it related to the cluster setting for creating temporary directory? and is there any suggestion for this problem?

Add caching to prv.py

Enable (compressed) pickle caching of processed prv data and openmp_analyze results

pypop-hybrid-metrics can fail when run on imagemagick_example_traces

pypop-hybrid-metrics can fail when run on imagemagick_example_traces, e.g. v0.2 branch on MN4 using
Python 3.6.1 (default, Jun 28 2017, 08:47:14)
[GCC Intel(R) C++ gcc 4.8 mode] on linux

Traceback (most recent call last):
File "/home/pr1efk00/pr1efk02/.local/bin/pypop-hybrid-metrics", line 8, in
sys.exit(hybrid_cli_metrics())
File "/home/pr1efk00/pr1efk02/.local/lib/python3.6/site-packages/pypop/cli.py", line 174, in hybrid_cli_metrics
scaling_plot = metrics.plot_scaling(title=config.scaling_title)
File "/home/pr1efk00/pr1efk02/.local/lib/python3.6/site-packages/pypop/metrics/metricset.py", line 350, in plot_scaling
return self._plot_scaling(x_key, y_key, label, title)
File "/home/pr1efk00/pr1efk02/.local/lib/python3.6/site-packages/pypop/metrics/metricset.py", line 361, in _plot_scaling
cores_min = numpy.nanmin([cores_min, self.metric_data[x_key].min()])
File "/apps/PYTHON/3.6.1/INTEL/lib/python3.6/site-packages/numpy/lib/nanfunctions.py", line 298, in nanmin
res = np.amin(a, axis=axis, out=out, **kwargs)
File "/apps/PYTHON/3.6.1/INTEL/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2618, in amin
initial=initial)
File "/apps/PYTHON/3.6.1/INTEL/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: cannot perform reduce with flexible type

CLI routines still use v0.2 API in v0.3rc1

Issue with 0.3 cli routines:

[orudyy@fju749 serial] pypop-hybrid-metrics mhdg-parallel-3d.div1.1Nodes.1x1.prv
Traceback (most recent call last):                                                                                                                             
  File "/home/orudyy/.local/bin/pypop-hybrid-metrics", line 11, in <module>
    load_entry_point('pypop==0.3rc1+g121db96', 'console_scripts', 'pypop-hybrid-metrics')()
  File "/home/orudyy/.local/lib/python3.7/site-packages/pypop/cli.py", line 279, in hybrid_cli_metrics
    metric_table.savefig(config.metric_table)
AttributeError: 'MetricTable' object has no attribute 'savefig'

Merge PRV and PRVTrace classes

There is a lot of code and functional degeneracy here. Can be merged into a single Trace object supporting both summary and detail analysis models.

Metrics Review Tracker

  • MPI_Metrics
  • MPI_OpenMP_Metrics
  • MPI_OpenMP_Multiplicative_Metrics
  • OpenMP_Metrics
  • Judit_Hybrid_Metrics
  • PRV (loader and OpenMP Region Analyzer)

Extrae analysis could be sped up

Paramedir does support multiple config files at once, which would improve processing speed, but it would require some pre-testing of the prv file to check what events are present. This is entirely feasible but needs a rewrite of the PRVloader to build the correct list of configs to use based on the events in the prv file and to gracefully handle failure in some way.

Need to see how paramedir handles failures in multi-config mode and then work out what can be done.

Multi-locus Analysis

Ran an analysis on ~25000 samples with 5 locus each, analysis runs smooth but when in the results looks like the Multi-locus analysis was not performed with the comment saying "Too many rows for haplotype program" and also blank Pairwise LD estimates.

Is there a way to process this huge number of samples?

NameError: free variable 'cut_trace' referenced before assignment in enclosing scope

Running pypop-preprocess with --chop-to-roi I get an error from the develop branch of PyPOP, I've swapped to my branch and that seems to work ok.

pypop-preprocess --outfile-path ./pypop_files --tmpdir-path ../junk --chop-to-roi parody_pdaf.prv

Traceback (most recent call last):                                                                                                            
  File "/fserver/jonathanb/.local/bin/pypop-preprocess", line 11, in <module>
    load_entry_point('pypop', 'console_scripts', 'pypop-preprocess')()
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/cli.py", line 308, in preprocess_traces
    TraceSet(
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/traceset.py", line 83, in __init__
    self.add_traces(
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/traceset.py", line 145, in add_traces
    Trace.load(
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/trace.py", line 66, in load
    return loader(
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/trace.py", line 133, in __init__
    self._load_trace()
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/trace.py", line 139, in _load_trace
    self._gather_statistics()
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/prvtrace.py", line 119, in _gather_statistics
    self._statistics = self._analyze_tracefile(
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/prvtrace.py", line 167, in _analyze_tracefile
    stats = [
  File "/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/trace/prvtrace.py", line 169, in <listcomp>
    cut_trace, cfg, index_by_thread=True, statistic_names=[name]
NameError: free variable 'cut_trace' referenced before assignment in enclosing scope

Don't calculate IPC scaling using stats["IPC"].mean()

Instead use something like

(stats["Useful Instructions"].sum() / stats["Useful Cycles"].sum())/
(self._stats_dict[ref_key].stats["Useful Instructions"].sum()/self._stats_dict[ref_key].stats["Useful Cycles"].sum())

LOOP and TASK the wrong way around in prv.py

In prv.py i think LOOP and TASK are the wrong was around?

K_EVENT_OMP_TASK_FUNCTION = "60000018"
K_EVENT_OMP_LOOP_FUNCTION = "60000023"
K_EVENT_OMP_TASK_FILE_AND_LINE = "60000118"
K_EVENT_OMP_LOOP_FILE_AND_LINE = "60000123"

prv.profile_openmp_regions() on chopped traces gives ValueError: array length 21409 does not match index length 21410

When calling prv.profile_openmp_regions() on chopped traces from hybrid (MPI + OpenMP with MPI comms inside the OpenMP parallel regions) and using the develop branch I get the following warning

/fserver/jonathanb/pop/pypop_stuff/pypop/pypop/prv.py:409: UserWarning: Incomplete OpenMP region found. This likely means the trace was cut through a region
warn(

and then this error


ValueError Traceback (most recent call last)
in
----> 1 omp_region_stats = prv1.profile_openmp_regions()

~/pop/pypop_stuff/pypop/pypop/prv.py in profile_openmp_regions(self, no_progress, ignore_cache)
503 )
504
--> 505 rank_stats[irank] = pd.DataFrame(
506 {
507 "Rank": np.full(region_starts.shape, irank),

~/miniconda3/envs/PyPop/lib/python3.9/site-packages/pandas/core/frame.py in init(self, data, index, columns, dtype, copy)
466
467 elif isinstance(data, dict):
--> 468 mgr = init_dict(data, index, columns, dtype=dtype)
469 elif isinstance(data, ma.MaskedArray):
470 import numpy.ma.mrecords as mrecords

~/miniconda3/envs/PyPop/lib/python3.9/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
281 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
282 ]
--> 283 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
284
285

~/miniconda3/envs/PyPop/lib/python3.9/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
76 # figure out the index, if necessary
77 if index is None:
---> 78 index = extract_index(arrays)
79 else:
80 index = ensure_index(index)

~/miniconda3/envs/PyPop/lib/python3.9/site-packages/pandas/core/internals/construction.py in extract_index(data)
409 f"length {len(index)}"
410 )
--> 411 raise ValueError(msg)
412 else:
413 index = ibase.default_index(lengths[0])

ValueError: array length 21409 does not match index length 21410

Need better checking for damaged PRV files

Currently loading PRV files with event values outside the range of an int64 causes Pandas to silently promote the event column dtype to float. This will cause odd behaviour with e.g the OpenMP region analysis functions.

Need some way to identify this situation and warn user/error out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.