arporter / habakkuk Goto Github PK

View Code? Open in Web Editor NEW

10.0 2.0 0.0 856 KB

Fortran code analysis for performance prediction

Python 70.86% Fortran 29.14%

dag fortran performance-analysis performance-prediction parsing hpc

habakkuk's People

Contributors

Stargazers

Watchers

habakkuk's Issues

Fails to recognise array accesses when arrays are parts of a derived type

When analysing code such as:

rtmp1 = (sshn_u%data(ji  ,jj ) + hu%data(ji  ,jj  ))*un%data(ji  ,jj)

Habakkuk reports that there are no array accesses.

Use python bindings to graphviz to generate dag output

Currently Habakkuk contains code to generate a dot-format file for use with graphviz. However, graphviz has an eponomously-titled python package and it would be better if we used that. We can probably make it optional so as to avoid having a mandatory dependence on graphviz (i.e. habakkuk is still useful, even if graphviz is not installed).

Increase test coverage

Testing is rudimentary at the moment. In this ticket I will improve the coverage of the test suite. This will then provide firmer foundations for future development.

Strip-out fparser code and use pypi package instead

Habakkuk currently includes a patched version of the fparser package as extracted from f2py.
The patches are currently being put into the fparser package under stfc/fparser#9. Once that is done we can remove all of the fparser code from Habakkuk and make it depend on the fparser package on pypi.

Attempt to provide SIMD performance estimates

Habakkuk currently makes no attempt to deduce what performance might be obtained by SIMD-vectorising the loops that it finds. The only way to account for this at the moment is to assume perfect SIMD and multiply the performance estimate it produces by the vector length (e.g. 2 for SSE, 4 for AVX2).

Since Habakkuk already has support for loop-unrolling we could, in principle, unroll the loop by the vector length and look to pack contiguous array accesses into 'vector' variables/nodes. We will investigate the feasability of doing this in this ticket.

Support MAX/MIN intrinsics

We don't currently recognise MAX and MIN as Fortran intrinsics. We need to add them (including an estimation of their cost in FLOPs and cycles).

Running habakkuk with no arguments causes crash!

It should produce a usage message at the very least...

Introduce 'scalar' node type

Currently the type of a DAGNode is left as None if it represents a scalar variable.
This is not very nice so in this issue we will change Habakkuk to give such nodes a scalar type.

Make CPU architecture configurable

Although CPU microarchitecture details are pulled out of a file, this is currently hard-wired to be config_ivy_bridge.py. Since this microarchitecture has no FMA support (and thus no cost associated with an FMA) this causes problems if the user attempts to have the code generate FMAs. Currently two tests are set to xfail because of this limitation.

We need to build upon the existing functionality to make the choice of micro-architecture configurable by the user.

Processing pert_pressure_gradient_kernel_mod.F90 fails

Processing the named dynamo source file fails with:

dag_node.DAGError: "DAG Error: Unrecognised child type: <class 'fparser.Fortran2003.Mult_Operand'>, (temp1 * rho_ref_at_quad(i) * theta_ref_at_quad(i)) ** temp2"

This is because in #3 I changed the code to no-longer silently skip any parser-generated object that it didn't recognise.

Process DAG to ensure integer ops are not counted as FLOPs

During DAG construction a node can be tagged as being an integer quantity if it is found to be used as an array index. However, we currently make no attempt to ensure that other nodes are updated to be consistent. This can mean that an arithmetic operation can be wrongly identified as a FLOP when in fact its arguments are integer.

In this issue we'll add a further processing step to ensure that information on which nodes are integer is propagated through the DAG.

Correct cache-line access counts when array is written after read

For code of the form:

b(i) = 3*a(i)
a(i) = 2

Habakkuk will count three cache-line accesses. This is because the assignment to a(i) creates a new node in the DAG (named "a'(i)") which is then seen as a new array access. This is clearly incorrect.

Re-structure for PyPi

Repository needs re-structuring if it is to work with pypi and (cleanly) with travis.

I'll try to follow the suggestions here:

https://hynek.me/articles/sharing-your-labor-of-love-pypi-quick-and-dirty/

documentation question

Hi,

Which Fortran dialects are supported? Can this handle F2008 object oriented style code?

Add support for Python 3

We currently only support (and test with) Python 2.7. In this issue we will extend Habakkuk to support both 2.7 and Python 3.

Integrate use of codecov with Travis

CodeCov (https://codecov.io/) offers coverage information on git diffs which will be very useful for reviewing pull requests. This Issue is to attempt integration with that service.

Support loop-unrolling for kernels containing indirect array accesses

Although the parser code in parse2003.py recognises indirect array accesses, it currently flattens such expressions into strings. i.e my_array(map(i)+1) results in an array index stored as "map(i)+1". If i is the loop variable and we wish to unroll the loop then this is going to cause problems.

This issue can possibly be thought of as identifying contiguous and non-contiguous array accesses for the purposes of memory-bandwidth usage and potential SIMD vectorisation.

Improve the test suite

Currently running the test suite results in a lot of .gv files in the CWD.
In this issue we will change the tests to use temporary directories to avoid this.
We'll also investigate the intermittent test failures that I've sometimes seen when running pytest in parallel.

Habakkuk does not store array-index expression in parentheses

Habakkuk fails to store the array-index expression when it is enclosed within parentheses, e.g.:

a((i+j)) = 2.0*b(i)

Requesting the full_name of the node representing the LHS of this expression just returns "a()".

Test failures with latest version (0.0.6) of fparser

The updates to fparser have broken Habakkuk.

Get FLOP counting working for NEMO

This issue (and branch) will address any issues found in getting habakkuk working for the (pre-processed) NEMO code base.

Update documentation now project is on pypi

Need to update installation instructions to say that habakkuk may be installed from pypi.
Also need to remove text about f2py installation since it now just depends on the fparser package.

Sub-expressions within array-index expressions not handled correctly

Somewhere along the line Habakkuk's ability to handle indirect array accesses (e.g. my_array(map(i) + 1) has been broken. Similarly, we now fail to handle e.g. my_array(2*i). We will fix these problems in this issue.

Bring up-to-date with latest fparser

Currently setup.py does not specify which fparser version to install. Since the API has changed in the latest release (0.0.7) this means Habakkuk does not work out of the box. We will fix this by updating Habakkuk's use of fparser.

Schedule generator does not account for cost of intriniscs

While working on #8 I've realised that the schedule generator quietly ignores the cost of any intrinsic operations - it considers only operators. This is a significant omission because the intrinsics that we do currently recognise (sin, cos, **) are computationally costly. In this issue we'll think about how we might remedy the situation.

Automate process of releasing to pypi

Travis can automate the process of making a release to pypi.
In this issue we'll configure this project to make use of that functionality.

Allow for overlap of DIVSD with other operations

It seems that a DIVSD does not exclusively occupy the execution port - i.e. it's possible for an independent MULSD to make use of the multiplication hardware while a DIVSD is in progress.
Unfortunately this doesn't appear to be documented anywhere. We therefore need to be able to produce a performance estimate that allows for this overlapping, maybe with bounds since we don't know just how much the two operations can be overlapped.

In conjunction with this, Agner's published results for 1/throughput of a DIVSD have a range of 8-14 cycles. Currently habakkuk produces a performance estimate using a single value for this throughput but again, we could do with producing bounds on the estimate.

I'm extending this issue to cope with overlapping of ADDSDs with DIVSDs as well.

Produce estimate of Working Set Size

It would be very useful if Habakkuk were able to generate some parameterised expression for the
working-set size of a loop body. Will probably have to do this in terms of the loop bounds.

Bring up-to-date with fparser 0.0.8

A lot of work has been done on fparser and Habakkuk needs some work to make use of the latest version.

arporter / habakkuk Goto Github PK

habakkuk's People

Contributors

Stargazers

Watchers

habakkuk's Issues

Recommend Projects

Recommend Topics

Recommend Org