Code Monkey home page Code Monkey logo

habakkuk's People

Contributors

arporter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

habakkuk's Issues

Use python bindings to graphviz to generate dag output

Currently Habakkuk contains code to generate a dot-format file for use with graphviz. However, graphviz has an eponomously-titled python package and it would be better if we used that. We can probably make it optional so as to avoid having a mandatory dependence on graphviz (i.e. habakkuk is still useful, even if graphviz is not installed).

Increase test coverage

Testing is rudimentary at the moment. In this ticket I will improve the coverage of the test suite. This will then provide firmer foundations for future development.

Strip-out fparser code and use pypi package instead

Habakkuk currently includes a patched version of the fparser package as extracted from f2py.
The patches are currently being put into the fparser package under stfc/fparser#9. Once that is done we can remove all of the fparser code from Habakkuk and make it depend on the fparser package on pypi.

Attempt to provide SIMD performance estimates

Habakkuk currently makes no attempt to deduce what performance might be obtained by SIMD-vectorising the loops that it finds. The only way to account for this at the moment is to assume perfect SIMD and multiply the performance estimate it produces by the vector length (e.g. 2 for SSE, 4 for AVX2).

Since Habakkuk already has support for loop-unrolling we could, in principle, unroll the loop by the vector length and look to pack contiguous array accesses into 'vector' variables/nodes. We will investigate the feasability of doing this in this ticket.

Support MAX/MIN intrinsics

We don't currently recognise MAX and MIN as Fortran intrinsics. We need to add them (including an estimation of their cost in FLOPs and cycles).

Introduce 'scalar' node type

Currently the type of a DAGNode is left as None if it represents a scalar variable.
This is not very nice so in this issue we will change Habakkuk to give such nodes a scalar type.

Make CPU architecture configurable

Although CPU microarchitecture details are pulled out of a file, this is currently hard-wired to be config_ivy_bridge.py. Since this microarchitecture has no FMA support (and thus no cost associated with an FMA) this causes problems if the user attempts to have the code generate FMAs. Currently two tests are set to xfail because of this limitation.

We need to build upon the existing functionality to make the choice of micro-architecture configurable by the user.

Processing pert_pressure_gradient_kernel_mod.F90 fails

Processing the named dynamo source file fails with:

dag_node.DAGError: "DAG Error: Unrecognised child type: <class 'fparser.Fortran2003.Mult_Operand'>, (temp1 * rho_ref_at_quad(i) * theta_ref_at_quad(i)) ** temp2"

This is because in #3 I changed the code to no-longer silently skip any parser-generated object that it didn't recognise.

Process DAG to ensure integer ops are not counted as FLOPs

During DAG construction a node can be tagged as being an integer quantity if it is found to be used as an array index. However, we currently make no attempt to ensure that other nodes are updated to be consistent. This can mean that an arithmetic operation can be wrongly identified as a FLOP when in fact its arguments are integer.

In this issue we'll add a further processing step to ensure that information on which nodes are integer is propagated through the DAG.

documentation question

Hi,

Which Fortran dialects are supported? Can this handle F2008 object oriented style code?

Add support for Python 3

We currently only support (and test with) Python 2.7. In this issue we will extend Habakkuk to support both 2.7 and Python 3.

Support loop-unrolling for kernels containing indirect array accesses

Although the parser code in parse2003.py recognises indirect array accesses, it currently flattens such expressions into strings. i.e my_array(map(i)+1) results in an array index stored as "map(i)+1". If i is the loop variable and we wish to unroll the loop then this is going to cause problems.

This issue can possibly be thought of as identifying contiguous and non-contiguous array accesses for the purposes of memory-bandwidth usage and potential SIMD vectorisation.

Improve the test suite

Currently running the test suite results in a lot of .gv files in the CWD.
In this issue we will change the tests to use temporary directories to avoid this.
We'll also investigate the intermittent test failures that I've sometimes seen when running pytest in parallel.

Update documentation now project is on pypi

Need to update installation instructions to say that habakkuk may be installed from pypi.
Also need to remove text about f2py installation since it now just depends on the fparser package.

Bring up-to-date with latest fparser

Currently setup.py does not specify which fparser version to install. Since the API has changed in the latest release (0.0.7) this means Habakkuk does not work out of the box. We will fix this by updating Habakkuk's use of fparser.

Schedule generator does not account for cost of intriniscs

While working on #8 I've realised that the schedule generator quietly ignores the cost of any intrinsic operations - it considers only operators. This is a significant omission because the intrinsics that we do currently recognise (sin, cos, **) are computationally costly. In this issue we'll think about how we might remedy the situation.

Allow for overlap of DIVSD with other operations

It seems that a DIVSD does not exclusively occupy the execution port - i.e. it's possible for an independent MULSD to make use of the multiplication hardware while a DIVSD is in progress.
Unfortunately this doesn't appear to be documented anywhere. We therefore need to be able to produce a performance estimate that allows for this overlapping, maybe with bounds since we don't know just how much the two operations can be overlapped.

In conjunction with this, Agner's published results for 1/throughput of a DIVSD have a range of 8-14 cycles. Currently habakkuk produces a performance estimate using a single value for this throughput but again, we could do with producing bounds on the estimate.

I'm extending this issue to cope with overlapping of ADDSDs with DIVSDs as well.

Produce estimate of Working Set Size

It would be very useful if Habakkuk were able to generate some parameterised expression for the
working-set size of a loop body. Will probably have to do this in terms of the loop bounds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.