jeffersonlab / chroma Goto Github PK

The Chroma Software System for Lattice QCD

Home Page: http://jeffersonlab.github.io/chroma

License: Other

Shell 0.63% C++ 96.32% C 0.28% Perl 1.47% XSLT 0.01% Makefile 0.87% M4 0.39% Raku 0.02%

chroma's Introduction

Release compatibility
=================================
Chroma/QDP/QMP have release tags enumerated as

    major.minor.maintenance 

with cvs tags labelled as

    major-minor-maintenance 

Chroma version 3.43 or higher requires QDP++ 1.44.0 or higher and
QMP 2.X or higher . The latter follows the version 2.X specification
of the API. QDP++ and Chroma no longer support the 1.X QMP API.

As of these recent versions, both Chroma and QDP++ now require at 
least g++-4.X or above, and use the C++-11 standard called C++0x.

In JLab CVS module name and tag-ology, the current compatibility is

module        tag                          description
chroma        chroma3-43-X   or higher
qdp++         qdp1-44-0      or higher
qmp           qmp2-3-X       or higher     MPI and single node versions


Quick installations instructions for CHROMA
=================================

It is assumed that QDP++ is compiled and **installed** somewhere. You
can read the INSTALL file in this directory for more details on the
building of chroma.

To build CHROMA, it is recommended you make a subdirectory for the
build and keep the build tree separate from the source tree. E.g., say
the "scalar" version of QDP++ is installed in
/usr/local/share/qdp++/scalar

Then to build, you would execute:

% cd chroma
% mkdir ./scalar-build
% cd ./scalar-build
% ../configure --with-qdp=/usr/local/qdp++/scalar
% make

which should build the CHROMA library using a scalar version 
of QDP++. 

To build a main program

% cd chroma/scalar-build/mainprogs/tests
% make t_mesplq

which will build the executable "t_mesplq" using "t_mesplq.cc" as
the main program file and linking against the library in
chroma/scalar/lib .

You can execute the program simply by

% ./t_mesplq

which will compute the average plaquette on a random gauge
field and write the result into  "t_mesplq.xml" .


%

chroma's People

Contributors

Stargazers

Watchers

Forkers

azrael417 orginos erinaldi tangjia025 gumaro urbach latifkabir santanusinp hiskp-lqcd bglaessle gkanwar konvpalto yongchull callat-qcd bjoo scottpelak irubataru sunwayihep bogdan-tanygin nbalexis arjunqcd fkd19 bvds twlqcd lattice chenminacm vinay21694 eromero-vlc saltychiang cphysics alphacentauri763 cpviolator rabbott999 heatherms27 rqzhang0 san-zh a1904829614 saintbenjamin wittscien ylin910095 utku-k-can henrymonge keylorac imayuyu helianhua gdbrad bindong314

chroma's Issues

Build for Multigrid failed

Branch: devel
commit: 2d52a12
System: CentOS 7.5
GCC: 7.3.1

Problem 1:
../chroma/lib/actions/ferm/invert/quda_solvers/quda_mg_utils.h:401:28: error: incompatible types in assignment of ‘char’ to ‘char [256]’
mg_param.vec_infile[0] = '\0';
../chroma/lib/actions/ferm/invert/quda_solvers/quda_mg_utils.h:402:29: error: incompatible types in assignment of ‘char’ to ‘char [256]’
mg_param.vec_outfile[0] = '\0';

../chroma/lib/actions/ferm/invert/quda_solvers/syssolver_linop_wilson_quda_multigrid_w.h:560:28: error: incompatible types in assignment of ‘char’ to ‘char [256]’
mg_param.vec_infile[0] = '\0';
^~~~
../chroma/lib/actions/ferm/invert/quda_solvers/syssolver_linop_wilson_quda_multigrid_w.h:561:29: error: incompatible types in assignment of ‘char’ to ‘char [256]’
mg_param.vec_outfile[0] = '\0';

../chroma/lib/actions/ferm/invert/quda_solvers/syssolver_mdagm_clover_quda_multigrid_w.cc:330:93: error: incompatible types in assignment of ‘char’ to ‘char [256]’
if( subspace_prefix.size() > 255 ) { (subspace_pointers->mg_param).vec_outfile[255] = '\0'; } ^~~~
../chroma/lib/actions/ferm/invert/quda_solvers/syssolver_mdagm_clover_quda_multigrid_w.cc:335:52: error: incompatible types in assignment of ‘char’ to ‘char [256]’
(subspace_pointers->mg_param).vec_outfile[0] ='\0';

Problem 2:
cannot convert char (*)[256] to char * for vec_outfile at function std::strncpy((subspace_pointers->mg_param).vec_outfile, subspace_prefix.c_str(), 256);

For the first problem, I replaced with the strcpy() function, which shows mg_param.vec_outfile and mg_param.vec_infile is 2d strings.

For the second, I replaced (subspace_pointers-mg_param).vec_outfile with ** (subspace_pointers-mg_param).vec_outfile[0]**.

So, is this how it is designed to work?

QDP-JIT clover term broken for Nc > 3 builds

I've built some Nc=3,4,5 stacks using a variety of combinations of QDPXX, QDP-JIT, and a color agnostic branch of QUDA:

CHROMA + QDPXX
CHROMA + QDPJIT
CHROMA + QDPXX + QUDA
CHROMA + QDPJIT + QUDA

where in the +QUDA cases QUDA handles the inversions only as normal.

For Nc=3, all 4 stacks produce 2 Flavour clover HMC configurations (no link smearing) with excellent agreement in Delta H when using the same seed for all 4 stacks. When I move to Nc>3, only the CHROMA + QDPXX and CHROMA + QDPXX + QUDA are stable, in that their Delta H values are in agreement, and the solver converges. For both CHROMA + QDPJIT and CHROMA + QDPJIT + QUDA stacks, the solver diverges quite quickly. This isolates the problem to Chroma's QDPJIT clover term code. If QUDA's treatment of the QDPJIT ordered clover term were bad, only the CHROMA + QDPJIT + QUDA builds would produce poor data. Furthermore, the fact that both the CHROMA + QDPJIT + QUDA and CHROMA + QDPJIT stacks produce good data for Nc=3 demonstrates that the issue is in the Nc != 3 QDPJIT code only.

I'm using CHROMA devel with master merged in and some minor modifications to place guards around baryon code. QDPXX and QDPJIT are also the devel branches with similar harmless guards around baryon code. The QUDA branch is feature/Nc_agnostiscism which is up-to-date with QUDA's develop branch.

If someone could cast a more experienced eye than mine over the QDPJIT clover code in CHROMA to look for an error I cannot see, I would be very appreciative. I can also provide a stack reproducer, but I do not have push rights to QDPXX or QDPJIT, so I will have to fork them for downloading.

QPhiX interfacing uses QDPCloverTerm

It should use CloverTermT<>::Type_t to properly accommodate QDP-JIT.

about ARM architecture

Could chroma run on a arm-based system? Thank you very much!

Deprecate libraries: CG-DWF SSE_WILSON_DSLASH

Hi Folks,
Should we consider deprecating sse_wilson_dslash and cg-dwf before making this source public?

CG-DWF is actually not strictly speaking ours, but belongs to MIT, but even they are not really using it anymore.

sse_wilson_dslash has essentially been superceded by cpp_wilson_dslash.

Thoughts?
Best, B

Cannot import chroma in latest pip release

Hi,
pip install chroma has installed chroma-0.2.0 in my python 3.6.6 environment on Kubuntu 18.04.

However, when I try to import chroma, I get this error:

  Traceback (most recent call last):

  File "/home/gally/Software/anaconda3/envs/npfc/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-85-dd9842f76628>", line 1, in <module>
    import chroma

  File "/home/gally/Software/anaconda3/envs/npfc/lib/python3.6/site-packages/chroma/__init__.py", line 9, in <module>
    from .core import *

  File "/home/gally/Software/anaconda3/envs/npfc/lib/python3.6/site-packages/chroma/core.py", line 273
    except Exception, e:
                    ^
SyntaxError: invalid syntax

Did I do something wrong or is this project compatible only with python 2?

If so, maybe you could specify it in the requirements.

Cheers,
Jose Manuel

Note “rossi_basis_v2.tex” cannot be compiled, missing “jnl”

The file rossi_basis_v2.tex starts with the following:

\input jnl

That file is not contained in the repository, therefore it cannot be compiled. Could you please add that into the repository and check that the documentation can be build?

Building Current Master HEAD with QDP-JIT

Hi,
The current chroma master HEAD revision needs the latest QDPXX code to build it,
with support for user created SubLattice's. These were introduced into QDPXX in late May. QDP-JIT/PTX master was last committed to prior to this on April 24th, and I had trouble building the latest chroma master (3b3adc2) against the latest QDP-JIT (1438924c6c211aa0e960d31e65b4e910a4a0fae7), tho I was successful building against the regular QDPXX (229c86dbacf8254439c98afa5a288f24895529b6).

Ultimately after a bit of a faff tracking, I found the last version of Chroma I could build against QDP-JIT in full double precision came from Chroma commit (55c054f). Versions of Chroma immediately before this commit had a type mismatch on the template used in MOD_t (LatticeColorVector):

This broke builds in DP as the TimeSliceIO param of MOD_t was LatticeColorVector, but in the calling code a TimeSliceIO with template param LatticeColorVecF was explicitly used in the eigen_source.get() call.

A workaround for this issue is: use Chroma commit 55c054f with QDP-JIT

A full fix would involve bringing QDP-JIT up to consistency with QDPXX regarding the SubLattice stuff.

#ifdef guard files referencing LatticeColorVectorSpinMatrix for QDP-JIT builds

A couple of files make reference to LatticeColorVectorSpinMatrix that may not work with QDP-JIT in current devel due to recent mods. I am going through a build fixing these with

#ifndef QDP_IS_QDPJIT

guards

Parallel IO for USQCD DD PAIRS prop writes

The readers and writers are hardwired for QDPIO_SERIAL. Add Boolean ParallelIO tag to the XML, and then determine the QDPIO_SERIAL or QDPIO_PARALLEL option for the readers and writers based on that

Add support for SystemVerilog lexer

Hello,

Can you please add support for the SystemVerilog lexer?

hdl lexer source

Chroma test cases

while running chroma test cases with qdp, qmp and quda, some test cases are failed except "t_mesplq", can you give us the solution about this test cases:
t_leapfrog , t_lwldslash_array , t_lwldslash_new , t_lwldslash_pab , t_lwldslash_sse , t_meas_wilson_flow , t_minvert , t_ritz_KS

The error getting is "MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1."

Chroma does not build in double precision and soalen=2 on AVX2

We want to do a lattice with Lx = 28. After checkerboarding this is 14 and must be divisible by the SoA length. Therefore we need soalen = 2 in a build. We want to have double precision and it has to run on KNL, where the vector length is 8 for double and 16 for float.

Taking a look into codegen/jinja/isa.js we see that AVX512 does not have the vector length that we need.

    "avx512": {
        "fptypes": {
            "double": {"veclen": 8, "soalens": [4, 8]},
            "float": {"veclen": 16, "soalens": [4, 8, 16]},
            "half": {"veclen": 16, "soalens": [4, 8, 16]}
        },
        "extra_includes_global": ["immintrin.h"],
        "extra_includes_local": []
    },

However, with AVX2 we have this option:

    "avx2": {
        "fptypes": {
            "double": {"veclen": 4, "soalens": [2, 4]},
            "float": {"veclen": 8, "soalens": [4, 8]},
            "half": {"veclen": 8, "soalens": [4, 8]}
        },
        "extra_includes_global": ["immintrin.h"],
        "extra_includes_local": ["qphix_codegen/avx_utils.h"]
    },

The problem is that Chroma seems to compile with both single and double precision kernels and there is no single precision kernel with SoA length 2:

/usr/local/software/jurecabooster/Stages/2018a/software/impi/2018.2.199-iccifort-2018.2.199-GCC-5.5.0/bin64/mpiicpc -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/sources/chroma/mainprogs/main -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/sources/chroma/lib -I../../lib  -xCORE-AVX2 -O3 -fopenmp -std=c++11 -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include/libxml2 -I/usr/local/software/jurecabooster/Stages/2018a/software/GMP/6.1.2-GCCcore-5.5.0/include -xCORE-AVX2 -O3 -fopenmp -std=c++11    -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include       -xCORE-AVX2 -O3 -fopenmp -std=c++11 -L../../lib  -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -L/usr/local/software/jurecabooster/Stages/2018a/software/GMP/6.1.2-GCCcore-5.5.0/lib     -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -L../../other_libs/qdp-lapack/lib       -o chroma chroma.o -lchroma  -lqdp -lXPathReader -lxmlWriter -lqio -llime -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -lxml2 -lm -lqmp -lqmp -lintrin -lfiledb -lfilehash -lgmp      -lqphix_solver -lqphix_codegen  -lqdp-lapack      
../../lib/libchroma.a(syssolver_linop_clover_qphix_w.o): In function `QPhiX::ClovDslash<float, 8, 2, true>::completeFaceDir(int, float const*, float (*) [3][4][2][2], float const (*) [8][2][3][2][8], QPhiX::Types<float, 8, 2, true>::CloverBlock const*, double, int, int, int, bool)':
syssolver_linop_clover_qphix_w.cc:(.text._ZN5QPhiX10ClovDslashIfLi8ELi2ELb1EE15completeFaceDirEiPKfPA3_A4_A2_A2_fPA8_A2_A3_A2_A8_S2_PKNS_5TypesIfLi8ELi2ELb1EE11CloverBlockEdiiib[_ZN5QPhiX10ClovDslashIfLi8ELi2ELb1EE15completeFaceDirEiPKfPA3_A4_A2_A2_fPA8_A2_A3_A2_A8_S2_PKNS_5TypesIfLi8ELi2ELb1EE11CloverBlockEdiiib]+0x45e): undefined reference to `void QPhiX::face_clov_finish_dir_plus<float, 8, 2, true, true>(float const*, QPhiX::Types<float, 8, 2, true>::SU3MatrixBlock const*, QPhiX::Types<float, 8, 2, true>::FourSpinorBlock*, QPhiX::Types<float, 8, 2, true>::CloverBlock const*, int const*, int const*, int, int, int, int, float, unsigned int, int, float const (*) [2])'

More than a year ago, before the great refactoring of QPhiX, this would have compiled because the kernel code was header-only and there was a default definition which would just raise a runtime error. This way one could still compile in this fashion but just had to be careful not to call it with single precision calls. Now we have to decide what we want:

Add non-functioning kernels to QPhiX which raise exceptions when called. This way Chroma would not need to be changed but QPhiX would have to generate broken kernels.
Chroma needs to be more careful which QPhiX objects are instantiated.

Monomials are referred to by their C++ classes and not the user's monomial_id

While going through the log files (solver, forces), the monomials are all referred to by their C++ classes. This makes sense from the solver or force computation code as it does not care what physics the user wants to describe with those. In the N_f = 2 + 1 simulations that I run with clover term, there are two log-det terms. From the output of the solvers and the force monitor I cannot tell which block belongs to which monomial (light log-det or strange log-det):

<elem>
  <EvenOddPrecLogDetEvenEvenMonomial>
    <S>-1078819.68276751</S>
  </EvenOddPrecLogDetEvenEvenMonomial>
</elem>

<EvenOddPrecLogDetEvenEvenMonomial>
  <Forces>
    <F_sq_per_direction>5.06968661479971e-09 5.17655652393928e-09 5.15228167670478e-09 5.09518950267906e-09</F_sq_per_direction>
    <F_avg_per_direction>6.77851025083714e-05 6.83683866616008e-05 6.82822311070698e-05 6.78154167053338e-05</F_avg_per_direction>
    <F_max_per_direction>0.000193982091162627 0.000175413372051822 0.00017905863116576 0.000194948531090264</F_max_per_direction>
    <F_sq>5.12342857953071e-09</F_sq>
    <F_avg>6.80627842455939e-05</F_avg>
    <F_max>0.000194948531090264</F_max>
  </Forces>
</EvenOddPrecLogDetEvenEvenMonomial>

In the flat text output, there seems to be no way to connect the iterations of the solvers to the monomial. It would be really interesting to know the relative iteration counts and residuals for two similar monomials (two Hasenbusch terms on different time scales for instance).

Looking through the code I saw that the monomials themselves do not know their monomial_id, only the singleton map knows those. So I thought that one could attach the name to the base class of the monomials. However, the XML emission code is redundant for every monomial type such that it would mean changes at a lot of places in order to get this done consistently.

Is there perhaps an easier way to make this more end-user friendly?

Git Modules

Hi, Just noticed that we need to repoint the gitmodules for every branch potentially. I've done it for 'master' and 'ptxquda'. Beware...

Interface changes made chroma incompatible with qphix/devel

hello,

using the latest qphix/devel branch together with chroma causes the following issues:

/project/projectdirs/mpccc/tkurth/NESAP/USQCD/install/intel/qphix_knl/include/qphix/inv_richardson_multiprec.h(126): error: no instance of overloaded function "QPhiX::AbstractSolver<FT, V, S, compress12, num_flav>::operator() [with FT=float, V=16, S=8, compress12=true, num_flav=1]" matches the argument list
argument types are: (float (*const [1])[3][4][2][8], float (*const [1])[3][4][2][8], double, int, double, unsigned long, unsigned long, int, bool, int)
object type is: QPhiX::AbstractSolver<float, 16, 8, true, 1>
solver_inner(dx_inner,
^
/project/projectdirs/mpccc/tkurth/NESAP/USQCD/install/intel/qphix_knl/include/qphix/abs_solver.h(94): note: this candidate was rejected because arguments do not match
virtual void operator()(Spinor *x[num_flav],
^
/project/projectdirs/mpccc/tkurth/NESAP/USQCD/install/intel/qphix_knl/include/qphix/abs_solver.h(76): note: this candidate was rejected because arguments do not match
virtual void operator()(Spinor *x[num_flav],
^
/project/projectdirs/mpccc/tkurth/NESAP/USQCD/install/intel/qphix_knl/include/qphix/abs_solver.h(41): note: this candidate was rejected because arguments do not match
virtual void operator()(Spinor *x,

Additionally, some std:: scopes are not explicitly specified, causing issues after using namespace std was removed, e.g.

/project/projectdirs/mpccc/tkurth/NESAP/USQCD/src/chroma/lib/actions/ferm/invert/qphix/syssolver_mdagm_clover_qphix_iter_refine_w.h(506): error: identifier "endl" is undefined
QDPIO::cout << "QPHIX_MDAGM_SOLVER: total time: " << swatch.getTimeInSeconds() << " (sec)" << endl;

The commits are:
chroma: 0618847
qphix: aca7bb148deb7b2762372a76ef25002ce0c48551

Best Regards
Thorsten Kurth

Undefined behavior if static multiXd LatticeObjects are used

Static variables like the phases in lib/util/gauge/stag_phases_s.cc and

static multiXd<LatticeABC> abc;

cause undefined behavior if they have size>0 when the program exits.

This is due to the fact that the order of initialization (and therefore deconstruction) of the qdp++ allocator map (the_alignment_map) and the empty multiXd is undefined. Maybe worse is that the qdp allocator hooks atexit, whereas the multiXd, which requires the allocator, is definitely destroyed after that.

A hacky patch for the staggered phases issue is bglaessle/chroma@e2a1313, but one could also replace the static variables by objects in the NamedObjMap.
I can create a merge request if your interested.

Generally one should either prevent these kinds of variables
or maybe make the_alignment_map a member of the allocator. Therefore I can open an issue in qdp++.

Furthermore I have not checked chroma or other code bases for more of these static variables.

Build with Nc!=3 broken

Build fails when compiling lib/meas/gfix/coulgauge.cc: there is a call to reunit() which fails, even though lib/util/gauge/reunit.cc has code for arbitrary Nc.

I believe this is due to commit d147c2a

Issues with Intel Compiler

I have had several issue reports with Intel compiler, probably as relates to QDP++ under Chroma (perhaps I should move / cross list this issue to a QDP++ tracker if we ever get one):

Brendan Fahy reported this in March:

I found a very strange bug. In
lib/actions/gauge/gaugeacts/plaq_gaugeact.cc when compiled with the
intel compiler gives completely wrong results for the backwards
staple. I was getting completely wrong results when running with an
intel compiler and was able to track it down to the staple function in
the wilson action. Very shocked, but rewriting the multiplication
using an extra temp variable seems to sort out whatever the intel
compiler was doing wrong.

This does not yet implicate the high optimization level (sent query to Brendan)
but Jie also had a similar issue which we did track to using -O3. This made me think
that Brendan's issue may have also been due to -O3 vs -O2.

Finally, Will Detmold reported incorrect solver convergence (and in fact nonconvergence) on Edison at NERSC, using a configuration created on Intrepid
(Argonne BG/P). He was trying to continue the run. He tried a variety of optimization
combinations:

(op qmp) + (op qdp++) + (op chroma + sseBICGkernels) = BAD
(unop qmp) + (unop qdp++) + (unop chroma) = GOOD
(op qmp) + (op qdp++) + (unop chroma) = BAD
(op qmp) + (unop qdp++) + (unop chroma) = GOOD
(op qmp) + (unop qdp++) + (op chroma + sseBICGkernels) = GOOD
(op qmp) + (partop qdp++) + (op chroma + sseBICGkernels) = GOOD

op means with -O3 (and -ax=avx for chroma and qdp++)

partop qdp++ means with see but without -O3

Seems like it is QDP++ where the issue is when -O3 is used

I would add to this that I suspect the issue is ina .cc file in QDP++,
since if it was in a .h file, then the (op Chroma) would likely also be bad.

However, Will's test hopefully used a double prec build. I don't know if Jie and Brendan saw this issue in double prec or not.

add timing to mg_proto solver

Add timing measurements to the MG_proto solver to gauge speed etc.

QOP MG and linkage

The current templated LinOpSysSolverQOPMG is written so that templated member functions are in the .cc file and not in the defining .h file. This seems to cause problems when folks try to create these classes directly. The issue is the templating, as usual. It should all be inclined. A hackaround is to remove the templating for this solver.

t_mesplq produces empty xml

The README states

You can execute the program simply by

% ./t_mesplq

which will compute the average plaquette on a random gauge
field and write the result into  "t_mesplq.xml" .

However, doing so (consistently) produces the following output

Lattice initialized:
  problem size = 4 4 4 8
  layout size = 4 4 4 8
  logical machine size = 1 1 1 1
  subgrid size = 4 4 4 8
  total number of nodes = 1
  total volume = 512
  subgrid volume = 512
Initializing QDPDefaultAllocator.
Finished init of RNG
Finished lattice layout
Start gaussian
w_plaq = 0.00127763216854267
link = 0.0123963951342982
w_plaq = 0.00127763036754105
link = 0.000377129760333143

and creates a completely empty xml file t_mesplq.xml. I assume this is not the expected result?

Is there a definitive list of allowed XML tags?

Currently when I write an input file, I more or less do these things:

Look through my list of std::string name entities that I parsed from the source code to find some particular gauge action or fermion state. If that does not help, I give some wrong name and let hmc crash. All the keys in the singleton map are printed, that way one knows which keys are allowed. A case where this is needed is when finding a solver to use with rational monomials where a multi-shift solver is a must.
Do a git grep LW_TREE_GAUGEACT to find the file where it is defined.
Go through the C++ source or header file and find the read or write calls that interact with the XML. There I can see which parameters are loaded and which are optional.
Look at the header file (or the separate parameters file) to see which types those parameters are to get an idea what values I can supply.

This works, it does not need much knowledge of C++. However, I think it would be very nice to have a definitive list of possible XML tags and values. Is there something like that hidden in the documentation directory?

Other than manually looking through all the code I thought about using the Clang C++ parser to extract all the calls to read. I have looked into the Clang API a bit but I think it will take too much time for me to get it implemented properly and then it would have cost a multiple of the time needed to perform the above steps a couple times for the actions I want to simulate.

Is that pretty much it or am I missing something?

The errors of make

The "make" ends with the following errors:

In file included from ./util/ferm/key_peram_distillution.h:10,
from ./meas/hadron/distillution_factory.h:19,
from meas/hadron/distillution_factory.cc:7:
./util/ferm/key_val_db.h:38:50: error: ISO C++17 does not allow dynamic exception specifications
38 | void writeObject (std::string& output) const throw (SerializeException) {3

  |                                                  ^~~~~

./util/ferm/key_val_db.h:44:48: error: ISO C++17 does not allow dynamic exception specifications
44 | void readObject (const std::string& input) throw (SerializeException) {
| ^~~~~
./util/ferm/key_val_db.h:86:50: error: ISO C++17 does not allow dynamic exception specifications
86 | void writeObject (std::string& output) const throw (SerializeException) {
| ^~~~~
./util/ferm/key_val_db.h:92:48: error: ISO C++17 does not allow dynamic exception specifications
92 | void readObject (const std::string& input) throw (SerializeException) {
| ^~~~~
./util/ferm/key_val_db.h: In instantiation of ‘class Chroma::SerialDBKeyChroma::KeyPeramDistillution_t’:
./util/ferm/key_peram_distillution.h:43:42: required from here
./util/ferm/key_val_db.h:36:26: error: conflicting return type specified for ‘const short unsigned int Chroma::SerialDBKey::serialID() const [with K = Chroma::KeyPeramDistillution_t]’
36 | const unsigned short serialID (void) const {return 456;}
| ^~~~~~~~
In file included from /usr/local/include/DBKey.h:43,
from /usr/local/include/DBCursor.h:55,
from /usr/local/include/ConfDataStoreDB.h:71,
from /usr/local/include/qdp_db_imp.h:10,
from /usr/local/include/qdp_db.h:12,
from ./util/ferm/key_val_db.h:10,
from ./util/ferm/key_peram_distillution.h:10,
from ./meas/hadron/distillution_factory.h:19,
from meas/hadron/distillution_factory.cc:7:
/usr/local/include/Serializable.h:58:28: note: overridden function is ‘virtual short unsigned int FILEDB::Serializable::serialID() const’
58 | virtual unsigned short serialID (void) const = 0;
| ^~~~~~~~
In file included from ./util/ferm/key_peram_distillution.h:10,
from ./meas/hadron/distillution_factory.h:19,
from meas/hadron/distillution_factory.cc:7:
./util/ferm/key_val_db.h: In instantiation of ‘class Chroma::SerialDBDataChroma::ValPeramDistillution_t’:
./util/ferm/key_peram_distillution.h:44:42: required from here
./util/ferm/key_val_db.h:84:26: error: conflicting return type specified for ‘const short unsigned int Chroma::SerialDBData::serialID() const [with D = Chroma::ValPeramDistillution_t]’
84 | const unsigned short serialID (void) const {return 123;}
| ^~~~~~~~

It seems that the errors are associated with programming itself.

HEX smearing is not implemented for HMC but does not fail

HEX smearing seems to be implemented for the HMC, one can create a hmc input XML which includes HEX_FERM_STATE and the simulation runs. However, lib/actions/ferm/fermstates/hex_fermstate_w.h reveals missing parts:

/* Not implemneted   **/
virtual void deriv(P& F) const 
{
  START_CODE();
  
  END_CODE();
}

I had started a simulation with this assuming that if there was a key in the factory, it would be implemented. In order to prevent people from creating false results, I think one should throw some exception in the missing methods to make the program fail fast with an error message saying that HEX smearing is not supported currently.

Should I implement that?

Does Chroma support without QMP/MPI on single GPU?

I am building Chroma with QUDA,QDP and without QMP,MPI

Build steps followed:
QUDA: (QMP=OFF)
git clone --recursive https://github.com/lattice/quda.git
mkdir build_quda
cd build_quda
CXX=g++ CC=gcc cmake ../quda -DQUDA_GPU_ARCH=sm_70 -DQUDA_DIRAC_STAGGERED=ON -DQUDA_DIRAC_DOMAIN_WALL=OFF -DQUDA_DIRAC_TWISTED_MASS=OFF -DQUDA_DOWNLOAD_EIGEN=ON -DQUDA_DIRAC_WILSON=ON -DQUDA_DIRAC_CLOVER=ON -DQUDA_DIRAC_TWISTED_CLOVER=ON -DQUDA_DYNAMIC_CLOVER=ON -DQUDA_DIRAC_CLOVER_HASENBUSCH=ON -DQUDA_LINK_HISQ=OFF -DQUDA_MULTIGRID=ON -DQUDA_MPI=OFF -DQUDA_INTERFACE_MILC=OFF -DQUDA_QMP=OFF -DQUDA_QIO=OFF -DQUDA_BUILD_SHAREDLIB=ON -DQUDA_BUILD_ALL_TESTS=ON -DQUDA_TEX=OFF -DCMAKE_BUILD_TYPE=DEVEL -DCMAKE_CXX_FLAGS="-O3 -march=core2 -funroll-all-loops -fargument-noalias-global -finline-limit=50000 -fpeel-loops" -DCMAKE_CFLAGS="-O3 -fargument-noalias-global -funroll-all-loops -fpeel-loops -march=core2"
make -j

QDP: (QMP=OFF)
git clone --recursive https://github.com/usqcd-software/qdpxx.git
cd qdpxx
autoreconf -i
mkdir build
cd build
../configure --prefix=/opt/singlegpu/qdpxx --enable-parallel-arch=scalar --enable-precision=single --enable-sse2 CXX=g++ CC=gcc CXXFLAGS="-O3 -march=core2 -funroll-all-loops -fargument-noalias-global -finline-limit=50000 -fpeel-loops" CFLAGS="-O3 -fargument-noalias-global -funroll-all-loops -fpeel-loops -march=core2"
make -j
make install

Chroma:
git clone --recursive https://github.com/JeffersonLab/chroma.git -b devel
cd chroma
autoreconf -i
mkdir build
cd build
../configure --prefix=/opt/singlegpu/chroma --with-qdp=/opt/singlegpu/qdpxx --with-quda=/home/build_quda --enable-gmp --enable-sse2 --enable-sse3 CC=gcc CXX=g++ CXXFLAGS="-O3 -march=core2 -funroll-all-loops -fargument-noalias-global -finline-limit=50000 -fpeel-loops" CFLAGS="-O3 -fargument-noalias-global -funroll-all-loops -fpeel-loops -march=core2"

make -j
make install

Build got successful.

Error:
When I run any test case, getting the below error
Invalid device number -1 (/home/quda/lib/interface_quda.cpp:559 in initQudaDevice())
last kernel called was (name=,volume=,aux=)

When I build with QMP ON, its working.It is picking the device ID correctly. So, can we use Chroma without QMP and if so, please suggest what I am missing.

System Configuration
CUDA version: 10.1
gcc version: 7.5
cmake version: 3.16.5
OS: Ubuntu 16.04.1

Thanks in advance :)

Multinode Support

I want to try CHROMA on multinode multigpu setup.Does it supports and scalable?

git clone fails

Recently git clone --recursive [email protected]:JeffersonLab/chroma.git has begun failing.

The issue seems to be a missing submodule, wilsonmg hosted on @bjoo's bitbucket, which seems to be a private repo.

> git clone --recursive [email protected]:JeffersonLab/chroma.git
Cloning into 'chroma'...
Warning: Permanently added the RSA host key for IP address '192.30.253.113' to the list of known hosts.
remote: Counting objects: 74093, done.
remote: Total 74093 (delta 0), reused 0 (delta 0), pack-reused 74093
Receiving objects: 100% (74093/74093), 32.50 MiB | 1.69 MiB/s, done.
Resolving deltas: 100% (59162/59162), done.
Checking connectivity... done.
Submodule 'other_libs/cg-dwf' ([email protected]:JeffersonLab/cg-dwf.git) registered for path 'other_libs/cg-dwf'
Submodule 'other_libs/cpp_wilson_dslash' ([email protected]:JeffersonLab/cpp_wilson_dslash.git) registered for path 'other_libs/cpp_wilson_dslash'
Submodule 'other_libs/qdp-lapack' ([email protected]:JeffersonLab/qdp-lapack.git) registered for path 'other_libs/qdp-lapack'
Submodule 'other_libs/sse_wilson_dslash' ([email protected]:JeffersonLab/sse_wilson_dslash.git) registered for path 'other_libs/sse_wilson_dslash'
Submodule 'other_libs/wilsonmg' ([email protected]:bjoo/wilsonmg.git) registered for path 'other_libs/wilsonmg'
Cloning into 'other_libs/cg-dwf'...
remote: Counting objects: 276, done.
remote: Total 276 (delta 0), reused 0 (delta 0), pack-reused 276
Receiving objects: 100% (276/276), 1.24 MiB | 773.00 KiB/s, done.
Resolving deltas: 100% (166/166), done.
Checking connectivity... done.
Submodule path 'other_libs/cg-dwf': checked out '7e850581dc552004b2234af78387c174dd35770a'
Cloning into 'other_libs/cpp_wilson_dslash'...
remote: Counting objects: 610, done.
remote: Total 610 (delta 0), reused 0 (delta 0), pack-reused 610
Receiving objects: 100% (610/610), 413.65 KiB | 691.00 KiB/s, done.
Resolving deltas: 100% (439/439), done.
Checking connectivity... done.
Submodule path 'other_libs/cpp_wilson_dslash': checked out '56a4abf64c4d586a73bc2fd7747067f6e92329c9'
Cloning into 'other_libs/qdp-lapack'...
remote: Counting objects: 501, done.
remote: Total 501 (delta 0), reused 0 (delta 0), pack-reused 501
Receiving objects: 100% (501/501), 264.80 KiB | 0 bytes/s, done.
Resolving deltas: 100% (349/349), done.
Checking connectivity... done.
Submodule path 'other_libs/qdp-lapack': checked out 'c1e61d3593b42f39afc68b23752992c88a1e6ef9'
Cloning into 'other_libs/sse_wilson_dslash'...
remote: Counting objects: 888, done.
remote: Total 888 (delta 0), reused 0 (delta 0), pack-reused 888
Receiving objects: 100% (888/888), 482.48 KiB | 384.00 KiB/s, done.
Resolving deltas: 100% (663/663), done.
Checking connectivity... done.
Submodule path 'other_libs/sse_wilson_dslash': checked out 'af5c190b8d63d0129a830a2326a8c30f8ccce4ee'
Cloning into 'other_libs/wilsonmg'...
ssh: connect to host bitbucket.org port 22: Connection refused
fatal: protocol error: bad line length character: f '[email protected]:bjoo/wilsonmg.git' into submodule path 'other_libs/wilsonmg' failed

Typo in XML: “WislonFlowGaugeObservables”

The XML output of the Wilson flow has a typo in one of the XML tags: In the file lib/meas/inline/glue/inline_wilson_flow.cc it says MesPlq(xml_out, "WislonFlowGaugeObservables", wf_u);.

Is that somethat that should be fixed or is it likely that people have written XML parsing scripts that now depend on the name being misspelled?

Where has the "configure" file gone?

without which how can I install chroma on linux?

Allow the regression running program to take extra args for chroma

Currently regression checks are run with 'make xcheck' and this makes up the command:

${RUN} -i <input.xml> -o <output.xml> -l <log.xml> 2> .err > .out

In some builds we may want extra arguments e.g. -geom 1 1 1 1
or for QDP-JIT one may want a -ptxdb or similar argument
or for QPhIX builds one may want a list of -by -bz etc options.

These would need to go after the eecutable (so they cannot be part of the the RUN)
which would place them either before the executable or after the redirects.

The issue can be solved if the RUN script is made a bit more sophisticated e.g. it doesn't just
do $* but splits it into a head and a and the extra args can be sandwiched between the head and the tail.

Alternatively the orchestrator in chroma/scripts/run_chroma_xmldiff.pl could be changed.

QMP Thread safety level in QDP++

QMP_initialize allows us to request a thread safety level (QMP_THREAD_SINGLE, QMP_THREAD_FUNNELED, QMP_THREAD_SERIALIZED, QMP_THREAD_MULTIPLE) mimicking the levels offered by MPI. A recent commit of QMP ( commit 9cdf6875d0e7077cf3df2e8a5b56f26646302e52 ) performs the MPI Init properly requiring the corresponding MPI thread safety level (before it was always MPI_THREAD_SINGLE==QMP_THREAD_SINGLE).

QDP++ should call this init function 'properly', with the right thread level. Non threaded code should use QMP_THREAD_SINGLE, and threaded code may need higher levels. Right now probably our master thread + fork join, may indicate QMP_THREAD_FUNNELED as the right choice, but if we intend to use Chroma with multi-threaded libraries, where individual threads can make MPI calls (you never know) it may be worth to go for QMP_THREAD_MULTIPLE (=> MPI_THREAD_MULTIPLE).

This may use code paths in MPI with higher overhead than MPI_THREAD_SINGLE.
In terms of code we could choose to call QMP/MPI_THREAD_SINGLE, or QMP/MP_THREAD_MULTIPLE based on how many threads there are (qdpNumThreads()).

XML files are not properly closed when running into a timeout

I occasionally face the problem that I specified too many updates for a single invocation of hmc and therefore run into the six hour walltime limit on JURECA. The last block started in the output XML might then look like this:

      <elem>
        <Update>
          <update_no>193</update_no>
          <WarmUpP>false</WarmUpP>
          <HMCTrajectory>
            <WarmUpP>false</WarmUpP>
            <H_old>
              <KE_old>5010.4276207191</KE_old>
              <PE_old>-43007097.8619257</PE_old>
            </H_old>
            <H_new>
              <KE_new>5791.19429249987</KE_new>
              <PE_new>-43007878.5434836</PE_new>
            </H_new>
            <deltaKE>780.766671780766</deltaKE>
            <deltaPE>-780.681557863951</deltaPE>
            <deltaH>0.0851139168153168</deltaH>
            <AccProb>0.918407656366748</AccProb>
            <AcceptP>true</AcceptP>
          </HMCTrajectory>
          <seconds_for_trajectory>721.725699</seconds_for_trajectory>
          <InlineObservables>
            <elem>
              <Plaquette>
                <update_no>193</update_no>
                <w_plaq>0.441936165724557</w_plaq>
                <s_plaq>0.441920568668605</s_plaq>
                <t_plaq>0.441951762780509</t_plaq>
                <plane_01_plaq>0.441707477639388</plane_01_plaq>
                <plane_02_plaq>0.441705823111355</plane_02_plaq>
                <plane_12_plaq>0.442348405255071</plane_12_plaq>
                <plane_03_plaq>0.442113163975531</plane_03_plaq>
                <plane_13_plaq>0.441967064555441</plane_13_plaq>
                <plane_23_plaq>0.441775059810555</plane_23_plaq>
                <link>-3.00385949850094e-05</link>
              </Plaquette>
            </elem>
            <elem>
              <PolyakovLoop>
                <update_no>193</update_no>

The file is just truncated there. Reading it with an XML parser will not work. There are options to my XML library (lxml which somewhere deep down uses libxml2) to still parse this file.

I thought it would be a better user experience if the XML files would be valid even when the job scheduler sends a termination signal. This could be achieved by letting push create some object with calls pop in the destructor. That would of course mean that the whole could would have to be converted from push(...) to XmlPush foo(...) or by converting push to a macro that generates a unique object each time (using the TOKENPASE from http://stackoverflow.com/a/1597129):

#define TOKENPASTE(x, y) x ## y
#define TOKENPASTE2(x, y) TOKENPASTE(x, y)
#define push(...) (XmlPush TOKENPASTE2(xml_push_, __LINE__)(...))

I'm just going to write a little script that will close those XML files for now.

This seems like a rather minor feature with virtually the whole codebase being touched. Is there any interest in this? Is that even sensible?

Remove calls to “tune” in QPhiX

QPhiX devel does not contain the tuning any more, so the tune() member function is also gone.There seem to be four spots to remove the calls:

lib/actions/ferm/invert/qphix/syssolver_linop_clover_qphix_w.h:   if( invParam.TuneP ) cg_solver->tune();
lib/actions/ferm/invert/qphix/syssolver_linop_clover_qphix_w.h:   if( invParam.TuneP ) bicgstab_solver->tune();        
lib/actions/ferm/invert/qphix/syssolver_mdagm_clover_qphix_w.h:   if( invParam.TuneP ) cg_solver->tune();              
lib/actions/ferm/invert/qphix/syssolver_mdagm_clover_qphix_w.h:   if( invParam.TuneP ) bicgstab_solver->tune();

I could do it, but it does not seem clear to me whether devel or master is the right branch. They have diverged and there seems to be some cherry-picking going on: