Comments (16)
I'll try and take a look into this more tomorrow, thank you so much for investigating!
from mumps-feedstock.
As a partial update @traversaro @akhmerov we have some friends at Quansight looking into fixing the METIS 5.2.1 build (hopefully). Will keep you posted.
from mumps-feedstock.
Personally I am in favor of stopping the metis 5.1.1 migration and switching back to metis 5.1.0 here and in the other migrated feedstocks (see https://conda-forge.org/status/#metis511). Once metis 5.2.1 is ready, we can try it and if it works fine proceed with the 5.2.1 migration, I am not sure if any other @conda-forge/mumps @conda-forge/metis have any other opinion. I would be happy to do the necessary PRs to stop the migration and revert migrated repos to 5.1.0 .
from mumps-feedstock.
Thanks a lot for the thorough investigation @akhmerov, I totally agree. At this point, considering also the other failures we are seeing with metis 5.1.1 (conda-forge/gtsam-feedstock#21) we could consider reverting the migration to 5.1.1 at the conda-forge level, and stick to 5.1.0 until metis 5.2.1 is available. This has the downside that dgl will not be installable side-by-side with other conda-forge packages that depend on metis, but if anyone really needs that they can invest in the work either in packaging metis 5.2.1 or ensuring that the package of interest build for both metis 5.1.0 and 5.1.1 .
Any opinion on this @conda-forge/metis @conda-forge/dgl @conda-forge/mumps ?
from mumps-feedstock.
conda-forge/conda-forge-pinning-feedstock#5396 halts the migration. Rebuilds can start once that lands. Looks like only 10 packages. @Traverso feel free to ping me on any un-migrate PRs
from mumps-feedstock.
@akhmerov just to understand, is this crash related to python -c "import kwant; kwant.test()"
(that was in theory fixed in #88) or something else?
from mumps-feedstock.
It's a different segfault caught by our updated tests (that's why kwant.test()
passes now, while the unreleased version of tests crashes). Because it works with the same version of mumps and with different orderings though, I expect that the problem is not on our side.
Here is the crash in CI, but I'll provide a more self-contained example in a bit.
from mumps-feedstock.
I have investigated the failure more, and I have arrived to the following reproducer that works on Ubuntu 23.10.
First Make an environment
name: mumps_bug
channels:
- conda-forge
dependencies:
- mumps-seq=5.2.1
- metis=5.1.1
- kwant=1.4.4
- valgrind
- pytest
- pip
- pip:
- pytest-valgrind
Download this test file
contents
import itertools
import numpy as np
import pytest
from pytest import raises
from numpy.testing import assert_almost_equal
import kwant
from kwant._common import ensure_rng
import kwant.solvers.sparse
import kwant.solvers.mumps
no_mumps = False
mumps_solver_options = [
{'nrhs': 10, 'ordering': 'metis'},
{'nrhs': 10, 'sparse_rhs': True, 'ordering': 'metis'},
{'nrhs': 2, 'ordering': 'metis', 'sparse_rhs': True},
]
solvers = list(itertools.chain(
[("mumps", opts) for opts in mumps_solver_options],
))
def solver_id(s):
solver_name, opts = s
args = ", ".join(f"{k}={repr(v)}" for k, v in opts.items())
return f"{solver_name}({args})"
@pytest.fixture(scope="function", params=mumps_solver_options)
def solver(request):
solver_opts = request.param
solver = kwant.solvers.mumps
solver.options(**solver_opts)
return solver
@pytest.fixture
def smatrix(solver):
return solver.smatrix
@pytest.fixture
def greens_function(solver):
return solver.greens_function
@pytest.fixture
def wave_function(solver):
return solver.wave_function
@pytest.fixture(scope="function")
def twolead_builder():
rng = ensure_rng(4)
system = kwant.Builder()
left_lead = kwant.Builder(kwant.TranslationalSymmetry((-1,)))
right_lead = kwant.Builder(kwant.TranslationalSymmetry((1,)))
for b, site in [(system, chain(0)), (system, chain(1)),
(left_lead, chain(0)), (right_lead, chain(0))]:
h = rng.random_sample((n, n)) + 1j * rng.random_sample((n, n))
h += h.conjugate().transpose()
b[site] = h
for b, hopp in [(system, (chain(0), chain(1))),
(left_lead, (chain(0), chain(1))),
(right_lead, (chain(0), chain(1)))]:
b[hopp] = (10 * rng.random_sample((n, n)) +
1j * rng.random_sample((n, n)))
system.attach_lead(left_lead)
system.attach_lead(right_lead)
return system
n = 5
chain = kwant.lattice.chain(norbs=n)
sq = square = kwant.lattice.square(norbs=n)
def test_output(twolead_builder, smatrix):
fsyst = twolead_builder.finalized()
result1 = smatrix(fsyst)
s, modes1 = result1.data, result1.lead_info
assert s.shape == 2 * (sum(len(i.momenta) for i in modes1) // 2,)
s1 = result1.submatrix(1, 0)
result2 = smatrix(fsyst, 0, (), [1], [0])
s2, modes2 = result2.data, result2.lead_info
assert s2.shape == (len(modes2[1].momenta) // 2,
len(modes2[0].momenta) // 2)
assert_almost_equal(abs(s1), abs(s2))
assert_almost_equal(np.dot(s.T.conj(), s),
np.identity(s.shape[0]))
raises(ValueError, smatrix, fsyst, out_leads=[])
modes = smatrix(fsyst).lead_info
h = fsyst.leads[0].cell_hamiltonian()
t = fsyst.leads[0].inter_cell_hopping()
modes1 = kwant.physics.modes(h, t)[0]
h = fsyst.leads[1].cell_hamiltonian()
t = fsyst.leads[1].inter_cell_hopping()
modes2 = kwant.physics.modes(h, t)[0]
raise
def test_smatrix_shape(smatrix):
chain = kwant.lattice.chain(norbs=1)
system = kwant.Builder()
lead0 = kwant.Builder(kwant.TranslationalSymmetry((-1,)))
lead1 = kwant.Builder(kwant.TranslationalSymmetry((1,)))
for b, site in [(system, chain(0)), (system, chain(1)),
(system, chain(2))]:
b[site] = 2
lead0[chain(0)] = lambda site: lead0_val
lead1[chain(0)] = lambda site: lead1_val
for b, hopp in [(system, (chain(0), chain(1))),
(system, (chain(1), chain(2))),
(lead0, (chain(0), chain(1))),
(lead1, (chain(0), chain(1)))]:
b[hopp] = -1
system.attach_lead(lead0)
system.attach_lead(lead1)
fsyst = system.finalized()
lead0_val = 4
lead1_val = 4
s = smatrix(fsyst, 1.0, (), [1], [0]).data
assert s.shape == (0, 0)
lead0_val = 2
lead1_val = 2
s = smatrix(fsyst, 1.0, (), [1], [0]).data
assert s.shape == (1, 1)
lead0_val = 4
lead1_val = 2
s = smatrix(fsyst, 1.0, (), [1], [0]).data
assert s.shape == (1, 0)
lead0_val = 2
lead1_val = 4
s = smatrix(fsyst, 1.0, (), [1], [0]).data
assert s.shape == (0, 1)
def test_reflection_no_open_modes(greens_function):
# Build system
syst = kwant.Builder()
lead = kwant.Builder(kwant.TranslationalSymmetry((-1, 0)))
syst[(square(i, j) for i in range(3) for j in range(3))] = 4
lead[(square(0, j) for j in range(3))] = 4
syst[square.neighbors()] = -1
lead[square.neighbors()] = -1
syst.attach_lead(lead)
syst.attach_lead(lead.reversed())
syst = syst.finalized()
# Sanity check; no open modes at 0 energy
_, m = syst.leads[0].modes(energy=0)
assert m.nmodes == 0
assert np.isclose(greens_function(syst).transmission(0, 0), 0)
Place the file in an empty folder and activate the environment. Observe that running py.test
in that folder, while sometimes finishes (disregard the errors, they are not relevant), sometimes segfaults with
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.1, pytest-7.4.4, pluggy-1.3.0
rootdir: /home/anton/tmp/mumps_bug
plugins: valgrind-0.2.0
collected 9 items
test_bug.py FFF...FFFatal Python error: Segmentation fault
Current thread 0x00007f6926919740 (most recent call first):
File "/home/anton/micromamba/envs/mumps_bug/lib/python3.12/site-packages/kwant/linalg/mumps.py", line 243 in analyze
File "/home/anton/micromamba/envs/mumps_bug/lib/python3.12/site-packages/kwant/linalg/mumps.py", line 320 in factor
File "/home/anton/micromamba/envs/mumps_bug/lib/python3.12/site-packages/kwant/solvers/mumps.py", line 104 in _factorized
...TRUNCATED
Furthermore, running valgrind using PYTHONMALLOC=malloc valgrind --show-leak-kinds=definite --log-file=valgrind-output py.test --valgrind --valgrind-log=valgrind-output
gives (after a fairly long wait) this error that looks relevant:
________________________________________________________________________ test_reflection_no_open_modes[solver0] ________________________________________________________________________
[VALGRIND ERROR+LEAK]
Valgrind detected both an error(s) and a leak(s):
**3904598**
**3904598** **********************************************************************
**3904598** test_bug.py::test_reflection_no_open_modes[solver0]
**3904598** **********************************************************************
==3904598==
==3904598== More than 100 errors detected. Subsequent errors
==3904598== will still be recorded, but in less detail than before.
==3904598== Conditional jump or move depends on uninitialised value(s)
==3904598== at 0x53A159F8: libmetis__genmmd (in /home/anton/micromamba/envs/mumps_bug/lib/libmetis.so)
==3904598== by 0x53A16CBC: libmetis__MMDOrder (in /home/anton/micromamba/envs/mumps_bug/lib/libmetis.so)
==3904598== by 0x53A17090: libmetis__MlevelNestedDissection (in /home/anton/micromamba/envs/mumps_bug/lib/libmetis.so)
==3904598== by 0x53A175DB: METIS_NodeND (in /home/anton/micromamba/envs/mumps_bug/lib/libmetis.so)
==3904598== by 0x53963A5F: __mumps_ana_ord_wrappers_MOD_mumps_metis_nodend_mixedto32 (in /home/anton/micromamba/envs/mumps_bug/lib/libmumps_common_seq-5.2.1.so)
==3904598== by 0x53756C60: __zmumps_ana_aux_m_MOD_zmumps_ana_f (in /home/anton/micromamba/envs/mumps_bug/lib/libzmumps_seq-5.2.1.so)
==3904598== by 0x5384CA2F: zmumps_ana_driver_ (in /home/anton/micromamba/envs/mumps_bug/lib/libzmumps_seq-5.2.1.so)
==3904598== by 0x538D3700: zmumps_ (in /home/anton/micromamba/envs/mumps_bug/lib/libzmumps_seq-5.2.1.so)
==3904598== by 0x538D8ABD: zmumps_f77_ (in /home/anton/micromamba/envs/mumps_bug/lib/libzmumps_seq-5.2.1.so)
==3904598== by 0x538CFB25: zmumps_c (in /home/anton/micromamba/envs/mumps_bug/lib/libzmumps_seq-5.2.1.so)
==3904598== by 0x53713DE9: __pyx_pw_5kwant_6linalg_6_mumps_6zmumps_5call (in /home/anton/micromamba/envs/mumps_bug/lib/python3.12/site-packages/kwant/linalg/_mumps.cpython-312-x86_64-linux-gnu.so)
==3904598== by 0x32F93E: UnknownInlinedFun (pycore_call.h:92)
==3904598== by 0x32F93E: PyObject_Vectorcall (call.c:325)
from mumps-feedstock.
Also: installing metis=5.1.0
makes the segfault disappear and removes the Valgrind error (the leak stays, but it's likely irrelevant).
from mumps-feedstock.
Investigating the issues with Metis 5.1.1, it seems unavoidable. I suggest skipping 5.1.1 and waiting until 5.2.1 arrives to the feedstock (conda-forge/metis-feedstock#41 if it succeeds).
from mumps-feedstock.
This is becoming a blocker for using the feedstock on windows. There:
- Version 5.2.1 misplaces
.lib
files so that mumps isn't found bymeson
- Version 5.6.2 lacks
mumps_int_def.h
(#100)
(I'm just working on a feedstock over here: conda-forge/staged-recipes#25042)
from mumps-feedstock.
The basic blocker is KarypisLab/GKlib#23 (comment).
If we don't get a response soon we will do a release targeting latest sha.
from mumps-feedstock.
I'm a bit concerned about counting on that, see the evaluation of Metis 5.2 by SuiteSparse DrTimothyAldenDavis/SuiteSparse#291 (comment)
from mumps-feedstock.
Would it be an appropriate solution to have build variants for metis 5.1.0 and 5.2.1 (when it's available)? Looking at #88, the only difference is whether to apply the patch.
On the other hand only packaging for a library that wasn't tested (metis 5.2.1) or even isn't releasable right now seems like a potential for a lot of pain for the users.
from mumps-feedstock.
Since the packages are now effectively broken, I would really appreciate that.
from mumps-feedstock.
More examples of metis 5.1.1 problems in the wild: ami-iit/bipedal-locomotion-framework#799 .
from mumps-feedstock.
Related Issues (19)
- BLAS support HOT 2
- Missing header file HOT 7
- incorrect install name in osx mumps libraries HOT 3
- pkgconfig HOT 2
- segfault when shared library used from petsc: mac + openmpi only
- Compile mumps as a shared library HOT 2
- build with MPI HOT 5
- broken kwant builds because of MPI build HOT 8
- mumps and mumps-mpi cannot be installed side-by-side HOT 7
- build examples using test.source_files
- Only one variant of mumps-mpi is created HOT 5
- Master branch is failing HOT 6
- including mpi.h? HOT 12
- build with OpenMP HOT 1
- Windows package installs .lib files in Library\bin\
- Support for newer versions? HOT 2
- Needs pin for metis HOT 11
- @conda-forge-admin, please add user @traversaro HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mumps-feedstock.