lab-cosmo / metatensor Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 16.0 10.55 MB

Self-describing sparse tensor data format for atomistic machine learning and beyond

Home Page: https://docs.metatensor.org

License: BSD 3-Clause "New" or "Revised" License

CMake 2.46% Python 59.03% Rust 18.53% C 2.05% Shell 0.30% Dockerfile 0.03% C++ 16.92% Julia 0.69%

metatensor's People

Contributors

Stargazers

Watchers

Forkers

davidetisi zhangylch picocentauri m-stack-org luthaf cbenmahm ecignoni bananenpampe soniasalomoni victorprincipe arthur-lin1027 gitarthakalita aqhali agoscinski rudolfweeber

metatensor's Issues

Rename `TensorBlock` and `TensorMap`

In the last dev meeting, we talked about renaming the two core classes TensorBlock and TensorMap.

Naming our classes Tensor might confuse users because they think that these classes are tensors and they can apply standard operations on them. Up to a certain point, this is true but due to gradients, there is an additional complexity layer that goes beyond standard tensor operations.

We concluded that TensorBlock should be renamed to Block

However, for the TensorMap, we haven't found a git name. Simply calling it Map is a bad idea because it clashes with the function map. In my view, the TensorMap is a classical map for Blocks. Therefore, we could also name it BlockMap or BlockFrame (in a pandas spirit). If somebody has other ideas we can also discuss them here.

Counterintuitive parameters for `sum_over_samples`

Perhaps it's just me, but I find the syntax for sum/mean over samples to be very confusing.
If I have a function sum_over_samples with a parameter sample_names I would expect it to sum over the indices given by sample_names - instead effectively it sums over all other indices "gathering" by sample_names.

Perhaps calling the argument reduced_samples would be enough, but IMO - in analogy with what numpy does for sum and mean where you specify with axis the axis to sum over - I would keep the name but change the logic so you specify the sample names to sum over, and it matches all the others.

Thoughts?

Better API for labels in Rust

Right now users need to manually check and ensure that label values and names match their expectation. It would be great to be able to use named structure fields instead. This could work by adding a define_labels! macro

equistore::define_labels!(StructureCenter, [structure, center]);

// the macro expands to something like
#[repr(C)]
pub struct StructureCenter {
     structure: i32,
     center: i32,
}

impl From<[i32; 2]> for StructureCenter {
    fn from(value: [i32; 2]) -> StructureCenter {
        // check that the names match
        // ...
 
        // and transmute
        unsafe { std::mem::transmute(value) }
    }
}

That could then be used something like

for sample in block.samples.iter_as::<StructureCenter>() {
    println!("{} {}", sample.structure, sample.center);
}

equistore.operation.sum_over_values() discrepancy

There is a discrepancy between an old and new commit of equistore:

Using commit e85ecba:

block = equistore.TensorBlock(
    values=np.array([
        [1, 2, 4],
        [3, 5, 6],
        [7, 8, 9],
        [10, 11, 12],
    ]),
    samples=equistore.Labels(
        ["structure", "center"],
        np.array([
            [0, 0],
            [0, 1],
            [1, 0],
            [1, 1],
        ]),
    ),
    components=[],
    properties=equistore.Labels(
        ["properties"], np.array([[0], [1], [2]])
    ),
)
keys = Labels(names=["key"], values=np.array([[0]]))

tensor = equistore.TensorMap(keys, [block])

tensor_sum = equistore.operations.sum_over_samples(tensor, ["center"])

print(tensor_sum.block(0).values)
## [[ 8 10 13]
 ##[13 16 18]]

Using master commit:

block = equistore.TensorBlock(
    values=np.array([
        [1, 2, 4],
        [3, 5, 6],
        [7, 8, 9],
        [10, 11, 12],
    ]),
    samples=equistore.Labels(
        ["structure", "center"],
        np.array([
            [0, 0],
            [0, 1],
            [1, 0],
            [1, 1],
        ]),
    ),
    components=[],
    properties=equistore.Labels(
        ["properties"], np.array([[0], [1], [2]])
    ),
)
keys = equistore.Labels(names=["key"], values=np.array([[0]]))

tensor = equistore.TensorMap(keys, [block])

tensor_sum = equistore.operations.sum_over_samples(tensor, samples_names="center")

print(tensor_sum.block(0).values)
##[[ 4  7 10]
 ##[17 19 21]]

Can you confirm that this change is correct and anticipated? What should the proper result be?

Documentation points to the old commit as being correct, so looking for clarity.

@rosecers also working with me here -- tagging so she gets updates.

Hi Guillaume! - Rosy

equistore operation dot does not support gradients

Is there a reason why it is not supported? Because the error message specifically talks about the second tensor map
https://github.com/lab-cosmo/equistore/blob/e65cd07d2f0d5aefa41a7b2beade835b9b1dc03b/python/src/equistore/operations/dot.py#L49
and then the values and gradients are dotted together
https://github.com/lab-cosmo/equistore/blob/e65cd07d2f0d5aefa41a7b2beade835b9b1dc03b/python/src/equistore/operations/dot.py#L63
I cannot make sense out of it

Update Rascaline hypers in `python/tests/data/README.md` documentation to reflect new format.

Currently, in the file python/tests/data/README.md, the Rascaline hypers are reported in the old format:

cutoff=3.5,
max_radial=4,
max_angular=4,
atomic_gaussian_width=0.3,
radial_basis={"Gto": {}},
center_atom_weight=1.0,
gradients=True,
cutoff_function={"ShiftedCosine": {"width": 0.5}},

But should be updated to reflect the new format, along the lines of:

hypers = {
        "cutoff": 3.5,
        "max_radial": 4,
        "max_angular": 4,
        "atomic_gaussian_width": 0.3,
        "radial_basis": {"Gto": {}},
        "cutoff_function": {"ShiftedCosine": {"width": 0.5}},
        "center_atom_weight": 1.0,
    }

while pointing out that the 'positions' and 'cell' gradients were calculated, or including the python code, i.e.:

calculator = SphericalExpansion(**hypers)
descriptor = calculator.compute(frames, gradients=['positions', 'cell'])

calculator = SoapPowerSpectrum(**hypers)
descriptor = calculator.compute(frames, gradients=['positions', 'cell'])

Though the exact values of the hypers should be checked, and potentially the descriptors should be re-calculated.

Operations on `Labels`, `TensorBlock`: public, privates, mixed?

Currently, operations that act on TensorBlocks and Labels are (at least) for me inconsistently publicly available or only private i.e for TensorBlocks:

public: allclose_block, slice_block, ...
pirvate: _join_blocks, _dot_block, _solve_block, ...

and for Labels:

public: unique
private: _join_labels

I think we should either make them consistently public or private. This also important for #116 where the idea came up that the new unique function does take Labels and not a TensorMap.

For a consistent user experience it might make sense that all our operations take only TensorMaps and we try to hide Blocks and Labels under the hood.

Ideas, Opinions?

Adding a way to evaluate == for TensorMaps and TensorBlock

I talked today with @Luthaf about this issue.
It would be helpful to have a way of evaluating the operator == between two TensorMaps and TensorBlocks. One main issue can come from the fact that we need a new _dispach function to handle the case if having the values in a np.array or a pythorch.tensor.

Add overview and basic tutorials to the documentation

The documentation only contains the API reference for now, we should add an overview of the package (why do we need this?) and the types (how does a TensorMap looks like?); as well as some basic tutorials on how to use this package.

An interesting tutorial could be to create an equivariant Linear model.

Can we standardize the access of the arrays gradients and values?

For the implementations in equisolve we are writing a lot of functions that can access the values or the gradients of a tensor block and then do something with it. Because the values and gradients are accessed differently, we have to write this kind of code

data_keys =  [...] # allowed are 'values' and everything that is in gradients, e.g. 'positions' or 'cell'
for data_key in data_keys:
    if data_key == "values":
        property_len = len(tensor_block.values)
    else:
        property_len = len(tensor_block.gradient(data_key).data)

the handling would be simpler if we could access values and gradients the same way

data_keys =  [...] # allowed are 'values' and everything that is in gradients, e.g. 'positions' or 'cell'
for parameter_key in data_keys:
    property_len = len(tensor_block.data(parameter_key))

This should work similar as tensor_block.gradient(parameter_key). What might be confusing is that data here is a method and in on the gradients level it is member variable. But this is related to the inconsistency between obtaining the raw array values of a tensor block and of a gradient tensor block (one uses values but for gradents one uses data).

Should `slice()` have the arg `axis`, like operations `split()`, and `join()`?

Currently, the operations split and join take the axis argument to specify whether splitting or joining should be performed over the "samples" or "properties" axis. The slice function is written to allow users to slice over both "samples" and "properties" in the same function call, using the args slice(tensor, samples=Labels(...), properties=Labels(...)).

After discussion with @PicoCentauri , we are wondering if the the arguments should be made consistent, such that slice only works over a single axis at once, specifying the axis with the axis argument. This would be a breaking change, and mean that users would have to make 2 separate function calls to slice along samples and properties. However, the change feels more numpy-esque.

Old behaviour:

from equistore.operations import slice
...
sliced_tensor = slice(tensor, samples=samples_labels, properties=properties_labels)

Proposed new behaviour (with the same result as above):

sliced_tensor_along_samples = slice(tensor, axis="samples", labels=samples_labels)
sliced_tensor_along_properties = slice(sliced_tensor_along_samples, axis="properties", labels=properties_labels)

To Do

Change the args of all functions in slice.py to have axis and labels
Update function calls in tests/.../slice.py
For the test class TestSliceBoth, update the single function call to now be 2 function calls
Bonus test: Check that slicing samples then properties gives the same result as slicing properties then samples.

Implement Python's pickle protocol for TensorMap

This would give us a better integration with the overall Python ecosystem.

This should be done using equistore.io.load/equistore.io.save, but the functions themselves might require a bit of rework to write to (resp. load from) in-memory buffer instead of writing to (resp. loading from) the filesystem.

Broadcasted TensorMap/TensorBlock x scalar operations (add, minus, multiplication, division)

Because issue #101 is too large as one feature request, I split it up into chunks. This is the first chunk.

So we have this use case

# A is TensorMap
a = 10
A = add(A, a)

lets say we use an add function for this, then it would look like

def add(A, a):
    _A = copy(A)
    for key, block in _A.blocks():
        block.values += a
        for key, grad in block.gradients():
            grad.data += a
    return _A

QUESTION/PROBLEM I think one conceptual problem is how should we differ between a operation considering gradient and just broadcasting operation as shown above

So the above example would be fully broadcasting operation and the below one would recognize the gradients

def add(A, a):
    _A = copy(A)
    for key, block in _A.blocks():
        block.values += a
        # no operation applied on gradients since ∇(A+a) = ∇A
    return _A

Maybe this can be solved with an additional argument gradient_operation

def add(A, a, gradient_operation=False):
    _A = copy(A)
    for key, block in A.blocks():
        block.values += a
        if not(gradient_operation):
            for key, grad in block.gradients():
                grad.data += a
    return _A

QUESTION/PROBLEM how to do inplace operations?

The above ignores inplace operations, which would be required at some point, but I don't fully understand how numpy's inplace operation work

A += 10 # seems to be inplace
B = A + 10 # not this inplace
A = A + 10 # is this inplace? I think not
np.add(A, 1, out= A) # seems to also not be inplace, but works different than the above ones

in the end we would like something as add(A, scalar, out=A), to be inplace.

`Labels` object should have a sliceable `values` attribute of type np.ndarray

Current Behaviour

When creating a Labels object, the user needs to pass both names and values as parameters.

import numpy as np

from equistore import Labels

a = Labels(names=('structure', 'atom'), values=np.array([(0, 1), (0, 6), (0, 7), (0, 8), (0, 9)]))

If we want to reaccess the names, it is as simple as calling:

a.names
>>> ('structure', 'atom')

which returns the tuple we passed as a parameter when creating the object.

However, in order to access the values we passed, we must call a.asarray() in order to receive a sliceable numpy array.

# No 'values' attribute
a.values
>>> AttributeError: 'Labels' object has no attribute 'values'

# Sliceable Array
a.asarray()[:,1]
>>> Labels([1, 6, 7, 8, 9], dtype=int32)

Proposed Behaviour

I think it would make more sense to be able to have an attribute Labels.values, consistent with the name of the parameter used to instantiate the Labels object, that returns a sliceable array, such as:

a.values[:,1]
>>> Labels([1, 6, 7, 8, 9], dtype=int32)

This would be more intuitive for the un-familiar user and not require (in my experience at least) searching through the attributes of the Label object for the one that would return me a sliceable array - such as trying but failing to slice or get relevant values from a, a.base, a.data, or having to manipulate the objects returned by them to make them sliceable.

Unhelpful error message when trying to write a ``TensorBlock`` to file

When trying to write an TensorBlock object to file using the equistore.io.save() function, the following error message is given:

ArgumentError: argument 2: <class 'TypeError'>: expected LP_eqs_tensormap_t instance instead of LP_eqs_block_t

Example

import numpy as np

from equistore import io, TensorBlock

# Create an example TensorBlock
block = TensorBlock(values=np.random.random((2, 2)),
                    samples=Labels(names=('structure','center'), values=np.array([[0, 1], [0, 2]])),
                    components=[],
                    properties=Labels(names=('n',), values=np.array([(0,), (1,),])),
                   )

# Attempt to save to file
io.save('tensorblock.npz', block)
>>> ArgumentError: argument 2: <class 'TypeError'>: expected LP_eqs_tensormap_t instance instead of LP_eqs_block_t

Desired Behaviour

The error message could be more helpful by explaining more explicitly / less cryptically that only TensorMap objects, and not TensorBlock objects, can be saved.

Error message when adding gradient with single Labels object as components argument is not clear, doc is wrong

When doing

mean_block.add_gradient(
    parameter,
    mean_values.reshape(1,1,-1),
    Labels(['sample'], np.array([[0]], dtype=np.int32)),
    Labels.single()
)

The error message is

.../equistore/block.py:221, in TensorBlock.add_gradient(self, parameter, data, samples, components)
    219 components_array = ctypes.ARRAY(eqs_labels_t, len(components))()
    220 for i, component in enumerate(components):
--> 221     components_array[i] = component._as_eqs_labels_t()
    223 data = ArrayWrapper(data)
    225 self._lib.eqs_block_add_gradient(
    226     self._ptr,
    227     parameter.encode("utf8"),
   (...)
    231     len(components_array),
    232 )

AttributeError: 'numpy.void' object has no attribute '_as_eqs_labels_t'

The solution is to put the components Labels into a list

mean_block.add_gradient(
    parameter,
    mean_values.reshape(1,1,-1),
    Labels(['sample'], np.array([[0]], dtype=np.int32)),
    [Labels.single()]
)

The doc also seems wrong here: the text for components looks exactly like for samples

help(X.block().add_gradient)
...python
    :param samples: labels describing the gradient samples
    :param components: labels describing the gradient components

Create a utility function to convert ``TensorMap`` block values to type ``torch.tensor``

In order to be compatible as input to PyTorch models, the TensorBlock.values tensors in each block of a TensorMap need to be of type torch.tensor, instead of a ndarray, i.e. numpy.ndarray or _RustNDArray.

Desired Behaviour

There should be a utility function accessible through the Python API, perhaps in a new file such as equistore/python/src/equistore/operations/tensormap_to_torch.py, that is used in the following way:

new_tensormap = tensormap_to_torch(tensormap)

Implications

This necessarily has to be implemented as part of the Python API as there is no Rust-side solution to creating torch.tensor objects. This means that the above function tensormap_to_torch() will have to return a new TensorMap object, thus creating a copy of the object in memory. However, doing it this way round means that all the infrastructure that builds descriptors need not be written to be compatible with PyTorch; only at the step before building a model do we need to convert our blocks to torch.tensor objects.

Allow `names` paramter in `Labels` to be a string or raise a more clear error message

Expected behavior

When creating a Label instance with only one name I would write

Labels(names="species", values=np.array([(1,)]))

and should get

Labels([(1,)], dtype=[('species', '<i4')])

Actual behavior

I get an ValueError raised by labels.py in line 75 stating that names parameter must have an entry for each column of the array. The reason is that the code is treating the string a list and assumes that there have to be also 7 columns in my array.

Solution I would like

Either we convert single string inputs to list (as we do it in the operations) or we raise a more clear error message stating that the input type is wrong.

Additionally, I would rephrase the error message to something like names parameter has 7 entries but the array only has 1 column(s).

Join function for two `TensorMap`s

With LODE I have intensive calculations to perform. This requires pre-computing and saving the TensorMaps to quickly play around with some models based on the pre-computed values. To use as much as possible computer power I split my dataset into many chunks. After pre-computing these chunks, it would be nice to join the individual TensorMaps into one for convenient model training.

However, AFAIK equistore is currently missing such a join function merging two TensorMaps into a new object. Below is a minimal example in Python of the user facing API.

import ase.io
import equistore
from rascaline import SphericalExpansion

frames = ase.io.load("dataset.xyz")

calculator = SoapPowerSpectrum(
    cutoff=3.0,
    max_radial=6,
    max_angular=4,
    atomic_gaussian_width=0.3,
    center_atom_weight=1.0,
    radial_basis={"Gto": {},},
    cutoff_function={"ShiftedCosine": {"width": 0.5},}
)

descriptor_1 = calculator.compute(frames[:len(frames)//2])
descriptor_2 = calculator.compute(frames[len(frames)//2:])

# define a join function, that joins two TensorMaps
descriptor = equistore.join(descriptor_1, descriptor_2)

# Overload the `+` operator and allow
descriptor = descriptor_1 + descriptor_2

Pinging @DivyaSuman14 and @jwa7 since they were part of the discussion.

Segmentation fault when accessing TensorBlock that doesn't exist

This bug was discovered when working through the notebook tutorial-equivariant-models.ipynb in the lab-cosmo/equistore-examples repo. I have included some MWE code below to illustrate the bug. This code should be executed in the relative directory equistore-examples/ as it depends on an example water dataset found in the equistore-examples/data/ directory.

Set Up

The dataset is loaded, hypers defined, and descriptors computed. Then, we can access a block of desired angular momentum, species center, and species neighbour as follows:

import ase.io
import equistore
from rascaline import SphericalExpansion

frames = ase.io.read("data/chemrev_nuprime-theta-grid_computed.xyz", ":")

hypers = {
    "cutoff": 2.0,
    "max_radial": 6,
    "max_angular": 4,
    "atomic_gaussian_width": 0.2,
    "radial_basis": {"Gto": {}},
    "cutoff_function": {"ShiftedCosine": {"width": 0.5}},
    "center_atom_weight": 1.0,
}

calculator = SphericalExpansion(**hypers)
descriptor = calculator.compute(frames, gradients=['positions'])

# Define block with l=2, center=O, neighbour=H
block1 = descriptor.block(spherical_harmonics_l=2,species_center=8,species_neighbor=1)

If we want to move the species_neighbour index from keys to properties, we can do so with the key_to_properties() function:

# Move the index from keys to properties
descriptor.keys_to_properties('species_neighbor')

Expected Behaviour

Because species_neighbour is no longer a valid key, we cannot define a block as above with the specification of species_neighbour. Doing so throws an error, as expected and desired:

# Define block with l=2, center=O, and invalid parameter neighbour=H
block2 = descriptor.block(spherical_harmonics_l=2,species_center=8,species_neighbor=1)

>>> EquistoreError: invalid parameter: 'species_neighbor' is not part of the keys for this tensor

Instead, a block must be accessed only by specifying the remaining valid keys:

block2 = descriptor.block(spherical_harmonics_l=2,species_center=8)

Unexpected Behaviour

If a block is defined, i.e. block1, and then its index is moved from keys to properties (as shown above), the block previously assigned to the variable block1 no longer exists. If attempts are made to access this block, instead of an error being thrown a Segmentation Fault occurs at runtime and, if executing the code in a notebook, this result in the kernel dying.

# Define block with l=2, center=O, neighbour=H
block1 = descriptor.block(spherical_harmonics_l=2,species_center=8,species_neighbor=1)

# Move the index from keys to properties
descriptor.keys_to_properties('species_neighbor')

# Try to access block
block1

# >>> Kernel dies due to segmentation fault

Ideally, such user behaviour should result in a well-reported error as opposed to a runtime fault.

The path to TorchScript

Here is what I currently see as the path to enabling the use of equistore within TorchScript.

For reference, TorchScript is a small, Python-like language that is implemented inside the torch library (in C++). It can be extended with new operations/classes in Python (if the new operation/class only use the TorchScript-compatible subset of Python); or in C++ (completely custom operations/classes are possible).

To enable using equistore with TorchScript, we need to add a couple of custom classes (TensorMap, TensorBlock, Labels) to TorchScript, and we need to implement these in C++ (since we need to call into the core equistore library). This mean there will be two version of each class available from Python: the current, pure Python ones, and the TorchScript ones. See point 4 below for how to handle this.

Once we have the basic classes and member functions (such as keys_to_properties) available from TorchScript, we'll have to make sure everything in equistore.operations is available from

0) Clean separation of the core shared library

A pre-requisite to the TorchScript version of equistore (and the ability to use this version together with rascaline) is to ensure only one version of the core functions/structs exists, by splitting it into a standalone shared library that can then be loaded by both rascaline and TorchScript.

pull request: #85

1) Write the new TorchScript classes

We can then start writing C++ code to create new classes inheriting from torch::CustomClassHolder, and registering these classes with TorchScript. This means re-wrapping all classes and exposing all functions in a TorchScript-compatible way (e.g. all data uses torch::Tensor, …). This will first leave in a separate package (equistore_torch) in this repository. This can happen as a set of PR to a long-lived branch if we want to parallelize the work

create a long-lived branch with basic infrastructure to build and install the code #233
implement the new classes: #263
- Labels
- Implement equistore::DataArrayBase for torch::Tensor
- TensorBlock/Proxy for gradients
- TensorMap
merge the long-lived branch
Implement serialization for the torch version, this requires the same core functions as #94.

2) Change how `Labels` works

Because all the data needs to be either a custom class or one of TorchScript type, we will have a problem with Labels. Right now, these use numpy's structure arrays to carry around the names of the columns in Labels and allow indexing with these columns names. Unfortunately, structured data types are not supported with torch tensors, meaning we will need to have a different API for TorchScript.

We could have a different API for standard Python & TorchScript, but I think it would be better to be able to drop in one or the other version in the same Python code, to make it easier to move between them (see point 4). So it might be better to refactor the Python version of Labels to match what the TorchScript version will be. Instead of inheriting from np.ndarray, the new Labels should be a standalone class, with a values and names attributes (fixing #63), with some functions to support fancy indexing & slicing of the Labels.

change the pure Python Labels class
- design the class and update the core tests
- update all operations to use the new Labels API
make sure the TorchScript and pure Python version of Labels agree

The first part of this could be done in parallel to point 1.

3) Ensure `equistore.operations` is usable from TorchScript

Once we have all the base classes available in TorchScript, we'll need to go through the operations one by one, and add a test that they can be exported to pure TorchScript, without any Python.

4) Select between pure Python & TorchScript versions of classes

Before or at the same time as point 3, we will also have to pick how the user should interact with the TorchScript version of the code. I can basically see two (and a half) API:

# version A, "explicit"
from equistore import TensorMap  # gives the pure Python class
from equistore.torch import TensorMap  # gives the TorchScript class

# version B.1, "implicit"
from equistore import TensorMap  # if torch can be imported, gives the TorchScript class; otherwise gives the pure Python class

# version B.2, "a bit less implicit", not sure if this is possible
from equistore import TensorMap  # gives the pure Python class

# somehow tell pytorch to replace TensorMap with `equistore.torch.TensorMap` when exporting to TorchScript.

This is a decision we can mostly delay until step 1 is done.

BUG reduction operation when sum/mean over all samples

Now the following code gives an error:

import numpy as np
from equistore import TensorBlock, TensorMap, Labels
import equistore.operations as fn

block_1 = TensorBlock(
            values=np.array(
                [
                    [1, 2, 4],
                    [3, 5, 6],
                    [-1.3, 26.7, 4.54],
                ]
            ),
            samples=Labels(
                ["samples"],
                np.array(
                    [[0], [1], [2]],
                    dtype=np.int32,
                ),
            ),
            components=[],
            properties=Labels(
                ["properties"], np.array([[0], [1], [5]], dtype=np.int32)
            ),
        )

keys = Labels(
            names=["key_1", "key_2"], values=np.array([[0, 0]], dtype=np.int32)
        )
X = TensorMap(keys, [block_1])
sum_X = fn.sum_over_samples(X, samples_names=["samples"])

Gets an error like:

---------------------------------------------------------------------------
EquistoreError                            Traceback (most recent call last)
<ipython-input-7-17d5c8a12820> in <module>
     23         )
     24 X = TensorMap(keys, [block_1])
---> 25 sum_X = fn.sum_over_samples(X, samples_names=["samples"])

~/anaconda3/lib/python3.8/site-packages/equistore/operations/reduce_over_samples.py in sum_over_samples(tensor, samples_names)
    212     """
    213 
--> 214     return _reduce_over_samples(
    215         tensor=tensor, samples_names=samples_names, reduction="sum"
    216     )

~/anaconda3/lib/python3.8/site-packages/equistore/operations/reduce_over_samples.py in _reduce_over_samples(tensor, samples_names, reduction)
    141     for _, block in tensor:
    142         blocks.append(
--> 143             _reduce_over_samples_block(
    144                 block=block,
    145                 remaining_samples=remaining_samples,

~/anaconda3/lib/python3.8/site-packages/equistore/operations/reduce_over_samples.py in _reduce_over_samples_block(block, remaining_samples, reduction)
     58         )
     59 
---> 60     result_block = TensorBlock(
     61         values=values_result,
     62         samples=Labels(

~/anaconda3/lib/python3.8/site-packages/equistore/block.py in __init__(self, values, samples, components, properties)
     62             properties._as_eqs_labels_t(),
     63         )
---> 64         _check_pointer(self._ptr)
     65 
     66     @staticmethod

~/anaconda3/lib/python3.8/site-packages/equistore/status.py in _check_pointer(pointer)
     46             raise EquistoreError(last_error()) from e
     47         else:
---> 48             raise EquistoreError(last_error())
     49 
     50 

EquistoreError: internal error (this is likely a bug, please report it): attempt to divide by zero

The error is related to how it construct the new samples Label now that it should be empty

Translate setup.cfg into pyproject.toml

Once we dropped support for Python 3.6 and 3.7 we can move everything from the setup.cfg into the pyproject.toml as suggested in #88.

Introduce a consistent and clear format for function docstrings

Re: the ML development meeting, let's decide on a consistent format for function docstrings, using @PicoCentauri 's suggestion as a starting point. Then, once agreed, we can do an overhaul of all function docstrings, then enforce the format in subsequent PRs.

Add finite difference tests to all `operations`

This should be able to catch most mistakes regarding gradients. It would either require to add a dependency on rascaline in tests, or write a small calculator in pure Python to be used in tests. I think the second option is best and should be relatively simple.

Better examples when joining_labels

We have to better explain how labels are joined. The current state leads to a lot of confusion as discussed in #107:

I still dont really understand case 3 of the join operation, but I am okay to do this in a separate PR, might be good to add examples in this separate PR that make it clear. From my side approve.

Originally posted by @agoscinski in #107 (review)

Add a `repr` function to TensorMap and TensorBlock

Right now we get not very informative output:

>>> tensor
<equistore.tensor.TensorMap at 0x136bcd6f0>

>>> tensor.block(0)
<equistore.block.TensorBlock at 0x1044768c0>

Suggested by @DavideTisi

Reusing TensorBlock for creating a TensorMap causes segmentation fault

When I reuse the TensorBlock twice for creating a TensorMap (that happens quite easily when playing around with TensorMaps in a notebook, then my notebook crashes.

# block_1 = some Tensor definition ...
TensorMap(Labels.single(), [block_1])
TensorMap(Labels.single(), [block_1]) <--- causes segmentation fault

I think it is unrelated to the TensorBlock definition, but here the whole script

from equistore import TensorMap, TensorBlock, Labels
import numpy as np

block_1 = TensorBlock(
    values=np.random.rand(4, 2),
    samples=Labels(
        ["sample", "structure"], np.array([[0,0], [1,1], [2,2], [3,3]], dtype=np.int32)
    ),
    components=[],
    properties=Labels(["properties"], np.array([[0], [1]], dtype=np.int32)),
)
block_1.add_gradient(
    "positions",
    data=np.random.rand(7, 3, 2),
    samples=Labels(
        ["sample", "structure", "center"],
        np.array([[0, 0, 1], [0, 0, 2], [1, 1, 0], [1, 1, 1], [1, 1, 2], [2, 2, 0], [3, 3, 0]], dtype=np.int32)
    ),
    components=[Labels(["direction"], np.array([[0], [1], [2]], dtype=np.int32))],
)

TensorMap(Labels.single(), [block_1])
TensorMap(Labels.single(), [block_1])

copy function for tensor map

I think that would be just a simple helper function, because the copy function for tensor block already exists. So one needs to only create one loop. I am just not sure where this should be implemented. Python API or in the core?

method `keys_to_samples('center_species')` throws index error for pyLODE

The keys_to_samples method throws an IndexError when called with a pyLODE expansion computed from a system with more than one atom. I started looking into this together with @Luthaf
and we thought that something might goes crazy in

https://github.com/Luthaf/equistore/blob/a8cc33b6f4a7815934628f60212620a75b6108d2/src/tensor/keys_to_samples.rs#L116-L149

The error does not occur for a single species as well as normal SOAP. I am running Python 3.8.10 and numpy version 1.22.3.

Code to reproduce the behavior

import ase.io
from utils.models.soap import compute_power_spectrum
from utils.pylode import PyLODESphericalExpansion

frames = ase.io.read("nacl_aq_short.xyz", index=":")

hypers = {
    'smearing': 1,
    'max_radial': 1,
    'max_angular': 0,
    'cutoff_radius': 1.,
    'potential_exponent': 1,
    'radial_basis': 'gto',
    'compute_gradients': False,
}

calculator = PyLODESphericalExpansion(hypers)
descriptor = calculator.compute(frames)
ps = compute_power_spectrum(descriptor)
ps.keys_to_properties(['neighbor_species_1', 'neighbor_species_2'])
ps.keys_to_samples('center_species')

IndexError                                Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/equistore/utils.py:27, in catch_exceptions.<locals>.inner(*args, **kwargs)
     26 try:
---> 27     function(*args, **kwargs)
     28 except Exception as e:

File ~/.local/lib/python3.8/site-packages/equistore/data.py:245, in _eqs_array_move_samples_from(this, input, samples_ptr, samples_count, property_start, property_end)
    244 properties = slice(property_start, property_end)
--> 245 output[output_samples, ..., properties] = input[input_samples, ..., :]

IndexError: index 124 is out of bounds for axis 0 with size 124

Attachments

nacl_aq_short.xyz.zip

equistore.version

Currently, we have a version info when installing equistore via pip etc. However, there is no __version__ attribute when using equistore from the Python interpreter.

We should add this attribute. __version__ helps a lot with debugging production code, especially when using rascaline together equistore which could lead to some weird version clash behavior.

A quick and dirty solution is the following. We are pulling the version from the rust code using

https://github.com/lab-cosmo/equistore/blob/e65cd07d2f0d5aefa41a7b2beade835b9b1dc03b/setup.py#L91-L98

in setup.py. During this step we can write a version.py file to the source and import this in our __init__.py.

Another maybe cleaner would be providing a version string from rust over c to Python.

Labels for `center_species` and `neighbor_species` should be chemical symbols not atomic numbers

Descriptor labels (at least for SOAP and LODE), labeling different blocks currently are tuples of integers: `('spherical_harmonics_l', 'center_species', 'neighbor_species'), where a species label is the atomic number of the corresponding species.

To distinguish the spherical harmonic label (clearly integers) from species labels, I suggest using strings of chemical symbols for the two species labels on the user-facing side. Tuples of an integer and two strings make label interpretation and selection much easier for new users. Internally, one can map these to integers again.

Advanced indexing of tensor blocks

Adding index access functionalities which are inspired by numpy

indices = np.where(X == something)[0]
X[indices]

The position function already returns one index

index = tensor_block.properties.position((0,1))

But something in a similar spirit as the blocks_matching function would be nice

indices = tensor_block.properties.position(Labels(names=["n"], values=[[0]]) # returns a tensor block with all properties that fulfill n=0 in the Labels

Already talked with @Luthaf about this. Here details from him which might be important for an implementation

The function is called from here, with a Labels for selection https://github.com/lab-cosmo/equistore/blob/e65cd07d2f0d5aefa41a7b2beade835b9b1dc03b/python/src/equistore/tensor.py#L266-L271
(i.e. to get all n=0, you would give Labels(names=["n"], values=[[0]])
It ends up here after passing through C: https://github.com/lab-cosmo/equistore/blob/e65cd07d2f0d5aefa41a7b2beade835b9b1dc03b/src/tensor/mod.rs#L190, so this is the function we could extract out of the TensorMap and make available for arbitrary Labels (something like Labels::matching(Labels selection).
We are also doing something similar in rascaline, so standardizing everything in equistore would be nice

A function like `equal(tensor, only_metadata=True)`, but with control over the metadata axes checked

A key step in the ML workflow is checking that the metadata between input and output TensorMaps is exactly equivalent in keys, and along the samples and components axes. However, the properties should not be checked, as the relationship between different input and output features is the thing being learned.

There should be a function, similar to equal, where the user can control which metadata Labels should be checked. Maybe this should be a modification to equal, where flags turn off checking of specified axes:

from equistore.operations import equal
...
assert equal(tensor_1, tensor_2, only_metadata=True, do_not_check=["properties"])

or a new function completely:

from equistore.operations import equal_metadata
...
assert equal_metadata(tensor_1, tensor_2, metadata_to_check=["keys", "samples", "components"])

Some potentially useful operations, open for discussion

These suggestions are motivated by the desire to have convenience functions that do the following jobs and eliminate developer time hard coding them. This issue aims to be a starting point for discussion; the function names, exact functionalities, whether they are in place or not, or indeed whether they are actually useful (!) etc., are up for debate. I'm sure there are also some technicalities with memory allocation and immutability.

1. `insert` or `update`

A function to update, perhaps in a similar style to update with Python dictionaries, a TensorMap with new key/block pairs, provided the new keys are not currently in the existing TensorMap

from equistore.operations import insert_blocks

tensor = io.load(...)

# Define new blocks to be inserted
blocks_to_insert = [TensorBlock(...), TensorBlock(...)]

# Insert the blocks
new_tensor = insert_blocks(tensor, keys=Labels(names=["key_1", "key_2"], values=np.array([[5, 8],]), blocks=blocks_to_insert)

2. `drop_blocks`

Suppose I want to remove a block by its key name. Assuming this returns a new tensor:

from equistore.operations import drop_blocks

tensor = equistore.io.load(...)

# tensor with 12 blocks
tensor
>>>
TensorMap with 12 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
                  2                   1
               ...
                  3                   8
                  4                   8
                  5                   8

new_tensor = drop_blocks(
    tensor, 
    keys=Labels(
        names=["spherical_harmonics_l", "species_center"],
        values=np.array([[2, 1], [3, 8]]),
    )
)

# Now has 10 blocks
new_tensor
>>>
TensorMap with 10 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
               ...
                  4                   8
                  5                   8

The drop_blocks function presented here could be viewed like a kind of slice but for key/block pairs. Should then it work like slice, specifying the keys that should be kept, as opposed to the ones that should be dropped?

3. `merge` or `unify` or `join_along_keys`

Similar to join() but doesn't change the blocks themselves or join along a block axis. Just takes 2 TensorMaps and returns a new larger TensorMap whose keys are just the union of the keys of the original TensorMaps. Assumes that the intersection in key names between the 2 (or in principle more) TensorMaps is empty.

...
# First tensor with a certain set of keys
tensor_1
>>>
TensorMap with 7 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
               ...
                  5                   8

# second tensor with a different set of keys
tensor_2
>>>
TensorMap with 10 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  1                   1
               ...
                  4                   8

# merged tensor with keys and blocks from both
merged_tensor = merge(tensors=[tensor_1, tensor_2])
merged_tensor
>>>
TensorMap with 17 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
               ...
                  4                   8
                  5                   8

4. `drop_metadata_names`

a) keys

Suppose that I have dropped some blocks in my TensorMap such that some of the values of the key names are the same and therefore redundant.

As an example: generating a lambda-SOAP descriptor for both even (+1) and odd (-1) inversion parities. If all blocks with uneven parity are dropped we are left with blocks that have a redundant "inversion_sigma"=+1 key name. We therefore might want to drop the key name "inversion_sigma". This function would assume that the value for the chosen name(s) is equivalent for all keys.

from equistore.operations import drop_metadata_names

tensor = io.load(...)

# tensor with 10 blocks but a redundant key name
tensor
>>>
TensorMap with 10 blocks
keys: ['inversion_sigma', 'spherical_harmonics_l' 'species_center']
                  1                   0                   1
                  1                   1                   1
               ...
                  1                   4                   8
                  1                   5                   8

# remove redundant key name
cleaner_tensor = drop_metadata_names(tensor, axis="keys", names=["inversion_sigma"])
cleaner_tensor
>>>
TensorMap with 10 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
               ...
                  4                   8
                  5                   8

b) samples/properties

Now suppose I have a TensorMap where I have used slice to include, for instance only 1 structure. The following example is one I have encountered when using external codes (i.e. Q-Stack/equio.py) and have needed to drop the structure label from my samples axis. Again this assumes that the values of the named metadata along the given axis is equivalent for all blocks.

from equistore.operations import drop_metadata_names
...
# only structure index zero exists in this (and all) blocks
tensor.block(0).samples
>>>
Labels([( 0, 1), ( 0, 2), ( 0, 3), ( 0, 4)], dtype=[('structure', '<i4'), ('center', '<i4')])

new_tensor = drop_metadata_names(tensor, axis="samples", names=["structure"])

# remove the redundant samples name "structure"
new_tensor.block(0).samples
>>>
Labels([( 1,), ( 2,), (3,), (4,)], dtype=[('center', '<i4')])

and the equivalent along properties.

5. `insert_metadata_names`

As the inverse of drop_metadata_names, suppose we want to introduce either keys/samples/properties names of fixed values into a TensorMap.

a) keys

Suppose I want to use the merge function from above but with 2 TensorMaps with identical metadata. It could be useful to introduce a new key name that distinguishes them, before merging:

...
# first and second tensors have identical keys 
tensor_1
>>>
TensorMap with 10 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
               ...
                  4                   8
                  5                   8

tensor_2
>>>
TensorMap with 10 blocks
keys: ['spherical_harmonics_l' 'species_center']
                  0                   1
                  1                   1
               ...
                  4                   8
                  5                   8

# Insert a new key name to distinguish the 2 TensorMaps
new_tensor_1 = insert_key_names(tensor_1, names=["inversion_sigma"], values=[+1], prepend=True)
new_tensor_2 = insert_key_names(tensor_2, names=["inversion_sigma"], values=[-1], prepend=True)

new_tensor_1
>>>
TensorMap with 10 blocks
keys: ['inversion_sigma', 'spherical_harmonics_l' 'species_center']
                  1                   0                   1
                  1                   1                   1
               ...
                  1                   4                   8
                  1                   5                   8

new_tensor_2
>>>
TensorMap with 10 blocks
keys: ['inversion_sigma', 'spherical_harmonics_l' 'species_center']
                  -1                   0                   1
                  -1                   1                   1
               ...
                  -1                   4                   8
                  -1                   5                   8

# Now merge safely
merged_tensor = merge([new_tensor_1, new_tensor_2])

Where prepend=False would instead append the new key name to the list of key names.

b) samples/properties

When using the join function as currently implemented, the function attempts to resolve metadata conflicts should there be any. If we have 2 TensorMaps with exactly the same metadata, we could first introduce another samples name (the naming of which we have control) to the blocks in each respective TensorMap, with different values, and then join safely.

from equistore.operations import insert_metadata_names
...
tensor_1.block(0).samples
>>>
Labels([( 1,), ( 2,), (3,), (4,)], dtype=[('center', '<i4')])

tensor_2.block(0).samples
>>>
Labels([( 1,), ( 2,), (3,), (4,)], dtype=[('center', '<i4')])

new_tensor_1 = insert_metadata_names(tensor, axis="samples", names=["structure"], values=[0,], prepend=True)
new_tensor_1.block(0).samples
>>>
Labels([( 0, 1), ( 0, 2), ( 0, 3), ( 0, 4)], dtype=[('structure', '<i4'), ('center', '<i4')])

new_tensor_2 = insert_metadata_names(tensor, axis="samples", names=["structure"], values=[1,], prepend=True)
new_tensor_2.block(0).samples
>>>
Labels([( 1, 1), ( 1, 2), ( 1, 3), ( 1, 4)], dtype=[('structure', '<i4'), ('center', '<i4')])

joined_tensor = join([new_tensor_1, new_tensor_2], axis="samples")

Where prepend=False would instead append the new sample name to the list of sample names.

6. inverse operations of `keys_to_x` etc?

Would it be useful to have the inverse of keys_to_properties and keys_to_samples, and components_to_properties? (Not come across a use for these personally, but think I've heard it mentioned before by people)

Let users rename Labels for keys, samples, properties (and perhaps components)

It would be useful to have some TensorMap and TensorBlock class methods that let the user rename Labels, such as keys, samples, properties, and perhaps components too.

This would allow for greater transferability and interoperability between equistore-based workflows, such as those produced by individuals/groups in different fields and with different naming conventions.

For example, a user might convert into a typical equistore-native naming convention from data produced by an outside source in the following way:

from equistore import io, Labels, TensorBlock, TensorMap

descriptor = io.load('descriptor.npz', use_numpy=False)

# i.e. for TensorMap keys
descriptor.keys.names
>>> ('spherical_harmonics_l', 'element')

descriptor.rename_keys(['spherical_harmonics_l', 'species_center']
descriptor.keys.names
>>> ('spherical_harmonics_l', 'species_center')

# i.e. for TensorBlock samples
descriptor.block(0).samples.names
>>> ('mol_id', 'atom_id')

descriptor.rename_samples(['structure', 'atom'])
descriptor.block(0).samples.names
>>> ('structure', 'atom')

Currently, the only solution (AFAIK) is to manually build new TensorMaps with the desired label names by brute-force iteration over every dimension of every block.

Of course, considerations need to be made wrt (im)mutability and memory allocation - i.e. create a new TensorMap and just reassign the variable like descriptor = descriptor.rename_keys() (similar to #58) or just reassign Label names Rust-side.

Move`_check_maps` and `_check_blocks` into public `equal` and `equal_block` functions

In #115 we discussed that the two functions _check_maps and _check_blocks which check if meta data between two TensorMaps/TensorBlocks is the same. Since @Luthaf has a valid point of not testing private function I propose we make them public but change their scope a bit.

The function should also have the option to check the values/gradients and not only the meta data. With this we can create a functions similar to allclose. I think of something like:

def equal(
    tensor1: TensorMap,
    tensor2: TensorMap,
    test_data: bool=True,
) -> bool:
    ...
   
def equal_raise(
    tensor1: TensorMap,
    tensor2: TensorMap,
    test_data: bool=True,
) -> bool:
    ...

def equal_block(
    tensor1: TensorMap,
    tensor2: TensorMap,
    test_data: bool=True,
) -> bool:
    ...

def equal_block_raise(
    tensor1: TensorMap,
    tensor2: TensorMap,
    test_data: bool=True,
) -> bool:
    ...

Benchmarking operations

Performance is a crucial part when performing operations ond equistore objects. This is important for repeating tasks during the training of a model but also when an model is applied during a simulation. To track the performance of our operation functions @ceriottm suggested to add benchmarks.

I support this idea and anybody who would like to start on this can give it a try. I suggest we use something like ASV to also track the speed over over lifetime. But, I am also open for other ideas how we can realize benchmarking.

Give an explicit target to rustc when building libequistore for Python

Otherwise it is pretty easy to have a rustup install targeting x86_64-apple-darwin when the host & Python are expecting aarch64-apple-darwin, or similarly on windows where Python expects MSVC but people can have the GNU toolchain as a default.

It should be possible to detect the required target from setup.py, and pass it through cmake to rustc.

Equistore is too keen on keeping references to tensor blocks, resulting in memory leaks

Code below is an example - any large dataset will make the problem clear. Basically values.sum() returns a _RustNDArray that holds a reference to the parent block, so the memory for the environment-level descriptors is never freed. Wrapping the .sum() in a np.array() solves the problem, but I see no reason why the reference should be kept around.

import numpy as np
import rascaline, equistore, itertools
import ase.io as aseio
import tqdm.notebook as tqdm

frames = aseio.read("./shiftml2_training_structures.xyz", ":")

hypers = {
    "cutoff": 6.0,
    "max_radial": 6,
    "max_angular": 4,
    "atomic_gaussian_width": 0.5,
    "radial_basis": {"Gto": {}},
    "cutoff_function": {"ShiftedCosine": {"width": 0.5}},
    "center_atom_weight": 1.0,    
}

calculator = rascaline.SoapPowerSpectrum(**hypers)

species = np.unique(np.concatenate([f.numbers for f in frames]))
feats_frm = []
for f in tqdm.tqdm(frames): 
    rho2i = calculator.compute(f)
    rho2i.keys_to_properties(equistore.Labels(['species_neighbor_1', 'species_neighbor_2'],
                                              values=np.array(list(itertools.product(species, species)), dtype=np.int32)) )
    print(rho2i.block(0).values.shape)
    feats_frm.append(rho2i.block(0).values.sum(axis=0))

Introduce an `absolute` operation

Mentioned in #145, but perhaps more useful as a separate issue as mathematically distinct from those mentioned in that issue.

Analogous to numpy.absolute() or torch.abs(), takes a TensorMap and returns a TensorMap of the same metadata, but where the blocks contain the absolute values of the original blocks. Make sure to include an absolute() function in _dispatch so that it can handle both numpy- and torch- based tensors.

Better error message when trying to assign to block.values directly

It could be useful to manipulate the descriptor values of a block after they have been calculated. However, expressions like

block.values *= -1

are not allowed an lead to AttributeError: can't set attribute. There is a workaround like

for r in block.values:
    r *= -1

which works fine, but in my view the first option should also work.

User layer for TensorBlock index operations through bracket operator

I wonder if could have these advance indexing operations available with the bracket operator like in numpy [ ] to have a similar feel of numpy arrays. I think it is easy to think about the tensor block (TB) as a 3D array and then do very similar things like slicing and indexing. Under the hood the equistore operations are still used. For some index operations I think it becomes not so trivial to do, but for slicing this should be easy to do.

Use a lexicographic Labels order when constructing a `TensorBlock` and a `TensorMap`

When doing operations on equistore objects, like a TensorBlock or TensorMap we perform checks to verify that the shapes are valid for the requested operation. Especially when comparing two instances it is important that the order of the values is the same. Swapped labels of otherwise identical instances could make it hard to debug the code for users.

Therefore I suggest that we introduce a Lexicographic order of Labels when constructing a TensorBlock and a TensorMap. Since this is something that affects the core of the whole library it should be done on the lowest Rust/C level.

Make sure the operations can be serialized to TorchScript

#51 started to implement operations on TensorMaps, using pure Python code. Once we are able to save TensorMap/TensorBlock to TorchScript, we should add tests making sure the code in operations can also be saved to TorchScript.

Non clear error message when an empty label is passed for samples

related to #127:

When trying to do something like this:

import numpy as np
from equistore import TensorBlock, TensorMap, Labels
 TensorBlock(values=np.array([[5,5,5]]),
     samples=Labels(names=[], values=np.array([[]],dtype=np.int32)),
     components=[],
     properties=Labels(["properties"], np.array([[0], [1], [5]], dtype=np.int32)))

you get a weird error message:

---------------------------------------------------------------------------
EquistoreError                            Traceback (most recent call last)
<ipython-input-14-c197761074a5> in <module>
----> 1 TensorBlock(values=np.array([[5,5,5]]),
      2 samples=Labels(names=[], values=np.array([[]],dtype=np.int32)),
      3 components=[],
      4 properties=Labels(
      5                 ["properties"], np.array([[0], [1], [5]], dtype=np.int32)

~/anaconda3/lib/python3.8/site-packages/equistore/block.py in __init__(self, values, samples, components, properties)
     62             properties._as_eqs_labels_t(),
     63         )
---> 64         _check_pointer(self._ptr)
     65 
     66     @staticmethod

~/anaconda3/lib/python3.8/site-packages/equistore/status.py in _check_pointer(pointer)
     46             raise EquistoreError(last_error()) from e
     47         else:
---> 48             raise EquistoreError(last_error())
     49 
     50 

EquistoreError: internal error (this is likely a bug, please report it): attempt to divide by zero

Missing operations transpose, swapaxis and basic arithmetic operation

Support for DLPack interchange format

DLPack is an existing standard, already implemented and supported by multiple frameworks, to share arrays between these frameworks without a copy. https://dmlc.github.io/dlpack/latest/index.html

We already support it for values/gradients in Python (since these will return numpy arrays or torch tensors), but we should explore how we can add support for it in the Rust/C/C++ API both as output (transforming eqs_array_t to DLManagedTensor) and as input (wrapping DLManagedTensor into eqs_array_t).

Documentation for keys_to_xxx is confusing when keys_to_move is a set of `Labels`

The Python documentation in particular does not do a good job of explaining what happens when you pass a single string, a list of strings or a set of Labels to keys_to_samples or keys_to_properties.

lab-cosmo / metatensor Goto Github PK

metatensor's People

Contributors

Stargazers

Watchers

Forkers

metatensor's Issues

QUESTION/PROBLEM I think one conceptual problem is how should we differ between a operation considering gradient and just broadcasting operation as shown above

QUESTION/PROBLEM how to do inplace operations?

Expected behavior

Actual behavior

Solution I would like

0) Clean separation of the core shared library

1) Write the new TorchScript classes

2) Change how Labels works

3) Ensure equistore.operations is usable from TorchScript

4) Select between pure Python & TorchScript versions of classes

Code to reproduce the behavior

Attachments

1. insert or update

2. drop_blocks

3. merge or unify or join_along_keys

4. drop_metadata_names

a) keys

b) samples/properties

5. insert_metadata_names

a) keys

b) samples/properties

6. inverse operations of keys_to_x etc?

Recommend Projects

Recommend Topics

Recommend Org

2) Change how `Labels` works

3) Ensure `equistore.operations` is usable from TorchScript

1. `insert` or `update`

2. `drop_blocks`

3. `merge` or `unify` or `join_along_keys`

4. `drop_metadata_names`

5. `insert_metadata_names`

6. inverse operations of `keys_to_x` etc?