sxs-collaboration / sxs Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 17.0 21.32 MB

Python code for manipulating data from the SXS collaboration

License: MIT License

Python 100.00%

python

sxs's People

Contributors

Stargazers

Watchers

Forkers

trami18 duetosymmetry bondzio marlomo 10220 antoniramosbuades cnhzsyb keefemitman gracospa monkey0720 dongzesun markscheel adivijaykumar ethansgithub doubleking-1

sxs's Issues

Possible bug in metadata writing

There appears to be a bug in the method to_json_file in the Metadata class. In particular, this line:
https://github.com/sxs-collaboration/sxs/blob/main/sxs/metadata/metadata.py#L273
This line creates a directory ending in .json which breaks things. Removing seems to solve the issue.

Failing tests

The latest tests have been failing on ubuntu python 3.9 due to weird issues with tqdm. I've tried to fix the issues here. Until that PR is merged, it might be worthwhile restricting to tqdm<4.61.2 and, once it's fixed, tqdm>4.63 or whatever. I think this is fairly specific to github's ubuntu container, and won't show up for, e.g., a human running ubuntu.

Fix mass scaling

This line should also get a 1 / (reference_mass1 + reference_mass2) factor, as mentioned here sxs-collaboration/catalog_tools#7

sxs/sxs/utilities/lvcnr/waveforms.py

Line 20 in cb7f2cc

truncation_time = metadata['reference_time']

The time of merger of BBH in sxs

Hi,
I want to know the time when two black holes begin merger, so how could I know the time when two black holes begin merger of a waveform in sxs catalog? Thank you!

Backwards compatibility with waveforms

I forgot to add the basic code needed for backwards compatibility with NRAR-format waveform files, so that modes can be indexed as "Y_l2_m2.dat" or whatever, in analogy with how Horizons objects work. It would also be nice to return a subclassed dict when the user doesn't specify the extrapolation level, so that this indexing can even work from the top level, as in

with sxs.loadcontext("SXS:BBH:0123/Lev/rhOverM") as f:
    h22 = f["OutermostExtraction.dir/Y_l2_m2.dat"]

Convenience methods for Horizons objects

The Horizons object currently has a newtonian_com property and an average_com_motion method. It might be nice to include several other features needed for the LVC format, which they name

nhat: the separation vector from BH B to BH A (x_A - x_B)
omega_orbit: the orbital angular frequency
LNhat: the direction of the (Newtonian) orbital angular velocity

Polish up binder notebooks

The notebooks I created for the ICERM workshop are here. I should ask for input from other SXS members and add the repo2docker action to build automatically on new releases of sxs.

tempfile replace problems

A user sent me this:

I got the following error message when I was trying to load the catalog:

>>> catalog = sxs.load("catalog")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/the_user_name/anaconda3/lib/python3.8/site-packages/sxs/handlers.py", line 213, in load
    return Catalog.load(download=download)
  File "/home/the_user_name/anaconda3/lib/python3.8/site-packages/sxs/catalog/catalog.py", line 82, in load
    zip_path.replace(cache_path)
  File "/home/the_user_name/anaconda3/lib/python3.8/pathlib.py", line 1369, in replace
    self._accessor.replace(self, target)
OSError: [Errno 18] Invalid cross-device link: '/tmp/tmp9yb781zh/catalog.zip' -> '/home/the_user_name/.cache/sxs/catalog.zip'

It looks like one line (function .replace) in catalog.py is not compatible with all systems. You need to charge from .replace to shutil.move in order to fix this issue:

/anaconda3/lib/python3.8/site-packages/sxs/catalog/catalog.py

Line 4: Add:

  import shutil

Line 82: Change
#zip_path.replace(cache_path)
dest_path = shutil.move(zip_path, cache_path)

On further investigation, I find that the replace function only works if the source and destination are on the same filesystem — suggesting that python's tempfile module made a temporary directory on some disk other than the one holding the user's home directory. This seems like it could be a fairly common use case.

I think a more robust fix would be to ask for the temporary files in the same folder as we expect to write the catalog zip file in, because read/write errors should occur if and only if there will be errors for the final file. Plus, there will be no copying between file systems.

Windows path error

A user reported this error on windows:

I got an error when loading the metadata:

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'SXS:BBH:0123\\Lev\\metadata.json'

I had thought it might be something to do with the direction of the slashes, but I now realize it's the colons — they're apparently not legal on Windows. It looks like I had run across this error previously, but had not recognized it here. I guess I'll have to write some wrapper functions any time the cache is accessed (which I guess is just in sxs.load and download) to rewrite those on Windows.

While I'm at it, it would be nice if I could look into failures like this one. I think the problem may be that I convert things to pathlib.Path objects, and it looks like they may be expanding to full user-space paths somewhere.

LVC conversion: better tolerances

@geoffrey4444 wants to convert our entire catalog to LVC format. The main problem with that is that the conversion process takes of order hours per waveform, so without any improvements, we would expect this process to take of order a year on a single computer.

After looking through the data, it's become clear that the real problem here is the phase — more specifically, the use of phase as one of the data series while requiring the same tolerance on the amplitude and phase. The result is that splines of the phase in the output LVC format

take hundreds to thousands of times longer to construct than than they need to, and
occupy tens to hundreds of times more space than they need to

for the resulting waveform to have the desired accuracy.

Each mode of the input waveform is first decomposed into amp and (unwrapped) phase, and then passed to romspline.ReducedOrderSpline, which builds a spline that should be able to reproduce the input data to within an L-infinity tolerance of 1e-6. Note that this tolerance is absolute, not relative.

Presumably, the goal here is to produce a waveform for consumption by, e.g., LAL that is accurate to 1e-6. This is basically achieved with amp. The amp will typically be no larger than 0.1 (or orders of magnitude less for higher-order modes) except very near merger, meaning the relative tolerance is more like 1e-5 or less restrictive — but the absolute error will indeed be 1e-6.

Meanwhile the phase will generally be tens to tens of thousands (depending on junk and/or precession), meaning that the relative tolerance will be 1e-7 or more restrictive. This is why the phase splines perform so much worse than amp.

But then, would LVC people accept such waveforms? I know everyone talks about how the phase really needs to be accurate, but I claim it will still be (at least nearly) as accurate as it was before. Specifically, this is not a cumulative phase error; it is an interpolation error. This means that there will still be many times throughout the output waveform that will have precisely zero phase error (relative to the input waveform), but there will be small fluctuations of the phase error between those points. These enter into the waveform in very much the same way as the amplitude errors, so they should be considered equivalent.

setup.py should have install_requires=[...] for dependencies

Currently, pip install sxs does not install any of the requirements.

References to missing modules `sxs.data`, `sxs.data.api`

Currently, setup.py refers to modules sxs.data and sxs.data.api:

sxs/setup.py

Lines 56 to 62 in 1691b00

    
           setup(name='sxs', 
        
                 packages = ['sxs', 'sxs.metadata', 'sxs.doxygen', 
        
                             'sxs.format', 'sxs.format.lvc', 'sxs.references', 
        
                             'sxs.utilities', 'sxs.utilities.decimation', 
        
                             'sxs.validate', 'sxs.zenodo', 'sxs.zenodo.api', 
        
                             'sxs.data', 'sxs.data.api'], 
        
                 scripts = ['scripts/sxs'],

However, there are no such modules. A pip install will therefore error (and this is blocking me from adding auto-generated documentation on RTD).

Enhancement: Consider adding support to load data from RIT/Maya/etc. data

For example, if somebody calls

sxs.load('RIT:BBH:0123/rPsi4')

then based on the string before the colon, look in the RIT catalog instead of the SXS catalog. Is this an appropriate interface? Should this be within the sxs package, even though it is others' data? Or should it instead be in a different package? Ping @moble request for comment.

Variable used before assigned in catalog.load()

In catalog.py lines 66-104. Suppose the user passes download=False and the user has no cache (so not cache_path.exists()). Then line 99 will be hit, testing the value of download_failed, which was never assigned, because the if block starting on line 66 was never entered.

I think the test for download is False just needs to come before the test for download_failed to resolve.

Found by @akhairna.

Converting waveforms to the frequency domain

[A user asked this question through a separate channel recently. It'll be easier to answer it here.]

I need to do the fft to transform time domain waveform to the frequency domain. But I tried many times and I can't get the correct waveform of frequency domain. For the python code to do fft, I used:

import sxs

waveform = sxs.load("SXS:BBH:0123/Lev/rhOverM", extrapolation_order=2)

h = waveform
fs = 4096
datafreq = np.fft.fftfreq(h.size) * fs
h_f = (np.fft.fft(h)/fs)[0:int(len(h)/2)]
freqs = datafreq[0:int(len(h)/2)]

For the results of doing fft of (2, 2) mode, it have much higher oscillation in the high frequency domain with the loglog plot of absolute value of h(f), and I think the trend of waveform with increasing of frequency is also incorrect.

sxs.load('...') fails for ExtraWaveforms.h5

sxs.load('...') fails for 'ExtraWaveforms', but sxs.rpdmb.load('ExtraWaveforms.h5/rMPsi4_Asymptotic_GeometricUnits_CoM_Mem/Extrapolated_N2.dir') works.

Can we make it so that sxs.load('...') checks to see if there's string content after the '.h5' and if so then it looks in the json file to check the sxs_format for the target waveform and then loads it via however that waveform is suppose to be loaded?

@moble

url not found error when using sxs.load(...)

Hello!

When I try to load any metadata file or waveform from the catalog,

import sxs
sxs.load("SXS:BBH:0123/Lev/metadata.json")

I run into a 404 url not found error:

Found the following files to load from the SXS catalog:
    SXS:BBH:0123v5/Lev5/metadata.json
An error occurred when trying to access <https://zenodo.org/api/files/260cc66c-43d6-4dc9-ba2b-39be9a7b28ee/SXS:BBH:0123/Lev5/metadata.json>.
{'message': 'The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.', 'status': 404}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/markc/sxs_test/lib/python3.9/site-packages/sxs/handlers.py", line 230, in load
    download_file(download_url, path, progress=progress)
  File "/home/markc/sxs_test/lib/python3.9/site-packages/sxs/utilities/downloads.py", line 59, in download_file
    r.raise_for_status()
  File "/home/markc/sxs_test/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://zenodo.org/api/files/260cc66c-43d6-4dc9-ba2b-39be9a7b28ee/SXS:BBH:0123/Lev5/metadata.json

Is this a problem with zenodo? I think this error only starting occurring this week.

Add sxs.load() support for Ext_CCE catalog

Adapt sxs.load() to that it can be used for simulations formatted as SXS:BBH_ExtCCE:XXXX and not just for extrapolated waveforms in the main catalog. As an example, the code below would be able to run:

import sxs
md = sxs.load("SXS:BBH_ExtCCE:0001/Lev3/metadata.json")

Broken links in the readme

Many links in the README are broken. For example, waveform in the below part points to https://github.com/sxs-collaboration/sxs/blob/main/api/sxs.waveforms.waveform_modes/#sxs.waveforms.waveform_modes.WaveformModes.

Similarly, for catalog, metadata, etc, but I didn't check everything.

LVC conversion: Are error outputs needed?

The errors produced by romspline and stored in the output LVC format files do not seem useful for our purposes; they look like they're more relevant to studying the greedy algorithm itself. Storing them increases the size of each LVC-format file by almost 50%.

Each time series in LVC format is currently stored as a group containing five datasets:

X, representing the sample times (aka knots of the spline)
Y, representing the data values at those times
deg, giving the degree of the spline used in the greedy algorithm
tol, giving the target tolerance allowed by the greedy algorithm
errors

This last dataset represents the largest error that the greedy algorithm encountered at each step as it was progressively choosing more and more knots to use for the spline. For example, the first element in any given errors dataset is the largest error when approximating the entire data series in question by a spline with just 6 knots; the second element gives the largest error with 7 knots. I don't see any connection between these quantities and anything we actually care about with the final result, which will typically use hundreds to thousands of knots. In particular, the points with those largest errors are then added to the set of knots, so that the error for the final result at those points should actually be 0.0 in all cases.

The romspline.ReducedOrderSpline.write function that we use to write these files has a slim option that, if set to True skips writing this errors dataset. Can we do so? Does anyone use those datasets? This would reduce our LVC file sizes by almost a third, and might simplify certain other things that I think might need to happen.

orcid ID's

Please add orcid ID's

Harald Pfeiffer -- https://orcid.org/0000-0001-9288-519X

Add support for new metadata fields `keywords`, `warnings`, `superseded_by`

Use keywords field, which has short descriptions of systems
- Can then add deprecated as a keyword
warnings should be a new field
superseded_by should be a new field
If sxs sees deprecated, refuse to load the file
- If you really want to load this, use a new option
- If superseded_by exists, recommend the newer waveform
If sxs sees superseded_by, suggest a different waveform
If warnings exist, issue the warnings

Cache manipulation

It would be helpful to have an sxs.cache submodule with functions for tidying up the cache. Things like

Remove all files
Remove everything bigger than X
Remove everything older than Y
Remove files with newer versions in cache (dealing with rhOverM gracefully)
Update (look through every file in cache and replace any old files with newer versions)

Add retries

A lot of our errors come from Zenodo responding to requests with 504 codes, which should just be retried after some delay. We also get a lot of what are apparently OpenSSL errors, saying OSError("(32, 'EPIPE')". My best guess is that these are caused by file access errors on our end, and should also be retried. These can be handled in the code that calls the API functions, but it would be nice if the API code would handle retries on its own, especially if it could be done just for these cases — not for status codes that should not be retried like 403 errors, for example.

This page demonstrates how to make a Session object retry automatically. Here's some pseudocode showing how it work for the Login object:

import requests

def login.__init__(self, retries=15, backoff_factor=0.05, status_forcelist=[504,], session=None):
    from requests.adapters import HTTPAdapter
    from requests.packages.urllib3.util.retry import Retry
    session = session or requests.Session()
    retry = Retry(
        total=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('https://', adapter)

Unify treatment of Modes objects

We now have spherical.Modes, scri.ModesTimeSeries, scri.AsymptoticBondiData, sxs.TimeSeries, and sxs.WaveformModes objects — with sporadic relationships between them. I'm thinking that scri.ModesTimeSeries should be moved to sxs, and inherit from both spherical.Modes and sxs.TimeSeries. Then, sxs.WaveformModes can inherit from the new sxs.ModesTimeSeries and WaveformMixin, while sxs.WaveformGrid can inherit from sxs.TimeSeries and WaveformMixin.

LVC conversion: "comparable" eccentricities

Moving one part of sxs-collaboration/catalog_tools#3 to this repo, because the code has moved.

@SergeiOssokine said

The function that looks at comparable cases currently only considers the mass ratio and magnitudes of the spins and not their directions (it also does not take into account eccentricity). So a q=5, chi1=(0.9,0.0,0.0) would be considered close to q=5, chi1=(0.0,0.0,0.9).

@geoffrey4444 said

I agree it's not optimal, but it's not obvious to me how to properly fold in everything you might want to look at. Is eccentricity more or less important than mass ratios and spins for determining accuracy?

@aaronbzimmerman said

Eccentricity should probably not be very different for the "similar run" used in error estimates, since an eccentricity 0.1 run isn't really the same as a circular run. Add logic to check that the eccentricity be close enough that only quasi-circular runs are used to estimate errors for quasi-circular runs (maybe check that the difference in eccentricity is less than say 0.01 or 0.05?)

I don't see any problem adding such logic to this function, but I wouldn't feel comfortable making the decisions when I haven't used this format, and don't have any feeling for what LVC people would want. So this looks like a job for Geoffrey.

Imported property 'bar' is not callable

In the spherical package 'bar' is a property, but it seems the implementation in sxs treats it as a function, which leads to the following error:

File "/home/hannes/.local/lib/python3.8/site-packages/sxs/waveforms/waveform_modes.py", line 288, in bar
    return spherical.modes.algebra.bar(self)
TypeError: 'property' object is not callable

Convenience function to get which INSPIRE keys to cite

Given a list of SXS IDs, return a list of associated INSPIRE keys which an end-user can cite

Slicing WaveformModes objects

One problem a user ran into during the ICERM workshop was that given w.norm and w[:, 2], he expected w[:, 2].norm to work. I suggested that this failure was a bug, but now I'm not convinced it should work at all, because it changes the meaning of "norm". Instead, w[:, 2] should return a basic TimeSeries object — or maybe some Waveform object that isn't WaveformModes — because it really doesn't contain the information I expect. Specifically, whenever the modes are sliced, I need to make sure that ell_min and ell_max are reset correctly, and if the slice does not respect mode boundaries, it should return a different type of object. I can imagine that norm might still make some sense, but only in the general sense of abs**2.

In any case, I have work to do to ensure that slicing modes deals with the mode metadata sensibly.

Improving automatic loading of metadata

It's useful to have the metadata.json file loaded with an object, and really useful to have the <same_name>.json file loaded. But these may be cached in separate directories if one was updated but not the other in the latest zenodo version. The catalog logic that could solve this problem is currently handled in Catalog and the highest-level load function, so that sub-loaders only need to solve the simpler problem of just understanding the data format itself.

Maybe sxs.load should take the include_json argument, and pass the file path to the loaders where relevant.

LVC conversion: spline degree

Moving another part of sxs-collaboration/catalog_tools#3 to this repo, because the code has moved.

@SergeiOssokine said

Currently when we call romspline to generate the spline we use the default value for the degree of the spline, which is 5. However, in LALSuite, when the data is being reconstructed the interpolation is done using cubic splines. Should we also just use cubic splines?

@geoffrey4444 said

I vote for using the same degree as before, not cubic splines, for consistency.

I vote for changing it to use cubic splines. I think there are two very good reasons (and one maybe-good reason) for this:

Cubic splines are significantly faster. In one quick test on the ell<=3 modes of one waveform, it's more than twice as fast to use cubic splines.
That same test actually shows a ~3% reduction in output file size. This actually agrees with what I expected, because higher-degree splines tend to get unstable, so they effectively need more control points — especially when the data are noisy or otherwise not particularly smooth. While the biggest modes are fairly smooth, most of the modes are not. There's a rule of thumb that says cubic splines are typically the best choice for data that's appropriately sampled, and that seems to be true here.
The tolerance doesn't mean what we think it means when comparing splines of different degrees. If tolerance 1e-6 is achieved for the default deg=5, I see no reason to expect that to be true of a waveform reconstructed with deg=3. This may seem to conflict with the previous point, because some data series have more points when using deg=5; but some data series also have fewer points, so there are surely some series that will not be interpolated properly.

It looks like points 1 and 2 are the same or stronger if I go to higher ell values, but I just don't have the time to wait around to completely answer that question.

I've added options to sxs.format.lvc.convert_simulation allowing the user to select the desired spline degree and tolerance. They're currently set to the romspline defaults, but I'm suggesting changing the former to 3, rather than 5.

	setup(name='sxs',
	packages = ['sxs', 'sxs.metadata', 'sxs.doxygen',
	'sxs.format', 'sxs.format.lvc', 'sxs.references',
	'sxs.utilities', 'sxs.utilities.decimation',
	'sxs.validate', 'sxs.zenodo', 'sxs.zenodo.api',
	'sxs.data', 'sxs.data.api'],
	scripts = ['scripts/sxs'],