sxs-collaboration / sxs Goto Github PK
View Code? Open in Web Editor NEWPython code for manipulating data from the SXS collaboration
License: MIT License
Python code for manipulating data from the SXS collaboration
License: MIT License
There appears to be a bug in the method to_json_file
in the Metadata
class. In particular, this line:
https://github.com/sxs-collaboration/sxs/blob/main/sxs/metadata/metadata.py#L273
This line creates a directory ending in .json
which breaks things. Removing seems to solve the issue.
The latest tests have been failing on ubuntu python 3.9 due to weird issues with tqdm. I've tried to fix the issues here. Until that PR is merged, it might be worthwhile restricting to tqdm<4.61.2 and, once it's fixed, tqdm>4.63 or whatever. I think this is fairly specific to github's ubuntu container, and won't show up for, e.g., a human running ubuntu.
This line should also get a 1 / (reference_mass1 + reference_mass2)
factor, as mentioned here sxs-collaboration/catalog_tools#7
sxs/sxs/utilities/lvcnr/waveforms.py
Line 20 in cb7f2cc
Hi,
I want to know the time when two black holes begin merger, so how could I know the time when two black holes begin merger of a waveform in sxs catalog? Thank you!
I forgot to add the basic code needed for backwards compatibility with NRAR-format waveform files, so that modes can be indexed as "Y_l2_m2.dat" or whatever, in analogy with how Horizons
objects work. It would also be nice to return a subclassed dict
when the user doesn't specify the extrapolation level, so that this indexing can even work from the top level, as in
with sxs.loadcontext("SXS:BBH:0123/Lev/rhOverM") as f:
h22 = f["OutermostExtraction.dir/Y_l2_m2.dat"]
The Horizons
object currently has a newtonian_com
property and an average_com_motion
method. It might be nice to include several other features needed for the LVC format, which they name
x_A - x_B
)newtonian_com
average_com_motion
n⃗
n̂
λ̂
ℓ̂
chi_prec
(Sec. 3 of https://arxiv.org/abs/1408.1810)Sigma
and Sigma__perp_λ̂
(Eq. (5) of https://arxiv.org/abs/1409.4431)The notebooks I created for the ICERM workshop are here. I should ask for input from other SXS members and add the repo2docker action to build automatically on new releases of sxs.
A user sent me this:
I got the following error message when I was trying to load the catalog:
>>> catalog = sxs.load("catalog")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/the_user_name/anaconda3/lib/python3.8/site-packages/sxs/handlers.py", line 213, in load
return Catalog.load(download=download)
File "/home/the_user_name/anaconda3/lib/python3.8/site-packages/sxs/catalog/catalog.py", line 82, in load
zip_path.replace(cache_path)
File "/home/the_user_name/anaconda3/lib/python3.8/pathlib.py", line 1369, in replace
self._accessor.replace(self, target)
OSError: [Errno 18] Invalid cross-device link: '/tmp/tmp9yb781zh/catalog.zip' -> '/home/the_user_name/.cache/sxs/catalog.zip'
It looks like one line (function .replace) in catalog.py is not compatible with all systems. You need to charge from .replace to shutil.move in order to fix this issue:
/anaconda3/lib/python3.8/site-packages/sxs/catalog/catalog.py
Line 4: Add:
import shutil
Line 82: Change
#zip_path.replace(cache_path)
dest_path = shutil.move(zip_path, cache_path)
On further investigation, I find that the replace
function only works if the source and destination are on the same filesystem — suggesting that python's tempfile
module made a temporary directory on some disk other than the one holding the user's home directory. This seems like it could be a fairly common use case.
I think a more robust fix would be to ask for the temporary files in the same folder as we expect to write the catalog zip file in, because read/write errors should occur if and only if there will be errors for the final file. Plus, there will be no copying between file systems.
A user reported this error on windows:
I got an error when loading the metadata:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'SXS:BBH:0123\\Lev\\metadata.json'
I had thought it might be something to do with the direction of the slashes, but I now realize it's the colons — they're apparently not legal on Windows. It looks like I had run across this error previously, but had not recognized it here. I guess I'll have to write some wrapper functions any time the cache is accessed (which I guess is just in sxs.load
and download
) to rewrite those on Windows.
While I'm at it, it would be nice if I could look into failures like this one. I think the problem may be that I convert things to pathlib.Path
objects, and it looks like they may be expanding to full user-space paths somewhere.
@geoffrey4444 wants to convert our entire catalog to LVC format. The main problem with that is that the conversion process takes of order hours per waveform, so without any improvements, we would expect this process to take of order a year on a single computer.
After looking through the data, it's become clear that the real problem here is the phase — more specifically, the use of phase as one of the data series while requiring the same tolerance on the amplitude and phase. The result is that splines of the phase in the output LVC format
for the resulting waveform to have the desired accuracy.
Each mode of the input waveform is first decomposed into amp
and (unwrapped) phase
, and then passed to romspline.ReducedOrderSpline
, which builds a spline that should be able to reproduce the input data to within an L-infinity tolerance of 1e-6. Note that this tolerance is absolute, not relative.
Presumably, the goal here is to produce a waveform for consumption by, e.g., LAL that is accurate to 1e-6. This is basically achieved with amp
. The amp
will typically be no larger than 0.1 (or orders of magnitude less for higher-order modes) except very near merger, meaning the relative tolerance is more like 1e-5 or less restrictive — but the absolute error will indeed be 1e-6.
Meanwhile the phase
will generally be tens to tens of thousands (depending on junk and/or precession), meaning that the relative tolerance will be 1e-7 or more restrictive. This is why the phase
splines perform so much worse than amp
.
The answer seems obvious: rather than requiring |δphase
|<1e-6, we need to be able to require amp
*|δphase
|<1e-6. Better yet, to keep the results as accurate as before, we should require |δamp
|<5e-7 and amp
*|δphase
|<5e-7, because now the latter will be contributing roughly equally with the former and L-infinity errors should add linearly.
But then, would LVC people accept such waveforms? I know everyone talks about how the phase really needs to be accurate, but I claim it will still be (at least nearly) as accurate as it was before. Specifically, this is not a cumulative phase error; it is an interpolation error. This means that there will still be many times throughout the output waveform that will have precisely zero phase error (relative to the input waveform), but there will be small fluctuations of the phase error between those points. These enter into the waveform in very much the same way as the amplitude errors, so they should be considered equivalent.
Currently, pip install sxs
does not install any of the requirements.
Currently, setup.py
refers to modules sxs.data
and sxs.data.api
:
Lines 56 to 62 in 1691b00
However, there are no such modules. A pip install will therefore error (and this is blocking me from adding auto-generated documentation on RTD).
For example, if somebody calls
sxs.load('RIT:BBH:0123/rPsi4')
then based on the string before the colon, look in the RIT catalog instead of the SXS catalog. Is this an appropriate interface? Should this be within the sxs
package, even though it is others' data? Or should it instead be in a different package? Ping @moble request for comment.
In catalog.py lines 66-104. Suppose the user passes download=False
and the user has no cache (so not cache_path.exists()
). Then line 99 will be hit, testing the value of download_failed
, which was never assigned, because the if block starting on line 66 was never entered.
I think the test for download is False
just needs to come before the test for download_failed
to resolve.
Found by @akhairna.
[A user asked this question through a separate channel recently. It'll be easier to answer it here.]
I need to do the fft to transform time domain waveform to the frequency domain. But I tried many times and I can't get the correct waveform of frequency domain. For the python code to do fft, I used:
import sxs
waveform = sxs.load("SXS:BBH:0123/Lev/rhOverM", extrapolation_order=2)
h = waveform
fs = 4096
datafreq = np.fft.fftfreq(h.size) * fs
h_f = (np.fft.fft(h)/fs)[0:int(len(h)/2)]
freqs = datafreq[0:int(len(h)/2)]
For the results of doing fft of (2, 2) mode, it have much higher oscillation in the high frequency domain with the loglog plot of absolute value of h(f), and I think the trend of waveform with increasing of frequency is also incorrect.
sxs.load('...')
fails for 'ExtraWaveforms', but sxs.rpdmb.load('ExtraWaveforms.h5/rMPsi4_Asymptotic_GeometricUnits_CoM_Mem/Extrapolated_N2.dir')
works.
Can we make it so that sxs.load('...') checks to see if there's string content after the '.h5' and if so then it looks in the json file to check the sxs_format
for the target waveform and then loads it via however that waveform is suppose to be loaded?
Hello!
When I try to load any metadata file or waveform from the catalog,
import sxs
sxs.load("SXS:BBH:0123/Lev/metadata.json")
I run into a 404 url not found error:
Found the following files to load from the SXS catalog:
SXS:BBH:0123v5/Lev5/metadata.json
An error occurred when trying to access <https://zenodo.org/api/files/260cc66c-43d6-4dc9-ba2b-39be9a7b28ee/SXS:BBH:0123/Lev5/metadata.json>.
{'message': 'The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.', 'status': 404}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/markc/sxs_test/lib/python3.9/site-packages/sxs/handlers.py", line 230, in load
download_file(download_url, path, progress=progress)
File "/home/markc/sxs_test/lib/python3.9/site-packages/sxs/utilities/downloads.py", line 59, in download_file
r.raise_for_status()
File "/home/markc/sxs_test/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://zenodo.org/api/files/260cc66c-43d6-4dc9-ba2b-39be9a7b28ee/SXS:BBH:0123/Lev5/metadata.json
Is this a problem with zenodo? I think this error only starting occurring this week.
Adapt sxs.load()
to that it can be used for simulations formatted as SXS:BBH_ExtCCE:XXXX and not just for extrapolated waveforms in the main catalog. As an example, the code below would be able to run:
import sxs
md = sxs.load("SXS:BBH_ExtCCE:0001/Lev3/metadata.json")
Many links in the README are broken. For example, waveform
in the below part points to https://github.com/sxs-collaboration/sxs/blob/main/api/sxs.waveforms.waveform_modes/#sxs.waveforms.waveform_modes.WaveformModes.
Similarly, for catalog, metadata, etc, but I didn't check everything.
The errors
produced by romspline and stored in the output LVC format files do not seem useful for our purposes; they look like they're more relevant to studying the greedy algorithm itself. Storing them increases the size of each LVC-format file by almost 50%.
Each time series in LVC format is currently stored as a group containing five datasets:
X
, representing the sample times (aka knots of the spline)Y
, representing the data values at those timesdeg
, giving the degree of the spline used in the greedy algorithmtol
, giving the target tolerance allowed by the greedy algorithmerrors
This last dataset represents the largest error that the greedy algorithm encountered at each step as it was progressively choosing more and more knots to use for the spline. For example, the first element in any given errors
dataset is the largest error when approximating the entire data series in question by a spline with just 6 knots; the second element gives the largest error with 7 knots. I don't see any connection between these quantities and anything we actually care about with the final result, which will typically use hundreds to thousands of knots. In particular, the points with those largest errors are then added to the set of knots, so that the error for the final result at those points should actually be 0.0 in all cases.
The romspline.ReducedOrderSpline.write
function that we use to write these files has a slim
option that, if set to True
skips writing this errors
dataset. Can we do so? Does anyone use those datasets? This would reduce our LVC file sizes by almost a third, and might simplify certain other things that I think might need to happen.
Please add orcid ID's
Harald Pfeiffer -- https://orcid.org/0000-0001-9288-519X
keywords
field, which has short descriptions of systems
deprecated
as a keywordwarnings
should be a new fieldsuperseded_by
should be a new fielddeprecated
, refuse to load the file
superseded_by
exists, recommend the newer waveformsuperseded_by
, suggest a different waveformwarnings
exist, issue the warnings
It would be helpful to have an sxs.cache
submodule with functions for tidying up the cache. Things like
A lot of our errors come from Zenodo responding to requests with 504 codes, which should just be retried after some delay. We also get a lot of what are apparently OpenSSL errors, saying OSError("(32, 'EPIPE')"
. My best guess is that these are caused by file access errors on our end, and should also be retried. These can be handled in the code that calls the API functions, but it would be nice if the API code would handle retries on its own, especially if it could be done just for these cases — not for status codes that should not be retried like 403 errors, for example.
This page demonstrates how to make a Session
object retry automatically. Here's some pseudocode showing how it work for the Login
object:
import requests
def login.__init__(self, retries=15, backoff_factor=0.05, status_forcelist=[504,], session=None):
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = session or requests.Session()
retry = Retry(
total=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('https://', adapter)
We now have spherical.Modes
, scri.ModesTimeSeries
, scri.AsymptoticBondiData
, sxs.TimeSeries
, and sxs.WaveformModes
objects — with sporadic relationships between them. I'm thinking that scri.ModesTimeSeries
should be moved to sxs
, and inherit from both spherical.Modes
and sxs.TimeSeries
. Then, sxs.WaveformModes
can inherit from the new sxs.ModesTimeSeries
and WaveformMixin
, while sxs.WaveformGrid
can inherit from sxs.TimeSeries
and WaveformMixin
.
Moving one part of sxs-collaboration/catalog_tools#3 to this repo, because the code has moved.
@SergeiOssokine said
The function that looks at comparable cases currently only considers the mass ratio and magnitudes of the spins and not their directions (it also does not take into account eccentricity). So a q=5, chi1=(0.9,0.0,0.0) would be considered close to q=5, chi1=(0.0,0.0,0.9).
@geoffrey4444 said
I agree it's not optimal, but it's not obvious to me how to properly fold in everything you might want to look at. Is eccentricity more or less important than mass ratios and spins for determining accuracy?
@aaronbzimmerman said
Eccentricity should probably not be very different for the "similar run" used in error estimates, since an eccentricity 0.1 run isn't really the same as a circular run. Add logic to check that the eccentricity be close enough that only quasi-circular runs are used to estimate errors for quasi-circular runs (maybe check that the difference in eccentricity is less than say 0.01 or 0.05?)
I don't see any problem adding such logic to this function, but I wouldn't feel comfortable making the decisions when I haven't used this format, and don't have any feeling for what LVC people would want. So this looks like a job for Geoffrey.
In the spherical
package 'bar' is a property, but it seems the implementation in sxs
treats it as a function, which leads to the following error:
File "/home/hannes/.local/lib/python3.8/site-packages/sxs/waveforms/waveform_modes.py", line 288, in bar
return spherical.modes.algebra.bar(self)
TypeError: 'property' object is not callable
Given a list of SXS IDs, return a list of associated INSPIRE keys which an end-user can cite
One problem a user ran into during the ICERM workshop was that given w.norm
and w[:, 2]
, he expected w[:, 2].norm
to work. I suggested that this failure was a bug, but now I'm not convinced it should work at all, because it changes the meaning of "norm". Instead, w[:, 2]
should return a basic TimeSeries
object — or maybe some Waveform
object that isn't WaveformModes
— because it really doesn't contain the information I expect. Specifically, whenever the modes are sliced, I need to make sure that ell_min
and ell_max
are reset correctly, and if the slice does not respect mode boundaries, it should return a different type of object. I can imagine that norm
might still make some sense, but only in the general sense of abs**2
.
In any case, I have work to do to ensure that slicing modes deals with the mode metadata sensibly.
It's useful to have the metadata.json
file loaded with an object, and really useful to have the <same_name>.json
file loaded. But these may be cached in separate directories if one was updated but not the other in the latest zenodo version. The catalog logic that could solve this problem is currently handled in Catalog
and the highest-level load
function, so that sub-loaders only need to solve the simpler problem of just understanding the data format itself.
Maybe sxs.load
should take the include_json
argument, and pass the file path to the loaders where relevant.
Moving another part of sxs-collaboration/catalog_tools#3 to this repo, because the code has moved.
@SergeiOssokine said
Currently when we call romspline to generate the spline we use the default value for the degree of the spline, which is 5. However, in LALSuite, when the data is being reconstructed the interpolation is done using cubic splines. Should we also just use cubic splines?
@geoffrey4444 said
I vote for using the same degree as before, not cubic splines, for consistency.
I vote for changing it to use cubic splines. I think there are two very good reasons (and one maybe-good reason) for this:
deg=5
, I see no reason to expect that to be true of a waveform reconstructed with deg=3
. This may seem to conflict with the previous point, because some data series have more points when using deg=5
; but some data series also have fewer points, so there are surely some series that will not be interpolated properly.It looks like points 1 and 2 are the same or stronger if I go to higher ell values, but I just don't have the time to wait around to completely answer that question.
I've added options to sxs.format.lvc.convert_simulation
allowing the user to select the desired spline degree and tolerance. They're currently set to the romspline defaults, but I'm suggesting changing the former to 3, rather than 5.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.