Comments (8)
- Do not copy metadata. Copying almost always results in incorrect metadata and makes it impossible to use metadata for anything useful. For instance, let's say we try to detect "data acquired from a camera" using metadata, which we do currently. If all processed data from the original has a copy of the metadata, how are we to distinguish it from the original?
- Put source metadata into a sub-tag. This is also not a good idea as it can propagate a lot of idea quickly and still make it difficult to access the metadata.
My plan for this in the future (which may be now) is to provide more metadata functions that allow for tracing the sources of the metadata. So an operation can add metadata about its sources and then additional functions can help to access a processed item's sources and metadata can be grabbed from the source. The obvious downside is if the source is missing.
Another approach would be to get rid of the concept of unstructured metadata completely and define subsets that are more specific and may be copied. We already do this to some extent with calibrations and data descriptions. In the near future we may add coordinate systems. The first example might be "description of processing". Another new example might be "original data acquisition information." This needs some thinking.
from niondata.
Before I close this issue, I'd like to understand your use case. What are you trying to accomplish by copying metadata?
from niondata.
I think there are a lot of processing routines where copying the metadata is valid. Think of things like a Gaussian blur (or any other filter). Even alining a sequence does not invalidate metadata.
Right now if you process your data in Swift, you always loose all metadata.
If you then export or even just snapshot the result, you have no idea about how this data was acquired because all the information about it is just gone.
There might be operations that invalidate metdata, but defaulting to just dropping everything is the wrong approach in my opinion.
from niondata.
I've considered the exporting case in the past and my conclusion in the past was that export is a special operation that should consolidate processing information and include metadata from sources. See nion-software/nionswift#397 and nion-software/nionswift#398.
If the user has a way to view sources and metadata of those sources directly in an expanded metadata editor and export has an option for including processing info and metadata from sources, would that satisfy your concerns?
from niondata.
It is better than nothing but I would still prefer if functions that do not cause metadata to be invalid would just copy it to their results.
from niondata.
New example where this behavior leads to an awkward implementation:
Consider a camera with the new acquire_synchronized
capabilities. It returns Partial
data with the xdata
attribute being the SI data which usually still contains the flyback pixels. So the obvoius thing to do would be to use the follwing code to crop the data:
from nion.data import xdata_1_0 as xd
partial_data = camera.acquire_synchronized_continue(...)
xdata = xd.data_slice(partial_data.xdata, (slice(None), slice(-2), ...))
Which is great because it keeps calibrations etc. and it is an API function.
But now we have to access a "private" function to get our metadata
dict back:
xdata._set_metadata(partial_data.xdata.metadata)
Why does data_slice
strip the metadata dict from the data? This makes absolutely no sense to me...
from niondata.
An additional rationale for this issue is described in the similar-to-niondata xarray
project, where they only copy metadata unless explicitly requested or it is unambiguous.
xarray: What is your approach to metadata?
from niondata.
Additional notes 2021-07-21:
1:
- In general metadata describes what we are looking at, how it was collected, and its calibrations. Obviously none of these things change from an align process. An integrate will obviously affect the effective exposure time.
- Chris' comment that in the metadata there is an item specific to "data acquired from a camera" means that we should discard all information is highly facetious. The proper way to handle this is to have a metadata tag "Origin" that gets set to what the origin of the data was after copying the metadata. I would also add a tag "Trail" that appends the current Origin to the Trail of the source. So we would see "Trail"="Camera ELA,Align,Integrate,Crop" for a basic wq.
- losing calibrations is nuts, and it is not helped by it not being possible to use inspector to copy them, as inspector only shows a heavily rounded version of the calibration.
2:
We would be hugely better off if we simply copied the meta-data from the parent to the child. There is indeed a chance that this could lead to some confusion, and this would be an improvement on the current situation where confusion is guaranteed.
Generally speaking, a more robust method for handling meta data is needed. We need options for what to do to handle complex cases, with sensible defaults for the most common operations.
Common use cases include:
- multi-acquire NxM, align, integrate -- please do not lose the calibration information (scale nor intensity), microscope kV, acquisition detector, etc...
(this could be 10,000 spectra with acquisition time 1ms, or 1,000 images with 1us/pixel, or other) - acquire a spectrum, subtract dark image, multiply by gain image. please do not lose the calibration information (scale nor intensity), microscope kV, acquisition detector, etc...
from niondata.
Related Issues (20)
- Add 1d and 2d cross correlation functions
- Data functions should all work sensibly on h5py datasets HOT 1
- Use term 'rank' for dimension count to match NumPy
- Consider using 'navigation' and 'signal' to match HyperSpy nomenclature
- The resize method should work on RGB data
- function_concatenate should preserve data_descriptor of input data HOT 2
- Allow specifying a maximum shift for align and sequence_align functions HOT 1
- Allow choice of Fourier shift or regular shift in sequence_align functions
- Sequences of color images
- Sobel filter should be 2d for images HOT 5
- Change default "sequence shift" to using linear spline shift HOT 1
- Should register_template return offset relative to center pixel? HOT 3
- Consider mechanism to track additional data properties such as whether complex is Hermitian
- Add mechanism to indicate which parts of the data are valid
- Intermittent test failures in new multi dimensional functions
- FFT result should have sensible intensity units; subsequent FFT should result in original intensity units HOT 1
- Multiplying a calibrated image by an uncalibrated image should result in a calibrated image
- Processing should use the timestamp/timezone/timezone_offset from the source data
- Add function to perform cropping in either sequence/collection axes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from niondata.