Comments (7)
To combine them, I first had to create a massive dict that contains all 1335 samples. I first tried to read in a csv that links the bin to a sample name, but struggled too much and in the end made a massive dict definition just using some search/replace on a csv export.
For the record, one simple way to do this kind of thing is:
csv_contents = '''
sample1, bin1
sample2, bin1
sample3, bin1
sample4, bin2
sample5, bin2
sample6, bin2'''[1:]
bindata = [l.split(', ') for l in csv_contents.split('\n')]
bins = {_[1] for _ in bindata} # NB this is a set, not a dict
bindict = {b: [sample for sample, bin in bindata if bin == b] for b in bins}
# this results in:
#
# bindict = {
# 'bin2': ['sample4', 'sample5', 'sample6'],
# 'bin1': ['sample1', 'sample2', 'sample3'],
# }
Now I managed to use combine_samples() with this dict, and it returns, as it says on the tin:
- the list of group names
- an array of the corresponding Δ4x values
- the corresponding (co)variance matrix
but now I'm not sure how to combine these again so that I get a table of:
| bin id | D47 average | D47 95% CI |
Basically I don't know how to calculate the 95% CI from a NxN covariance matrix and how to export the stuff to a csv.
By default I estimate 95% confidence limits by multiplying the standard error for each bin (from the diagonal elements of the covariance matrix) by the relevant Student's t factor for the number of degrees of freedom in the standardization model. This t factor is conveniently stored in the D47data
object's t95
property. So if your D47data
instance is mydata
, and mydata.combine_samples()
returns mybins
, D47_avg
, D47_CM
, you could do:
with open('binned_output.csv', 'w') as fid:
fid.write('bin,D47,D47_SE,D47_95CL')
for k in range(len(bins)):
fid.write(f'\n{mybins[k]},{D47_avg[k]},{mydata.t95 * D47_CM[k,k]**0.5}')
I also noticed that the averages are different from the ones that I get in my R script, but this may be because of the ETF pooling that you do, or the offset correction that we do. Will have to look into that part later.
Also, remember that binned samples are weighted by number of replicate analyses, which is a reasonable default behavior but is not always optimal. If you want to use different weights, look at the code; it should be simple to add an option to weigh samples equally. If this is something you think is necessary, feel free to let me know / open an issue / submit a PR.
Any help would be appreciated! :)
Hope this helps. I'm just happy someone is using this library apart from myself!
from d47crunch.
In that case a quick and dirty improvement would be to let users arbitrarily redefine the figure dimensions. Could you play with 6e71f67 and let me know if it helps?
from d47crunch.
That helps! And works.
But I think I misdiagnosed the problem: I was thinking of "Samples" as replicate measurements of the same exact sample (i.e. IODP Leg 208 Site 1264 Hole A 28H2 96 cm–97.5 cm
) but for clumped we need to calculate averages over longer time scales to get good reproducibility, and thus a "sample" is now something else: a period of time where you assume the material to be homogeneous enough to chuck it all together.
Obviously we're not at a stage yet where we need 100s of rows in this figure, because that would be a crazy well replicated study!
I guess I should have re-read your paper prior to messing around with the code and complaining that it wasn't working ;-)
from d47crunch.
FTR:
- "sample" = an amount of presumably homogeneous carbonate material. Each sample should be uniquely identified by a sample name (field
Sample
in the csv file). - "analysis" or "replicate" = corresponds to a single acid reaction followed by purification of the evolved CO2 and by a series of dual-inlet IRMS measurements. Each analysis is identified by a unique identifier (field
UID
in the csv file, but if it's missing a default series of UIDs will be generated).
from d47crunch.
Thanks, just to clarify further:
If I specify UID to mean one acid reaction and Sample to mean the things that I want to calculate averages for, is there a way to account for other "block" effects, in this case the unique label that was sieved/picked/cleaned separately but may have multiple analyses? Or, say, two benthic species picked from the same sample that we want to average together but might display some block effects, especially on d13C and d18O?
from d47crunch.
Just in case, beware that by defining a Sample
(in D47crunch
terminology), you postulate that it is isotopically homogeneous, and this assumption will play into your estimates of analytical repeatability. So you should not assign identical sample names to materials that you expect to have different isotopic compositions just for the sake of estimating their average Δ47 value.
Instead, even if you treat materials of different isotopic compositions as different samples, you may still average their Δ47 values after the standardization step, using D4xdata.sample_average()
:
# compute the unweighted average of Δ47(sample1), Δ47(sample3), and Δ47(sample3)
# and the corresponding standard error:
mydata.sample_average(samples = ['sample1', 'sample2', 'sample3'])
# compute the weighted average of Δ47(sample1), Δ47(sample3), and Δ47(sample3)
# with respective weights of 1, 2, and 3, and the corresponding SE:
mydata.sample_average(
samples = ['sample1', 'sample2', 'sample3'],
weights = [1, 2, 3])
# compute the difference between Δ47(sample2) and Δ47(sample1)
# and the corresponding SE:
mydata.sample_average(
samples = ['sample1', 'sample2'],
weights = [-1, 1])
Another option is to use D4xdata.combine_samples()
which specifically designed to compute average Δ47 values for bins of samples. This function also returns the full covariance matrix for these binned Δ47 values. For example, if samples foram_10
, foram_11
, and foram_12
are all Holocene and samples foram_18
, foram_19
correspond to the LGM, calling combine_samples(dict(Holocene = ['foram_10', 'foram_11', 'foram_12'], LGM = ['foram_18', 'foram_19']))
will return the average Δ47 value for the Holocene
group of samples, the average value for the LGM
group, and the 2x2 covariance matrix of these two averages. This is how I would personally approach your benthic species example.
from d47crunch.
Hi again Mathieu!
After all this time I'm taking another look at what you proposed here. I've now managed to calculate clumped values for each 2-cm sample separately. To combine them, I first had to create a massive dict that contains all 1335 samples. I first tried to read in a csv that links the bin to a sample name, but struggled too much and in the end made a massive dict definition just using some search/replace on a csv export.
Now I managed to use combine_samples() with this dict, and it returns, as it says on the tin:
- the list of group names
- an array of the corresponding Δ4x values
- the corresponding (co)variance matrix
but now I'm not sure how to combine these again so that I get a table of:
| bin id | D47 average | D47 95% CI |
Basically I don't know how to calculate the 95% CI from a NxN covariance matrix and how to export the stuff to a csv.
I also noticed that the averages are different from the ones that I get in my R script, but this may be because of the ETF pooling that you do, or the offset correction that we do. Will have to look into that part later.
Any help would be appreciated! :)
from d47crunch.
Related Issues (15)
- decide on how to make plotting interfaces consistent HOT 5
- add a **see also** section to readme HOT 2
- clumpycrunch formatting
- issue with trying to set parameter b to 0 in standardize(constraints = ...) call HOT 7
- D47crunch name HOT 2
- missing depency: rich? HOT 1
- UTF-8 encoding of csv HOT 4
- Output temperature in Table of Samples? HOT 2
- `indep_sessions` standardization is not working HOT 1
- Update module-level table functions for Δ49
- Implementation of Δ48 standardization HOT 6
- crunch throws an error with failed d45 type HOT 11
- standardize throws an error because of package lmfit HOT 5
- `standardize` is slow with large datasets HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from d47crunch.