vrooje / gzcandels_datapaper Goto Github PK
View Code? Open in Web Editor NEWCollaborative paper writing. Data release paper for Galaxy Zoo: CANDELS
License: MIT License
Collaborative paper writing. Data release paper for Galaxy Zoo: CANDELS
License: MIT License
It feels a little glossy right now.
GZ references, of course, but also non-GZ references.
Hoping some of the CANDELS team will also point out where I've accidentally omitted seminal references to their papers.
There isn't one, and there should be. At what redshifts are we resolving what features? What constraints does a "smooth" classification put on feature sizes? For extended sources, what does a plot of size versus p_artifact look like?
I'm thinking we should start the "Use of classifications in practice" section with this, as in "this is overall, but it contains no appropriate selections on redshift, luminosity, mass, etc.; here's what you do if you want to do a specific study".
What do you think? @willettk
e.g. "clean" sample of disks selected via various disk features, but with clean-smooth B/Tot < 0.5 as well?
Check that we haven't used any of the following terms wrongly (with what we should use in parentheses):
Also check to make sure all these easily-confused terms are being used correctly:
In a way that still looks like a coherent design but preserves the icons.
Send emails asking people for their acknowledgments
I have an Einstein acknowledgment now...
At the end of each row of images showing various examples of p_values for different responses, we should show small histograms plotting the distribution of those responses. It wouldn't take up much more real estate and it would add a lot of value.
Something like @CKrawczyk's node tree but one diagram for the whole population.
Coleman, how hard would this be for you to produce?
it shouldn't be that many, but verify it.
pick one and stick with it.
Now that we have new weighted classifications to T00, check the definition of "clean" samples to see if they need to be re-defined.
Spearman's is more robust even if the data is not that well-behaved (e.g. not Gaussian distributed etc). And the values are nearly identical, so report the one that's marginally more appropriate.
Early on Fig 4 is referenced as showing the colour images. It doesn't. Also you can't point to Fig 4 before Fig 1, 2, 3. MNRAS won't allow it....
I have spot-checked my way through several consistency calculations, but am I really really sure of this? the consistency distribution has 5 people with consistency < 0.2 and otherwise cuts off really sharply at just above 0.3. GZ2 didn't seem to do that. That's troubling.
Would it make sense to add a histogram of number of classifications per user in Section 3.3? I like to show those in talks....
What's the ultimate release format of the data? The paper says it'll be on http://data.galaxyzoo.org, which we should do, but I want to include it in more formats for posterity's sake.
Possible options:
And K14 --> K15 in the text.
The current F6 is not successful at convincing people of the quality of fits to \Delta f_value as a function of f_value and surface brightness. A simple fix is to just plot the planes as two line plots for each response (and probably include fewer responses), but we could explore having interactive 3D plots (which apparently MNRAS supports). Does anyone know how to do this? @willettk @rjsmethurst @chrislintott @CKrawczyk etc?
If not I'm tempted to just do the 2 2D plots because I don't want to get bogged down in this.
I suspect the referee will dislike Table 5 as much as I do now that I'm in the post-submission clarity phase. Since we aren't publishing B/Tot ratios in this paper (those are for @BorisHaeussler to publish as he sees fit), those galaxies should just be identified via a flag in the main catalog.
Did we look at the CANDELS classifications for signs of errant/non-human behavior aside from the star/artifact question? If so, would it be quick to run? I think it'd be good if we had the ability to write a sentence in 3.4 akin to "We have also analyzed the percentages of the remaining top-level categories (smooth and features/disk) for all users and find no/some/lots of evidence for bot-based classifications".
\begin{table*} etc. doesn't help as the number of line breaks is set by the number of responses. It needs to be split across 2 columns.
Need to check that I've written up the correct details of how the subject images were created (linearity, stretch, etc.). Those are in Section 2.1.
Also check with Jeyhan the status of the UDS classification paper (listed as in-prep in S4).
I'd like to see more on what Wisnioski found about the dynamics of high z galaxies in the section on Smooth discs.
Pick ranges of dM* and compute smooth vs featured fractions to add to the text describing the sankey diagram... could even be a new plot. Depends what the referee says.
Use Willett et al. (2013) and Lackner & Gunn (2012) to add the z=0 comparison to the figure.
I don't suppose @willettk might want to take this on?
(I have the Lackner & Gunn tables if they're not easily accessible online.)
As in Willett et al. (2013), we should show a couple of example subjects in a partial table. In addition to #30.
For ease of usability of the classifications, some CANDELS team members have pointed out it would be useful to add some additional discussion to S4 discussing how to translate between classification systems, and when this is and is not a good idea.
This might be a good opportunity to go into further detail on e.g. how to use both together to select merging systems, and how the differences in clumpy classifications might be used to do interesting science.
Relate the smooth disks to kinematic downsizing via S14?
Also there's a SINS ref I should include too.
(NT)
(now that we have new weighted classifications for task T00), re-check numbers, particularly in:
When compiling, I get this error:
LaTeX Warning: File `consistencies_iterations2.eps' not found on input line 516
now that we have a new set of weighted classifications for task T00, double-check (or just re-make) Figures:
We quote a lot of p-values in Section 4 for the CANDELS team-GZ comparisons. I think an upper limit (p < 2e-16) is more appropriate than putting p~0, since posterity won't necessarily know what our machine precision was.
I think we should label the rows in FIg 4 - perhaps in the white space in each histogram - with shorthand of the question being answered.
which... is a lot. Fun!
Suggestions for investigations e.g. mergers, clumpy galaxies, further investigation of smooth disks?
In Section 3.8 we say that we do not have wide-field depth classifications for 8130 subjects with deep exposures. We could collect them.... Should we?
Make space in the introduction to note the work GZ has done with Hubble data already. Melvin+, Simmons+, Cheung+.
Mostly this is covered by other items, but just double-check to make sure there's none left before submitting.
Not sure how that happened, but I've double-checked and it's fine in the data itself... just the table & figure are wrong.
(In addition to the full-size plot - zoom in somewhere to show convergence.)
Side note: I've always been sort of uncomfortable with the way the convergence here is quite sudden - the consistencies change a lot between the second and third iteration and then there is mostly no change between 3-4 and 4-5. I've double-checked it all and if something went wrong, I can't find it... so I think it looks real.
Just noting this, though, in case someone can a) find the trouble, or b) reassure me...
Dan McIntosh pointed out that there are now some parameters that add value to the K15 visual classification raw fractions, e.g. p_merger that combines the various merger votes into one value that goes from 0 to 1, a p_diskiness and also an artifact metric. These would be really interesting comparisons and could resolve some of the issues we had with combining parameters. It's worth exploring adding them as an additional plot. (Dan has sent me the info as I think some of these are currently unpublished).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.