Comments (23)
For a given general multiple variate linear model with interactions, e.g. y ~ a1 + a2 + a1 * a2
do you want all variables transformed or to allow the user to select which variables to transform?
For example, do we only allow:
y ~ a1^2 + a2^2 + a1^2 * a2^2
or to allow this:
y ~ a1^2 + a2 + a1^2 * a2
?
from agroft.
If I remember correctly, if you are including quadratic effects in the model, you'll also need to include all non-quadratic effects as well as terms.
y ~ x1 + I(x1^2) + x2 + I(x2^2) + ...
Also, best practice to surround the transformations with the isolate function I()
.
from agroft.
Thanks, Kyle, But my question was more about letting users set individual variables to transform or simply transforming all of the variables. I don't know what is common or necessary for agricultural data.
from agroft.
Yeah, I know. I just added my two cents because a statistician had a stern
talking to with me about that once.
On Fri, Jul 17, 2015 at 8:04 PM, Jason K. Moore [email protected]
wrote:
Thanks, Kyle, But my question was more about letting users setting
individual variables to transform or simply transforming all of the
variables. I don't know what is common or necessary for agricultural data.—
Reply to this email directly or view it on GitHub
#13 (comment)
.
from agroft.
I'll definitely use your two cents :)
I'm neither a statistician nor an expert at R, so any tips help.
from agroft.
@msimmond Just a reminder here, that I need some sort of example of what you want to do with transformations.
from agroft.
@msimmond This is still cloudy to me. Now that we are going to the 9 scenarios, where do transformations fit in? Do we need more scenarios or do you want this to work in a more general way, i.e. can be applied to any of the 9 scenarios?
from agroft.
I want the transformation options to be applied to any of the 9 scenarios. It will need to involve the user in the decision process of which transformation (if any) will be applied to data. The decision process will involve inspection of (1) shapiro-wilk test for normality of residuals, (2) levene's test for homogeneity of variance (HOV), (3) 1-df Tukey test for non-additivity (although, #3 is NOT run for CRD), (4) 'res x pred' plot, and (5) table of LS Means, SD, & Variances.
NOTES:
(2) HOV is the most important assumption (confirmed by non-significant Levene's test). Thus, no conclusions should be drawn from the ANOVA if this assumption is violated.
If Levene’s test is significant and transformation of the dependent variable does not correct it, a variance-weighted ANOVA (e.g. Welch's) can be performed to test for differences among group means.
Here are some references: http://www.unh.edu/halelab/BIOL933/labs/lab6.pdf, http://www.unh.edu/halelab/BIOL933/lectures/lect_11_reading.pdf
from agroft.
Ok, if we want to apply transformations to any of the 9 scenarios we'll need a few examples (at least) for me to get an idea of how to generalize it in the app. Maybe once you build the model formula, you can select the transformation. Will this transformation only ever happen on the dependent variable?
from agroft.
After the model formula is selected, and the lm(), anova(), shapiro.test(), leveneTest(), and the Tukey 1-DF Test on the squared residuals, and plot(resids ~ preds) are run, the user should be prompted if a transformation of the dependent variable (only case) is needed. If so, user proceeds to select transformation, and the previous code is run again on the transformed variable. At this point, they can check the output for improvement. There should be an option to try a different transformation, as well as to use Welch's variance-weighted ANOVA in the case that Levene's test cannot be corrected with transformations. I think the Ad-Hoc analyses of LSD should be on another tab to make it obvious that this is a next step once the model and data are all good.
from agroft.
The transformation selection could be before the calls to lm/aov, shapiro.test, levenTest, Tukey, etc. If it is, then we don't have to make the code run through a second time, the user would do it manually. So, for example, the user would first use the default transformation (None) and they'd see all the output to the above functions and notice that they don't look good. So then they'd go back to the transformation input box and change the transformation. They'd have to then press "run analysis" again to see the new results of all the function calls. If we do it this way, then it is on the user to rerun the model with the transformation instead of having code that reruns it automatically when the user selects a transformation at the end of the analysis.
from agroft.
Welch's variance-weighted ANOVA
This is something new, and outside of the scope of what we've already agreed on doing. Maybe put this as "icing on the cake".
I think the Ad-Hoc analyses of LSD should be on another tab to make it obvious that this is a next step once the model and data are all good.
Sounds good.
from agroft.
OK, icing, but this is a common thing you have to do with ag data. I can look into the code for it.
from agroft.
Also we'll need an option to detransform data in case user wants this in the post-hoc plots (e.g., bar graph showing de-transformed treatment means).
from agroft.
So if you transform the data to create the model you may need to detransform it to do the post hoc tests?
Show me examples of the Welch's thing and some transformations and I can then see how complex it will be to fit it all in to the app.
And just to be a curious devil's advocate here...why not just teach the workshop attendees how to do this stuff in R instead of wrapping this app around limited functionality?
from agroft.
For reference this is Maegan's sample code for finding an exponent for a power transformation (this works with the RCBD two var example):
my.data$merged_treatment <- paste0(my.data$clone, my.data$nitrogen)
as.factor(my.data$merged_treatment)
str(my.data)
means <- aggregate(my.data$yield, list(my.data$merged_treatment), mean)
vars <- aggregate(my.data$yield, list(my.data$merged_treatment), var)
logmeans <- log10(means$x)
logvars <- log10(vars$x)
power.mod<-lm(logvars ~ logmeans)
summary<-summary(power.mod)
#identify the slope
summary$coefficients[2,1]
#calculate the appropriate power of the transformation, where Power = 1 – (slope/2)
power <- 1-(summary$coefficients[2,1])/2
power
#Create power-tranformed variable
my.data$yield<-(my.data$yield)^(power)
from agroft.
Transformations are now supported in the app. Closing.
from agroft.
Has the transformation back to the original units been added yet?
For log-transformed variable X ==> 10^X
For power-transformed variable X with exponent 'a' ==> X^-a
On Thu, Aug 13, 2015 at 6:14 PM, Jason K. Moore [email protected]
wrote:
—
Reply to this email directly or view it on GitHub
#13 (comment).
Maegen Simmonds, Ph.D.
[email protected]
707-694-6079
from agroft.
No, that is listed in a different issue #41 and I don't really understand what you want here. There is nothing to change back. There are columns in the data frame for the original variable and the transformed variables. I don't change the original column, I just add new columns.
from agroft.
We just need the LS means for the treatments (from the LSD table) to be
transformed back to the original units. These values should be made visible
to the users and also the means that will be used in the bar plot. Does
that make sense?
On Thu, Aug 13, 2015 at 9:51 PM, Jason K. Moore [email protected]
wrote:
No, that is listed in a different issue #41
#41 and I don't really
understand what you want here. There is nothing to change back. There are
columns in the data frame for the original variable and the transformed
variables. I don't change the original column, I just add new columns.—
Reply to this email directly or view it on GitHub
#13 (comment)
.
Maegen Simmonds, Ph.D.
[email protected]
707-694-6079
from agroft.
Adding what you want to the examples we've been working on would make the clearest sense. If you can write the example code, I'll know exactly what to do.
from agroft.
Also try out the app and let me know if it is doing transformations correctly. In both of the examples you added transformations to, I don't think they were correct because you transformed the same variables in this fashion: sqrt(log10(y))^power
.
from agroft.
I started adding it to #34, but need your help.
from agroft.
Related Issues (20)
- make sure type III SS are provided in all ANOVA tables (Not type I SS) HOT 1
- comments on current app HOT 2
- CRD1 does not show all levels in bar plot HOT 1
- bsCollapsePanel displays the id
- initialize_AIP unneccesary HOT 1
- May need to change split plot from aov to mixed effects HOT 10
- Enable closing the app on browser close for users but not for developers HOT 1
- Display the LSD.test() code to the user in the app on the post hoc tab
- The examples no longer reflect exactly what happens in the app
- Update and create new help text
- Enable excel sheet upload in addition to csv
- Can't load AIP app HOT 9
- 'C:\Program' is not recognized as an internal or external command, operable program or batch file. HOT 1
- `yield.pred.sq` row missing from Tukey test? HOT 8
- post-hoc analysis error `missing value where TRUE/FALSE needed` HOT 3
- Empty interaction plots HOT 1
- App crashes after running an analysis then choosing new data
- Move post-hoc barplot labels near top of plot
- downloaded pdfs of powerpoint files are corrupt
- All non-DVs should be factors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agroft.