The mgcv-esa-workshop from eric-pedersen

Error when compiling exponential distributions slides

@eric-pedersen, it seems that there is some error when I get to line 315 of the beyond the exponential family slides.

Error: invalid subscript type 'double'

Any ideas? My pipe skills are not as good as yours it would seem!

Need feedback on extended example

Hi folks, @dill @gavinsimpson

I just pushed a commit with a new extended example for the second part of the day, focusing on non-linear time series data. The compiled script is "example-nonlinear-timeseries.html". It's partially based of the nonlinear trend analysis from Gavin's blog, and partially off the nonlinear analysis of the lynx data set, derived from Cosma Shalizi's blog. I still have to add a reference section to acknowledge both of those; that's on my todo list.

I was looking for feedback on it. Does this seem like a reasonable format for the extended examples? Does it need more explanatory text? Do the exercises make sense?
Also, putting this together took a fair bit of time, so we may need to trim down the number of extended examples back a bit.

Template for slides

Can one of you, perhaps @dill as you have used the HTML slide classes in RStudio more than me, push a file with a basic YAML header with everything needed for using the HTML slides class we want to use for the materials?

If we're not using the same slide classes then we can ignore this and I'll just use my metropolis-themed beamer slides template.

Example

@eric-pedersen here are some things that we talked about... on the examples:

Could we use the BBS data for the spatial and time series data, so that we did:

time series at one site
spatial snapshot at one time

then we don't have to introduce a bunch of different data sets? (Need to check with Dave Harris about modelling this -- ignore detectability.)

Do you have strong feelings about this?

Things to talk about, 3 August

Just wanted to put together an approximate agenda for this afternoon...

Please add any additional stuff you think we need to think about!!

This repo needs a LICENSE

I think GPL >=2 for code and CC-BY for the slides text? Maybe everyone else disagrees?

what do we actually have to do?

I guess it's important to also know what ESA want from us... From their call for proposals, they would like to know:

Title of the session
Description of the session (appears online only; 250 words max.)
Summary sentence (appears in print only; 50 word max.)
Name and contact information (affiliation, email) for the lead organizer and any co-organizers
Minimum and maximum number of participants (to assist in room assignment).
Requested scheduling
Any additional A/V equipment (standard A/V setup is a screen, LCD projector, and laptop)
Room set-up desired: theater, conference, hollow square, rounds, other
Food and beverage requests
Underwriting of workshop costs by a group or agency
Is the session intended to be linked to a scientific session?
Is the session intended to be linked to a business meeting or mixer?
Describe any known (workshop/event) scheduling conflicts (what should it follow/precede/not conflict with?)

Should we do a quick mailout

To, say, ECOLOG to advertise the course. Just saw one pop up and realised we've not done much advertising so far...

Happy to write some copy for it, suggestions welcome.

Ask for Simon's blessing

It would be polite to ask Simon Wood if he minds this happening and see if he would like to be involved.

Equipment

If there isn't wifi, should we plan for this problem and bring some USB sticks/drives with CRAN mirrors on or somesuch? Maybe packrat can help? I also have a spare wireless router I can bring.

Time schedule

According to https://github.com/eric-pedersen/mgcv-esa-workshop/blob/master/course_outline.md we think everything will happen in the morning... Obviously this is wrong...

Currently we have 2x 4 hour blocks: 8am-12pm and 1pm-5pm set up. I think @gavinsimpson and I implicitly thought of the following structure (tell me if I'm wrong Gavin!):

Morning

Intro (what is a GAM etc)
Model checking
Model selection
Beyond the exponential family

Afternoon

Extended examples/demos (all)
Smoother zoo

Does this make sense?

If so, we had previous attributed "1 session" to the intro and "1/2 sessions" to the other three sections in the morning. If we then split 4 hours into 2.5 "sessions", we could get to 1.5 hours for the intro, then, 30-45 mins for the other three, we'd have at least a 15 min coffee break. That being said, I think that leaves things a little tight, especially if we want to have some time for practical exercises. I'm going to try to put my intro slides together this weekend/early next week, so I'll have a better idea of how long that first part will be soon, then we can re-jig a bit from there.

Afternoon I think is a bit simpler, as I think the smoother zoo stuff is probably an hour tops, the rest takes as long as it takes but we need to ensure there is plenty of coffee!

Feedback on proposal

I've committed my changes. Sorry if it seems like a changed a lot -- I liked the structure, I just tried to tighten things up a little, given our limited word count. I hope this doesn't cause any offense! We now come in at 217, so you can add more details if you think I cut too much! I'll add other thoughts to #2 now...

website - add instructor info

At the moment I'm all alone on the instructor page.

Please add yourselves :)

Software requirements

Deadline lunch EDT 4 August

Up to date R
RStudio beneficial
latest mgcv
ggplot2
dplyr
tidyr

Workshop plan; topics, ordering, etc.

A rough outline and ordering of topics. This is just my basic off-top-of head idea so I won;t be offended if you don't agree.

I was thinking that the theory with some small examples to illustrate and practice would take the morning (given those are often shorter slots, say up to and including Model Checking. The rest is for the PM part with more practical/hands on stuff.

Introduction
- what are (generalised) additive models?
- what are splines?
- what is penalised regression?
- Is this where we discuss basis-penalty setup?
Basic model fitting
- s() and its arguments
- (What the heck is k if you don't specify it?)
- Intro to some of the main functions.mgcv methods people will use later (gam(), anova(), summary(), plot(), ...)
Smooth toolbox
- Introduce the basic types and variations of smoothers
  - Thinplate splines
  - Cubic splines (& cyclic versions)
  - P-splines
- 2-d isotropic smoothing via s()
- tensor product smooths
- smooth-factor interaction (by terms)
- random effects splines
Model checking
- gam.check()
- concurvity
- randomised quantile residuals
- using gams for testing nonlinearity in other models
Model selection
- shrinkage via shrinkage smooths
- shrinkage via select = TRUE
- AIC corrected for estimated smoothness params
- approximate p values here?
Extended GLM models
- Beta regression (betar)
- Tweedie (esp for continuous data with non-constant variance and zeroes)
- Negative-binomial & ZIP
- cox.ph? I think a fair number of ecologists are dealing with this kind of data
Extended examples/demos
- Spatial modelling
- Time series modelling
- Spatio-temporal models
- ~~Location-scale models~~
Other stuff
- type = "lpmatrix"
- paraPen
- Markov random fields
- Functional data
- ~~Mixed effect models via gamm4~~
- ???

Comment, but also (if you can, if not @eric-pedersen can you make David and I have admin/commit rights to this repo only?) if you want to suggest things for removal, use strikethrough (e.g. ~~text~~ so we can discuss if someone has strong desire to include a topic)?

eric-pedersen / mgcv-esa-workshop Goto Github PK

mgcv-esa-workshop's Introduction

mgcv-esa-workshop

mgcv-esa-workshop's People

Contributors

Stargazers

Watchers

Forkers

mgcv-esa-workshop's Issues

Recommend Projects

Recommend Topics

Recommend Org