Comments (3)
Here some reflections on this:
Put milestone data files from GGIR part 1 inside the package, e.g. from a couple of brands and study designs.
Agree with that. But I would first test that reading data from different brands/formats result in the same milestone data in g.part1. So, we would test the reading and the generation of the milestone data in part 1 separately in a test including all brands/formats possible (maybe with 4-hour recordings is enough for that). Once we test that the all brands can be read and the output is the same, we can proceed with just 1 milestone data from part 1, where I would use meta data from a geneactive file to include also lux variables in the output and facilitate later testing. This has the challenge that we need to store such raw data somehow, it might be heavy for an R package.
In this separate test for part 1, we should make sure that we include:
- Different data formats
- Imputation of gaps
- Appending of recordings
- Part 1 output data is similar across brands/formats (e.g., metalong column names differ between axivity (EN for euclidean norm) and actigraph (en for euclidean norm)).
- Short recordings and corrupted files are identified
- Calibration process work well (for this we might need a longer test file)
Write separate unit tests to process those milestone data in a way we would do for a project.
After thinking about this, I'm not sure this would improve the test coverage. In a project we usually select a specific GGIR configuration, but the GGIR pipeline is so flexible that using the script from different projects would result in reprocessing the part 1 milestone data an infinite number of times. A potential solution is using the default GGIR configuration over the 5 strategies in part 2 to test the interaction between the 5 parts of the package. And then test the extra functionalities separately.
from ggir.
I left out part 1 from my proposal because testing part 1 as a whole for real life data does not seem feasible: We will have to add large files to the package or downloaded them, and even if we have those large files it will take time to process them.
So, I think testing the part 1 functionalities is best done based on specific unit test to cover each single functionality separately. This is what we have been doing and we will keeping doing.
Instead, I would like to focus now on creating a more high level unit test (integration test) that runs the other parts (2-5) with more realistic study data compared with our current synthetic data or tiny example files.
Even 10 MB of GGIR milestone data is too large for a package. So, a possible solution could be to include a numeric data.frame with 100 days worth of real ENMO, MAD, nonwear, anglez, temperature, and LUX with values rounded to 1 decimal place. This would then be based on real data from a variety of studies appended to each other. Next as part of the test we could write a function to convert this data.frame in semi-synthetic test milestone data files. For example, split up as multiple recordings.
Advantages I see:
- Hopefully this offers a quick route to improving the test coverage for parts 2-5 without having to dive into every single function to search for possible opportunities to improve coverage. Ideally, running a realistic study scenario should already help to improve coverage.
- We can use this to shorten test_chainof5parts.R afterward as some of that will no longer be needed, and by that test_chainof5parts.R will become more tidy.
- These new unit-tests could even act as showcases of how GGIR can be used for new software contributors.
from ggir.
After thinking about this, I'm not sure this would improve the test coverage.
If we create a unit-test for let's say the Whitehall study then it will at the very least provide an integrated test of all functionalities they need, including LUX analysis, their specific sleeplog size, their specific, ID format, their specific way of dealing with missing files.
Here, I do not want to loop over all possible functionalities GGIR offers as then we are just doing the same as test_chainof5parts.R
. I only want to focus now on testing real life study scenarios which then could boost the test coverage but equally important helps to monitoring that the specific approaches to the data remain reproducible.
from ggir.
Related Issues (20)
- Help: I am in doubt about an object in L266. HOT 1
- Tidy up NEWS.Rd HOT 1
- Mistake in identifying vm column? HOT 1
- Porpose solution for embedding external function with outputtype = "character" HOT 3
- .Rbuildignore does not skip the gh-pages book chapters from CRAN HOT 1
- Extracting non-wear from `M$metalong$nonwearscore`? HOT 4
- save_ms5raw_without_invalid = TRUE misses last window in recording
- g.inspectfile cannot work with gz files
- Change log issues HOT 2
- relyonguider does not produce expected output
- Handle empty columns in sleeplog
- Make `.onAttach` not call `chooseCRANmirror`
- Data cleaning file night_part4 bug
- Inform user when trying to run part X when part X-1 output folder is empty.
- Rename sleepefficiency in part 5 to sleep_efficiency_after_onset
- bug in exclusion invalid days in part5
- search not working for gh-pages HOT 2
- Sleep Log: Missing Onset or Offset HOT 2
- timegap imputation for ad-hoc csv not working?
- broaden range of optional values for parameter chunksize
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ggir.