amirmasoudabdol / sam Goto Github PK

SAM is a modular, flexible, and extensible simulation framework for systematically studying researcher's questionable research practices as well as journal's questionable review practices.

Home Page: https://sam.amirmasoudabdol.name/

License: Apache License 2.0

CMake 0.23% C++ 99.67% C 0.10%

simulation-framework monte-carlo-simulation meta-analysis questionable-research-practices agent-based-simulation simulation

sam's Introduction

SAM: Science Abstract Model Simulation Framework

SAM is a modular, flexible, and extensible simulation framework for systematically studying researcher's questionable research practices as well as journal's questionable review practices. SAM is accompanied with Frodo. Frodo is a small project written in GNU Make and it is designed to facilitate the process of testing, and managing SAM projects.

You can find more about SAM, on our website.

You can find our pre-print on PsyArXiv.

sam's People

Contributors

Stargazers

Watchers

sam's Issues

Start with randomized seed if it's not provided!

Differentiate RNG Engines for data generation and drawing new data

In main, removing ExperimentSetup will break the code

This is very strange to me, even removing an unaffected if statement involving experimentSetup will break the code and nothing will be shown!

Add true_nobs to the ExperimentSetup

Basically I would like to treat the nobs similar to the true_means. This allows me to setup an experiment with different number of observations per group.

I need to make sure that I'm using the measurements.size() when I'm running any statistics
- It might be a good idea to have a vector of nobs and make sure that it's updated.
One issue that I faced in the first try of implementing this was the fact that all mvnorm return a vector, and I usually return nobs vector of random number and assign them to their corresponding groups. If nobs is different in each group, then this method will lead to some throw away numbers.

Save the Config file alongside the simulation

I can even name the output file with its config file. Basically, after computing everything. I should get a random name, save the JSON with it and then save the output file with the same prefix as well.

Refactor the LatentModel

The problem is the fact that I'm using different names and indexes for means, sds, etc. when it comes to the Latent Model, while I should use the same variable names, true_means, ... They will correspond to factors but with the same name. The only extra thing that I need is the covariant matrix of error terms.

Adaptive change of number of observation is not working with LatentModel

Somewhere, one gsl_vector or gsl_matrix is being initialized incorrectly.

Improve the SelectionStrategy

The default selection strategy is somehow hack-y when it comes to considering the side of the effect. I need a better representations. Currently, it can either accept Positive or Negative results, but I want to allow it for Natural selection as well, when it actually doesn't care.

Perhaps I need an enum for the side. It will make things more readable.

Fix the Outcome Switching

Since I've changed the Hacking Mechanism, Outcome Switching is not working properly. Technically, it does nothing. This is also a side-effect of separating the decision process from the hacking routine. One solution would be to replace the decisionStrategy but outcomeSwitching since it's technically a decision strategy.

Test an experiment with Multivariate parameters

Calculate and assign the value of sig in Submission

I think the class constructor can calculate it in place, and later on I can query it with an another method like isSig().

Unpredictable Outcome

I'm noticing a very strange bug where the output of CSV files are always nan or undefined. This happens sometimes when I run the simulation for more than one iteration as well but I don't know where it comes from. I have a few ideas:

Investigate the CSV output with one iteration
Investigate the construction and deconstruction processes in the main simulation loop

Use fmt package for formatting names, and strings

Here is the fmt package. I can use this for generating the file name, etc.

Fix the random output!

For some reason, I'm getting some random numbers on each run of LatentModel. I think I'm missing something around GSL objects. There should be some sort of memory leaks somewhere!

0, 0, 2.05456e+75, 5.52879e+77, 0,
0, 0, 1.23501e+76, 4.48031e+77, 0,
0, 0, -2.24039e+76, 4.04504e+77, 0,
0, 0, -1.45269e+76, 5.52966e+77, 0,
0, 0, -2.07961e+76, 4.01971e+77, 0,
0, 0, 5.43956e+75, 4.23671e+77, 0,
0, 0, -3.99033e+76, 5.56146e+77, 0,
0, 0, -3.47867e+76, 4.96207e+77, 0,
0, 0, 7.20501e+75, 3.80143e+77, 0,
0, 0, -1.10645e+75, 3.64027e+77, 0,
0, 0, -1.18544e+76, 6.17301e+77, 0,
0, 0, 1.46627e+76, 3.4118e+77, 0,
0, 0, 1.3695e+76, 4.96148e+77, 0,
0, 0, -3.14138e+76, 5.12844e+77, 0,
0, 0, 1.74546e+76, 3.35249e+77, 0,
0, 0, -4.6738e+74, 3.96898e+77, 0,
0, 0, -5.71553e+75, 4.80193e+77, 0,
0, 0, -3.9907e+75, 4.9927e+77, 0,
0, 0, 6.60881e+75, 3.43659e+77, 0,
0, 0, 1.84006e+74, 6.81138e+77, 0,

Handle the case where Covariance matrix is not given

So, if user does not provide a matrix, then I need to make it, based on the --cov-const.

Redesign the DecisionStrategy

Decision Strategy is implemented in a hacking way and the whole process of making a decision and submitting the final decision by the Researcher is rather unnatural. I think I have a better idea how to makes this working. This will not be a final solution but it'll be sufficient for now. See also, #7.

Journal's Decision Making

There is an overlap between the decision making of the Journal and Selection Strategy. Somehow the line is not very clear and it's not clear who does what. Selection strategy should be able to alter _alpha or _side but it should mainly comply with Journal's value. Currently, I'm mixing these up and I need to change it.

I also noticed that pubbias is a parameter of the selection strategy and not journal. Again, Selection Strategy can alter it but its Journal's parameter. I guess so.

Figure out why I cannot build with -O2

It seems to be a problem with the flatten function.

Simplify the Journal constructor

I think some of the parameters, like, pubbias needs to go to specific SelectionStrategies.

Debug the LatentModel

I think I broke something!

First submission to journal is `null`

Probably some initialization problem.

0, 0, 0, 0, nan, nan,
0, 0, 57, 0.514932, 1.62377, 0.106822,
0, 0, 63, 0.296357, 1.66123, 0.100488,

Use `std::inserter` to fill a 2D array with random number

Since gsl_ran_multivariate_guassian returns a vector but I need a matrix, I thought maybe I can generate many vector, stitch them together at the end of each other and reshape it into a matrix of a form that I like. Apparently, std::inserter can do that.

Smarter SD and COV input handling

If user provides any of the covariant matrices, I need to assume that he wants to be specific and don't consider the given --sds, either for Fixed and Latent Model.

ExperimentSetup constructor should return an error if it cannot find an required item

I had a bug where the constructor couldn't find the --loadings but I didn't get any error since the if was complete. I mistakenly defined the loadings as --loading in the JSON.

Validate the JSON config file

Make sure that `is_correlated` is set to `true` in the case of LatentModel

This will raise an exception where Latent Model cannot produce the data, since Latent Model is correlated by default. Not setting the is_correlated to true will not initialize the multivariate normal, and it'll throw an error.

Replace pointer with smart pointers

Implement GroupPooling

Use jsonnet to generate config files

Here is jsonnet.

Rethink the way that RandomNumberGenerator class handles the multivariate variables

I am not happy with the is_correlated parameters and it's really error prune since I need to be careful of it for no reason. It'd be nice if there is an another way to handle this. Maybe I can always draw from Multivariate even in the case of not being multivariate but put the covariance to zero.

Fix the Submission::nobs value

Currently, I'm using setup.nobs to set the values of submission.nobs but that's not general and rather incorrect since nobs might change in the case of Optional Stopping or Outlier Removal.

Redesign the way each class handles RandomNumberGenerators

I should generalize the handling or RNG Engines. I think it’s beneficial to pass the engine to a class but then the downside is that I need to take care that free it and … It’s nice if each class can generate its own engine only by passing the config. Maybe I can provide a second option, either being or actually including a pointer to a RNG engine.

Fix the conflict between --is-multivariate and --is-correlated

I think it should be --is-correlated

Handle the multivariate case better

Currently, there is a if in OptionalStopping where it check if the study is multivariate or not. I think this can be refactored or moved to avoid having the check every time. Basically, I know what's the setup would be, I should set it before starting the simulation.

Double check the logic of Experiment::allocateResources()

Free GSL variables

I left a few behind since I was debugging fast!

Create a Factory for DataStrategy

The tricky part of this in the current design is the fact that have to take care of the Random Number Engine. This is somehow a good idea but I need to be careful to not confuses different engines with each other. Also, I need to make sure that there are two — sufficiently distant — random number generator are being constructed in each data strategy. See also, #42.

Check for type instead of existence in the JSON

Currently, I have if statements like this if (simConfig["--cov-matrix"].is_null()) to check if a variable exists or not. In these cases where I'm actually checking if the value is a number or an array, I can use is_array() to see if the whole matrix is provided, if not, and if it's a number then I can construct it.

Implemented the alpha as an class variable in Submission

Submission record needs to know the alpha to be able to decide if a result is significant or not. If so, then I can have another function that asses the significance of submission record.

Generate random number for nobs

75% chance of it being between [20, 100) and 25% chance of it being between [100, 300).

Implement t-test and f-test similar to scipy implementation

I think this is particularly helpful since t-test is the basis of a lot of other tests, and methods.

Including true parameters in the Submission

It's starting to sound like a good idea to include some of the true parameters, like true_mean, ... in the submission record. This is mainly important if I setup the experiment with a vector if means.

I think I have a good idea actually. I think I can update the Submission as I'm massing it around. For instance, during the construction I can add the true_mean because I have access to the the Experiment. I can then, during the accepting process, at the pubbias, and other journal related values to it. Hacking status and history can be added during the construction as well, or maybe some of it by the Researcher during the hacking.

Check the use of SD and VARS

I think I made a mistake and just used them interchangeably without checking! 🤦‍♂️

RandomNumberGenerator::normal()
ExperimentSetup()

Redesign the config JSON

I can encapsulate each component parameters into their own object, e.g.,

{
  /*
    Simulator Parameters 
  */

  "Journal Parameters" :{

  },
  "Experiment Parameters": {

  },
  "Researcher Parameters": {

  }
}