The website from aml4td

better blending of overlapping colors

We should use ggblend

python computing supplement

I'd be interested in helping with a python computing supplement.

Did you have a format in mind? It seems likely that after the setup section, most sections could be tightly coupled between the R and python versions, which suggests maybe having two independent repositories isn't ideal? I think Quarto supports panelsets (as "tabsets"); that strikes me as a nice way to display the two, but also would mean both codes should be updated when a change is made.

One other thing that would be nice to decide on early: which python plotting library to use? plotnine mimics ggplot, matplotlib is already used by sklearn+pandas, others are slicker...

bib entry for `tragrisso2023`

          @kjell-stattenacity I don't have the bib entry for `tragrisso2023`.

Originally posted by @topepo in #32 (comment)

A small section on modeling philosophies

Slated to go here, talk about the full "let the machine work it out" viewpoint versus curated model development.

Add basic regression tests

A few implementations, specifically torch and libsvm/kernlab, don't have good control over random number usage. Also, we have seen differences in results across intel and apple silicon chips (but that seems to be getting better).

We have some places where we programmatically write out results in-line. If the results change, our encoded conclusions might no longer be valid.

We can take a few key objects and save their results once their usage is finalized. Then we can use testthat to verify that those results are the same (or within some tolerance). Since the project is almost structured like an R package, this means that we can use devtools::test() to check for consistency of results.

          @kjell-stattenacity is `miller1984selection` the best reference for FSA and interactions?

Originally posted by @topepo in #32 (comment)

the overfitting chapter (e.g. as complexity ☝️, bias 👇)
resampling methods
ensembles
regression performance (MSE decomposition)

Should we have an initial section on it though?

Some questions/comments on Categorical Predictors

The example with each agent working with a single customer type introduced in 5.2:
1. I think the row-wise sum comment could use some clarification; it's the sum among agents with a given customer type, and the single customer type column?
2. Later, in 5.4.3, the example is reused, but I think the language is stronger: "agent was aliased with the customer type" to me means there's a one-to-one correspondence rather than the many-to-one relationship I think the original insinuated. And in a one-to-one relationship, the effect encodings will end up being identical, so the argument fails. Separately: can we add a ref-link?
Figure 5.1 typo "distirbution"
In 5.4, I would expect to see some mention of coarsening the categories according to domain knowledge (e.g. states into regions). Maybe also model-based coarsening that uses other predictors?
The Cerda & Varoquaux citation seems to deal more with encodings that take the string nature of the predictor into account, with a hint of natural language processing to it.
In 5.4.2, I'm not sure whether adding a -1 to the hashing values leads to "fewer collisions"; it depends on what exactly you mean by a collision, and I'm not familiar with the cryptography literature to say. But in a parametric model, it's still enforcing some arbitrary constraint.
The intro to 5.3.2 says "different" supervised tool, but it's the only supervised tool in the chapter.
In 5.5, I'd like a small note about integer-encoding the values being reasonable for certain models. (Again, "will be discussed more later", but a preview would be nice.)

aml4td / website Goto Github PK

website's People

Contributors

Stargazers

Watchers

Forkers

website's Issues

Recommend Projects

Recommend Topics

Recommend Org