Code Monkey home page Code Monkey logo

website's People

Contributors

amy-palmer avatar coatless avatar kjell-stattenacity avatar krz avatar syclik avatar tomsing1 avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

website's Issues

python computing supplement

I'd be interested in helping with a python computing supplement.

Did you have a format in mind? It seems likely that after the setup section, most sections could be tightly coupled between the R and python versions, which suggests maybe having two independent repositories isn't ideal? I think Quarto supports panelsets (as "tabsets"); that strikes me as a nice way to display the two, but also would mean both codes should be updated when a change is made.

One other thing that would be nice to decide on early: which python plotting library to use? plotnine mimics ggplot, matplotlib is already used by sklearn+pandas, others are slicker...

Add basic regression tests

A few implementations, specifically torch and libsvm/kernlab, don't have good control over random number usage. Also, we have seen differences in results across intel and apple silicon chips (but that seems to be getting better).

We have some places where we programmatically write out results in-line. If the results change, our encoded conclusions might no longer be valid.

We can take a few key objects and save their results once their usage is finalized. Then we can use testthat to verify that those results are the same (or within some tolerance). Since the project is almost structured like an R package, this means that we can use devtools::test() to check for consistency of results.

Dark mode theme

Although it won't affect images, we should have some css for this

additional individual transformations

For the numeric predictors chapter, we had previously talked about transformations for percentages and proportions (like the arc-sin; see this reference)

Also, describe some transformations based on conventions or scientific knowledge.

2023-12-19 release

  • review potential changes to the DESCRIPTION file
  • (optional) update R packages with DESCRIPTION file
  • (optional) renv::snapshot()
  • search for pre-processing and other formats
  • remake bibtex files (R/post-process-bib-file.R)
  • review potential changes to contributing/preface pages.
  • delete items in _cache and _freeze
  • quarto render
  • quarto preview
  • review results
  • bump the version number in DESCRIPTION
  • usethis::use_github_release()
  • quarto publish gh-pages --no-render

where to initially discuss the variance/bais tradeoff?

We'll talk a lot about model complexity in regards to

  • the overfitting chapter (e.g. as complexity โ˜๏ธ, bias ๐Ÿ‘‡)
  • resampling methods
  • ensembles
  • regression performance (MSE decomposition)

Should we have an initial section on it though?

Some questions/comments on Categorical Predictors

  1. The example with each agent working with a single customer type introduced in 5.2:
    1. I think the row-wise sum comment could use some clarification; it's the sum among agents with a given customer type, and the single customer type column?
    2. Later, in 5.4.3, the example is reused, but I think the language is stronger: "agent was aliased with the customer type" to me means there's a one-to-one correspondence rather than the many-to-one relationship I think the original insinuated. And in a one-to-one relationship, the effect encodings will end up being identical, so the argument fails. Separately: can we add a ref-link?
  2. Figure 5.1 typo "distirbution"
  3. In 5.4, I would expect to see some mention of coarsening the categories according to domain knowledge (e.g. states into regions). Maybe also model-based coarsening that uses other predictors?
  4. The Cerda & Varoquaux citation seems to deal more with encodings that take the string nature of the predictor into account, with a hint of natural language processing to it.
  5. In 5.4.2, I'm not sure whether adding a -1 to the hashing values leads to "fewer collisions"; it depends on what exactly you mean by a collision, and I'm not familiar with the cryptography literature to say. But in a parametric model, it's still enforcing some arbitrary constraint.
  6. The intro to 5.3.2 says "different" supervised tool, but it's the only supervised tool in the chapter.
  7. In 5.5, I'd like a small note about integer-encoding the values being reasonable for certain models. (Again, "will be discussed more later", but a preview would be nice.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.