Code Monkey home page Code Monkey logo

ezr's Introduction

Easier AI (just the important bits)

© 2024 Tim Menzies, [email protected]
BSD-2 license. Share and enjoy.


For over two decades, I have been mentoring people about SE and AI. When you do that, after a while, you realize:

  • When it is all said and done, you only need a dozen or so cool tricks;
  • Other people really only need a few dozen or so bits of AI theory;
  • Everyone could have more fun, and get more done, if we avoided the same dozen or so traps.

So I decided to write down that theory and those tricks and traps (see below). I took some XAI code (explainable AI) I'd written for semi-supervised multiple-objective optimization. Then I wrote notes on any part of the code where I had spent time helping helping people with those tricks, theory and traps.

Here is how the notes are labelled. For way-out ideas, read the 500+ ones. For good-old-fashioned command-line warrior stuff, see 100-200

  • Odd number items are about SE;
  • So even numbers are about AI;
Anit-patterns
(things not to do)
SE system SE coding AI coding AI theory
(standard)
New AI ideas
00 - 99 100 - 199 200-299 300-399 400 - 499 500-599

One more thing. The SE and AI literature is full of bold experiments that try a range of new ideas. But some new ideas are better than others. With all little time, and lots of implementation experience, we can focus of which ideas offer the "most bang per buck".

Share and enjoy.

Setting Up

Get some example data

Installation

First get some test data:

git clone http://github.com/timm/data

Just grab the code:

git clone http://github.com/timm/ezr
cd ezr/src
python3 -B ezr.py -t path2data/misc/auto93.csv -e all

Or install from local code (if you edit the code, those changes are instantly accessible):

git clone http://github.com/timm/ezr
cd ezr
pip [-e] install ./setup.py
ezr -t path2data/misc/auto93.csv -e all # test the isntall

Install from the web. Best if you want to just want to import the code, the write you own extensions

pip install ezr
ezr -t path2data/misc/auto93.csv -e all # test the install

Running the code

This code has lots of eg.xxx() functions. Each of these can be called on the command line using, say:

 python3 -B ezr.py -e klass      # calls the eg.klass() function

ezr's People

Contributors

andre-motta avatar timm avatar timmenzies avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

ezr's Issues

quick tut on kmeans

should be 6 lines

use same hook thing as naive bayes. if done incrementally, could run $k \in (2,3,4,8)$ and kernel $\in (trig,uniform)$ all at the same time

need a core idea doc

surfing the long tail

where there is little date

compression is intelligence

  • 1GB picture of a straight line can be condesed to m,b o y=mx+b
  • better yet, condense to two end points
    • now we have anomaly detector (anything off the line between them, anything away from our two poles)
    • now we have runtime certification: summarize the training data, complain when runtime data falls outside the space of things seen during training
    • and now we have a compression algorithm (anything new thata aint an anomaly can be ignore)
    • and now we have on-line learning. if anomalies, recluster that region of the daa

of course, in practice, we'll need more than 2 points. care to guess how many? often less than 100 (to map out 50 lines)

less is more

  • not the best thing
  • but things statisitcally indistinguishable from the best
  • e.g.
    • $N(\mu=0, \sigma=1)$ effectively runes -3 to 3.
    • Cohen's rule says anything closer than $0.35*\sigma$ is different by a small effect of less
    • $0.35/(3 - -3)\approx 5$%. so there are only 17 statistically significant different solutions
    • according to Hamlet the number of random samples needed to be 95% certain of finding something with
      p=0.05 is
      • $n(C=0.95, p=0.05) = \log(1-C)/\log(1-p) \approx 49$
  • And if had some smart hueristic to sort that being better than that, we apply $\log_2$ to the above.
    • so, with some smarts, we can explore the world with $\log_2(49)\approx 6$.

finish the repo

image

  • in /readme.md, the scripts need to be listed
  • in /docs, please delete all those *.html. and if those image files are not used, they can go too
  • please delete /erz
  • please get rid of /*.html
  • inside src, is there crap that can be pruned?
  • please add pdf of the emse paper to /docs

er... what else?

please port paper to arxiv

i found out last night that arxiv does not support minted. sigh. so its back to lstlistings for the paper

so can u switch minted ==> lstlistings in the paper? the challenge will be keeping all the letters in circles appearing in the revised doc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.