probml / pml-book Goto Github PK

View Code? Open in Web Editor NEW

4.9K 87.0 580.0 240.25 MB

"Probabilistic Machine Learning" - a book series by Kevin Murphy

License: MIT License

HTML 4.36% Jupyter Notebook 95.64%

pml-book's Introduction

"Probabilistic machine learning": a book series by Kevin Murphy

Book 0: "Machine Learning: A Probabilistic Perspective" (2012)

See this link

Book 1: "Probabilistic Machine Learning: An Introduction" (2022)

See this link

Book 2: "Probabilistic Machine Learning: Advanced Topics" (2023)

See this link

pml-book's People

Contributors

Stargazers

Watchers

Forkers

beamiter takotakot-archives osmanatam fredymad lehoanglam20000 giangdip2410 caozq19 eric701803 zhukuixi jiamim gebegb3j jawaechan cv-ip geektemo anhnguyendepocen joonvan thaonhiennguyen-erika 2427795956 1512474508 braman09 geofly1985 sd37 finemax lijush yingli2009 renhongjia nashmit 15737117639 caxqueiroz dianamilevska richardshea mbrukman bahaaibrahim3024 eccstartup shishougang yjpdccn ringwraith eduardo359 ransselected yuknight zidane-han data-analisis beckhou makhmudovmurod yibit gavinljj lzbgt abdulbasitds hieuddo yuan776 gouzidama vishalbelsare databill86 1965aafc thryce85 zhangjunhit afit-csce623-master geraintpratten m-e-rademaker renferpur jimbecoolquant nanbhas botjhonrrrrrrr littleseadog parety dbggg ivanchenph georgemanakanatas igor-kozlov philippan-tech yupenghe yulkang zkz yang-chenyu104 muxinghan cirvin4 chuanqichen chandra1123 zihangxiang sandutsar xinhen fun772283153 2d1ff1cult tht18 repos-cl tong-chen jboru grasshourse monicaio chungwy melodylail pcreamix leejwuniverse warnerchang dorghamko minsik-ai davidgzx quantumu qin-courses drishttii

pml-book's Issues

Figure 1.7 mismatch between images, captions, and section text

Edition: Dec 31, 2020
Print pages: 9-10
PDF pages: 39-40

§1.2.2.2 says:

In Fig. 1.7(b), we see that using D = 2 results in a much better fit.

However, the Fig. 1.7(b) image is a polynomial of degree 14, or at least that's what the title above it says. Assuming the figure images are arranged as intended, I think the section text meant to refer to 1.7(a), which is of degree 2, and is not currently referenced by the text.

Also, the caption under Fig. 1.7 reads, in part:

(a-b) Polynomial of degrees 1 and 14"

The polynomial in 1.7(a) is of degree 2 (and is labeled as such above the graph). I assume the caption should instead read:

(a-b) Polynomial of degrees 1 2 and 14

convert matlab scripts to python

Here is a list of the 74 scripts used in vol 1 that need converting:
https://docs.google.com/spreadsheets/d/1y0dmDcQnPFXZi05RFo4-MhfXYs5yQIZZy0b9wPexhig/edit#gid=0

Another reason why Bayes is relevant in the “big data era” and "deep learning era"

In section 2.4.6, page 40

There is another reason as to why Bayes is still relevant.

I encountered this in my professional career: if the amount of data is huge (and the features aren't very predictive), Bayes can beat other forms of ML because it's much quicker to calculate the probabilities during evaluation. (This surprised me!)

This is also true for Bayes vs. Deep Learning which can be painfully slow, even with GPUs.

Various suggestions

See:

https://docs.google.com/document/d/18jveeae1rQ90WXsFySsUtC6LyQpFyTTquEOENtosAfQ/edit

Should attribute Kant quote to his actual work (maybe)

On page 7, last paragraph, there is a quote from Kant. It would be better to also mention which work it came from. It seems that various sources (articles on Poker) claim that this quote is from:

Critique of Pure Reason

But it doesn't appear to be here: https://www.gutenberg.org/files/4280/4280-h/4280-h.htm

So, I don't know where it really comes from or if it is indeed something that Kant said. I've searched these works (from their online PDFs) and couldn't find it:

Critique of Pure Reason
Critique of Practical Reason
Metaphysical Foundations of Natural Science
Critique of Judgement

From December 31, 2020 edition.

Typo in C.2.2.3 (release December 31, 2020)

release: December 31, 2020
print page number: 794
chapter: C.2.2.3

there is typo on the third line:

instead of "such that p"
should be "such that Pr"

Comments for Chapter 7 Bayesian statistics [Part 1] (January 18, 2021)

Hi Kevin, I am sending my comments for Chapter 7 Bayesian statistics [Part 1]

cerno_comments_on_pmlv2_c7_part1.pdf

Thanks!

Cheers,

Peter

section 5.5.1, variable D not defined

Numerical application - mistakes

p.113 Fig 5.5): Mistakes about matrix
b = [-14,-6] (in your formalism you have + b.theta)
k(A) = 30.234 for A = [20; 5; 5; 2], and k(A) = 1:8541 for A = [20; 5; 5; 16]

broken cross-ref in 17.5.7

"A second problem is that the magnitude of the fc’s scores are not calibrated with each other (see ??), so it is hard to compare them."

Minor typo (Release December 31, 2020)

Hi, there is a minor typo on page 753 (line 4) in the release as of December 31, 2020:

A basis $B$ is a set of lienarly...
Should be, of course, linearly :)

Thank you for uploading the book! It's incredible. How often are you going to reupload the corrected version of the PDF?

Release 2021-01-06 Typos

Chapter 2, Fig. 2.4, likelihod -> likelihood
Chapter 3, Eq. (3.10), you can comment on the name of the function is softplus
Chapter 3, the text right after Eq. (3.54), f_{\sigma} \in R^* -> f_{\sigma} \in R^+ (or R_+, in notation chapter, you use R^+, but Z_+)
Chapter 3, Page 56, this makes a good default choice in many cases -> This
Chapter 3, Fig. 3.8a. in the legend, Student(dof 1) -> Student (dof 1)
Chapter 3, Eq. (3.89) and (3.90), replace x by y
Chapter 3, the paragraph right above Eq. (3.92), the duplicate of 'may'
Chapter 3, page 67, last paragraph, p(y|z) = N(y|Wz, \Sigma_v) -> p(y|z) = N(y|Wz, \Sigma_y)
Chapter 3, page 71, the paragraph right after Eq. (3.122), c^m -> c^
Chapter 3, page 76, Eq. (3.132), p(y_4| y_3, 2 , y_1) -> p(y_4|y_3, y_2, y_1)
Chapter 5, Fig. 5.1, Caption: a some saddle points -> some saddle points
Chapter 5, the paragraph right above Eq. (5.41): the current iterate -> the current iteration
Chapter 6, page 169, the paragraph right before Sec. 6.3.9: similarlt -> similarly
Chapter 7, the paragraph right above Eq. (7.4): regionm -> region
Chapter 7, the sentence right above Eq. (7.7): z = f(x) = y^2 -> z = f(y) = y^2
Chapter 7, Eq. (7.12) lacking '=' after Bin(y|N, \theta)
Chapter 7, "summarizing the posterior" section: everythiing -> everything
Chapter 7, Eq. (7.43), Beta(\theta| 30, 20) -> Beta(\theta | 50, 20)
Chapter 8, Eq. (8.31), lack open '(' in E[h-a)^2|x]
Chapter 9, Eq. (9.1), denominator: y = c -> y = c'
Appendix: B.4.2.2: the sentence right above Eq. (B.59) sith -> with
Appendix: C.5.2, Eq. (C.166), lacking '=' between V V
Appendix: C.6, the sentence right below Eq. (C.192), q_1 and q_2 span the space of { a_1, a_2} -> q_1 and q_2 span the space of a_2
Appendix: C.7.2, the first sentence, 'indepoendent' -> independent
Appendix: D.5.5, Eq. (D.71), p(x_1=j)p(x_2 = j-k) -> p(x_1=k)p(x_2 = j-k)
Appendix: E.2.3.2, Eq. (E.21), \partial l -> \partial l^2, Eq. (E.23), 1/(2v^2) -> -1/(2v^2)

Typo in (10.15) in book 1

a_{m} is likely a_{n}

need to add back the RKHS section

Add RKHS section to ch 17

Local minima problem isn't mentioned in Exploration-exploitation tradeoff

Page 14, section 1.4.1

The problem of local maxima isn't mentioned.

E.g., In chess, a piece sacrifice could be the right first step to a winning combination, but if the reward function is weighted too heavily for material advantage, the sacrifice path might not ever get explored. On the other hand, if reward function is weighted too lightly for material advantage, the RL would waste too time exploring blunders.

minor typos in Dec 28th 2020 Book 1 text

Dec 28, 2020

PDF Pg 39, text Pg 9
eqn 1.15 should not contain \mathbf{w}^T
PDF Pg 135, text Pg 105
missing space: "theparameter space"

certain kinds of function => certain kinds of functions

On page 3, last sentence of first paragraph, this:

certain kinds of function

should be:

certain kinds of functions

Draft: December 31, 2020

Typo in C.1.3.1 (release December 31, 2020)

release: December 31, 2020
print page number: 790
chapter: C.1.3.1

there is a minor typo on the second line:

instead of "or does not old"
should be "or does not hold"

log(p=p/1 - p) is a bit unclear

P47, section 3.1
In the last sentence between equations 3.16 and 3.17

log(p=p/1 - p)

is a bit unclear.

It's obviously: p/(1-p), but it did give me pause. (Easy fix in LaTex.)

convert TF to JAX

Here is a list of TF notebooks that need translation to JAX (using flax, or Haiku, or whatever)

https://docs.google.com/spreadsheets/d/1KBVhgiS6CtWdqNVvkXc3Fo7LhFP_0b0bxuYS_VSlHNE/edit?usp=sharing

NLL is used without explanation in D.2.2

I know that NLL is defined on page 84, but I think it needs to be explained again in D.2.2 since appendices are supposed to be somewhat standalone.

Comments for Chapter 6 Information theory (January 18, 2021)

Hi Kevin, I am sending my comments for Chapter 6 Information theory

cerno_comments_on_pmlv2_c6.pdf

Thanks!

Cheers,

Peter

Wish: table of notations and symbols

Your book does seem to use standard notation and symbols, so I'm not having trouble.

However, for utter newbies, it will be harder going for them without a table of notations and symbols used in your book.

Missing file

This file https://github.com/probml/pyprobml/blob/master/scripts/svm_classifer_2d.py
(which goes with Fig 17.19) is missing.

Bayesian stuff

A few points about Bayes stuff section 2.2 and 2.2.1, pp 21-23

I think you should mention the Frequentist approach to probability. One aspect is that Bayes will hazard an estimate to an event that has no prior examples, where a Frequentist wouldn't dare. This would be something like election results.
Sometimes, the terms in the Bayes formulas are estimates, so there can be a lot of uncertainty in the answer. E.g. how do you get an accurate value for the P(H=1)? This can be hard.
It would be nice to show the numeric values substituted in for the equations 2.6 and 2.9.
The thing that trips up people about Bayes is differences in prior probability can radically change the result. A would be a good to recalculate the COVID example in 2.2.1 for different values of (PH=1). For instance, early in the COVID pandemic, the P(H=1) might be 0.01 or 0.001. Then the probably of FP of a test will become very large. This unintuitive result catches a lot of people by surprise. [added] I do see you show this in Exercise 2.1.

I calculate for p(H=1)=0.01 that the probability of infection drops to 26%. If p(H=1)=0.001 (the disease is very rare), then probability of infection drops to 3%.

This is true even though the test is highly accurate.

Add other problem with labeling is inaccurate labeling

page 11, section 1.3, 3rd paragraph, the text says:

"need to collect large labeled datasets for training, which can often be time consuming and expensive".

From my experience the third under appreciated factor is the accuracy of labels. This is especially true if managers are cutting corners by rushing the labeling and/or using cheap and inexperienced labor. The labeling inaccuracy is often not easily estimated. It is too easily to be blind-sided by label inaccuracy and end up with a poor model without realizing it until it's too late.

This problem can be magnified if some classes are vastly under represented.

Release 2020-12-31 Typo

print page 392 pdf 422 "convolutional neural networks (CNN), which are designed to work with variable-sized ~~and~~ images;"

print 393 pdf 423 "However, suppose we ~~replacing~~ replace the Heaviside function"

print 400 pdf 430 "it can be used to model financial data, we as well as the global temperature of the earth"

Probably typo error on Page 713 (or I did not understand it fully) "R^(N×N) × R^(N×D) → R^(D×L)" should be "R^(N×N) × R^(N×D) → R^(N×L)"

On Page 713, with section "Graph encoder network:"
"R^(N×N) × R^(N×D) → R^(D×L)" should be "R^(N×N) × R^(N×D) → R^(N×L)" or I did not get it fully.

Typo KL Divergence (Release 31-12-2020)

Page 161
Equation 6.42

The definition for the forwards KL divergence in equation 6.42 on page 161 shows the reverse KL divergence (which is shown in equation 6.43).

Thanks for releasing the updated book :)

In Eq 14.8 (Page 430), matrix W is not in Toeplitz form

Not mentioned: why/when logistic regression is better than linear regression

Not mentioned is why logistic regression is better than linear regression when doing a binary classification.

(This question does come up in interview questions, so I think it's worth mentioning....)

Entropy is also related to information

Section 6.1, 153 (183 of the PDF) Draft 2020-01-03

says:

The entropy of a probability distribution can be interpreted as a measure of uncertainty, or lack
of predictability, associated with a random variable drawn from a given distribution, as we discuss
below.

For myself, I am always amused by the irony that this lack of predictability (entropy) is also related to information. That is something that is completely predictable has no information.

Since the chapter is entitled "Information Theory", I think how Entropy is related to Information deserves an explanation.

In fact, even though the Chapter about Information Theory, the chapter doesn't describe much what Information is. This will puzzle the newbie who would go by the popular definition of "information" and not the very precise one you intend.

(perhaps it's better described in your second volume?)

In 1.3, (unsupervised) doesn't discuss the problem of guessing how many clusters there are

In section 1.3, pp 11-13, does not discuss the problem of guessing how many clusters there.

"then we might want to split the top right into (at least) two subclusters." (p 12, section 1.3.1).

This is (obviously) a hard problem. Maybe it's mentioned somewhere else in the book (so there should be a reference to that), but I haven't finished reading this very fine book yet!

12/30/2020 version.

Typo in Appendix

Book Name: Probabilistic Machine Learning: An Introduction
Book date stamp: 2020-12-28
pdf page number: 749
print page number: 719

In section A.2.1 first sentence, last word should be $\mathcal{X}$ instead of $\mathcal{Y}$.

"""
A.2.1 Functions
A function f: X->Y ... for each x \in Y.
"""

Typo (release December 31)

Page 334, Table 11.1 has "Poserior" instead of "Posterior".

Add "feature engineering" to "feature preprocessing"

Page 9, section 1.2.2.2

"This is a simple example of feature preprocessing". I think it'w worthwhile mention that it's also called "feature engineering", which isn't mention anywhere in the book.

Also add the phrase "feature engineering" to the index.

Some typos in Chapter 1 of Book 1

Page 8: the line right above Eq. (1.12): If we have "multiple inputs" => "multiple features"
Page 9: Eq. (1.15): should not include w^T
Page 11: paragraph 2: "That is, we just get observed outputs D = { y_n: n = 1 : N } without any corresponding inputs x_n." => "That is, we just get observed inputs D = { x_n: n = 1 : N } without any corresponding outputs y_n."

Constrained optimization section

Not sure what version, but downloaded and printed around 1st Feb. Caveat: I didn't know much about this topic, so some suggestions are more to do with my understanding than anything being wrong. Some are really pedantic as well. Change or ignore as you see fit.

Section 5.5 boolean --> Boolean
Section 5.5.1.1 Here, you are minimizing \theta_1^2+\theta_2^2-1 but in the figure just \theta_1^2+\theta_2^2. I know the solution is the same, but its nonetheless inconsistent.
Section 5.5.3 The description of how to convert to standard form could use some extra work. It's not make explicit where \mathbf{A} comes from (presumably an aggregation of equations 5.106 and 5.107 but it wouldn't do any harm to say that). Plus you do not explain how to make $\bftheta \geq \mathbf{0}$ or why this is important.
Section 5.5.3.1 In the worse case --> In the worst-case scenario
Section 5.5.3.1 "There are various..." This sentence is a bit of a non-sequitur and should probably be connected to the previous sentence to identify that you are saying that there do exist methods that are more efficient than the Simplex method.
Section 5.5.4 "From the geometry of the problem..." Well, this is true, but what about when we are in 100 dimensional space and the geometry is not obvoius?
Section 5.5.6 It took me a while to figure out that NLL meant negative log likelihood having just dropped into the book here.
Section 5.6 It wasn't clear to me why we scale by \eta
Equation 5.120 The function f() is not defined. Possibly you mean \mathcal{L}? The $z$ that you are argmin-ing over is also not defined.
Figure 5.15 I might be mistaken but I think you are maximizing the function in this figure vs. minimizing in Figure 5.14 which is a bit confusing.
Equation 5.131 I did not understand why there is a factor of a 1/2 on the RHS
Equation 5.132 I think the first case should be \theta-\lambda. Something wrong anyway as this is not symmetric.
Section 5.6.3 I thought the discussion of the straight-through estimator seemed a little out of place and broke the flow. Consider moving to end of section, shortening or dropping completely.

Realistically, I probably can't read the whole document with this level of detail, but if you know there are sections that are not well read (perhaps near the end of the book or new parts that you have added) then send me a message and I'll try to find time to focus on these.

Add ref to KL Divergence

p84, 114 of PDF, 2020-01-03 Draft, last sentence of the page:

minimizes the KL divergence

should have a reference to 8.1.6.1, p 241 (271 of PDF)

Also add the ref to KL divergence on p84 to the index

Comments for Chapter 5 Optimization algorithms (January 18, 2021)

Hi Kevin, I am sending my comments for Chapter 5 Optimization algorithms

cerno_comments_on_pmlv2_c5.pdf

Thanks!

Cheers,

Peter

AI Ethics and inadequate training sets

AI Ethics is mentioned in section 1.5.

One thing that is not mentioned, is the danger of (unconsciously) picking a biased training set (which is not mentioned in Section 1.2).

That is exemplified by Google's image recognition that misidentified pictures of Blacks as apes.

See:

https://www.theguardian.com/technology/2018/jan/12/google-racism-ban-gorilla-black-people

links to ipynb notebooks are broken

eg figure 1.4 refers to https://github.com/probml/pyprobml/blob/master/notebooks/iris_dtree.ipynb
but should instead refer to https://github.com/probml/pyprobml/blob/master/book1/trees/iris_dtree.ipynb

mismatch between eq 1.16 and figure 1.6(b)

Text links them together. In eq 1.16 there is w5x1x2 term but in figure there is not. This is on page 9 of pdf.

XGBoost may support Categorical variables directly some day

on page 575, it says: "XGBoost assumes the user has preprocessed them into one-hot vectors"

This may change by the time you go to print. Categorical variables are being experimented with:

https://github.com/dmlc/xgboost/releases/tag/v1.3.0

Experimental support for direct splits with categorical features
Currently, XGBoost requires users to one-hot-encode categorical variables. This has adverse performance implications, as the creation of many dummy variables results into higher memory consumption and may require fitting deeper trees to achieve equivalent model accuracy.
The 1.3.0 release of XGBoost contains an experimental support for direct handling of categorical variables in test nodes. Each test node will have the condition of form feature_value \in match_set, where the match_set on the right hand side contains one or more matching categories. The matching categories in match_set represent the condition for traversing to the right child node. Currently, XGBoost will only generate categorical splits with only a single matching category ("one-vs-rest split"). In a future release, we plan to remove this restriction and produce splits with multiple matching categories in match_set.
The categorical split requires the use of JSON model serialization. The legacy binary serialization method cannot be used to save (persist) models with categorical splits.
Note. This feature is currently highly experimental. Use it at your own risk. See the detailed list of limitations at #5949.

In addition, the user doesn't have to explicitly use one-hot encoding for XGBoost (at least with the H2O.ai version). It gets converted behind the scenes:

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/xgboost.html
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/categorical_encoding.html

categorical_encoding: Specify one of the following encoding schemes for handling categorical features:

auto or AUTO: Allow the algorithm to decide. In XGBoost, the algorithm will automatically perform one_hot_internal encoding. (default)

one_hot_internal or OneHotInternal: On the fly N+1 new cols for categorical features with N levels

one_hot_explicit or OneHotExplicit: N+1 new columns for categorical features with N levels

binary or Binary: No more than 32 columns per categorical feature

label_encoder or LabelEncoder: Convert every enum into the integer of its index (for example, level 0 -> 0, level 1 -> 1, etc.)

sort_by_response or SortByResponse: Reorders the levels by the mean response (for example, the level with lowest response -> 0, the level with second-lowest response -> 1, etc.). This is useful, for example, when you have more levels than nbins_cats, and where the top level splits now have a chance at separating the data with a split.

enum_limited or EnumLimited: Automatically reduce categorical levels to the most prevalent ones during training and only keep the T (10) most frequent levels, and then internally do one hot encoding in the case of XGBoost.

I haven't gotten through your book, so I'm not sure if you mention that one of the problems with one-hot encoding, is that it causes an explosion of features, which can cause an effective dilution of the data, in that the information that the one-hot encoded values are mutually exclusive but not "seen" by many (all?) algorithms.

Unclear what "this" refers to in "For regression, this is a scalar..."

p559 (589 of PDF) 2020-01-03 Draft

3rd paragraph:

(For regression, this is a scalar; for classification, it can be the logits or
class probabilities.)

It could be clearer as to what "this" and "it" is. (I presume theta?)

Confusing statement about posterior probability (release 2021-12-31)

Hello,

(Thankyou for the great resource)

I would like to point out the statement about posterior distribution on Page 22, Print Version (Just before Covid example). Since, Bayes and posterior are very important concept for the whole book, it this might be confusing for some people(like me which are not so clever),

It says

Multiplying the prior by the likelihood for each value of H, and then normalizing so the result
sums to one, gives the posterior distribution p(H = h|Y = y); this represents our new belief
state about the possible values of H

I may be completely wrong but I am not sure how this is true. We multiply likelihood by each value of H to get marginal likelihood (denominator, P(Y=y)) instead and then divide by this marginal likelihood to get posterior of some p(H = h|Y = y). We dont " multiply by each value of H in numerator(likelihood*prior) and then normalize"

Multiplying the prior by the "likelihood for each value of H", and then normalizing so the result
sums to one gives us

I mean, for some specific h, to get posterior for this h i.e p(H = h|Y = y), we do not need to multiply all value of H before normalising, am i right?

Release 2021-01-03 Typos

Print Page 589, PDF Page 619: "A natural approach to ~~transfer learning~~ meta-learning"

Print Page 590, PDF Page 620: "N-way K-shot classiﬁcation , in which the system is expected to learn to classify KN classes using just NK training examples of each class.".
Based on the figure 19.7 and the next example I think it should be rewritten as suggested.

convert pytorch course to JAX

The following class by Alfredo Canziani and Yann Le Cun has lots of good pytorch demos:
https://github.com/Atcold/pytorch-Deep-Learning
It would be useful to have JAX versions of these (using flax or haiku for the DNN DSL).

Give a nod to Gauss in 1.2.2.1

It can be argued that Gauss was the first person to do Machine Learning, since he developed Least Squares to predict astroid orbit:

https://en.wikipedia.org/wiki/Least_squares#The_method

I think it would be nice to add a historical foot note to this achievement.

It may be useful to mention that LS has the advantage of having a closed form solution (because quadratic is differentiable), where as L1 does not, which is why LS been held in favor for so long despite having trouble with outliers.

Pls clarify that the aligned DNA sequences are horizontal and other issues

Fig 6.2a p154, pdf p184 Draft 2020-01-03

People who don't know much about DNA may not realize the bases are sequential horizontally and that the columns are to compare similarity. To clarify, I suggest changing (1st paragraph, 2nd sentence)

(e.g., from different species)

(e.g., each row is a sequence from a different species)

In addition, it would help if you had the sequence numbers at the bottom of the figure (but I'm not sure how to squeeze in 10, 11...)

a t a g c c g g t a c g g c a
t t a g c t g c a a c c g c a
t c a g c c a c t a g a g c a
a t a a c c g c g a c c g c a
t t a g c c g c t a a g g t a
t a a g c c t c g t a c g t a
t t a g c c g t t a c g g c c
a t a t c c g g t a c a g t a
a t a g c a g g t a c c g a a
a c a t c c g t g a c g g a a
1 2 3 4 5 6 7 8 9

You might even want to color the letters in 6.2a to match the colors in 6.2b

I confessed I was confused by text for Fig 6.2b:

The overall
vertical axis represents the information content of that location measured in bits (i.e., using log base 2).
Deterministic distributions (with an entropy of 0) have height 2, and uniform distributions (with an entropy
of 2) have height 0

I thought that an entropy of 0 had no information (completely deterministic), but you have bits (height) being 2. (e.g. position 3 is always A). On the other hand, I do understand that for a position to be highly conserved (i.e. doesn't change so that the function of the motif is preserved through evolution) is provides "information" about the importance of the position in the sequence. E.g. positions 3,5, and 13 are critical (no variation), and positions 10 and 15 are highly important (mostly the same base.). I do see that this is explained a bit more on the next page (155), first paragraph, where you mention that the hight of the bar is 2-H_t. Maybe because information content isn't really defined in this chapter (as far as I can see.)

Finally, I note that fig 6.2b is not color blind friendly. Prof. Jerome Friedman redid the figures in later printings of Elements of Statistical Learning to account for that.

probml / pml-book Goto Github PK

pml-book's Introduction

"Probabilistic machine learning": a book series by Kevin Murphy

Book 0: "Machine Learning: A Probabilistic Perspective" (2012)

Book 1: "Probabilistic Machine Learning: An Introduction" (2022)

Book 2: "Probabilistic Machine Learning: Advanced Topics" (2023)

pml-book's People

Contributors

Stargazers

Watchers

Forkers

pml-book's Issues

Recommend Projects

Recommend Topics

Recommend Org