Feb 28th version
This is real nitpicking, so ignore if you don't care about these things.
P857 "autoregressive models" Perhaps I'm just revealing my ignorance here (in which case this is going to be very embarrassing), but I thought that auto-regressive models could model an arbitrary joint probability distribution if each variable is conditioned on all of the previous ones (as might be possible in principle with e.g., transformers). Either I am mistaken, or possibly you are referring to models that only condition on the previous few variables (I didn't read that chapter yet). It stuck out to me anyway.
P852 L15. I don't think the word "datavector" exists. Perhaps change to "at one of the training data examples" or similar.
P852, 853 Algorithm 28 and 29 have inconsistent spacing of semicolons at the end of lines. Sometimes with a space sometimes without. No semicolon at all at the end of Algorithm 28.
P866 Eq 25.40 has a full stop at the end, which is a little weird given that it's part of a continuing sentence. In general the punctuation of equations is inconsistent throughout, but I don't think anyone will notice except pedants like me. In this case though, it's actually wrong.
P872 L42 If this is camera ready production then you need to deal with the overrun of the inline equation here.
P875 L10 Similar issue with a reference.