Two philosophies: If you are taking lectures notes (or can borrow notes from someone who is), read the lecture notes again before reading the text, that will make the text easier to understand. If you are not taking lecture notes, skim the reading before lecture to maximize the chance that you grok (understand deeply) the lecture. Either way you approach the reading, the examples are your friends. Get to know them, many of them are fun, and they will help you understand the theory.
It goes without saying that the Exercises at the end of each section are important. This page lists what you should read before you try those exercises. You can skim at first but if you have trouble with the exercises then read the section carefully before trying the exercises again.
1.1: Pages 1-5.
1.2: Pages 11-13. It is important to understand the figure on page 13.
1.3: This is the basis of everything that follows. The table on Page 19 should be imprinted on your heart and you should thoroughly understand pages 20-25. Skip Example 4 for now. Learn to recognize the Bernoulli distribution and the uniform distribution on a finite set. In this course it is a very good idea to learn to recognize the standard distributions that crop up frequently.
1.4: Try following this path: start with Example 2 on page 34. This motivates the general formula at the top of page 36. Read that formula and Example 4. Then come back to Example 3 – it's the interesting one. The multiplication rule on page 37 will come as no surprise. At this point you should be able to simply read Examples 6 and 7. Compare the general multiplication rule on page 37 with the special case when you have independence, on page 42. Then skim Examples 8 and 9. See if you can draw a tree diagram that can be used in Example 9. You should now be able to go back and do Example 4 of 1.3 without reading the solution first.
1.5: Page 47 through the discussion that ends at the top of page 51.
1.6: This obvious extension of the familiar multiplication rule leads to a lot of interesting stuff. I suggest you start with Example 4. Then try Examples 2 and 3 (please note the infinite outcome space - everything's nice and convergent so don't worry). And then the birthday problem in Example 5. It is useful to note that independence can be slippery when you have more than two events: see Example 8, and also look carefully at the box on page 67 – it is not enough just to check that P(ABC) = P(A)P(B)P(C).
2.1: Go through pages 79-83 very thoroughly. Then read all the contents of the box on Page 86; follow that by learning the derivation using consecutive odds. Look carefully at the arrays of figures on Pages 87-89. Go through the captions and make sure you're following the details of what changes as n gets large.
2.2: Everything up to and including the box on Page 101, as well as the discussion of “How good is the normal approximation?” on pages 103-104. I will sometimes use the terms "center" and "spread" for "mean" and "standard deviation" respectively. Also, more formally, "location parameter" and "scale parameter". The normal table is in Appendix 5.
2.4: Nice short section, read all of it.
2.5: Start on Page 124 at Sampling Without Replacement and read to the end. Again a nice short section, but be careful when you do the problems - counting can be quite slippery.
3.1: Skim pages 140-152, through the box on page 152, then read carefully anything that seems unclear. In lecture we have done most of this material but without the context of the random variable. Skip the technical remark starting at the bottom of page 143. Next read the first paragraph under the heading Several Random Variables (page 153), skim the definition and consequences of mutual independence of random variables on page 154. Verify that the messy formula for the multinomial distribution follows the same logic as for the binomial, and look over the discussion of symmetry on page 156.
3.2: This section is the key to much of the rest of the course. You must go through all of it, except the Gambling Interpretation on pages 165-166 and Expectation and Prediction on pages 178-179. The summary box on page 181 is crucial.
3.3: This is about the fundamental measure of dispersion, and like 3.2 it must be internalized deeply. Read everything except the Skewness section on page 198.
3.4: This formalizes some moves we've been making for a while, e.g. with the Poisson distribution. But the examples in this section are all in the context of the geometric distribution which is the simplest of all the distributions on an infinite set. Skim the whole section.
3.5: The Poisson is familiar as an approximation to the binomial. Here it appears in its own right as a distribution. Read pages 222-224, then 226-227. (We will never cover the skew-normal approx to the Poisson.) For the random scatter, read the assumptions in the boxes on page 229 and the statement of the theorem in the box on page 230. Then read the Examples 2 and 3, and the note on Thinning. You don't have to read the proof of the Poisson Scatter Theorem on page 233.
3.6: This formalizes the symmetries that you saw in card shuffling earlier in the class. Go straight to Examples 1 and 2 on page 240 - you will find that you could have done them back in Chapter 2 after we talked about sampling without replacement. The main calculation is that of the mean and variance of the hypergeometric, pages 241-243.
4.1: Pages 260-271. Follow the examples closely. Spend some time comparing pages 262-263. Understanding the relationship between between discrete distributions and continuous distributions (defined by a density) is crucial to building a bridge between what you've learned up to now to what we're doing in Ch 4.
4.2: Pages 278-290, don't care about Gamma Dist for Non-Integer Shape Parameter or anything after that. Understanding the summary on page 288-289 is important. Example 4 on page 290 is instructive because it shows how you can use gamma facts known from the Poisson process context in a setting where there appears to be no Poisson process.
4.4: Read the whole section (it's short) and follow the examples carefully. Try hard to wrap your head around the picture on page 305. Once you get straight what it means, the whole section makes sense.
4.5: We defined the c.d.f. very early, when we started 4.1, and used it for the standard normal distributions way before that. So skim pages 311-314, then go over Examples 1 and 2 and notice that that we did Example 1 in class when we talked about the uniform density. The discussion of max and min on pages 316-318 will come as no surprise, given the similarity between the geometric and the exponential distributions. Pay close attention to the discussion of Simulation starting at the bottom of page 320 and everything to the end, it's not what I would call core material but it's pretty cool.
4.6: Nice short section, read it all. Remember that identifying a beta distribution is easy – the density has to look like x to a power times (1-x) to another power, for x between 0 and 1. The rest is just the constant that makes the density integrate to 1.
Chapter Summary: Everything on pages 332-333 except the sections on hazard rates.
Note: Gamma densities are sprinkled throughout the chapter, in particular in Sections 4.2 and 4.4; the gamma constant also appears in Section 4.6.
5.1: Easy but important; go through all the examples. This section gives you practice in representing events as regions in the plane.
5.2: This one has all the fundamentals, so read everything carefully. The three examples aren't overly hard, so I recommend attempting them on your own before reading the solution. It is important to compare the tables on pages 348-349 line by line. You will find that you already learned all the joint density facts in Chapter 3, provided you replace sums by integrals.
5.3: This is perhaps the most important joint distribution in statistics. Read pages 357-361 (we'll do most of it in lecture). Then read the result in the box on page 363. The result is crucial (sums of independent normals are normal) and simple to remember, even if you choose not to go through its derivation. Example 2 on page 364 involves an important technique.
5.4: We will cover distributions of sums, pages 371-382. The ratio example is great but everything that I ask you to do with ratios can be done without the density of the ratio, so I have omitted the density calculation
Chapter Summary: Nice and short. At this point you should go through the Distribution Summaries (pages 476-478) and notice that you know all the distributions, apart from the bivariate normal which you will meet in Chapter 6. These summaries are a wonderful part of this text; you won't find this information so succinctly displayed elsewhere.
6.1: This is essentially just one example, to get you back into thinking about discrete joint distributions. Notice that, as with many examples in conditioning, it's easy to find conditional distributions if you go in "chronological order". For example, it's usually easy to find the conditional distribution of the number of heads (Y) given the number of coins (X). It takes more work to "go backwards in time", that is, to find the distribution of the number of coins given the number of heads. You need to renormalize by the probability of what's given; that is, you have to use the division rule. That's what this section is about.
6.2: Conditional expectation is a powerful tool for finding expectations. The key is the box on Page 403. Skim pages 40-403, then read Examples 2 and 3. Then go to Page 406, which formalizes a natural idea.
6.3: Start with the box on page 417, in the context of coin-tossing. Read Example 3 and Problem 1 of Example 4. Now go over the boxes on pages 410 and 411, and then look at the calculations at the bottom of page 411 to reassure yourself that a conditional density is just an ordinary density, and can be used like any other density. The diagrams on page 412, and their companion text on page 413, are terrific for a geometric understanding of the division involved in the formula for the conditional density. Go on to Example 1 and follow it thoroughly. Then look at the box on page 416 and go over Example 2. Finally, compare pages 424 and 425. They should show you that everything you know about continuous conditioning is an extension of what you already knew about discrete conditioning.
6.4: We will first use covariance as a tool to find the variance of a sum. So start with pages 430-431, then jump to page 441-444. Next comes correlation, which is the measure that gives some meaning to covariance. Go over the boxes on pages 432-433. Example 4 of the text is similar to one of the exercises on that page. Example 6 brings together all the techniques you have recently learned. It's well worth going through.
6.5: The bivariate (and multivariate) normal is the fundamental distribution of statistics. Read page 449, then the box on page 451. If you don't like the geometry of the construction of the bivariate normal, never mind (though pages 452-453 are among the best descriptions of the geometry at this level). But you must follow everything on pages 454-461. Much of it will be done in class exactly as in the text, but you must fill in the blanks.
Chapter Summary: This lists all the general formulas, but in my experience students understand these formulas much better in the context of specific examples. If a formula seems mysterious, an excellent exercise is to go through your notes and the text to find one specific example of the use of that formula.