8  Independence and Conditional Probability Practice

9 Practice Problems

9.1 Two Fair Coins

Alex and Bob each flips a fair coin twice. Use “1” to denote heads and “0” to denote tails. Let X be the maximum of the two numbers Alex gets, and let Y be the minimum of the two numbers Bob gets.

  1. Find and sketch the joint PMF \(p_{X,Y} (x, y)\).

  2. Find the marginal PMF \(p_X(x)\) and \(p_Y (y)\).

  3. Find the conditional PMF \(P_{X|Y} (x | y)\). Does \(P_{X|Y} (x | y) = P_X(x)\)? Why or why not?

x <- 0:1
px <- c(.25, .75) 

y<- 0:1
py <- c(.75, .25)

pmf <- px %*% t(py)
rownames(pmf) <- x
colnames(pmf) <- y 
addmargins(pmf)
         0      1  Sum
0   0.1875 0.0625 0.25
1   0.5625 0.1875 0.75
Sum 0.7500 0.2500 1.00
prop.table(pmf, 2)
     0    1
0 0.25 0.25
1 0.75 0.75

Each column gives P(X=x|Y=y) for the two values of y; you can see that they are the same; the reason is because the value of X and Y are independent.

9.2 Two Fair Dice

Two fair dice are rolled. Find the joint PMF of \(X\) and \(Y\) when

  1. \(X\) is the larger value rolled, and \(Y\) is the sum of the two values.
die1 <- rep(1:6, 6)
die2 <- rep(1:6, rep(6,6))
outcomes <- data.frame(die1, die2)
outcomes$x <- pmax(die1, die2)
outcomes$y <- die1+die2
pmf <- prop.table(table(outcomes$x, outcomes$y))
pmf
   
             2          3          4          5          6          7
  1 0.02777778 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  2 0.00000000 0.05555556 0.02777778 0.00000000 0.00000000 0.00000000
  3 0.00000000 0.00000000 0.05555556 0.05555556 0.02777778 0.00000000
  4 0.00000000 0.00000000 0.00000000 0.05555556 0.05555556 0.05555556
  5 0.00000000 0.00000000 0.00000000 0.00000000 0.05555556 0.05555556
  6 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.05555556
   
             8          9         10         11         12
  1 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  4 0.02777778 0.00000000 0.00000000 0.00000000 0.00000000
  5 0.05555556 0.05555556 0.02777778 0.00000000 0.00000000
  6 0.05555556 0.05555556 0.05555556 0.05555556 0.02777778
  1. \(X\) is the smaller, and \(Y\) is the larger value rolled.
die1 <- rep(1:6, 6)
die2 <- rep(1:6, rep(6,6))
outcomes <- data.frame(die1, die2)
outcomes$x <- pmin(die1, die2)
outcomes$y <- pmax(die1,die2)
pmf <- prop.table(table(outcomes$x, outcomes$y))
pmf
   
             1          2          3          4          5          6
  1 0.02777778 0.05555556 0.05555556 0.05555556 0.05555556 0.05555556
  2 0.00000000 0.02777778 0.05555556 0.05555556 0.05555556 0.05555556
  3 0.00000000 0.00000000 0.02777778 0.05555556 0.05555556 0.05555556
  4 0.00000000 0.00000000 0.00000000 0.02777778 0.05555556 0.05555556
  5 0.00000000 0.00000000 0.00000000 0.00000000 0.02777778 0.05555556
  6 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.02777778

9.3 Two Normal RVs

Let X and Y be zero-mean, unit-variance independent Gaussian random variables. Find the value of r for which the probability that \((X, Y )\) falls inside a circle of radius r is 1/2.

x <- rnorm(10000)
y <- rnorm(10000)

r <- seq(1.1, 1.2, .005)
p <- 0
for (i in 1:length(r)){
  p[i] <- mean(sqrt(x^2+y^2) <= r[i])
}
data.frame(r,p)
       r      p
1  1.100 0.4480
2  1.105 0.4521
3  1.110 0.4549
4  1.115 0.4568
5  1.120 0.4596
6  1.125 0.4619
7  1.130 0.4646
8  1.135 0.4678
9  1.140 0.4708
10 1.145 0.4736
11 1.150 0.4770
12 1.155 0.4798
13 1.160 0.4829
14 1.165 0.4861
15 1.170 0.4883
16 1.175 0.4905
17 1.180 0.4934
18 1.185 0.4975
19 1.190 0.5014
20 1.195 0.5052
21 1.200 0.5078
#X^2 + Y^2 ~ Chisq(2)
#so the square root of the 50th percentile from that distribution should be the answer
sqrt(qchisq(.5,2))
[1] 1.17741

9.4 Uniform Random Angle

Let \(\Theta ∼ Uniform[0, 2\pi]\).

  1. If \(X = cos \Theta\), \(Y = sin \Theta\). Are \(X\) and \(Y\) uncorrelated?

Yes, they are uncorrelated, because (x,y) can be any point on the circumference of a circle of radius 1 with uniform likelihood. However, they are not independent. If we know the value of \(Y\) for example, there are only 2 possible values of \(X\).

thetas <- runif(10000, 0, 2*pi)
cor(cos(thetas), sin(thetas))
[1] 0.001023
  1. If \(X = cos(\Theta/4)\), \(Y = sin(\Theta/4)\). Are \(X\) and \(Y\) uncorrelated?

In this case (x,y) can only be found in the first quadrant. In this case they are going to be negatively correlated, since that portion of the unit circle in the first quadrant slopes downwards.

cor(cos(thetas/4), sin(thetas/4))
[1] -0.918208

9.5 Signal and Noise

Let \(Y = X+N\), where \(X\) is the input, \(N\) is the noise, and \(Y\) is the output of a system. Assume that \(X\) and \(N\) are independent random variables. It is given that \(E[X] = 0\), \(Var[X] = \sigma^2_X\), \(E[N] = 0\), and \(Var[N] = \sigma^2_N\).

  1. Find the correlation coefficient \(\rho\) between the input \(X\) and the output \(Y\) .

\(\rho(X,Y)= \dfrac{Cov(X,Y)}{SD(X)SD(Y)}=\dfrac{E(XY)-E(X)E(Y)}{\sigma_X \cdot \sqrt{\sigma^2_X+\sigma^2_N}}\)

\(E(XY)=E(X^2+XN)=E(X^2)+E(X)E(N)\). Because \(E(X)=E(N)=0\) this simplifies to

\(\rho(X,y)=\dfrac{E(X^2)}{\sigma_x\sqrt{\sigma^2_X+\sigma^2_N}}\)

\(Var(X)=E(X^2)-E(X)^2 = E(X^2)\) so we can replace the numerator with \(\sigma^2_X\). So

\(\rho(X,Y) = \sqrt{\dfrac{\sigma^2_X}{\sigma^2_X+\sigma^2_N}}\)

#example: sigma_X = 5, sigma_N=2
X <- rnorm(10000, 0, 5)
N <- rnorm(10000, 0, 2)
cor(X, X+N)
[1] 0.9292393
sqrt(5^2/(5^2+2^2))
[1] 0.9284767
  1. Suppose we estimate the input \(X\) by a linear function \(g(Y ) = aY\) . Find the value of a that minimizes the mean squared error \(E[(X − aY )^2]\).

\(E[(X-aY)^2]=E[X^2-2aXY+a^2Y^2]=E[X^2]-2aE[XY]+a^2E[Y^2]\)

Because \(E(Y)=0\), \(E[Y^2]=Var(Y)=\sigma_X^2+\sigma_N^2\). We already have that \(E(X^2)=\sigma_X^2\) and \(E(XY)=\sigma_X^2\). Thus

$E[(X-aY)^2]=(_X^2+_N2)a2- 2_X^2a+_X^2 $

This is a quadratic function in \(a\), and the vertex can be found at \(-\frac{B}{2A}\) or in this case \(a^*=\dfrac{\sigma_X^2}{\sigma_X^2+\sigma_N^2}\)

  1. Express the resulting mean squared error in terms of \(\eta = \sigma^2_X/\sigma^2_N\).

Plugging this in for \(a\) we get

\(E[(X-aY)^2]=(\sigma_X^2+\sigma_N^2)\dfrac{\sigma_X^4}{(\sigma_X^2+\sigma_N^2)^2}-2\dfrac{\sigma_X^4}{\sigma_X^2+\sigma_N^2}+\sigma_X^2=\sigma_X^2\left(1-\dfrac{\sigma_X^2}{\sigma_X^2+\sigma_N^2}\right)\)

\(=\sigma_X^2\left(\dfrac{\sigma_N^2}{\sigma_X^2+\sigma_N^2}\right)=\dfrac{\sigma_X^2}{\eta+1}\)

9.6 Cat Genetics

The gene that controls white coat color in cats, KIT , is known to be responsible for multiple phenotypes such as deafness and blue eye color. A dominant allele W at one location in the gene has complete penetrance for white coat color; all cats with the W allele have white coats. There is incomplete penetrance for blue eyes and deafness; not all white cats will have blue eyes and not all white cats will be deaf. However, deafness and blue eye color are strongly linked, such that white cats with blue eyes are much more likely to be deaf. The variation in penetrance for eye color and deafness may be due to other genes as well as environmental factors.

  • Suppose that 30% of white cats have one blue eye, while 10% of white cats have two blue eyes.

  • About 73% of white cats with two blue eyes are deaf

  • 40% of white cats with one blue eye are deaf.

  • Only 19% of white cats with other eye colors are deaf.

  1. Calculate the prevalence of deafness among white cats.
n.blue <- c(0,1,2)
p.n.blue <- c(.60, .30, .10)

p.deaf.given.b <- c(.19, .40, .73) #for 0, 1 ,2 blue eyes

#P(Deafness) = P(0b)*P(deaf|0b) + P(1b)*P(deaf|1b) + P(2b)*P(deaf|2b)
(p.deaf <- sum(p.n.blue * p.deaf.given.b))
[1] 0.307
#Check
nCats <- 10000
nblue <- sample(n.blue, size=nCats, replace=TRUE, prob=p.n.blue)
isdeaf <- FALSE
for(i in 1:nCats){
  isdeaf[i] <- runif(1) < p.deaf.given.b[nblue[i]+1]
}
mean(isdeaf)
[1] 0.3042
  1. Given that a white cat is deaf, what is the probability that it has two blue eyes?
#P(2b | deaf) = P(2b)*P(deaf|2b) / p(deaf)
p.n.blue[3] * p.deaf.given.b[3] / p.deaf
[1] 0.237785
#check
mean(nblue[isdeaf]==2)
[1] 0.2307692
  1. Suppose that deaf, white cats have an increased chance of being blind, but that the prevalence of blindness differs according to eye color. While deaf, white cats with two blue eyes or two non-blue eyes have probability 0.20 of developing blindness, deaf and white cats with one blue eye have probability 0.40 of developing blindness. White cats that are not deaf have probability 0.10 of developing blindness, regardless of their eye color.

  2. What is the prevalence of blindness among deaf, white cats?

p.blind.given.nodeaf <- 0.10
p.blind.given.deaf.and.nblue <- c(0.20, 0.4, 0.2) #for 0, 1, 2 blue eyes

#P(blind & deaf) = P(0b)*P(deaf|0b)*P(blind|deaf&0b)+
#  P(1b)*P(deaf|1b)*P(blind|deaf&1b)+
#  P(2b)*P(deaf|2b)*P(blind|deaf&2b)+
p.blind.and.deaf <- sum(p.n.blue * p.deaf.given.b * p.blind.given.deaf.and.nblue)

#P(blind | deaf ) = P(blind & deaf) / P(deaf)
(p.blind.given.deaf = p.blind.and.deaf / p.deaf)
[1] 0.2781759
#check
isBlind <- FALSE
for(i in 1:nCats){
  if(!isdeaf[i]){
isBlind[i] <- runif(1) < p.blind.given.nodeaf
  } else {
isBlind[i] <- runif(1) < p.blind.given.deaf.and.nblue[nblue[i]+1]
  }
}
#check
mean(isBlind[isdeaf])
[1] 0.277449
  1. What is the prevalence of blindness among white cats?
#P(blind) = P(deaf & blind) + P(nondeaf & blind)

#P(nondeaf & blind ) = P(nondeaf) * P(blind | nondeaf)
p.blind.and.nondeaf <- (1-p.deaf)*p.blind.given.nodeaf

(p.blind <- p.blind.and.deaf + p.blind.and.nondeaf)
[1] 0.1547
#check
mean(isBlind)
[1] 0.1568
  1. Given that a cat is white and blind, what is the probability that it has two blue eyes?
#P(2b | blind) = P(2b & blind) / p(blind)
#numerator: P(2b & blind) = P(2b) * [P(deaf|2b) * p(blind|deaf & 2b) + P(nondeaf|2b) * p(blind|nondeaf & 2b)]
p.blind.and.2b <- p.n.blue[3] * (p.deaf.given.b[3]*p.blind.given.deaf.and.nblue[3] + 
 (1-p.deaf.given.b[3]) * p.blind.given.nodeaf)

(p.2b.given.blind = p.blind.and.2b / p.blind)
[1] 0.1118293
#check
mean(nblue[isBlind]==2)
[1] 0.1090561

9.7 GSS Survey I

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?

  1. Linda is a banker.

  2. Linda is a banker and considers herself a liberal Democrat.

To answer this question we will use data from the GSS survey found at https://github.com/AllenDowney/ThinkBayes2/raw/master/data/gss_bayes.csv.

The code for “Banking and related activities” in the indus10 variable is 6870. The values of the column sex are encoded like this:

1: Male, 2: Female

The values of polviews are on a seven-point scale:

1 Extremely liberal
2 Liberal
3 Slightly liberal
4 Moderate
5 Slightly conservative
6 Conservative
7 Extremely conservative

Define “liberal” as anyone whose political views are 3 or below. The values of partyid are encoded:

0 Strong democrat
1 Not strong democrat
2 Independent, near democrat
3 Independent
4 Independent, near republican
5 Not strong republican
6 Strong republican
7 Other party

You need to compute:

  1. The probability that Linda is a female banker,
  2. The probability that Linda is a liberal female banker, and
  3. The probability that Linda is a liberal female banker and a Democrat.
gss <- read.csv("https://github.com/AllenDowney/ThinkBayes2/raw/master/data/gss_bayes.csv")
#banker : indus10==6870
#female : sex == 2
#liberal: polviews <= 3
#democrat: partyid <= 2

#In my reading, I am interpreteing that 'Linda is female' is given from the context;

#P(banker | female) 
mean(gss[gss$sex==2,]$indus10==6870)
[1] 0.02116103
#P(liberal banker | female)
mean(gss[gss$sex==2 & gss$polviews <=3,]$indus10==6870)
[1] 0.01723195
#P(liberal Dem banker | female)
mean(gss[gss$sex==2 & gss$polviews <=3 & gss$partyid <= 1,]$indus10==6870)
[1] 0.01507289

9.8 GSS Survey II

Compute the following probabilities:

  1. What is the probability that a respondent is liberal, given that they are a Democrat?
mean(gss[gss$partyid<=1,]$polviews<=3)
[1] 0.389132
  1. What is the probability that a respondent is a Democrat, given that they are liberal?
mean(gss[gss$polviews<=3,]$partyid<=1)
[1] 0.5206403

9.9 GSS Survey III

There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:

If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.

Whether you agree with this proposition or not, it suggests some probabilities we can compute as an exercise. Rather than use the specific ages 25 and 35, let’s define young and old as under 30 or over 65. For these thresholds, I chose round numbers near the 20th and 80th percentiles. Depending on your age, you may or may not agree with these definitions of “young” and “old”.

I’ll define conservative as someone whose political views are “Conservative”, “Slightly Conservative”, or “Extremely Conservative”.

Compute the following probabilities. For each statement, think about whether it is expressing a conjunction, a conditional probability, or both. For the conditional probabilities, be careful about the order of the arguments. If your answer to the last question is greater than 30%, you have it backwards!

  1. What is the probability that a randomly chosen respondent is a young liberal?
mean(gss$age <30 & gss$polviews <=3)
[1] 0.06579428
  1. What is the probability that a young person is liberal?
mean(gss[gss$age <30,]$polviews <=3)
[1] 0.3385177
  1. What fraction of respondents are old conservatives?
mean(gss$age >65 & gss$polviews >= 5)
[1] 0.06226415
  1. What fraction of conservatives are old?
mean(gss[gss$polviews >=5,]$age >65)
[1] 0.1820933

9.10 Two Child Paradox

Suppose you meet someone and learn that they have two children. You ask if either child is a girl and they say yes. What is the probability that both children are girls? (Hint: Start with four equally likely hypotheses.)

Before we know anything about their two kids, the number of girls \(X\) they have could be 0, 1 or 2. Simplifying the scenario with the ‘equally likely hypothesis’ means that we assume each kid has a 50% chance of being a girl, independently. Thus the probabilities are 0.25, 0.5 and 0.25 respectively.

If we learn that at least one of the kids is a girl, that tells us that the first possibility, that \(x=0\) is not possible. Thus \(P(X=2 | X>0) = .\dfrac{25}{.5+.25}= \frac{1}{3}\)**

9.11 Monty Hall

There are many variations of the Monty Hall problem. For example, suppose Monty always chooses Door 2 if he can, and only chooses Door 3 if he has to (because the car is behind Door 2).

  1. If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?
#3 equally likely possibilities:
#C1  -> Monty chooses Door 2
#C2  -> Monty cannot choose Door 2, chooses Door 3
#C3  -> Monty chooses Door 2

#So if Monty chooses Door 2, either C1 or C3 must be the case, each equally likely. 
#Thus P(C3 | Monty chooses 2) = .50
  1. If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?
#If he chose Door 3, that means that it must be the case that the car is behind 
#door 2; he would *ONLY* choose door 3 in that case.

#Thus P(C2 | Monty chooses 3) = 1.0

9.12 M&Ms

M&M’s are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&M’s, changes the mixture of colors from time to time. In 1995, they introduced blue M&M’s.

  • In 1994, the color mix in a bag of plain M&M’s was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan.

  • In 1996, it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.

Suppose a friend of mine has two bags of M&M’s, and he tells me that one is from 1994 and one from 1996. He won’t tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

(Hint: The trick to this question is to define the hypotheses and the data carefully.)

\(P(G|94) = .10, P(G|96) = .20\) \(P(Y|94) = .20, P(Y|96) = .14\) \(P(94)=P(96)=.50\) assuming equally likely. It’s important to realize only one of two situations could have occurred:

  • Situation 1: A G was chosen from the 94 bag and a Y was chosen from the 96 bag
  • Situation 2: A Y was chosen from the 94 bag and a G was chosen from the 96 bag.

The corresponding probabilities are \((.10)(.14)=.014\) and \((.20)(.20)=.04)\). The question could be stated: What is the probability that Sit.1 occurred given that either Sit1 or Sit2 occured.

The likelihood is \(.014 / (.014+.04) = .2592593\)

9.13 Two Coins

Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides. You choose a coin at random and see that one of the sides is heads. What is the probability that you chose the trick coin?

This is actually similar to the “family with two girls” problem. The equally likely coins are “HT” and “HH”.

\(P(Heads | HH) = 1\) \(P(Heads | TH) = .5\)

Bayes Theorem tells us that \(P(HH | Heads) = \dfrac{P(HH)*P(Heads|HH)}{P(HH)P(Heads|HH)+P(TH)P(Heads|TH)}=\dfrac{.5}{.5+.25}=\frac23\)

Seems counter intuitive! If I pick a random coin from the two, you already know that one of the sides is a head without looking at it. Somehow then, when you look at just one side of the coin, seeing a H makes you 67% sure that it is the trick coin. The reason is that seeing the head side is like flipping it once and getting a H. That is less likely to occur with the fair coin, hence this outcome lends evidence to the “trick coin” hypothesis.

9.14 Online Sales I

(From Introduction to Probability and Statistics for Data Science) Customers can buy one of four produts, each having its own web page with a ``buy’’ link. When they click on the link, they are redirected to a common page containing a registration and payment orm. Once there the customer eithe rbuys the desired product (labeled 1,2,3 or 4) or they fail to complete and a sale is lost. Let event \(A_i\) be that a customer is on produt \(i\)’s web page and let event \(B\) be the event that the customer buys the product. For the purposes of this problem, assume that each potential customer visits at most one product page and so he or she buys at most one produt. For the probabilities shown in the table below, find the probability that a customer buys a product.

Product \((i)\) \(Pr[B|A_i]\) $Pr[A_i]
1 \(0.72\) \(0.14\)
2 \(0.90\) \(0.04\)
3 \(0.51\) \(0.11\)
4 \(0.97\) \(0.02\)

9.15 Online Sales II

Continuing from the previous problem, if a random purchase is selected, find the probability that it was item 1 that was purchased. Do the same for items 2, 3 and 4.

9.16 UAV

Consider a new type of commercial unmanned aerial vehicle (UAV) that has been outfitted with a transponder so that if it crashes it can easily be found and reused. Other older UAVs do not have transponders. Eighty percent of al lUAVs are recovered and, of those recovered, 75% have a transponder. Further, of those not recovered, 90% do not have a transponder. Denote recovery as \(R+\) and failure to recover as \(R-\). Denote having a transponder as \(T+\) and not having a transponder as \(T-\). Find

  1. \(Pr[T+]\)
  2. The probability of not recovering a UAV given that it has a transponder.

9.17 Disease Screening

Suppose that the test for a disease has sensitivity 0.90 and specificity 0.999. The base rate for the disease is 0.002. Find

  1. The probability that someone selected at random from the population tests positive.

  2. The probability that the person has the diseases given a positive test result (the positive predictive value).

  3. The probability that the person does not have the disease given a negative test result (the negative predictive value).

9.18 Two random variables

Suppose

10 Beyond STAT 340

These problems are excellent practice but they are beyond the material we cover in STAT 340.

10.1 Variance of a RV from the pmf

Let \(X\) be a random variable with pmf \(p_k = 1/2^k\) for \(k=1,2,\ldots\).

  1. Find \(\text{Var}X\).

The variance is \(E(X^2)-(EX)^2=E(X^2)-4\). The expected value of \(X^2\) can be derived, though it’s not so fun…

Start by taking the equation \(\sum_k kp^k = \frac{p}{(1-p)^2}\) and take a derivative again. We get \(\sum_k k^2 p^{k-1} = \frac{(1-p)^2+2p(1-p)}{(1-p)^4}\). Multiply through by \(p\) to get \(\sum_k k^2 p^k = \frac{p(1-p)^2+2p^2(1-p)}{(1-p)^4}\). If we let \(p=\frac12\) we have found \(E(X^2)=\sum_{k=1}^\infty k^2(\frac12)^k=\dfrac{\frac18-\frac{2}{8}}{\frac{1}{16}}=6\). Thus \(Var(X)=E(X^2)-E(X)^2 = 6-4=2\). It should be noted that this random variable is actually a geometric random variable (well, according to the “number of trials until and including the first success definition). If we define \(Y\sim Geom(.5)\) using our definition of “number of failures before the first success” then We can let \(X=Y+1\). \(E(X)=E(Y+1)=\frac{1-.5}{.5}+1=2\) and \(Var(X)=Var(Y)=\frac{1-p}{p2}=\frac{.5}{.25^2}=2\).

10.2 Variance of a Uniform RV

Calculate the variance of \(X \sim \text{Unif}(a,b)\). (Hint: First calculate \(\mathbb{E}X^2\))

10.3 Expectation and Binomial

If \(X \sim \text{Binom}(n,p)\) show that \(\mathbb{E}X(X-1)=n(n-1)p^2\).

We can just expand the product \(\mathbb{E}(X^2-X)\) and we can split this up into two expected values: \(\mathbb{E}X^2 - \mathbb{E}X = \mathbb{E}X^2-\mu\). Recall that \(Var(X)=\mathbb{E}X^2-\mu^2\) So \(\mathbb{E}X^2=Var(X)+\mu^2\). For a binomial, \(Var(X)=np(1-p)\) and \(\mu=np\). Thus we have

\(\mathbb{E}X^2 - \mu=[np(1-p) + n^2p^2] - np = np\left(1-p+np-1\right)\)

Tidying up a little bit we get \(np(np-p)=np^2(n-1)\), and we’re done.

10.4 Variance Formula

Show that \(\mathbb{E}(X-\mu)^2 = \mathbb{E}X^2-\mu^2\). Hint: expand the quantity \((X-\mu)^2\) and distribute the expectation over the resulting terms.

The proof goes like this: We first FOIL \((X-\mu)^2\):

\(\mathbb{E}(X-\mu)^2 = \mathbb{E}(X^2 - 2\mu X + \mu^2)\)

We next split the expected value into 3 expected values using the fact that \(\mathbb{E}\) is a linear operator.

\(\mathbb{E}(X-\mu)^2 = \mathbb{E}X^2 -2\mu \mathbb{E}X + \mathbb{E}\mu^2\)

We next observe that \(\mu^2\) is constant and \(\mathbb{X}=\mu\)

\(\mathbb{E}(X-\mu)^2 = \mathbb{E}X^2 -2\mu \mu + \mu^2\)

We can simplify the expression and we’re done!