1. Note that P (uj ≤ Xj ≤ vj ) = e−λuj − e−λvj , and fXj (x|uj ≤ Xj ≤ vj ) = λe−λx /{e−λuj − e−λvj }. Hence,
Xj (λ) ≡ Eλ (Xj |Xj ∈ [uj , vj ]) =
1 uj exp(−λuj ) − vj exp(−λvj )
+
. λ exp(−λuj ) − exp(−λvj )
The log-likelihood function based on the full sample is n ∑
l(θ) ≡ l(θ; X1 , · · · , Xn ) = n log λ − λ
Xj ,
j=1
which yields the MLE based on full sample θ(X1 , · · · , Xn ) = n/
∑
1≤j≤n
Xj .
Now the E-step is
Q(λ) = Eλ0 {l(θ) | Xj ∈ [uj , vj ] for m < j ≤ n} = n log λ − λ
m
∑
Xi − λ
i=1
and the M-step is simply λ1 = n
m
/{ ∑
Xi +
i=1
n
∑
n
∑
Xj (λ0 ),
j=m+1
}
Xj (λ0 ) .
j=m+1
The EM-algorithm iterates E-step and M-step with, for example, initial value λ0 = m/(X1 + · · · + Xm ).
2. (a) Note l(p) = X log p + (n − X) log(1 − p), s(p) = X/p − (n − X)/(1 − p), and s(p) = −X/p2 − (n −
˙
∑n
X)/(1 − p)2 , where X = j=1 Xj . Hence the Fisher information is
I(p) = −Ep {s(p)} = n/p + n/(1 − p) = n/{p(1 − p)}.
˙
dθ
The C-R lower bound for the variance of unbiased estimator of θ(= p2 ) is ( dp )2 /I(p) = 4p3 (1 − p)/n.
∏n
(b) Note L(p) = j=1 pXj (1 − p)1−Xj . This yields p = X/n. Hence θ = (p)2 = X 2 /n2 .
(c) Note
Ep (X 2 ) =
n
∑
∑
2
Ep (Xi ) +
i=1
2
Ep (Xi Xj ) = nEp (X1 ) + (n2 − n)Ep (X1 X2 ) = np + (n2 − n)p2 .
1≤i̸=j≤n
Hence Ep (θ) = p2 + p(1 − p)/n ̸= p2 , i.e θ is a biased estimator for θ with bias p(1 − p)/n.
∗
∗
(d) We draw bootstrap sample X1 , · · · , Xn from Bernoulli distribution with probability p. Define the
∗
∗
∗ 2 bootstrap estimator θ = (X1 + · · · + Xn ) /n2 . The bootstrap estimator for the bias of θ is Bias∗ ≡
Ep (θ∗ ) − θ. In practice, Ep (θ∗ ) may be estimated via repeated bootstrap samplings.
Note. For this simple example, the bias estimator admits a simple analytic formula Bias∗ = p(1 − p)/n, which is the simple plug-in estimator.
3. Let x⋆ = F (p)−1 . Then x⋆ ≥ x for any G( x−µ ) = F (x) ≤ p. Put y ⋆ = (x⋆ − µ)/σ. Then y ⋆ ≥ (x − µ)/σ σ for any G( x−µ ) ≤ p, i.e. y ⋆ ≥ y for any G(y) ≤ p. Hence y ⋆ = G(p)−1 , i.e. {F (p)−1 − µ}/σ = G(p)−1 . σ 4. Since − log x is convex, it follows from Jensen’s inequality that
(∑ )
∑
∑ fi gi
=−
gi log
≥ − log gi log fi = 0. fi gi i i i 5. Since
∑ i≥1 igi = µ,
D(g, f )
=
∞
∑
i=1
gi log gi −
∞
∑
gi log fi = C −
i=1
∞
∑
gi {log p + (i − 1) log(1 − p)}
i=1
= C − log p − (µ − 1) log(1 − p), which obtains the minimum at p = 1/µ. Thus the geometric distribution which minimises D(g, f ) is fi = µ−1 (1 − µ−1 )i−1 , i ≥ 1.
1
6. From the course work, the normal distribution f which minimises D(g, f ) should have the mean and variance equal to the mean and the variance of g. So the answers are (a) N (θ−1 , θ−2 ), (b) N (r + c, r), and (c) N (r/β, r/β 2 ).
7. The exponential distribution on [0, ∞) with mean µ > 0 has the density function f (x) = µ−1 e−x/µ I(x >
0). Hence
∫
∫ ∞ g(x)xdx = C + log µ + µ0 /µ,
D(g, f ) = C − g(x) log f (x)dx = C + log µ + µ−1
0
which obtains the minimum at µ = µ0 .
8. The log quasi-likelihood under exponential distribution is
1∑
Xi . µ i=1 n l(µ; X1 , · · · , Xn ) = −n log µ −
¯
Maximising it leads to the MQLE µ = X = n−1
∑ i ˙ l(µ) = −1/µ + X1 /µ2 ,
Hence
Xi . Write l(µ) = l(µ; X1 ). Then
¨ = 1/µ2 − 2X1 /µ3 . l(µ) −1 2Eg X1
1
I = −Eg ¨ = 2 + l(µ) = 2, µ µ3 µ 2
2
1
2Eg X1
Eg X1
Eg X1
1
−
+
=
− 2.
2
3
4
4 µ µ µ µ µ √ ¯
Hence by the limit theorem for MQLEs, n(X − µ) converges in distribution to a normal distribution with mean 0 and variance
˙
J = Eg {l(µ)}2 =
2
I −1 J I −1 = J /I 2 = Eg X1 − µ2 = Varg (X1 ),
which is the standard CLT for sample mean. In fact, as long as an MQLE is the sample mean, including the MQLE under normal distribution, its asymptotic distribution is effectively determined by the CLT.
¯
Note. MQLEs for µ = EX1 are not always equal to X. For example, the MQLE under the uniform distribution
U (a, b) would be µ = 0.5(a + b) = 0.5(mini Xi + maxj Xj ).
9. Let Y = (Y1 , · · · , Yn )τ , ε = (ε1 , · · · , εn )τ , and X be the n × d matrix with xij as its (i, j)-th element.
Then Y = Xβ + ε. The log likelihood is l(β, σ 2 ) = −
n
1
log(2πσ 2 ) − 2 ||Y − Xβ||2 .
2
2σ
2
Hence the MLE for β is the LSE β = (Xτ X)−1 Xτ Y, and σd =
C. Ignoring a constant, we may define
1
2
n ||Y−Xβ|| ,
2
2
and l(β, σd ) = − n log(σd )+
2
2
AIC(d) = n log(σd ) + 2d.
This is the AIC with all d explanatory variable xi1 , · · · , xid included.
For model selection, we may apply, for example, a back-deleting algorithm with AIC as follows. Deleting one x from the complete model, we obtain a regression model with d − 1 explanatory variable and there are d such models. We choose one with the minimum AIC value which is denoted as AIC(d − 1). Starting from this new model, we may find the optimum model with d − 2 explanatory variables with the AIC value denoted as AIC(d − 2). In the same manner, we may obtain the optimum model with k explanatory variables, for k = d − 3, · · · , 2, 1. The overall optimum model is the one with overall minimum AIC value.
2
You May Also Find These Documents Helpful
-
p P( x 2) 1 P( x 1) 1 P( x 0) P( x 1) 1 …
- 2595 Words
- 11 Pages
Good Essays -
Please remember to show all your work for the calculations in questions 1 and 2. These can be added at the end of this document (see Calculations Section). Complete the yellow shaded regions in the document below.…
- 511 Words
- 3 Pages
Satisfactory Essays -
(i) If random variables Y1 , Y2 , · · · , Yn are independent with a common mean µ but…
- 333 Words
- 2 Pages
Satisfactory Essays -
@Answer found in sections 4.3 The One-sample t-Test and 4.4 Hypothesis Testing, in Statistics for Managers…
- 595 Words
- 8 Pages
Satisfactory Essays -
Please remember to show all your work for the calculations in questions 1 and 2. These can be added at the end of this document (see Calculations Section). Complete the yellow shaded regions in the document below.…
- 256 Words
- 2 Pages
Satisfactory Essays -
1. Answer Questions #1 & #2 on the exam paper and Questions #3 to #6 in the exam booklet(s). Show computations.…
- 1903 Words
- 8 Pages
Satisfactory Essays -
Complete the Problem 1-3B on page 37, 1-4B on page 37 (Chapter 1) and Problem 2-7B on page 89 (Chapter 2) of your text. Please show your work.…
- 413 Words
- 2 Pages
Satisfactory Essays -
Duncan Cramer, D. H. (2004). The SAGE Dictionary of Statistics. London, England: Sage Publications. doi:: http://dx.doi.org/10.4135/9780857020123…
- 1038 Words
- 3 Pages
Better Essays -
A point estimate occurs when a single value, which is the result of sampling, is used to estimate a population parameter. (See p. 501)…
- 1505 Words
- 11 Pages
Good Essays -
3a) A silicon n-p-n transistor has an emitter doping NDE = 1020 cm-3 and a base…
- 2167 Words
- 9 Pages
Powerful Essays -
MLR 1 (Linear in Parameters) – From observation of equation (2) we can see that the model in the population can be written in the form y = β0 + β1X1 + …+ βkXk + u. In the model β1, β2 …βk are the unknown parameters of interest and u is an unobserved random error.…
- 607 Words
- 3 Pages
Satisfactory Essays -
6. Based on the results of the estimation in step 3, answer the following questions:…
- 1063 Words
- 5 Pages
Good Essays -
(This source helped me a lot to answer most of Part A, as it had all the information required).…
- 772 Words
- 3 Pages
Good Essays -
In the long term, n is very large, so the error made by the exponential model is relatively large, so this model will not be accurate in the long term.…
- 616 Words
- 3 Pages
Good Essays -
The log-likelihood function is declared as an R function. In R, functions take at least two arguments. First, they require a vector of…
- 363 Words
- 2 Pages
Satisfactory Essays