Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Advanced Statistics Expert

1. The vector of random variables (X1, X2, X3)T follows a trivariate normal distribution with mean and covariance matrix given by


1
3 1 -2
μ =    -2 Σ = 1 2 -1

0
-2 -1 1.5

(a) Find the joint distribution of (X1, X3).
(b) Find the joint conditional distribution of (X1, X3)|X2 = 1.

(c) Find the joint distribution of

2X1 - X3
X2 + 4X3 + 1

2. Let X ~ Np(µ, Σ). Show via moment generating function that the quadratic shown below is distributed as a central Chi-Square distribution with degrees of freedom p.

(X - µ)T Σ-1 (X - µ) ∼ χp2

Recall that the moment generating function of a Chi-Square distribution with degrees of freedom p is given by M (t) = (1 - 2t)-p/2. A helpful property here is that for generic independent random variables Y1, ..., Yn: MY1+...+Yn (t) = E(etΣt=1n Yi) = Πni=1 E(etYi)

3. Consider the regression problem Y|X = Xβ + R, in which R ~ N (0 , σ2I), X is an n × p matrix, βp×1 is the parameter vector, and Yn×1 is the vector of response variable.

Show that

(a) β^M LE = (XT X)-1XTY

(b) σ^2MLE = (Y-Xβ^MLE )T(Y - Xβ^MLB)

4. We often mention that n (sample size) must be much larger than p (the dimension of each observation) in order for the Central Limit Theorem to be an accurate approximation particularly when the data do not come from a normal distribution.

Recall for the univarite t-distribution, the smaller the degrees of freedom, the larger the kurtosis. Similarly, in the multivariate case, the lower the degrees of freedom, the further the distribution deviates from normality (particularly via kurtosis). The following code simulates data from a p-variate t distribution with degrees of freedom 6, and a covariance matrix that was simulated from a Wishart with p degrees of freedom:

Σ ∼ Wishart(p, Ip)

X1, ...Xn ii~d tp(Σ, df = 6)

Use the code below to input atleast three values of p that contain one low, medium, and high value (e.g. 2, 5, 20), and assess the normality of the sample means for each values of p using n = (10, 100, 1000). Report the qqplots and formal test results for the normality of the sample means. Feel free to test more p's and n's, but you do not need to show qqplots and normality tests for extra results. Provide a written summary of your findings.

library (mvt)

p = p0
N = 5000
means = matrix ( 0 , ncol = ( p ) , nrow = N)

Sigma <- matrix ( rWishart ( 1 , df = p , Sigma = diag ( p ) ), byrow = TRUE, ncol = p)

## Keep the same Sigma for fixed p and varying n

n = n0

for ( i in 1 :N) {

x <- rmvt ( n , sigma = Sigma , df = 6 )

means [ i , ] = apply ( x , 2 , mean)

}

5. Stiffness and bending strength are two variables of interest in the quality of lumber. A sample of 30 pieces of a particular type of wood is provided in the file lumber.txt.

(a) Construct and plot a 95% confidence ellipse for the pair µ = (µ1, µ2), where µ1 = E(Stiffness) and µ2 = E(Bending Strength).

(b) Suppose high quality lumber has µ = (2000, 10000)T . Given the result in part (a), do the data in lumber.txt represent a sample of high quality lumber? Explain.

(c) Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.

6. Consider the random vector X where

X ~ N3

3
10 5 4
2 , 5 18 7
1
4 7 9

Below, you see 5 simulated samples from this distribution.

6:171516  4:605047  5:8303953
7:595643  1:754275  1:8826819
4:047683   1:791576  0:7613451
1:672295   3:434457  2:1768536
2:904052 3:906055 4:6161726

Of course, the choice of data is arbitrary. Here is how I generated the 5 observations above. Feel free to generate more observations, change the mean, covariance, etc.

library(mvtnorm) mu <- c(3,-2,1)
Sigma <- matrix(c(10,5,4,5,18,7,4,7,9),nrow=3) X <- rmvnorm (5,mu,Sigma)

Now, suppose two of the observations in the data-set above are missing at random, the one on the fist row and first column, as well as the one on the third row and third column. The data-set with the missing components is shown below.

NA 4:605047 5:8303953
7:595643 1:754275 1:8826819
4:047683 1:791576 NA
 1:672295 3:434457 2:1768536
2:904052 3:906055 4:6161726

Use EM algorithm described in your text book to estimate the missing data, the MLE for the mean vector and the MLE for the covariance matrix. Be sure that you run the algorithm long enough to reach convergence say within 1e - 5. Also, consider the algorithm in which we only update the missing xj˜(1) for each subject/observation j = 1, ..., n and then recompute the MLE's directly from the updated dataset. In other words, we skip (5-39), and update Σ˜ from the entire dataset as opposed to trying to separately estimate each x(˜1) (1)T (note that the estimate for x(˜1) (2)T ∼j xj ∼j xj are the same under both algorithms). Discuss your thoughts on the implications of both EM methods. Do you prefer one over the other? Discuss any theoretical benefits/downfalls that you see.

7. Bootstrap is an efficient method in calculating the p-value of a test when the theoretical distribution of the test statistic is not available, and/or if the sample size is too small for the asymptotic approximations. The data file T est.txt includes 30 observations of 3 variables. Interest lies in testing the null hypothesis


4

4
H0:μ =    8 vs. Ha:μ ≠ 8

-2

-2

To calculate the bootstrap p-value, generate 10,000 samples, each of size 30 (with re- placement), from the original sample. For each sample set compute the test statistic:

W = - 2 log((maxΣ∈?0, L(µ0,Σ))/(maxµ,Σ∈?L(µ~, Σ)).

Let Wobs be the above computation for the originally observed dataset. Estimate the p-value Pr(W > Wobs) using the bootstrap samples. Compare your answer to the p-value calculated from the asymptotic distribution of the test statistic (Result 5.2 in the book). Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Attachment:- Assignment.rar

Advanced Statistics, Statistics

  • Category:- Advanced Statistics
  • Reference No.:- M91990730
  • Price:- $120

Guranteed 48 Hours Delivery, In Price:- $120

Have any Question?


Related Questions in Advanced Statistics

Question 1before beginning a study investigating the

QUESTION 1 Before beginning a study investigating the ability of a drug to lower cholesterol, baseline values of total serum cholesterol were measured for a sample of 30 healthy controls thought not to be at risk forcard ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As