1 the vector of random variables x1 x2 x3t follows a, Ask an Expert

Statistics

1. The vector of random variables (X₁, X₂, X₃)^T follows a trivariate normal distribution with mean and covariance matrix given by

	1		3	1	-2
μ =	-2	Σ =	1	2	-1
	0		-2	-1	1.5

(a) Find the joint distribution of (X₁, X₃).
(b) Find the joint conditional distribution of (X₁, X₃)|X₂ = 1.

2X1 - X3

X2 + 4X3 + 1

2. Let X ~ N_p(µ, Σ). Show via moment generating function that the quadratic shown below is distributed as a central Chi-Square distribution with degrees of freedom p.

(X - µ)^T Σ^-1 (X - µ) ∼ χ_p²

Recall that the moment generating function of a Chi-Square distribution with degrees of freedom p is given by M (t) = (1 - 2t)^-p/2. A helpful property here is that for generic independent random variables Y₁, ..., Y_n: M_Y1+...+Y_n (t) = E(e^tΣ_t=1ⁿ Yi) = Πⁿ_i=1 E(e^tY_i)

3. Consider the regression problem Y|X = Xβ + R, in which R ~ N (0 , σ²I), X is an n × p matrix, β_p×1 is the parameter vector, and Y_n×1 is the vector of response variable.

Show that

(a) β^M LE = (X^T X)^-1X^TY

(b) σ^{^2}_MLE = (Y-Xβ^{^}_MLE )^T(Y - Xβ^{^}MLB)

4. We often mention that n (sample size) must be much larger than p (the dimension of each observation) in order for the Central Limit Theorem to be an accurate approximation particularly when the data do not come from a normal distribution.

Recall for the univarite t-distribution, the smaller the degrees of freedom, the larger the kurtosis. Similarly, in the multivariate case, the lower the degrees of freedom, the further the distribution deviates from normality (particularly via kurtosis). The following code simulates data from a p-variate t distribution with degrees of freedom 6, and a covariance matrix that was simulated from a Wishart with p degrees of freedom:

Σ ∼ Wishart(p, I_p)

X₁, ...X_n ii_~d t_p(Σ, df = 6)

Use the code below to input atleast three values of p that contain one low, medium, and high value (e.g. 2, 5, 20), and assess the normality of the sample means for each values of p using n = (10, 100, 1000). Report the qqplots and formal test results for the normality of the sample means. Feel free to test more p's and n's, but you do not need to show qqplots and normality tests for extra results. Provide a written summary of your findings.

library (mvt)

p = p0
N = 5000
means = matrix ( 0 , ncol = ( p ) , nrow = N)

Sigma <- matrix ( rWishart ( 1 , df = p , Sigma = diag ( p ) ), byrow = TRUE, ncol = p)

## Keep the same Sigma for fixed p and varying n

n = n0

for ( i in 1 :N) {

x <- rmvt ( n , sigma = Sigma , df = 6 )

means [ i , ] = apply ( x , 2 , mean)

}

5. Stiffness and bending strength are two variables of interest in the quality of lumber. A sample of 30 pieces of a particular type of wood is provided in the file lumber.txt.

(a) Construct and plot a 95% confidence ellipse for the pair µ = (µ₁, µ₂), where µ₁ = E(Stiffness) and µ₂= E(Bending Strength).

(b) Suppose high quality lumber has µ = (2000, 10000)T . Given the result in part (a), do the data in lumber.txt represent a sample of high quality lumber? Explain.

(c) Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.

6. Consider the random vector X where

X ~ N₃

3		10	5	4
2	,	5	18	7
1		4	7	9

Below, you see 5 simulated samples from this distribution.

6:171516	4:605047	5:8303953
7:595643	1:754275	1:8826819
4:047683	1:791576	0:7613451
1:672295	3:434457	2:1768536
2:904052	3:906055	4:6161726

Of course, the choice of data is arbitrary. Here is how I generated the 5 observations above. Feel free to generate more observations, change the mean, covariance, etc.

library(mvtnorm) mu <- c(3,-2,1)
Sigma <- matrix(c(10,5,4,5,18,7,4,7,9),nrow=3) X <- rmvnorm (5,mu,Sigma)

Now, suppose two of the observations in the data-set above are missing at random, the one on the fist row and first column, as well as the one on the third row and third column. The data-set with the missing components is shown below.

NA	4:605047	5:8303953
7:595643	1:754275	1:8826819
4:047683	1:791576	NA
1:672295	3:434457	2:1768536
2:904052	3:906055	4:6161726

Use EM algorithm described in your text book to estimate the missing data, the MLE for the mean vector and the MLE for the covariance matrix. Be sure that you run the algorithm long enough to reach convergence say within 1e - 5. Also, consider the algorithm in which we only update the missing x_j˜(1) for each subject/observation j = 1, ..., n and then recompute the MLE's directly from the updated dataset. In other words, we skip (5-39), and update Σ˜ from the entire dataset as opposed to trying to separately estimate each x(˜1) (1)T (note that the estimate for x(˜1) (2)T ∼j xj ∼j xj are the same under both algorithms). Discuss your thoughts on the implications of both EM methods. Do you prefer one over the other? Discuss any theoretical benefits/downfalls that you see.

7. Bootstrap is an efficient method in calculating the p-value of a test when the theoretical distribution of the test statistic is not available, and/or if the sample size is too small for the asymptotic approximations. The data file T est.txt includes 30 observations of 3 variables. Interest lies in testing the null hypothesis

	4		4
H₀:μ =	8	vs. H_a:μ ≠	8
	-2		-2

To calculate the bootstrap p-value, generate 10,000 samples, each of size 30 (with re- placement), from the original sample. For each sample set compute the test statistic:

W = - 2 log((max_Σ∈?0, L(µ₀,Σ))/(maxµ,_Σ∈?L(µ_~, Σ)).

Let W_obsbe the above computation for the originally observed dataset. Estimate the p-value Pr(W > W_obs) using the bootstrap samples. Compare your answer to the p-value calculated from the asymptotic distribution of the test statistic (Result 5.2 in the book). Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Attachment:- Assignment.rar

View complete question

Advanced Statistics, Statistics

Category:- Advanced Statistics
Reference No.:- M91990730
Price:- $120

Guranteed 48 Hours Delivery, In Price:- $120

Have any Question?Write your Review or question?

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Recent Questions

Ask Advanced Statistics Expert

Statistics

Related Questions in Advanced Statistics

Question 1before beginning a study investigating the

Ask Experts for help!!

Looking for Assignment Help?

Why might a bank avoid the use of interest rate swaps even

Describe the difference between zero coupon bonds and

Compute the present value of an annuity of 880 per year

Compute the present value of an 1150 payment made in ten

Compute the present value of an annuity of 699 per year

Follow Us