Ask Statistics and Probability Expert

Assignment: Multivariate Data Analysis

Part A

Refer to the data in FoodConsumptionNutrients en.xls. It has information for about 175 countries. Choose 30 or so countries that interest you to work on. Be sure that you use countries from at least three different country groups from different regions (see the sheet CountryGroupComposition to get some ideas for groupings that you might use). Collect the information on energy consumption, fat consumption and protein consumption for your chosen countries onto a single sheet. Create a variable for the country group.

1. Choose two of the three original variables. Draw a scatterplot with the country group of each point indicated. Comment.

2. Generate classification rules using

• Linear discriminant analysis

• Quadratic discriminant analysis

• Multinomial logistic regression

• Classification trees

3. Using the confusion matrix and the apparent error rate, compare the effectiveness of each of the classifi- cation rules.

4. Assume that you did not know which countries were in which groups. Use the following methods to group the observations.

• One hierarchical implementation of cluster analysis

• K-means cluster analysis

• Multidimensional scaling

Do any of these correctly divide all the observations into the original groups?

Part B appears overleaf.

Find two datasets using online sources that you can use to demonstrate the techniques that you have learned in this subject. Some good places to find interesting data are:

• http://blog.visual.ly/data-sources/
• http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
• http://www.tableausoftware.com/public/community/sample-data-sets
• https://www.kaggle.com/
• http://lib.stat.cmu.edu/DASL/
• http://www.models.kvl.dk/datasets
• http://research.library.gsu.edu/c.php?g=115854&p=754836
• http://www.stat.ufl.edu/ winner/datasets.html
• http://www.statsci.org/data/

You must get approval from me for your datasets before you begin. I may not approve two students using the same dataset.

Some datasets are quite extensive and you may feel that you can illustrate a range of techniques with different subsets of the same dataset. If you think this applies to your chosen dataset talk to me about this when you are getting approval for your dataset.

If you are having trouble thinking about what you need to be able to do, think back over the broad areas that we have covered in class - inferences about mean vectors, MANOVA (one- and two-way), multivariate linear regression, PCA and factor analysis, canonical correlation, discrimination and classification including clustering. You don't need to show that you can do all of these but I would hope (read expect) to see at least 5 of these broad areas represented in your answer.

For each of your chosen datasets, you need to pose one or more questions that you believe you can (try to) address using the dataset. You then need to use appropriate techniques to analyse the data to address the research question(s) that you have posed. Finally, you will need to reflect on the adequacy of the dataset to address the questions that you have posed, and make suggestions about how you might collect the data differently to better address your question (consider what to collect or how to collect, for instance).

Your answer to this question should include (separately for each of the two datasets, if appropriate):

• A report that describes the data, poses the research question(s), analyses the research question(s) and reflects on the usefulness of the data to answer the question(s). This should be in a report format, with essential output in the report and any other output that you use in an appendix. You should also indicate where you obtained the data from (e.g. reference to a paper or URL).

• A .R file containing your code.

• A .csv file containing the data set (if it is not already in your .R file)

Attachment:- FoodConsumptionNutrients en.xlsx

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91549253
  • Price:- $30

Priced at Now at $30, Verified Solution

Have any Question?


Related Questions in Statistics and Probability

Introduction to epidemiology assignment -assignment should

Introduction to Epidemiology Assignment - Assignment should be typed, with adequate space left between questions. Read the following paper, and answer the questions below: Sundquist K., Qvist J. Johansson SE., Sundquist ...

Question 1 many high school students take the ap tests in

Question 1. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year,of the 211,693 students who too ...

Basic statisticsactivity 1define the following terms1

BASIC STATISTICS Activity 1 Define the following terms: 1. Statistics 2. Descriptive Statistics 3. Inferential Statistics 4. Population 5. Sample 6. Quantitative Data 7. Discrete Variable 8. Continuous Variable 9. Qualit ...

Question 1below you are given the examination scores of 20

Question 1 Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file). 52 99 92 86 84 63 72 76 95 88 92 58 65 79 80 90 75 74 56 99 a. Construct a frequency distributi ...

Question 1 assume you have noted the following prices for

Question: 1. Assume you have noted the following prices for paperback books and the number of pages that each book contains. Develop a least-squares estimated regression line. i. Compute the coefficient of determination ...

Question 1 a sample of 81 account balances of a credit

Question 1: A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. 1. Formulate the hypotheses that can be used to determine whether the mean of all acc ...

5 of females smoke cigarettes what is the probability that

5% of females smoke cigarettes. What is the probability that the proportion of smokers in a sample of 865 females would be greater than 3%

Armstrong faber produces a standard number-two pencil

Armstrong Faber produces a standard number-two pencil called Ultra-Lite. The demand for Ultra-Lite has been fairly stable over the past ten years. On average, Armstrong Faber has sold 457,000 pencils each year. Furthermo ...

Sppose a and b are collectively exhaustive in addition pa

Suppose A and B are collectively exhaustive. In addition, P(A) = 0.2 and P(B) = 0.8. Suppose C and D are both mutually exclusive and collectively exhaustive. Further, P(C|A) = 0.7 and P(D|B) = 0.5. What are P(C) and P(D) ...

The time to complete 1 construction project for company a

The time to complete 1 construction project for company A is exponentially distributed with a mean of 1 year. Therefore: (a) What is the probability that a project will be finished in one and half years? (b) What is the ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As