Ask Statistics and Probability Expert

Statistical Models and Methods Linear Models

Please hand in your work by 5.00pm on Thursday 5 May. Work should be submit- ted via the coursework post boxes in the School. Please remember to complete and attach a coursework submission sheet to your report. Your report should contain all relevant plots and R output needed to justify your answers/arguments, together with appropriate discussion, but please do not include pages of irrel- evant plots/output which are not discussed in the report. The easiest way to include R output in your report is to use R Markdown to produce your report, but you do not have to do so. Your report does not need to contain your R code, though you can include it if you wish. If you are using R Markdown, and do not wish to include your R code, then you can suppress the R code using the echo =
FALSE argument, i.e. enclose the code in an {r, echo=FALSE} environment in the Markdown file.

There will be a Moodle forum specifically for answering queries about the course- work, so you may post questions and I will answer them there so that everyone receives the same assistance. Please be careful to not inadvertently give away parts of your answer if you do post a question.

Unauthorised late submission will be penalised by 5% of the full mark per day. Work submitted more than one week late will receive zero marks. You are reminded to familiarise yourself with the guidelines concerning plagiarism in assessed coursework (see the student handbook), and note that this applies equally to computer code as it does to written work.

The Data

Data are available on the recommended prices of used cars in the United States. All cars are the same age, but have done different mileages and have different specifications. You have recently been employed by a used car dealership to build models to describe the dependence of recommended prices on potential explanatory variables, in order to use these models to price your own used cars. The data, which come in two parts, are available on Moodle. They are

TrainData.txt Training data, which will be used to build models.

TestData.txt Test data, which will be used to assess predictions from the models built.

They can be read into R (after saving the file in your working directory) using

Train = read.table("TrainData.txt",header = T)

Test = read.table("TestData.txt",header = T)

A description of the variables can be found in the file description.txt.

After reading in the data, you can look at the structure of the data (number of observa- tions/variable types etc) using the str() command, e.g. str(Train). For both data sets, you should treat the covariates Cylinder, Doors, Cruise, Sound and Leather as factors (they are treated as integers by default). This can be done using, for example,

Train$Cylinder = factor(Train$Cylinder)

The Task

(a) Using the TRAINING data, investigate models to explain the relationship between Price and the other variables. That is, Price (or transformations of it) is to be the response variable, and all other variables are potential explanatory variables.

(b) Use your fitted model(s) from (a) to predict the responses for the observations in the TEST data set. That is, for each of the observations in the Test data, use the values of the explanatory variables as input to your model(s) from (a) to obtain fitted/predicted responses for these observations. Compare your predicted responses with the known observed responses from the observations in the Test data, using suitable plots/numerical summaries.

Notes

- As with any analysis, the first step should be to do some exploratory analysis using any relevant plots and numerical summaries.

- For the model fitting, you can/should use any of the techniques we have covered this semester to investigate potential models. The task is deliberately open-ended, as would be the case in real situations working with real data. As this is a realistic situation with real data, there is not necessarily one single correct answer. Your job is to investigate potential models, and provide a summary of what they tell us about the problem we are trying to solve. The important point is that you correctly use the relevant techniques in a logical and principled manner, and provide a concise but insightful summary of your findings and reasoning. (Note however that you do not have to produce a report in a formal "report" format.)

- You should pay attention as to whether the model assumptions are being met, for example using suitable diagnostic plots, and consider any transformations of the numerical variables if appropriate. Also consider whether your conclusions depend on a few outlying or influential points.

- You should (briefly and concisely) interpret your model(s) and consider whether they make sense in the context of the problem, for example via interpreting the fitted parameters.

- You do not need to include all your R output, as you will generate lots of output when experimenting with the model fitting. However, you should include the output which is relevant to the arguments that you make when describing the logical developments of your model fitting, and any diagnostic plots which justify changes you make in order to meet the modelling assumptions. Finally, at all stages please remember to explain your reasoning and describe (concisely but accurately) the action you take and why, along with the relevant output.

Price: recommended retail price of the car

Mileage: number of miles the car has been driven

Make: manufacturer of the car

Type: body type such as Hatchback, Coupe etc.

Litre: a measure of engine size

Cylinder: number of cylinders in the engine (4,6,8)

Doors: number of doors (2 or 4)

Cruise: indicator variable representing whether the car has cruise control (1 = cruise)

Sound: indicator variable representing whether the car has upgraded speakers (1 = upgraded)

Leather: indicator variable representing whether the car has leather seats (1 = leather)

NOTE: You should change the variables "Cylinder" , "Doors" , "Cruise" , "Sound" and "Leather" to factors, as described on the question sheet.

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91765717
  • Price:- $60

Guranteed 36 Hours Delivery, In Price:- $60

Have any Question?


Related Questions in Statistics and Probability

Introduction to epidemiology assignment -assignment should

Introduction to Epidemiology Assignment - Assignment should be typed, with adequate space left between questions. Read the following paper, and answer the questions below: Sundquist K., Qvist J. Johansson SE., Sundquist ...

Question 1 many high school students take the ap tests in

Question 1. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year,of the 211,693 students who too ...

Basic statisticsactivity 1define the following terms1

BASIC STATISTICS Activity 1 Define the following terms: 1. Statistics 2. Descriptive Statistics 3. Inferential Statistics 4. Population 5. Sample 6. Quantitative Data 7. Discrete Variable 8. Continuous Variable 9. Qualit ...

Question 1below you are given the examination scores of 20

Question 1 Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file). 52 99 92 86 84 63 72 76 95 88 92 58 65 79 80 90 75 74 56 99 a. Construct a frequency distributi ...

Question 1 assume you have noted the following prices for

Question: 1. Assume you have noted the following prices for paperback books and the number of pages that each book contains. Develop a least-squares estimated regression line. i. Compute the coefficient of determination ...

Question 1 a sample of 81 account balances of a credit

Question 1: A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. 1. Formulate the hypotheses that can be used to determine whether the mean of all acc ...

5 of females smoke cigarettes what is the probability that

5% of females smoke cigarettes. What is the probability that the proportion of smokers in a sample of 865 females would be greater than 3%

Armstrong faber produces a standard number-two pencil

Armstrong Faber produces a standard number-two pencil called Ultra-Lite. The demand for Ultra-Lite has been fairly stable over the past ten years. On average, Armstrong Faber has sold 457,000 pencils each year. Furthermo ...

Sppose a and b are collectively exhaustive in addition pa

Suppose A and B are collectively exhaustive. In addition, P(A) = 0.2 and P(B) = 0.8. Suppose C and D are both mutually exclusive and collectively exhaustive. Further, P(C|A) = 0.7 and P(D|B) = 0.5. What are P(C) and P(D) ...

The time to complete 1 construction project for company a

The time to complete 1 construction project for company A is exponentially distributed with a mean of 1 year. Therefore: (a) What is the probability that a project will be finished in one and half years? (b) What is the ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As