Ask Statistics and Probability Expert

BHV Estimation project proposal

Member 4 recently dropped, need one more member

B. What would be the important predictors for estimating the Boston housing value and how would these predictors affect the housing value of Boston? We want to figure out the relationship between these predictor variables and the response variable (home value).

C. Description of the dataset:

There are 506 observations of 14 variables in this dataset. These 14 variables contain 13 continuous attributes and 1 binary-valued attribute.

1) CRIM refers to per capita crime rate by town
2) ZN refers to the proportion of residential land zoned for lots over 25,000 square feet
3) INDUS refers to the proportion of non-retail business acres per town
4) CHAS is a qualitative variable that refers to Charles River dummy variable (= 1 if tract bounds river; 0 otherwise
5) NOX refers to the nitric oxides concentration (parts per 10 million)
6) RM refers to average number of rooms per dwelling
7) AGE refers to the proportion of owner-occupied units built prior to 1940
8) DIS refers to weighted distances to five Boston employment centres
9) RAD refers to the index of accessibility to radial highways
10) TAX refers to the full-value property-tax rate per $10,000
11) PTRATIO refers to pupil-teacher ratio by town
12) B refers to 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13) LSTAT refers to % lower status of the population
14) MEDV refers to Median value of owner-occupied homes in $1000's

D. The techniques we think would be useful are cross-validation, PCA and LDA. For the cross-validation, we need to first separate the data into train set and test set with proper percentages. Then we use set.seed() method to get the corresponding train set and test set, we can then use the train set and test set to find the misclassification test error using the KNN-Fold Cross-Validation strategy. We can then compare the test errors at each K value and find the minimal test error and the best K value for the number of folds. For the PCA, we need to draw histograms for the response variable(s) to check for their skewness and normality. If the data is normal, we need to scale the data using mean and standard deviation. If the data is skewed, we need to scale the data using median and median absolute deviation.

We can then look at the loadings for each Principal Component and find the best PC's for our predictor variables. For further investigation, we can use biplot to visualize the PC's and determine which ones work the best. We can then apply the LDA test to do the logistic discrimination analysis for the data. We then can compare the LDA and PCA to find the best estimators for our predictor variables. These are the ones we learned in class so far. There might be some more useful techniques we can apply after getting further in the course, so our techniques for this data might change in the future.

E. How many PC's should we select or use?
What should we do if there are more than one indicator variable?
How should we treat the outliers?
Do we need to use Box-Cox transformation for the data?

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91793156

Have any Question?


Related Questions in Statistics and Probability

Introduction to epidemiology assignment -assignment should

Introduction to Epidemiology Assignment - Assignment should be typed, with adequate space left between questions. Read the following paper, and answer the questions below: Sundquist K., Qvist J. Johansson SE., Sundquist ...

Question 1 many high school students take the ap tests in

Question 1. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year,of the 211,693 students who too ...

Basic statisticsactivity 1define the following terms1

BASIC STATISTICS Activity 1 Define the following terms: 1. Statistics 2. Descriptive Statistics 3. Inferential Statistics 4. Population 5. Sample 6. Quantitative Data 7. Discrete Variable 8. Continuous Variable 9. Qualit ...

Question 1below you are given the examination scores of 20

Question 1 Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file). 52 99 92 86 84 63 72 76 95 88 92 58 65 79 80 90 75 74 56 99 a. Construct a frequency distributi ...

Question 1 assume you have noted the following prices for

Question: 1. Assume you have noted the following prices for paperback books and the number of pages that each book contains. Develop a least-squares estimated regression line. i. Compute the coefficient of determination ...

Question 1 a sample of 81 account balances of a credit

Question 1: A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. 1. Formulate the hypotheses that can be used to determine whether the mean of all acc ...

5 of females smoke cigarettes what is the probability that

5% of females smoke cigarettes. What is the probability that the proportion of smokers in a sample of 865 females would be greater than 3%

Armstrong faber produces a standard number-two pencil

Armstrong Faber produces a standard number-two pencil called Ultra-Lite. The demand for Ultra-Lite has been fairly stable over the past ten years. On average, Armstrong Faber has sold 457,000 pencils each year. Furthermo ...

Sppose a and b are collectively exhaustive in addition pa

Suppose A and B are collectively exhaustive. In addition, P(A) = 0.2 and P(B) = 0.8. Suppose C and D are both mutually exclusive and collectively exhaustive. Further, P(C|A) = 0.7 and P(D|B) = 0.5. What are P(C) and P(D) ...

The time to complete 1 construction project for company a

The time to complete 1 construction project for company A is exponentially distributed with a mean of 1 year. Therefore: (a) What is the probability that a project will be finished in one and half years? (b) What is the ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As