Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Statistics and Probability Expert

BHV Estimation project proposal

Member 4 recently dropped, need one more member

B. What would be the important predictors for estimating the Boston housing value and how would these predictors affect the housing value of Boston? We want to figure out the relationship between these predictor variables and the response variable (home value).

C. Description of the dataset:

There are 506 observations of 14 variables in this dataset. These 14 variables contain 13 continuous attributes and 1 binary-valued attribute.

1) CRIM refers to per capita crime rate by town
2) ZN refers to the proportion of residential land zoned for lots over 25,000 square feet
3) INDUS refers to the proportion of non-retail business acres per town
4) CHAS is a qualitative variable that refers to Charles River dummy variable (= 1 if tract bounds river; 0 otherwise
5) NOX refers to the nitric oxides concentration (parts per 10 million)
6) RM refers to average number of rooms per dwelling
7) AGE refers to the proportion of owner-occupied units built prior to 1940
8) DIS refers to weighted distances to five Boston employment centres
9) RAD refers to the index of accessibility to radial highways
10) TAX refers to the full-value property-tax rate per $10,000
11) PTRATIO refers to pupil-teacher ratio by town
12) B refers to 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13) LSTAT refers to % lower status of the population
14) MEDV refers to Median value of owner-occupied homes in $1000's

D. The techniques we think would be useful are cross-validation, PCA and LDA. For the cross-validation, we need to first separate the data into train set and test set with proper percentages. Then we use set.seed() method to get the corresponding train set and test set, we can then use the train set and test set to find the misclassification test error using the KNN-Fold Cross-Validation strategy. We can then compare the test errors at each K value and find the minimal test error and the best K value for the number of folds. For the PCA, we need to draw histograms for the response variable(s) to check for their skewness and normality. If the data is normal, we need to scale the data using mean and standard deviation. If the data is skewed, we need to scale the data using median and median absolute deviation.

We can then look at the loadings for each Principal Component and find the best PC's for our predictor variables. For further investigation, we can use biplot to visualize the PC's and determine which ones work the best. We can then apply the LDA test to do the logistic discrimination analysis for the data. We then can compare the LDA and PCA to find the best estimators for our predictor variables. These are the ones we learned in class so far. There might be some more useful techniques we can apply after getting further in the course, so our techniques for this data might change in the future.

E. How many PC's should we select or use?
What should we do if there are more than one indicator variable?
How should we treat the outliers?
Do we need to use Box-Cox transformation for the data?

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91793156

Have any Question?


Related Questions in Statistics and Probability

Your companys revenues were 3 million this year you paid

Your company's revenues were $3 million this year. You paid out $500,000 in salaries and your only other cash outflow was the purchase of a piece of construction equipment for $1 million that is to be depreciated to a ze ...

The risk-free rate is 20 in the fama-french model the

The risk-free rate is 2.0%. In the Fama-French model, the equity risk premium is 4.0%, the size premium is 2.0%, and the value premium is 2.8%. Glude Corp has a market beta of 1.20, a size beta of -0.30, and a value beta ...

A survey of a group of college students was done to find

A survey of a group of college students was done to find out how students get to school for the school year. 15% of those surveyed were from out of state. Of those that were in-state, 56% used a car as their primary form ...

Consider a hypothesis testing problem where the null and

Consider a hypothesis testing problem where the null and the alternative hypotheses are defined as H0 μ=5 Vs. H1 :μ≠5 H0:μ=5 Vs. H1:μ≠5 A 95% confidence interval for μ was calculated as (2, 7). A) What would be your conc ...

What statistic was calculated to determine differences

What statistic was calculated to determine differences between the intervention and control groups for the lumbar and femur neck BMDs? Were the groups significantly different for BMDs?

A process is normally distributed with a mean of 104

A process is normally distributed with a mean of 104 rotations per minute and a standard deviation of 8.2 rotations per minute. If a randomly selected minute has 118 rotations per minute, would the process be considered ...

Question a number of public policies related to alcohol

Question: A number of public policies related to alcohol consumption have been instituted over the past couple of decades in an attempt to limit the number of alcohol-related traffic fatalities. These policies include: • ...

What type of data values are quantitative and the number of

What type of data values are quantitative and the number of values is finite or countable?

Anystate auto insurance company took a random sample

Anystate Auto Insurance Company took a random sample of 388 insurance claims paid out during a 1-year period. The average claim paid was $1575. Assume  σ  = $240. Find a 0.90 confidence interval for the mean claim paymen ...

The table shows the results of a survey in which 400 adults

The table shows the results of a survey in which 400 adults from the? East, 400 adults from the? South, 400 adults from the? Midwest, and 400 adults from the West were asked if traffic congestion is a serious problem. Co ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As