Picking the Data Set
Look into the following sites as an example and select a data set that interests you.
1. http://www.dataminingconsultant.com/resources.htm
2. http://www.kdnuggets.com/datasets/index.html
3. https://www.kaggle.com/datasets
4. Any other source of your choice
Preparing the data
- Import the data set into R. Save the data as .RData file, so it can be loaded back easily
- Document the steps for the import process and any preprocessing had to be done prior to or after the import
Analyzing the data
- Do the analysis as in Module3 for categorical and numerical data. Show appropriate plots for your data
- Pick one variable with numerical data and examine the distribution of the data.
- Draw various random samples of the data and show the applicability of the Central Limit Theorem for this variable.
- Show how various sampling methods can be used on your data.
- For confidence levels of 80 and 90, show the confidence intervals of the mean of the numeric variable for various samples and compare against the population mean.
Presenting the Project
- You will schedule your project presentation with your Facilitator.
- Each presentation is for at most 10 minutes.
- Your Faciliator will send a sign-up sheet with their schedule slots. The days will be Dec 15th , 16th , 17th.