Ask Applied Statistics Expert

Machine Learning Assignment

1) Regression; Consider the Bike Share dataset from the UCI machine learning repository. The dataset contains three files, viz. day.csv, hour.csv, and Readme.txt. Both the datasets (data.csv and hour.csv) contain a combination of integer-valued (e.g., season, weekday or not) and real-valued features (e.g., temperature, windspeed). Details of the dataset are described in the README (Readme.txt) in the Data Characteristics section. Just like the previous assignment, spend some time understanding the structure of the dataset, how the instances are organized, how the features are organized, what the various features mean (info in README), what features are useful for the task at hand, and so on. Do not attempt to run any machine learning algorithm before understanding the structure of the dataset.

Note, in particular, the last three fields in the data, viz. casual (denotes casual riders), registered (registered riders), and cnt (total ridership count).

1. Using only hour.csv, implement regression algorithms (both linear and k-nearest neighbors) to predict the hourly values for:

a. the number of casual riders

b. the number of registered riders

c. total ridership count

2. Using only day.csv, implement regression algorithms (both linear and k-nearest neighbors) to predict the daily values for:

a. the number of casual riders

b. the number of registered riders

c. total ridership count

Therefore, for each dataset, you are reporting 3 models for linear regression and 3 models for KNN regression.

Note: Remember that using one of the target values (as a feature) in predicting the outcome of any of the counts - casual, registered, or total would defeat the purpose of the learning algorithm. It will make the problem too easy. That is, the number of casual riders cannot be used as a feature to predict the number of registered riders or total ridership. Similarly, the number of registered riders cannot be used to predict the other two values, and so on. Only features like season, temperature, real-feel, etc. have to be used by the learning algorithm.

For instance, suppose you are using hourly.csv dataset and you want to predict the number of casual riders. You have to remove the columns related to the number of registered riders and total ridership first and then start training/testing your model. Similarly, when you are making the prediction model for total ridership, you have to remove the columns related to casual rides and registered riders first and then start training/testing your model.

As before, you will need to separate the data into training set and test set (decide on the proportion of splits yourself). Evaluate the performance of your regression using suitable measures. Report on the performance results and which model(s) worked best (and why in your opinion).

2) Clustering; Consider the Seeds data set from the UCI machine learning repository. The dataset comprises of features from three different types of wheat kernels. There are seven features (area, perimeter, compactness, length, width, asymmetry coefficient, and length of kernel groove) that describe each data point. (Note that the dataset has an eighth column (class information with labels 1, 2, and 3), which we will use as ground truth to verify our clustering results.)

Using the k-means algorithm cluster this dataset into three clusters based on the seven features at your disposal. Demonstrate the effectiveness of your implementation by comparing the results against the ground truth. Follow the steps in the k-means demo video from the lectures.

Also, note that the default label values in scikit learn start from 0, whereas the dataset here starts labels with 1. While evaluating your implementation's effectiveness, ensure to account for this discrepancy.

As a performance measure, compare the clusters identified by k-means w.r.t. the ground truth data and make observations.

Attachment:- Assignment Files.rar

Applied Statistics, Statistics

  • Category:- Applied Statistics
  • Reference No.:- M92496492
  • Price:- $75

Guranteed 36 Hours Delivery, In Price:- $75

Have any Question?


Related Questions in Applied Statistics

Question onea a factory manager claims that workers at

QUESTION ONE (a) A factory manager claims that workers at plant A are faster than those at plant B. To test the claim, a random sample of times (in minutes) taken to complete a given task was taken from each of the plant ...

You are expected to work in groups and write a research

You are expected to work in groups and write a research report. When you work on your report, you need to use the dataset, and other sources such as journal articles. If you use website material, please pay attention to ...

Assignment -for each of the prompts below report the

Assignment - For each of the prompts below, report the appropriate degrees of freedom, t statistic, p-value and plot using the statistical software platform of your choice (R/STATA) 1) A sample of 12 men and 14 women hav ...

Assignment - research topicpurpose the purpose of this task

Assignment - Research topic Purpose: The purpose of this task is to ensure you are progressing satisfactorily with your research project, and that you have clean, useable data to analyse for your final project report. Ta ...

Assessment task -you become interested in the non-skeletal

Assessment Task - You become interested in the non-skeletal effects of vitamin D and review the literature. On the basis of your reading you find that there is some evidence to suggest that vitamin D deficiency is linked ...

Part a -question 1 - an analyst considers to test the order

PART A - Question 1 - An analyst considers to test the order of integration of some time series data. She decides to use the DF test. She estimates a regression of the form Δy t = μ + ψy t-1 + u t and obtains the estimat ...

Medical and applied physiology experimental report

Medical and Applied Physiology Experimental Report Assignment - Title - Compare the working and spatial memory by EEG. 30 students were tested (2 memory games were played to test their memory - a card game and a number g ...

Business data analysis computer assignment -part 1

Business Data Analysis Computer Assignment - PART 1 - Economists believe that high rates of unemployment are linked to decreased life satisfaction ratings. To investigate this relationship, a researcher plans to survey a ...

Question - go to the website national quality forum nqf

Question - Go to the website, National Quality Forum (NQF), located in the Webliography, and download the article by WIRED FOR QUALITY: The Intersection of Health IT and Healthcare Quality, Number 8, MARCH 2008. You are ...

Go to the webliography source for the national cancer

Go to the Webliography source for the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program. In the Fast Stats, create your own cancer statistical report, "Stratified by Data Type," and u ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As