Ask Homework Help/Study Tips Expert

Some of the questions in this assignment require you to use the "BikeShare" dataset. This dataset is given as a text file, named "BikeShareTabSep.txt". You can download this from the Assignment folder in CloudDeakin. Below is the description of this dataset.

Bike sharing dataset (BikeShare)
This dataset gives the count of bikes rented between 11am - 12pm on different days and locations through the Capital Bikeshare System (operating in US cities) between 2011 and 2012. The variables include the following (9 variables):

Season: Categorical: 1 = Spring, 2 = Summer, 3 = Autumn (fall), 4 = Winter

Working day: 0 = Weekend, 1 = Workday.

Weather: Categorical variable
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered cloud
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

Temperature: Temperature in Celsius.

`Feeling' Temperature: `Feels like' temperature, reported in Celsius.

Humidity: Humidity (given as a percentage).

Windspeed: Windspeed (measured in km/h).

Casual users: Count of casual users that used a bike at that time.

Registered users: Count of registered users that used a bike at that time.

Assignment tasks

Q1):

• Download the txt file "BikeShareTabSep.txt" and save it to your R working directory.
• Assign the data to a matrix, e.g. using

the.data<-as.matrix(read.table("BikeShareTabSep.txt"))

• Generate a sample of 400 data using the following:

my.data <- the.data [sample(1:727,400),c(1:9)]

Save "my.data" to a text file titled "name-StudentID-BikeShareMyData.txt" using the following R code (NOTE: you must upload this text file with your submission).

write.table(my.data,"name-StudentID-BikeShareMyData.txt")

Use the sampled data ("my.data") to answer the following questions.

Draw histograms for ‘Registered users' and ‘Temperature' values, and comment on them.

Give the five number summary and the mean value for the ‘Casual users' and the ‘Registered users' separately.

Draw a parallel Box plot using the two variables; ‘Casual users' and the ‘Registered users'. Use the answers to Q1.2 and the Boxplots to compare and comment on them.

Draw a scatterplot of ‘Temperature' and ‘Casual users' for the first 200 data vectors selected from the "my.data" (name the axes) and comment on them.

Fit a linear regression model to the ‘temperature' (as x) and the ‘casual users' (as y) using the first 200 data vectors selected from the "my.data". Write down the linear regression equation. Plot the line on the same scatter plot. Compute the correlation coefficient and the coefficient of Determination. Explain what these results reveal.

Q2)

The table shows results of a survey conducted about the type of vehicle people own (in thousands) in different states over a five year period between 2011 and 2016.

 

State

New south Wales (N)

Victoria (V)

Queeensland (Q)

Total

Vehicle type

Passenger (P)

1360

1140

810

3310

Light commercial (C)

260

190

240

690

Total

1620

1330

1050

4000

Suppose we select a person at random,

What is the probability that the person is from Victoria (V)?

What is the probability that the person owns a light commercial vehicle (C)?

What is the probability that the person owns a passenger vehicle (P) and from New South Wales (N)?

What is the probability that the person owns a light commercial vehicle (C) given that he/she is from Queensland (Q)?

What is the probability that the person, who owns a passenger vehicle is from Queensland (Q)?

What is the probability that the person is from Victoria (V) or owns a passenger vehicle (P)?

find the marginal distribution of the vehicle type

find the marginal distribution of the state

find the conditional distribution of vehicle type within each state.

Q3)

Suppose that 20% of the adults smoke cigarettes. It is known that 60% of smokers and 15% of non-smokers develop a certain lung condition. What is the probability that someone with the lung condition was a smoker?

Q4) Maximum Likelihood Estimation (MLE)

The number of cars xi arrive at a shopping centre on a given day i is modelled by a Poisson distribution with unknown parameter θ as given by the following equation.

xi ~ Poid(θ)

Poid(θ) = p(xi|θ) = θxie/xi!

Assume that we consider N consecutive days, and the cars arrive at the shopping centre are independently and identically distributed (iid).

a) Show that the expression for the likelihood (joint distribution) p(X|θ) of the arrival of cars for N days (X = {x1, x2, ... , xN}) is given by

p(X|θ) = θNx¯e-Nθ/x1i!x2!x3!....xN!,

where x¯ = 1/N∑i=1Nxi

b) Find an expression for the logliklihood function L(θ) = ln (p(X|θ))

c) In order to find the Maximum likelihood Estimation (MLE) of parameter θ, we need to maximize the L(θ).

Find the value of θ that maximises L(θ) by differentiating the log likelihood function L(θ) with respect to θ and equating it to zero. Show that the Maximum likelihood Estimate θ^ (MLE) of parameter θ is given by:

θ^ = x¯,  where x¯ = 1/N∑Ni=1xi

d) Suppose that we observe the number of cars arrived on the three days as x1 = 100, x2 = 60 and x3 = 70.

What is the MLE given this data?

Q5) Bayesian inference for Gaussians (unknown mean and known variance)

What is the meaning of conjugate prior?

Why conjugate priors are useful in Bayesian statistics?

Give three examples of Conjugate pairs (i.e., give three pairs of distributions that can be used for prior and likelihood)

The annual rainfall received at the Murray basin are measured for n years. The average rainfall observed over the n years is 1100 mm. Assume that the annual rainfall are normally distributed with unknown mean θ and known standard deviation 200 mm. Suppose your prior distribution for θ is normal with mean 800 mm and standard deviation 100 mm.

a) State the posterior distribution for θ (this will be in terms of n. Do not derive the formulae)
b) For n=3, find the mean and the standard deviation of the posterior distribution. Comment on the posterior variance
c) For n=15, find the mean and the standard deviation of the posterior distribution. Compare with the results obtained for n=3 in the above question Q5.4(b) and comment.

Q6) Dimensionality Reduction:

Use the "BikeShare" data for this question. Use the following code to load randomly selected 200 (or 100) data points. Note that only features from 4 to 9 are used here.
the.data <- as.matrix(read.table("BikeShareTabSep.txt"))
selData <- the.data [sample(1:727,200),c(4:9)]

Save "selData" to a text file titled "name-StudentID-PCASelData.txt" using the following R code (NOTE you must upload this text file with your submission).

write.table(selData,"name-StudentID-PCASelData.txt")

Conduct a principal component analysis (PCA) on this data (selData). Use the below mentioned "biplot" code (in R) to produce a scatterplot using the first two principal components. Comment on the plot.
pZ <- prcomp(selData, tol = 0.01, scale = TRUE) pZ
summary(pZ) biplot(pZ)

Draw a graph of variance verses the principal components, and explain how this can be used to determine the correct number of principal components.

For the same data above (selData), compute the Euclidean distance matrix. Use the distance matrix to perform a classical multidimensional scaling (classical MDS or Metric MDS). You can use the following command

mds <- cmdscale(selData.dist) # here ‘selData.dist' is the distance matrix

Plot the results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using number of dimensions k set to 2. Use the following command to do this:

library(MASS)
fit<-isoMDS(selData.dist, k=2)

Plot the results of this isoMDS

Draw the Shepard plot for this isoMDS results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using the number of dimensions k set to 4.
library(MASS)
fit<-isoMDS(selData.dist, k=4)

Draw the Shepard plot for this isoMDS results and compare the plot obtained for k=2 in Q6.6 above. Comment on them

Q7) Clustering:

K-Means clustering: Use the data file "SITdata2018.txt" provided in CloudDeakin for this question. Load the file "SITdata2018.txt" using the following:
zz<-read.table("SITdata2018.txt") zz<-as.matrix(zz)
a) Draw a scatter plot of the data.

b) State the number of classes/clusters that can be found in the "SITdata2018" (zz).

c) Use the above number of classes as the k value and perform the k-means clustering on that data. Show the results using a scatterplot. Comment on the clusters obtained.

d) Vary the number of clusters (k value) from 2 to 20 in increments of 1 and perform the k-means clustering for the above data. Record the total within sum of squares

(TOTWSS) value for each k, and plot a graph of TOTWSS verses k. Explain how you can use this graph to find the correct number of classes/clusters in the data.

Spectral Clustering: Use the same dataset (zz) and run a spectral clustering (use the number of clusters/centers as 4) on it. Show the results on a scatter plot (with colour coding). Compare these clusters with the clusters obtained using the k-means above and comment on the results.

Attachment:- SITdata.rar

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M93079491
  • Price:- $110

Guranteed 48 Hours Delivery, In Price:- $110

Have any Question?


Related Questions in Homework Help/Study Tips

Review the website airmail service from the smithsonian

Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service. Go to the Airmail in America link and explore the additional tabs along the le ...

Read the article frank whittle and the race for the jet

Read the article Frank Whittle and the Race for the Jet from "Historynet" describing the historical influences of Sir Frank Whittle and his early work contributions to jet engine technologies. Prepare a presentation high ...

Overviewnow that we have had an introduction to the context

Overview Now that we have had an introduction to the context of Jesus' life and an overview of the Biblical gospels, we are now ready to take a look at the earliest gospel written about Jesus - the Gospel of Mark. In thi ...

Fitness projectstudents will design and implement a six

Fitness Project Students will design and implement a six week long fitness program for a family member, friend or co-worker. The fitness program will be based on concepts discussed in class. Students will provide justifi ...

Read grand canyon collision - the greatest commercial air

Read Grand Canyon Collision - The greatest commercial air tragedy of its day! from doney, which details the circumstances surrounding one of the most prolific aircraft accidents of all time-the June 1956 mid-air collisio ...

Qestion anti-trustprior to completing the assignment

Question: Anti-Trust Prior to completing the assignment, review Chapter 4 of your course text. You are a manager with 5 years of experience and need to write a report for senior management on how your firm can avoid the ...

Question how has the patient and affordable care act of

Question: How has the Patient and Affordable Care Act of 2010 (the "Health Care Reform Act") reshaped financial arrangements between hospitals, physicians, and other providers with Medicare making a single payment for al ...

Plate tectonicsthe learning objectives for chapter 2 and

Plate Tectonics The Learning Objectives for Chapter 2 and this web quest is to learn about and become familiar with: Plate Boundary Types Plate Boundary Interactions Plate Tectonic Map of the World Past Plate Movement an ...

Question critical case for billing amp codingcomplete the

Question: Critical Case for Billing & Coding Complete the Critical Case for Billing & Coding simulation within the LearnScape platform. You will need to create a single Microsoft Word file and save it to your computer. A ...

Review the cba provided in the resources section between

Review the CBA provided in the resources section between the Trustees of Columbia University and Local 2110 International Union of Technical, Office, and Professional Workers. Describe how this is similar to a "contract" ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As