Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Homework Help/Study Tips Expert

Some of the questions in this assignment require you to use the "BikeShare" dataset. This dataset is given as a text file, named "BikeShareTabSep.txt". You can download this from the Assignment folder in CloudDeakin. Below is the description of this dataset.

Bike sharing dataset (BikeShare)
This dataset gives the count of bikes rented between 11am - 12pm on different days and locations through the Capital Bikeshare System (operating in US cities) between 2011 and 2012. The variables include the following (9 variables):

Season: Categorical: 1 = Spring, 2 = Summer, 3 = Autumn (fall), 4 = Winter

Working day: 0 = Weekend, 1 = Workday.

Weather: Categorical variable
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered cloud
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

Temperature: Temperature in Celsius.

`Feeling' Temperature: `Feels like' temperature, reported in Celsius.

Humidity: Humidity (given as a percentage).

Windspeed: Windspeed (measured in km/h).

Casual users: Count of casual users that used a bike at that time.

Registered users: Count of registered users that used a bike at that time.

Assignment tasks

Q1):

• Download the txt file "BikeShareTabSep.txt" and save it to your R working directory.
• Assign the data to a matrix, e.g. using

the.data<-as.matrix(read.table("BikeShareTabSep.txt"))

• Generate a sample of 400 data using the following:

my.data <- the.data [sample(1:727,400),c(1:9)]

Save "my.data" to a text file titled "name-StudentID-BikeShareMyData.txt" using the following R code (NOTE: you must upload this text file with your submission).

write.table(my.data,"name-StudentID-BikeShareMyData.txt")

Use the sampled data ("my.data") to answer the following questions.

Draw histograms for ‘Registered users' and ‘Temperature' values, and comment on them.

Give the five number summary and the mean value for the ‘Casual users' and the ‘Registered users' separately.

Draw a parallel Box plot using the two variables; ‘Casual users' and the ‘Registered users'. Use the answers to Q1.2 and the Boxplots to compare and comment on them.

Draw a scatterplot of ‘Temperature' and ‘Casual users' for the first 200 data vectors selected from the "my.data" (name the axes) and comment on them.

Fit a linear regression model to the ‘temperature' (as x) and the ‘casual users' (as y) using the first 200 data vectors selected from the "my.data". Write down the linear regression equation. Plot the line on the same scatter plot. Compute the correlation coefficient and the coefficient of Determination. Explain what these results reveal.

Q2)

The table shows results of a survey conducted about the type of vehicle people own (in thousands) in different states over a five year period between 2011 and 2016.

 

State

New south Wales (N)

Victoria (V)

Queeensland (Q)

Total

Vehicle type

Passenger (P)

1360

1140

810

3310

Light commercial (C)

260

190

240

690

Total

1620

1330

1050

4000

Suppose we select a person at random,

What is the probability that the person is from Victoria (V)?

What is the probability that the person owns a light commercial vehicle (C)?

What is the probability that the person owns a passenger vehicle (P) and from New South Wales (N)?

What is the probability that the person owns a light commercial vehicle (C) given that he/she is from Queensland (Q)?

What is the probability that the person, who owns a passenger vehicle is from Queensland (Q)?

What is the probability that the person is from Victoria (V) or owns a passenger vehicle (P)?

find the marginal distribution of the vehicle type

find the marginal distribution of the state

find the conditional distribution of vehicle type within each state.

Q3)

Suppose that 20% of the adults smoke cigarettes. It is known that 60% of smokers and 15% of non-smokers develop a certain lung condition. What is the probability that someone with the lung condition was a smoker?

Q4) Maximum Likelihood Estimation (MLE)

The number of cars xi arrive at a shopping centre on a given day i is modelled by a Poisson distribution with unknown parameter θ as given by the following equation.

xi ~ Poid(θ)

Poid(θ) = p(xi|θ) = θxie/xi!

Assume that we consider N consecutive days, and the cars arrive at the shopping centre are independently and identically distributed (iid).

a) Show that the expression for the likelihood (joint distribution) p(X|θ) of the arrival of cars for N days (X = {x1, x2, ... , xN}) is given by

p(X|θ) = θNx¯e-Nθ/x1i!x2!x3!....xN!,

where x¯ = 1/N∑i=1Nxi

b) Find an expression for the logliklihood function L(θ) = ln (p(X|θ))

c) In order to find the Maximum likelihood Estimation (MLE) of parameter θ, we need to maximize the L(θ).

Find the value of θ that maximises L(θ) by differentiating the log likelihood function L(θ) with respect to θ and equating it to zero. Show that the Maximum likelihood Estimate θ^ (MLE) of parameter θ is given by:

θ^ = x¯,  where x¯ = 1/N∑Ni=1xi

d) Suppose that we observe the number of cars arrived on the three days as x1 = 100, x2 = 60 and x3 = 70.

What is the MLE given this data?

Q5) Bayesian inference for Gaussians (unknown mean and known variance)

What is the meaning of conjugate prior?

Why conjugate priors are useful in Bayesian statistics?

Give three examples of Conjugate pairs (i.e., give three pairs of distributions that can be used for prior and likelihood)

The annual rainfall received at the Murray basin are measured for n years. The average rainfall observed over the n years is 1100 mm. Assume that the annual rainfall are normally distributed with unknown mean θ and known standard deviation 200 mm. Suppose your prior distribution for θ is normal with mean 800 mm and standard deviation 100 mm.

a) State the posterior distribution for θ (this will be in terms of n. Do not derive the formulae)
b) For n=3, find the mean and the standard deviation of the posterior distribution. Comment on the posterior variance
c) For n=15, find the mean and the standard deviation of the posterior distribution. Compare with the results obtained for n=3 in the above question Q5.4(b) and comment.

Q6) Dimensionality Reduction:

Use the "BikeShare" data for this question. Use the following code to load randomly selected 200 (or 100) data points. Note that only features from 4 to 9 are used here.
the.data <- as.matrix(read.table("BikeShareTabSep.txt"))
selData <- the.data [sample(1:727,200),c(4:9)]

Save "selData" to a text file titled "name-StudentID-PCASelData.txt" using the following R code (NOTE you must upload this text file with your submission).

write.table(selData,"name-StudentID-PCASelData.txt")

Conduct a principal component analysis (PCA) on this data (selData). Use the below mentioned "biplot" code (in R) to produce a scatterplot using the first two principal components. Comment on the plot.
pZ <- prcomp(selData, tol = 0.01, scale = TRUE) pZ
summary(pZ) biplot(pZ)

Draw a graph of variance verses the principal components, and explain how this can be used to determine the correct number of principal components.

For the same data above (selData), compute the Euclidean distance matrix. Use the distance matrix to perform a classical multidimensional scaling (classical MDS or Metric MDS). You can use the following command

mds <- cmdscale(selData.dist) # here ‘selData.dist' is the distance matrix

Plot the results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using number of dimensions k set to 2. Use the following command to do this:

library(MASS)
fit<-isoMDS(selData.dist, k=2)

Plot the results of this isoMDS

Draw the Shepard plot for this isoMDS results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using the number of dimensions k set to 4.
library(MASS)
fit<-isoMDS(selData.dist, k=4)

Draw the Shepard plot for this isoMDS results and compare the plot obtained for k=2 in Q6.6 above. Comment on them

Q7) Clustering:

K-Means clustering: Use the data file "SITdata2018.txt" provided in CloudDeakin for this question. Load the file "SITdata2018.txt" using the following:
zz<-read.table("SITdata2018.txt") zz<-as.matrix(zz)
a) Draw a scatter plot of the data.

b) State the number of classes/clusters that can be found in the "SITdata2018" (zz).

c) Use the above number of classes as the k value and perform the k-means clustering on that data. Show the results using a scatterplot. Comment on the clusters obtained.

d) Vary the number of clusters (k value) from 2 to 20 in increments of 1 and perform the k-means clustering for the above data. Record the total within sum of squares

(TOTWSS) value for each k, and plot a graph of TOTWSS verses k. Explain how you can use this graph to find the correct number of classes/clusters in the data.

Spectral Clustering: Use the same dataset (zz) and run a spectral clustering (use the number of clusters/centers as 4) on it. Show the results on a scatter plot (with colour coding). Compare these clusters with the clusters obtained using the k-means above and comment on the results.

Attachment:- SITdata.rar

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M93079491
  • Price:- $110

Guranteed 48 Hours Delivery, In Price:- $110

Have any Question?


Related Questions in Homework Help/Study Tips

Assessment - diversity and professional organizations and

Assessment - Diversity and Professional Organizations and Journals 1. Read the first chapter from the required course text related to the foundations of multicultural education to understand how professional organization ...

If you havent already read the overview for the module do

If you haven't already, read the Overview for the module, do the reading assignments, and listen to the presentation/lectures. Answers can be found in the text and lectures. These are three short essay questions. Answers ...

Question many students begin their graduate programs

Question: Many students begin their graduate programs without considering how they will incorporate their studies into their current responsibilities and schedules. Online learning offers flexibility, but this is within ...

Complete this assignment as if you were a fire chief and

Complete this assignment as if you were a fire chief, and according to APA format. Memorandum to City Council: Chief officers are often viewed as the subject-matter expert in anything related to fire department functions ...

Comment 1 ulcerative colitis uc and crohns disease are

Comment 1: Ulcerative colitis (UC) and Crohns disease are inflammatory bowel diseases, not to be confused with inflammatory bowel syndrome (IBS). Describe the differences in symptom manifestations, and how those manifest ...

Assignment - practical assessmentscenariouser modelling

Assignment - Practical Assessment Scenario User Modelling Inc. would like to organize a series of conferences focusing on research topics in the area of user adaptive systems and personalization. They need to organize an ...

Question you are tasked by your police chief to provide

Question : You are tasked by your Police Chief to provide information to potential officer candidates going through the police academy about the importance of working with juvenile delinquents and the role the candidates ...

Question submit a 2 to 4 page paper for which you

Question: Submit a 2 to 4 page paper for which you articulate a position on eating disorders in adolescent girls from diverse racial and cultural experiences. Explain how the position is related to the biological and/or ...

Application observing infant developmentas emphasized in

Application: Observing Infant Development As emphasized in the Learning Resources, observation is an essential part of assessment and planning in infant settings. And, for you as a student, observing is an important way ...

Question 1 what strategies will you use in your new role in

Question: 1. What strategies will you use in your new role in health care to review and critique. Strategies can come in many shapes and forms.Know there is a process to review and critique various types of literature. L ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As