Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask DBMS Expert


Home >> DBMS

Data Cleaning

Problem 1 :
The dataset is missing a lot of data, suggest an explanation for

(a) missing payment card numbers

(b) prices 1 and

(c) end addresses

Finally, (d) calculate what percentage of prices are missing and suggest a way to deal with both missing prices and other missing data before using the dataset for analysis

Whatever you decide to do in (d), apply it to the given dataset - including sorting out not missing but inconsistent and missing data data - and save it as stage1.csv in your final submission

Descriptive Statistics
For this problem include in your latex file itself which R-code you used for answering each part of the question.

Problem 2:

a) What is the average journey cost

b) Which journey (between which addresses) is the most popular and what is the median time for this journey?

c) What is the average and median duration of the journeys? Hint: You may find it helpful to create and addition column in excel to hold the calculated journey times, and read the resultant file into R, before doing the analysis.

Modelling

For this problem describe in your latex file itself which R-code you used for answering each part of the question.

Problem 3:

The prices are not missing due to the corresponding transaction being paid in cash 1

a) Create a plot of the relationship between journey time and cost

b) Is there a linear relationship between these variables? Show your reasoning for this answer, mentioning the type of model you use to answer this question

c) Can you suggest a rough set of categories by which journeys can be clustered? Suggest the model that you can use to find this out, and the values representative of the clusters 2. Furthermore, explain what these clusters can be understood to represent conceptually with respect to the journeys

d) Run the appropriate validation test for your clustering model, and explain how this affects your certainty about your categories

Question Refinement and Hypothesis Testing

For this problem describe in your latex file itself which R-code you used for answering each part of the question.

Problem 4 :

As a data scientist hired by Uber you have been asked to simply figure out ways to reduce costs. However, you only have the attached customer data as input. Uber has said that this customer's behaviour is representative of a important sector of the market in the Al Naseem area.

Your task is to figure out if the question of 'how costs can be reduced', can be answered by the given data. Your initial consultation with someone in finance reveals that troublesome customers, defined as undecisive customers that keep cancelling their ubers after ordering them without the 5 minutes elapsing, are an increasing cost.

Your further discussion with the product engineering manager shows that there is an idea for creating a private rating of uber users based on this troublesome behaviour: users who cancel a large percentage of their trips will be given low ratings. And users with low ratings will not be 'actually assigned ubers' (even though the application may show otherwise) until a few minutes after they have ordered the uber.

a) Explain briefly how costs can be reduced with such a rating system

b) Suggest a refined question about saving costs, and what you expect to benefit from answering this question

c) What would be a way to answer this question with the given data

d) Suggest a hypothesis test, stating the null and alternate hypothesis. Assume here that if the user cancels 30 percent or more of their rides then they will get low ratings

e) Perform the test on the attached dataset, are you inclined to accept the null or alternate hypothesis explain your choice

f) Given the user data you analysed is representative of 1000 users, and assuming that cancellations within 5 minutes cost on average 3 SAR, how much money do you think you can save and over how many months?

Presentation

Problem 5:

Communicate your problem, question, refined question, statistical test results and overall conclusions from Problem 4 to your manager using the necessary visualisations. You should use your results from the prior problems to inspire or encourage your final argument 3

Attachment:- Data.rar

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M92565858
  • Price:- $50

Priced at Now at $50, Verified Solution

Have any Question?


Related Questions in DBMS

You can work on this assignment individually or in a group

You can work on this assignment individually or in a group of 2. If you are working in a group please establish a group in Assignment 2 Group on Canvas In this assignment you are asked to explore the use of neural networ ...

Assignment question - write and run sql statements to

Assignment Question - Write and run SQL statements to complete the following tasks Part A - DML 1. Locate the record in the vendor table that does not have a value for the attribute V_STATE 2. Find the customers whose ba ...

The relation memberstudentid organizationid roleid stores

The relation Member(StudentId, OrganizationId, RoleId) stores the membership information of student joining organization. For example, ('S1', 'O2', 'R3') indicates that student with Id 'S1' joined the organization with i ...

Quesiton 1 what is data-manipulation language dml there are

Quesiton: 1. What is Data-Manipulation Language (DML)? There are four types of access in DML, explain each one. 2. Assume we have a Library Database consists of the following relations: author(author_id, first_name, last ...

Students will select a situation or problem from their

Students will select a situation or problem from their company as a course project that can be solved using a database system. Using MS Access, or MS SQL Server Express, students will create a relational database model o ...

Databases assignment - monash library services monlib case

Databases Assignment - Monash Library Services (MonLib) Case Study TASK 1: Data Definition For this task you are required to complete the following: 1.1 - Add to your solutions script, the CREATE TABLE and CONSTRAINT def ...

Project outline and requirements provide a brief

Project Outline and Requirements Provide a brief description of the organization (can be hypothetical) that will be used as the basis for the projects in the course. Include company size, location(s), and other pertinent ...

Assignment task -write and run sql statements to complete

Assignment Task - Write and run SQL statements to complete the following tasks Part A - DML 1. Show the details of the products where the product code starts with '22'. 2. Display the vendor details from areacode 615. 3. ...

Case study problem 1 the case study company has experienced

Case Study: Problem 1 The case study company has experienced rapid growth in both the size of its client base and also in the services provided to clients. Unfortunately, the growth in data management policies, procedure ...

Sql assignmentin these exercises youll enter and run your

SQL Assignment In these exercises, you'll enter and run your own SELECT statements. You will use the MyGuitarShop database for these queries. If you do not already have the MyGuitarShop database, the SQL script and the i ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As