Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Statistics and Probability Expert

ASSIGNMENT

In the current assignment we apply some of the tools to analyze the data. The data was collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository.

The file "transfusion.csv" contains the data. The file can be found here. The file contains 5 variables:
- recency = The number of months since the last donation. (numeric)
- frequency = The total number of donations. (numeric)
- monetary = Total blood donated (in c.c.). (numeric)
- time = The number of months since the first donation. (numeric)
- march2007 = An indicator. Indicates those that donated blood in March, 2007. (factor)
In the assignment we consider the last four variables.

Comparing Two Samples
Consider "frequency" as a response and "march2007" as an explanatory variable. Plot the relation between the two variables, test the equality of the expectation in the two sub-samples and the equality of the variance. Repeat the same analysis for the case where the response "frequency" is replaced by the log-transformed response: "log(frequency)". In Tasks 1-3 you are asked to describe the results of the analysis.

Linear Regression
In Tasks 4-7 you are asked to conduct an analysis similar to the analysis of Tasks 1-3. The difference is that the numerical variable "time" is used as the explanatory variable. The model of linear regression assumes that the expectation of the response is a linear function of the explanatory variable. Another assumption of the model is that the variance of the response is constant for each value of the explanatory variable. Frequently, however, one may observe an increase in the variance for larger values of the explanatory variable. Replacing the response by the log-transformed response is a commonly used method to overcome this difficulty. The analysis that involves the log of the response can be carried out via the replacement of the response "frequency" in the formula by the transformed response "log(frequency)".

The Relation Between Two Variables
The final Task 8 involves the investigation of the relation between the response "frequency" and the variable "monetary".

Tasks

Comparing Two Samples:

1. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "march2007" in order to produce the two box-plots of the response. Redo the plotting with "frequency" replaced by "log(frequency)". The distribution of the variable "log(frequency)" is:

__ More symmetric, __ Less symmetric compared to the distribution of the variable "frequency".

Mark the most appropriate option and attach the R code that produces the two plots:

2. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The expectation of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The expectation of "log(frequency)" is the same in the two subsets.

Explain your answer:

3. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The variance of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The variance of "log(frequency)" is the same in the two subsets.

Explain your answer:

Linear Regression:

4. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "time" in order to produce the scatter plot. Add the regression line to the plot. The variability of the variable "frequency, for larger values of the explanatory variable, is:

__ Smaller, __ Larger, __ Constant.

Mark the most appropriate option and attach the R code that produces the two plots:

5. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "frequency" is equal to zero,

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "log(frequency)" is equal to zero.
Explain your answer:

6. The 95%-confidence interval of slope of "time" in the regression line of the response "log(frequency)" is:
Lower end = ____, Upper end = ____.

Attach the R code that produces the confidence interval:

7. The regression line between "time" as an explanatory variable and "log(frequency)" as a response is:
__ Increasing, __ Decreasing, __ Constant.

Mark the most appropriate option and explain your answer:

The Relation Between Two Variables:

8. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "monetary" in order to produce the scatter plot. Add the regression line to the plot. The points in the scatter plot are:

__ All on the same line, __ Show a linear trend but are not on the same line, __ Don't show a linear trend.

Mark the most appropriate option and attach the R code that produces the plot:

Attachment:- Data.rar

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M92721194
  • Price:- $30

Guranteed 24 Hours Delivery, In Price:- $30

Have any Question?


Related Questions in Statistics and Probability

1 given the following capital project datacost of

1. Given the following capital project data Cost of automation system (invoice):$730,000 Transportation and installation:$140,000 Training:$100,000 Firm's WACC:9% Firm's tax rate:35% Depreciation 5 years, straight line L ...

Question 1below you are given the examination scores of 20

Question 1 Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file). 52 99 92 86 84 63 72 76 95 88 92 58 65 79 80 90 75 74 56 99 a. Construct a frequency distributi ...

The board of a major credit card company requires that the

The board of a major credit card company requires that the mean wait time for customers when they call customer service is at most 3.00 minutes. To make sure that the mean wait is not exceeding the requirement, an assist ...

The random variablenbspxnbsptakes on the values 5 20 30 and

The random variable  X  takes on the values 5, 20, 30, and 200 with probabilites 0.60, 0.30, 0.08, and 0.02 respectively.  Use the statistical capacity of your calculator to find the expected value of  X rounded to one p ...

According to the same national collegiate athletic

According to the same National Collegiate Athletic Association data, the means and standard deviations of eligibility and retention rates (based on a 1,000-point scale) for the 2013-2014 academic year are presented, alon ...

A restaurant serves hot chocolate that has a mean

A restaurant serves hot chocolate that has a mean temperature of 175 degrees with a standard deviation of 8.1 degrees. Find the probability that a randomly selected cup of hot chocolate would have a temperature of less t ...

Help me study by answering this question a stock is just

Help me study by answering this question. A stock is just paid a dividend of $0.91 and is growing at a constant rate of 10 percent per year. If the required rate of return is 15 percent, what is the stock's expected pric ...

Question 1 a random variable x is defined as the difference

Question: 1) A random variable X is defined as the difference between the higher value and the lowervalue when two dice are thrown. If they have the same value, X is zero. a.) Find the probability distribution for X. b.) ...

An all-equiry business has 175m shares outstanding selling

An all-equiry business has 175M shares outstanding selling for $20/share. Management believes interest rates are unreasonably low and decides to execute a leveraged recapitalization. It will raise $1B in debt and repurch ...

At a college 66 of courses have final exams and 56nbsp of

At a college, 66 % of courses have final exams and 56 % of courses require research papers. Suppose that 45 % of courses have a research paper and a final exam.  Find the probability that a course has NONE of these two r ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As