Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Homework 6 - R assignment

1. Can we detect when a marketing campaign has been successful?

a. On homework 4, you simulated data from the TableFarm salad chain before and after the implementation of a new marketing campaign.  Read the combined data (both before and after) into R.  (You could do this by saving the data as a .csv file and using read.csv(), or by copying the data into a text file, separating the values by commas, and enclosing the data in c( ... ) to make a vector.)  Homework 4TableFarm  is below.

Average monthly revenue at each store in the TableFarm salad chain is $100,000, with a standard deviation of $12,000. An advertising firm claims they can increase monthly revenue to $120.000, but the standard deviation will be increased as well, to $25,000.

Write Python code to generate three lists of random numbers which model potential revenue: one list with 12 months of revenue using the current mean and standard deviation, another list with 12 months of revenue using the predicted mean and standard deviation, and a third list combining your first two lists. You can assume a normal distribution. Round each number to the nearest $1.000.

b. Make a scatterplot of the data.  Add a vertical line to mark the month in which the new marketing campaign began, and add a legend to your plot.

c. Make side-by-side boxplots of the revenue before and after implementing the marketing campaign.  Write a few sentences describing and comparing the boxplots, and relating them to the underlying model you used to simulate the data.

d. Based on the way you simulated the data, you know that the marketing campaign was successful; that is, the data after implementing the marketing campaign was simulated from an underlying model with a higher mean than before the marketing campaign.  However, in real life we probably wouldn't know this.  Based on the scatterplot and boxplots, would you be confident in claiming that the marketing campaign was successful?  Why or why not?

e. Write the null and alternative hypotheses for a test of whether the marketing campaign was successful.  (I.e., whether the mean revenue with the marketing campaign is higher than the mean revenue before the marketing campaign.)

f. In a few sentences, explain why a 2-sample, 1-sided t-test is appropriate for testing the hypotheses in part e.

g. Conduct a 2-sample, 1-sided t-test in R.  Include the R output and state your conclusion in the context of the problem.

2. Can we detect an association between chocolate consumption and Nobel prizes?  Homework 4 problems reffered to are below:

Researchers have observed a (presumably spurious) correlation between per capita chocolate consumption and the rate of Nobel prize laureates: see Chocolate Consumption. Cognitive Function, and Nobel Laureates. In this problem, we will create some sample data to simulate this relationship.

Write Python code to produce a list of 50 ordered pairs (c, n), where c represents chocolate consumption in kg/year/person and n represents the number of Nobel laureates per 10 million population. The values for c should be random numbers (not necessarily integers!) between 0 and 15. You may assume that c and n are related by

n = 0.4 · c - 0.8.

However, it is not possible for a nation to have a negative number of Nobel laureates, so if your predicted value of n is less than 0, replace that value by 0.

Report your values of c and n to 2 decimal places. Print your list of ordered pairs.

Problem - Error Term

Our list of data from part (a) is not a good simulation of real-world data, because it is perfectly linear. Starting with the c and n values you generated in part (a), generate new n values, using the following formula:

ne = n + c.

Here c should be a random variable with normal distribution, mean 0, and standard deviation 1. Using the list of ordered pairs generated in 3(a), create a new list of 50 ordered pairs (c, ne).

Again, your simulated data should not predict negative numbers of Nobel laureates. Again, do not generate a new list; make sure to use the list of ordered pairs already generated in 3(a).

Print your new list of ordered pairs.

a. On homework 4, you simulated data on countries' per-capita chocolate consumption and number of Nobel Prize winners, using an error term ? (representing random "noise").  Read these data into R and make a scatterplot of the number of Nobel Prize winners versus chocolate consumption.

b. Fit a linear model to the data.  What is the equation of the line of best fit?  How does it compare to the theoretical model you used to simulate the data?  Graph the line of best fit with the scatterplot.

c. State the null and alternative hypotheses for a test of whether the number of Nobel Prize winners (per 10 million population) is associated with per-capita chocolate consumption.

d. State your conclusion about the hypotheses in part c, in the context of the problem.

e. Graph the diagnostic plots for the regression. Explain what they tell us.

3. In homework 5, you counted the frequencies of letters in two encrypted texts.  In this problem, you will use statistical analysis to identify the language in which the text was written, and decrypt it.

a. Read the letter frequencies from encryptedA into R and attach the data.  Use the following code to make a barplot of the letter frequencies, with the letters listed in order of increasing frequency:  (Here I've assumed that your columns were named "key" and "count".)

encrypt_order = order(count)

barplot( count[encrypt_order], names.arg = key[encrypt_order] )

Be sure you understand what this code does.

b. The file Letter Frequencies.csv contains data on the frequencies of letters in different languages.  (Source:  http://www.sttmedia.com/characterfrequency-englishand http://www.sttmedia.com/characterfrequency-welsh, accessed 21 August 2015.  Used by permission of Stefan Trost.)  Read these data into R. 

c. In a single graphing window, display two bar plots:  A plot on top showing the encrypted frequencies, and a plot below it showing the frequencies of letters in English.  Each plot should be sorted in order of increasing frequency.  Each plot should also have a title telling whether it is from the encrypted text or from plain English.

d. Based on the shape of the plots, do you think it is likely that the encrypted text came from English?  Explain.

e. We want to conduct a hypothesis test to be more precise about whether it is plausible that the text came from English.  To do this, we will pair up each letter in the encrypted text with a letter in English, based on the order of frequency.  So, encryptedA "r" is paired with English "e", encryptedA "c" is paired with English "t", etc.  Then we will test whether the resulting letter frequencies plausibly come from a random sample of English words.

To pair up the letters, sort the vector of counts from the encrypted text in order of increasing frequency, and store it as a new vector.  Then do the same thing with the vector of frequencies from English.

f. To pair up the letters, we need the data (the counts of letters from encryptedA.txt) and the probability model (the theoretical frequencies from Letter Frequencies.csv) to have the same number of letters.  Depending on how you formatted your output from Python, your letter counts may include 20 or 26 letters.  This is due to the fact that some letters did not appear in the encrypted text, so they appeared 0 times.  If necessary, prepend 6 zeroes to the count vector to make it the same length as the theoretical frequencies:

count = c( rep(0, 6), count )

g. State the null and alternative hypotheses for a chi-squared Goodness of Fit test of this question.

h. To satisfy the assumptions of a Goodness of Fit test, we need the expected counts of each category to be greater than or equal to 5.  Find the total number of letters in the encrypted text.  Then multiply this number by the probabilities from Letter Frequencies.csv to get the expected counts. 

i. Combine categories (letters) to get expected counts that are greater than or equal to 5.  For example, if you decided to combine the first two categories, you could use the code

sortEnglish_combined = c( sum(sortEnglish[1:2]), sortEnglish[3:26] )

Combine the same categories in the encrypted counts.

j. Use R to conduct the chi-squared Goodness of Fit test. 

k. State your conclusion in the context of the problem.

l. Repeat stepsh-k for Welsh, and then repeat for both languages for encryptedB.  Based on the hypothesis tests, which text do you think came from which language?  How confident are you in your assessment?

m. Optional:  Try to decrypt the English text.  Simon Singh's Black Chamber website (http://www.simonsingh.net/The_Black_Chamber/substitutioncrackingtool.html) will automatically substitute letters for you, so you can test different possibilities for what English plaintext letter is represented by each letter in the ciphertext.  Start by substituting the letter E for the most common letter in the ciphertext.  Then use frequencies of letters in the ciphertext, common patterns of letters, and experimentation to determine other substitutions.

Attachment:- Assignment Files.rar

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92214388

Have any Question?


Related Questions in Computer Engineering

Whats the relationship between organizational performance

What's the relationship between organizational performance and financial management practices. These include capital structure decision, investment appraisal techniques, dividend policy, working capital management and fi ...

System analysis and design1 why is it important to keep the

System Analysis and Design 1) Why is it important to keep the output simple for display-based output technology? 2) Why is it important to consider how users learn when developing output? Explain. 3) Describe the differe ...

Determine whether or not the following claim is true for

Determine whether or not the following claim is true for all regular expressions r 1  and r 2 . The symbol ≡ stands for equivalence regular expressions in the sense that both expressions denote the same language.  (a) (r ...

We might consider a regression of the number of group

We might consider a regression of the number of group members and the efficiency of project completion. Which of those would be the dependent and which would be the independent variables?

Across the nine cities in multilevel multivariate analysis

Across the nine cities, in multilevel, multivariate analysis, controlling for income inequality (GINI coefficient), percent living in poverty and percent Non-Hispanic Black population, the ZIP code level overall HIV diag ...

Define risk and explain why this concept is important to

Define risk and explain why this concept is important to small business managers.

If sailsboro just paid a dividend of 2 per share amp

If Sailsboro just paid a dividend of $2 per share & dividends are expected to grow at 8%, 6%, and 5% for the next three years respectively. After that the dividends are expected to grow at a constant rate of 3% indefinit ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

What are content management systems cms describe the

What are Content Management Systems (CMS). Describe the challenges in implementing and maintaining CMS. Can internet search engines be considered as Content Management Systems - explain your answer.

About signed integer representation twos complement

About Signed Integer Representation. Two's Complement Overflow Explain how to perform the signed decimal to Hexadecimal conversion and vice versa. Show it with two examples for each.

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As