Please follow these instructions carefully, read them first! Your answers should be well structured and concise, and written in complete English sentences. Include your graphs and your code (in typepreparer font), comment it. Do not forget to number your answers to refer to the corresponding tasks below.
Create a data frame in R that will look like this:
The final dimension of the data set should be 48 rows by 2 columns. Each factor level (A, B, C, D) is replicated 12 times. For variable 1, use the following code:
Give this data frame a name of your choice and show the code you used to create it. describe as much of the code to populate variable 1 with numbers. Comment on the details of the code.
Think of a situation (an experiment or observational study) within your (future) area of research that could produce such a data set. describe (hypothetically) what was done and how the data were obtained. Throughout the rest of this assignment, pretend these are real data. However, in your interpretation, you may comment at any stage on how the data were created.
Explore your data set using at least 2 commands to summarise your data and at least 2 commands to visually depict your data (plots, histograms,...). At each step / for each plot you create, comment on what you see and what you learn from it.
Compare the maximum and minimum values within each group (factor level) to their respective group means. What is the largest absolute difference between one of your values and its group mean? What are the chances of obtaining such a value, assuming the data are normally distributed and centered around the respective group mean with a standard deviation of 1?
Formulate a hypothesis based on task 1 that you can test using ANOVA. Perform the analysis. Briefly describe what the F-value tells you and interpret the p-value. Draw a conclusion.
Regardless of whether you found significant differences between groups or not, perform a Tukey’s HSD test to determine what groups are significantly different from each other (even if none are). Briefly interpret this result in view of the nature of your data.
describe why the output contains an ‘adjusted p-value’. What does it adjust for? Can you outline a simulation experiment in R (you don’t need but you can show how you would program it) that illustrates the problem of multiple comparisons?
Compute the group means and then the variance of the group means for variable1 and assign this number to an object called ‘betweenVar’, meaning ‘between group variance’. Also compute the within-group variance of variable1 and assign this number to an object called ‘withinVar’ for ‘within group variance’.
Using the within- and between groups variances from the task above, conduct a power test for your ANOVA. Regardless of whether your ANOVA result was significant or not, discuss all aspects of the power test (sample size, type I and type II error, power, within- and between group variance) in accordance with the outcome of your ANOVA (as in task 4) and the nature of your data as you described it in task 1. For ex, if your ANOVA did not find a significant difference between groups, discuss how you could change your experimental set-up to find an expected difference. Or, if your ANOVA was significant, discuss the power of your test and (for ex) your type I error probability and whether / how you could reduce it. Talk about what the design of a larger experiment could look like based on the results of your pilot study, e.g. by bringing in financial aspects.
Show the differences between groups in a bar plot with standard errors. Mark those groups that do not differ significantly with the same letters. Adjust the margins, the label and tick size and position etc. until you think this graph could be published in a journal. Draft a figure caption that contains all the needed information: What is shown? Where? How was it measured? When? What test are the letters referring to?