The dataset Vital Capacity contains 24 observations of workers in the cadmium industry. In the column exposurethe value 10 indicates exposure of more than 10 years and 0 indicates no exposure at all. Column age gives the age in years and column vital. Capacity the vital capacity (a measure of lung volume) in litres.
a) Compare the vital capacity in the two exposure groups.
b) Why is it misleading to draw the conclusion from the test above that long-time exposure to cadmium reduces the vital capacity? Illustrate the doubts by two graphs (one scatter plot and one box plot).
c) One way two analyse the data would be to use multiple linear regression with vital capacity as response variable and suitable predictors. Do that! What is your conclusion based on that analysis? (The true nature of cadmium risk is of course another story, this is just an exercise).