Ask Applied Statistics Expert

Linear and Logistic Regression Assignment-

SECTION A - Case-control study

The dataset "data_assessment2_ccstudy.dta" provides data from 560 patients admitted to hospital (in a region with malaria) who are part of a hypothetical nested case-control study. There are 140 patients who died within 1 year of hospital admission (cases) and 420 controls, the cases and controls have been selected from a larger cohort study of 13000 patients where 140 had died within 1 year of follow-up. Sex and age were routinely recorded in the hospital admission records. For the case-control study further information, haemoglobin level and malaria infection status on admission, were extracted from laboratory data records.

The variables in this dataset are:

Variable name                   Description

id                                      Unique identifier

dead                                  Died within 1 year of hospital admission (0 = control, 1 = case)

age                                    Age at baseline (years)

haemoglobin                       Haemoglobin level at baseline (g/dL)

malaria                               Malaria at baseline (0 = no malaria; 1 = malaria)

male                                   Sex of patient (0=female, 1=male)

We will use multivariable logistic regression to investigate the evidence for an association between haemoglobin and death, controlling for the possibility that this association is confounded by other exposure variables that appear in the dataset.

1. Description of study sample

a) Present histograms for age and haemoglobin and describe the distribution of these variables in terms of approximate normality and appropriate measures of centrality and spread.

b) Provide a table that summarises the distribution of age, sex, haemoglobin, and malaria with separate columns for those who died (cases) and the controls (remember this is a case-control study).

c) Using the information regarding the numbers of patients who died in the 1 year follow-up period in the cohort study, estimate the odds of death for a patient in the cohort study.

d) Calculate the estimated odds of death in the case-control study. Why isn't this estimate equal to the odds of death in the cohort study (calculated in 1c)?

2. Univariable logistic regression models

Consider the two univariable logistic regression models of the outcome dead on the variables male and malaria.

a) Present in a table the estimated odds ratios, 95% confidence intervals for the population odds ratio and p-values for the two separate simple logistic regressions.

b) Interpret the Odds Ratios for the univariable logistic regression of death and malaria (an interpretation of the p-value and confidence intervals is not required). Since the study population for this case-control study is hospital 'in-patients' what further information may you want regarding the patients without malaria infection on admission?

3. Linear association between exposure & outcome

We must decide whether it is reasonable to assume a linear association between the numerical exposure variables, age and haemoglobin, and the log odds of death.

a) Create a new variable in the dataset containing quintiles of age using the xtile command:

xtile age_q5=age, nq(5)

Use Stata to plot the log odds of death versus age_q5.

Note please use the Stata option commands:

ciplot yscale(log) yscale(range(0.5 2)) ylabel(0.25 0.5 0.75 1 1.5 2)

[Note:- for earlier versions of Stata you may need to replace "ciplot" above with "graph"]

Briefly summarise the plot, by describing whether the association looks linear.

b) Using the variable age_q5, fit separate simple logistic regression models with age_q5 as a categorical variable and as a continuously valued variable. Compare the models using the likelihood ratio test and comment on whether the association between log odds of death and age is linear.

c) Repeat parts 3a) & 3b) to investigate whether the association between haemoglobin and the log odds of death is linear. Briefly comment on whether the association is linear and state the null hypothesis being tested here.

4. Multivariable logistic regression models - Confounding

Now use univariable logistic regression to estimate the unadjusted odds ratios of death for haemoglobin and all three potential confounders (age, sex, and malaria). Then use multivariable logistic regression (including all four variables) to estimate the adjusted odds ratios. Include haemoglobin and age as categorical variables with the following groupings - age (< 3 & ≥ 3 years, with ≥ 3 years as the reference group) and haemoglobin (<9 (low), 9-14 (normal), >14 (high) g/dL, with the haemoglobin group 9-14 g/dL set as the reference group) [Hint - use 'gen' and 'replace' commands to create new variables].

a) Present in a table two columns - the unadjusted Odds Ratios (95% Confidence Intervals) and the adjusted Odds Ratios (95% Confidence Intervals) for the association between haemoglobin, age, sex, and malaria and the odds of death.

b) Comment on any confounding observed by considering any changes in the odds ratio of haemoglobin (categorical version) from the univariable to the multivariable logistic regression.

c) Investigate the confounding by exploring any univariable associations (in the controls only) between haemoglobin and the potential confounders.

d) Comment on the associations between the potential confounders and the outcome (after adjusting for the exposure of interest, haemoglobin). Together with what you found in 4c, comment on which variables are confounding the association between haemoglobin and death.

5. Final presentation of results and Stata do file

a) Please write a summary (abstract) based on the analyses you performed in the previous questions to answer the research question "Is there an association between haemoglobin and death?" (maximum word count of 200). Your summary should have the headings:- Aim, Study Design, Statistical Methods, Results.

b) Please provide a copy of your Stata do-file for performing the statistical analyses required for questions 1 to 5. Do not upload a second file when submitting your assignment but instead copy and paste the 'Stata do-file' to your word document.

SECTION B -

The dataset "data_assessment2_lupus.dta" provides cross-sectional data from 60 women who have Systemic Lupus Erythematosus (SLE), a chronic, multisystem autoimmune disease. The treatment for SLE often involves steroid therapy. The clinical researcher is particularly interested in bone loss in SLE and the impact of steroid usage. She is seeking your assistance in analysing a dataset she has compiled consisting of bone mineral density at one location (left hip), whether steroids had ever been prescribed or not, and the patient's age and smoking history (ever/never).

The variables in the dataset are:

patid                                 patient identification number

hipbmd                              bone mineral density measurement at the left hip in mg/cm2

ster_evr                             steroid usage coded as 1 for Ever usage and 0 for Never

age                                    age in years

smoker                              smoking history: coded as 1 for an ever smoker and 0 for never smoker

The research question of interest is:-

Is the relationship between steroid usage and bone mineral density modified by smoking and age?

6. Linear association between age & hipbmd;

a) Assess both visually and statistically (by including an additional squared term of age in the model) if it is reasonable to assume a linear association between hipbmd versus age.

7. Univariable and multivariable linear regression

Perform univariable linear regression to obtain the unadjusted associations between the outcome hipbmd and steroid usage, age and smoking. Following this perform multivariable linear regression including all three covariates.

a) Present in a single table the estimates, 95% confidence intervals and p-values of the univariable and multivariable linear regression analyses with separate columns for the unadjusted and adjusted estimates.

b) Interpret the adjusted association between steroid usage and bone mineral density.

c) Investigate if the association between steroid usage and bone mineral density is modified by smoking, after controlling for age.

d) Investigate if the association between steroid usage and bone mineral density is modified by age, after controlling for smoking.

8. (Concluding statement; 5 marks) Describe for the clinician in a single paragraph the results of your statistical analyses, in particular, addressing her research question (maximum 100 words).

Assignment link - https://www.dropbox.com/s/gs0ksj8lmmmgcpi/Assignment.zip?dl=0.

Applied Statistics, Statistics

  • Category:- Applied Statistics
  • Reference No.:- M91960446

Have any Question?


Related Questions in Applied Statistics

Question onea a factory manager claims that workers at

QUESTION ONE (a) A factory manager claims that workers at plant A are faster than those at plant B. To test the claim, a random sample of times (in minutes) taken to complete a given task was taken from each of the plant ...

You are expected to work in groups and write a research

You are expected to work in groups and write a research report. When you work on your report, you need to use the dataset, and other sources such as journal articles. If you use website material, please pay attention to ...

Assignment -for each of the prompts below report the

Assignment - For each of the prompts below, report the appropriate degrees of freedom, t statistic, p-value and plot using the statistical software platform of your choice (R/STATA) 1) A sample of 12 men and 14 women hav ...

Assignment - research topicpurpose the purpose of this task

Assignment - Research topic Purpose: The purpose of this task is to ensure you are progressing satisfactorily with your research project, and that you have clean, useable data to analyse for your final project report. Ta ...

Assessment task -you become interested in the non-skeletal

Assessment Task - You become interested in the non-skeletal effects of vitamin D and review the literature. On the basis of your reading you find that there is some evidence to suggest that vitamin D deficiency is linked ...

Part a -question 1 - an analyst considers to test the order

PART A - Question 1 - An analyst considers to test the order of integration of some time series data. She decides to use the DF test. She estimates a regression of the form Δy t = μ + ψy t-1 + u t and obtains the estimat ...

Medical and applied physiology experimental report

Medical and Applied Physiology Experimental Report Assignment - Title - Compare the working and spatial memory by EEG. 30 students were tested (2 memory games were played to test their memory - a card game and a number g ...

Business data analysis computer assignment -part 1

Business Data Analysis Computer Assignment - PART 1 - Economists believe that high rates of unemployment are linked to decreased life satisfaction ratings. To investigate this relationship, a researcher plans to survey a ...

Question - go to the website national quality forum nqf

Question - Go to the website, National Quality Forum (NQF), located in the Webliography, and download the article by WIRED FOR QUALITY: The Intersection of Health IT and Healthcare Quality, Number 8, MARCH 2008. You are ...

Go to the webliography source for the national cancer

Go to the Webliography source for the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program. In the Fast Stats, create your own cancer statistical report, "Stratified by Data Type," and u ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As