MPH5041 Introduction to Biostatistics

Get Expert's Help on Formulation and Evaluation of Research Hypothesis

  • Alignment with Learning Outcomes (LO): Please see the Assessment 3 E-book
  • General Instructions: Please see the Assessment 3 E-book & Appendix 2
  • Instructions for Submission: Please see the Assessment 3 E-book
  • Marking Criteria: Please see the Assessment 3 E-book
  • Required Data: Please find the data file “QoL & Depression Data” in the “Assessment 3” section on the moodle site of this unit.
  • Returning date: Please see the Assessment 3 E-book
  • Please dn’t post any discussions related to the AT3 questions/answers on Discussion Violation of this request may affect your grade in AT3.
  • In case, you need any clarification on any questions please feel free to post your query on “Forum, Ask Your Instructor”.
  • Key for HD: Some of the questions in AT3 resemble of Formative Assessments in solution will be immensely useful.

Appendix 1: Useful TIPS

You will be given a prepared data set in which you will analyse data to evaluate research claims and statistical procedures used in the reporting of published work. This data may be different from the data in Assessment 1 and Assessment 2, as you will be using more advanced statistical methods for investigating a variety of hypotheses related to public health. This assignment covers Modules 1-10 of this unit for data analysis. All questions must be answered using the statistical methods from Modules 1-10.

The study description, questions and a dataset for this assessment will be made available one week (for 6 weeks unit)/two weeks (for 12 weeks unit) prior to the due date.

You can use IBM SPSS software for data analysis. Or you can use any other statistical software package that will allow you to import and analyse the given .sav data files.

In a SINGLE Word document, your report must:

  • Have a title page, which includes your full name and student ID, and the title of the assessment;
  • Have a content page, which lists the content of your report;

·         Tips for some selected questions or part of questions in AT3:

  • State the objectives and/or hypothesis of the study and justify the use of the selected statistical method/s to evaluate it;
  • Address all key issues related to the chosen statistical methods;
  • Analyse the given data using a statistical software package such as IBM SPSS;

Snipping Tool" to copy appropriate results from the SPSS output to word document;

  • Discuss the results in the context of the study question/hypothesis;
  • Provide a summary conclusion and impact of the findings in the context of the research question;
  • Evidence of originality and insight is expected throughout the presentation;
  • In case, you are analysing multiple variables for a question, please combine all these variables in each steps/sections in the presentation;
  • Have an appendix for all relevant data analysis
  • Present each question separately;
  • A clear and logical presentation is strongly

·         Please read the marking criteria/rubric.

Important information

  • The word limit does not include the content page, tables, graphs or
  • Your report must be formatted in A4 size (210 x 297 mm), with Times New Roman, size 12 font.

You must name your file with your file name and the assessment number: lastname- firstname-A3.docx

  • Your report must not be longer than 30 pages. If you have any more than that, only the first 30 pages will be
  • You must combine all the information into a SINGLE Word file, as you will only be able to make one submission to the assessment dropbox. For full information about submission, see the "Submission details and feedback" page later in this
  • You must not copy and paste information from the internet. If you do use internet information, please put it in your own words. References may not be required, but if you do use them, you must cite all of your sources using the Vancouver referencing style.
  • This assessment is worth 45% (HURDLE). It will be marked as a percentage of

For full information about how this assessment will be marked, refer to the Assessment criteria and rubric later in this book.


Study Description: A cross-sectional study was conducted in 2017 to determine the factors related to Health Related Quality of Life (HrQoL) and depression among people with type 2 diabetes mellitus (T2DM) in Bangladesh. The HrQoL was measured for each of the 1253 study participants in continuous scale between 0 and 1, where “0” represent the worst HrQoL and “1” represent the best HrQoL. The description of some selected variables are given in Table 1 below.

Table 1: Variable description with statistical code

Variable Description Statistical code (if any)
Gender Gender of patient 0 for male and 1 for female
Age Age of patient N/A
Location Area of residence 0 for rural and 1 for urban
Education Education level 0 for up to year 12 and 1 forgraduate and above
Duration_dm Duration of diabetes N/A
HbA1c Glycaemic level (HbA1c) 0 for controlled and 1 foruncontrolled
PA* Physical activity 0 for active and 1 for inactive
No_complic Number of complications N/A
HTN Hypertension 0 for no and 1 for yes
HrQoL Health Related Quality of lifescore N/A
Cog_func Cognitive function 0 for not-impaired and 1 forimpaired
Anxiety Presence of anxiety 0 for no-anxiety and 1 for anxiety
Depression Presence of depression 0 for no-depression and 1 fordepression

Macro- and micro-vascular complications (CAD, stroke, diabetic foot, retinopathy, nephropathy and neuropathy) are related to type 2 diabetes mellitus. Internationally standard questionnaires were used to assess patients’ physical activity, HrQoL, cognitive function, anxiety and depression. Hypertension was defined either as known previous detection, patient on anti- hypertensive medication or newly discovered blood pressure (BP) reading with systolic>140mmHg and diastolic >90mmHg. Glycaemic status was considered ‘good controlled’ for HbA1c <7% and ‘uncontrolled’ for HbA1c >= 7%.

Using the data described in the above table answer the Questions 1-3 below.          

Question 1 [5 marks]: Perform appropriate statistical analysis and evaluate the strength of relationship of HrQoL with patients’ age, duration of diabetes and number of complications.

Question 2 [30 marks]: Perform appropriate simple as well as multiple regression analyses with stepwise variable selection method (use backward elimination method) to find the variables (from the list in Table 1 above) those are significantly related to HrQoL. For this analysis make an initial assumption that HrQoL approximately follows the normal distribution, i.e., you do not need to evaluate pre-analysis normality of HrQoL.

Present the above simple and stepwise multiple regression analyses results in a table (follow Module 8 Formative Assessment (FA 8.3) - Model Question) and interpret the beta coefficients and 95% CI of both simple and multiple regression for the variables duration of diabetes and depression only. Then provide a summary discussion of the results followed by a conclusion and implication. Address all other (if any) relevant issues/results in your presentation.

Note: (1) for presentation please follow all necessary steps discussed in lecture; (2) please do not repeat the steps for each variable – follow Question 2 in AT2; (3) evaluation of model adequacy is not required for simple regression.

Consider that you shared the above analysis results with your colleague who has expertise in clinical/public health study data analysis. Your colleague recommended to adding the following variables into your multiple regression model: current hypertension status, systolic blood pressure, creatinine level (a measure of kidney function), and diastolic blood pressure. Assume that these variables are available in the database. Briefly discuss would you address this recommendation?

Question 3 [15 marks]: Perform appropriate simple as well as multiple regression analyses with stepwise variable selection method (use backward: Wald) to find the variables those are significantly related to depression. Exclude HrQoL from your analysis. Present the results in a table (follow the Formative Assessment in Module 10) and provide a summary discussion of the results followed by a conclusion. Address all other (if any) relevant issues/results in your presentation.

Using the above results, predict the risk of depression for a physically active patient who completed graduate degree and have five complications, and also have anxiety and impaired cognitive function.

Note: (1) for presentation please follow all necessary steps discussed in lecture; (2) please do not repeat the steps for each variable – follow Question 2 in AT2.

Note: For Questions 4 & 5 you do not need to follow the steps outlined in the lecture and/or tutorial.

Question 4 [25 marks]:                                                                                                               

Objectives: To examine the effect of different stages of chronic kidney disease (CKD) on patients’ risk of post-operative mortality and complications following isolated coronary artery bypass grafting (CABG) in a large cohort of patients who had cardiac surgery.

Description: All patients who underwent isolated CABG in the cohort were reviewed, and their preoperative glomerular filtration rates (eGFR) were estimated using the Chronic Kidney Disease Epidemiology Collaboration creatinine equation.

The CKD stages were classified as follows: normal: eGFE ≥ 90 ml/min/1.73m² and not on dialysis, mild: eGFR 60-89 ml/min/1.73m² and not on dialysis, moderate: eGFR 30 - 59 ml/min/1.73m² and not on dialysis, severe: eGFR < 30 ml/min/1.73m² and not on dialysis; and dialysis dependent.

Analysis Method: The descriptive statistics for various post-operative outcomes were reported as percentages (see Table 2). The effect of CKD stages on each of the outcomes following isolated CABG were examined using multiple logistic regression method. In the multiple logistic regression analysis the CKD variable was adjusted for other 12 predictors (please see the list below the Table 3), i.e., there were 13 predictors in each of the regression models including CKD stages. However, the OR, 95% CI and p-value were reported only for CKD stages (see Table 3). Normal CKD stage was considered as the reference category in the multiple logistic regression analysis. Thus, the ORs in the Table 3 quantify the odds of various CKD stages (moderate to severe) as compared to normal CKD stage. Please see the Appendix for a brief description of post-operative mortality and complications.

Discuss the results in Tables 2 and 3 and make a summary conclusion followed by the impact of the findings. Your answer must have only the following three separate sections:

  • Section 1: Summary (overall) discussion of descriptive statistics (presented in Table 2) of post-operative outcomes by CKD
  • Section 2: Summary (overall) discussion of multiple logistic regression analysis results presented in Table
  • Section 3: Make a brief summary conclusion about the effect of CKD on post-operative mortality and complications (see column 1 in Table 3 for the list of these variables) followed by the impact of the

Table 2: Descriptive statistics (%) of post-operative outcomes following CABG by CKD stages.

Kidney function (GFR ml/min/1.73m²)
Post-operative outcomes Normal function (≥ 90) Mild dysfunction (60-89) Moderate dysfunction 30-59) Severe dysfunction (< 30) Dialysis
30-day mortality 0.5 1.4 2.8 5.5 4.3
Post-op Stroke 0.7 1.2 2.3 2.1 1.9
Reoperation for bleeding 1.8 2.4 2.9 2.6 1.5
MI within 21 days 29.5 26.3 29.8 37.1 31.8
New renal failure 1.2 2.6 7.1 7.6 3.8
Return to theatre 3.6 4.6 6.9 7.8 8.0
Prolonged ventilation> 24 hrs 6.0 7.2 12.4 16.1 15.6
Septicemia 0.6 0.7 1.4 2.2 2.5
Post-operative stay> 14 days 18.9 23.9 36.3 46.9 50.8
Readmission within≤ 30 days from surgery 8.4 9.1 11.3 13.1 25.2
New cardiacarrhythmia 20.7 29.9 34.7 32.7 32.1
Deep sternalinfection 0.6 0.5 1.1 1.3 2.3
Reoperation for deep sternalinfection 0.4 0.4 0.8 0.8 1.5
Red blood cells transfusion 28.7 35.5 52.7 61.8 65.3
Pneumonia 3.6 3.7 5.1 5.2 6.5

Note: the data in the above table shows the % of outcome (yes) within each category of kidney function. For example percentages of death (30-day mortality) among sever kidney dysfunction was 5.5% and that among the dialysis

was 4.3%, etc.  You don’t have to discuss every single %s in the table.

Table 3: Multiple logistic regression analysis of various post-operative outcomes following CABG.

                                                  Kidney function (GFR ml/min/1.73m²)                                                 
 Outcome / Complications Mild dysfunction (60-89) n=15,626 Moderate dysfunction (30-59)n=7,037 Severe dysfunction (< 30)n=720 Dialysis n=475
                                                                                                        OR (P-value, 95% CI)                                                             

30-day mortality


1.8 (0.002, 1.3 – 2.6)


2.2 (<0.001, 1.5 – 3.3)


3.6 (<0.001, 2.2 – 6.1)


4.4 (<0.001, 2.4 – 8.2)

Post-operative Stroke 1.3 (0.094, .98 – 1.9) 2.0 (<0.001, 1.4 – 2.8) 1.6 (0,147, 0.8 – 3.0) 2.0 (0.061, 0.89 – 4.2)
Reoperation for bleeding 1.3 (0.011, 1.1 – 1.6) 1.6 (<0.001, 1.3 – 2.1) 1.5 (0.137, 0.9 – 2.5) 0.8 (0.669, 0.4 - 1.8)
New renal failure 2.1 (<0.001, 1.7 – 2.7) 5.1 (<0.001, 4.0 – 6.6) 4.7 (<0.001, 3.2 – 6.8) 2.5 (0.001, 1.4 – 4.3)
MI within 21 days 0.9 (0.016, .82 – .95) 0.9 (0.020, 0.8 – 0.99) 1.2 (0.121, 0.99 – 1.5) 1.0 (0.726, 0.8 – 1.4)
Return to theatre 1.2 (0.018, 1.01 – 1.4) 1.6 (<0.001, 1.3 – 1.9) 1.6 (0.004, 1.2 – 2.3) 2.0 (<0.001, 1.4 – 2.9)
Prolonged ventilation 1.1 (0.091, .96 – 1.3) 1.7 (<0.001, 1.4 – 1.9) 2.0 (<0.001, 1.6 – 2.6) 2.5 (<0.001, 1.8 – 3.3)
Post-operative stay > 14 days 1.1 (0.002, 1.01 – 1.2) 1.6 (<0.001, 1.4 – 1.7) 2.3 (<0.001, 1.9 – 2.8) 3.5 (<0.001, 2.9 – 4.3)
Red blood cells transfusion 1.1 (0.008, 1.01 – 1.2) 1.7 (<0.001, 1.5 – 1.8) 2.3 (<0.001, 1.9 – 2.7) 3.7 (<0.001, 3.0 – 4.5)
Pneumonia 1.0 (0.630, 0.8 – 1.1) 1.2 (0.045, 1.01 – 1.5) 1.1 (0.608, 0.8 – 1.6) 1.5 (0.069, .92 – 2.2)
Deep sternal infection 0.8 (0.322, 0.6 – 1.2) 1.6 (0.043, 1.01 – 2.4) 1.4 (0.412, 0.6 – 3.2) 3.4 (<0.001, 1.7 – 6.8)
Reoperation for deep sternal infection 0.9 (0.650, 0.5 – 1.5) 1.6 (0.106, 0.9 – 2.7) 1.5 (0.460, 0.5 – 4.0) 3.6 (0.004, 1.5 – 8.6)
Septicemia 1.2 (0.455, 0.8 – 1.7) 1.9 (0.002, 1.3 – 2.8) 2.7 (0.002, 1.4 – 5.0) 3.2 (0.001, 1.6 – 6.3)

Note: Variables in the logistic model: CKD stages (reference category: normal function), age, gender, heart ejection fraction, previous heart surgery, urgency status, New York Heart Association class, previous MI, peripheral vascular disease, cardiogenic shock, inotropes at day of surgery, anticoagulation at day of surgery, IV nitrates at day of surgery. You may not have to discuss every single ORs in the table.

Question 5 [25 marks]: Short Answer Questions     

  1. [5 marks] Consider that the creatinine level (measured in continuous scale) of patients with type 2 diabetes mellitus follows the normal If you construct a sampling distribution of sample mean for small samples, what would be its distribution? No data analysis required.
  2. [5 marks] Consider 4 groups (A, B, C and D) of diabetic patients who were treated by four different Their fasting HbA1c mmol/L levels were as follows.
Group A Group B Group C Group D
5.6 4.3 4.8 6.1
7.2 4.9 4.4 7.2
10.3 6.9 6.8 5.1
8.4 7.8 5.8 6.1
6.3 8.8 5.3
9.1 5.4 5.6
7.5 8.1 6.6
6.2 5.7 8.1
6.3 7.2
5.3 5.2

If the data in groups A and C are non-normal but normal in groups B and D, what are the statistical methods that could have been used to analyse the difference between these four treatment groups? Justify your answer. No data analysis required.

  • [6 marks] A clinician is performing a multiple regression analysis to identify predictors of current hypertension status (classified as normotensive or no hypertension, pre-hypertensive and hypertensive) among people with type 2 diabetes mellitus in He is considering gender, age, body mass index, education level (up to year 11/above year 11), area of residence (urban/rural), duration of diabetes, adherence to treatment (yes/no), creatinine level, and kidney function (classified as normal or mild, moderate, severe or dialysis) as the potential predictors into the multiple regression model. What type of regression method you recommend? Do you have any further feedback/comment on his data analysis plan? Discuss briefly. No data analysis required.
  1. [6 marks] The following graph shows the regression model “Birth-Weight = 6 + 0.596*Oestriol Level” where the data points A and B were excluded from the analysis. If you rerun the regression with all data points including A and B, what would be the possible effects of these two new data points (A and B) on the constant (baseline effect) and beta coefficient of the regression model? No data analysis required.
  2. [3 marks] The regression model “Birth-Weight = 21.6 + 0.596*Oestriol Level” shown in the following graph was obtained excluding data point A from the analysis. If the data point A is included in the analysis how would you describe its effect on the constant (baseline effect) and beta coefficient of the regression model? No data analysis

Appendix 2: Definitions of common post-operative outcomes   

  • 30-day mortality is the death within 30 days after the
  • Stroke is defined as any new central neurological deficit whether permanent (> 72 hours) or transient (resolved within 72 hours).
  • New renal failure is defined as the occurrence of at least two of the following after the procedure; serum creatinine increase to > 2 mmol/l, doubling or greater increase in serum creatinine over pre-operative value, or a new requirement for dialysis/haemofiltration.
  • Return to theatre is defined as return to the operating theatre for the management of post- operative complications, and this includes procedures done in ICU that normally would be performed in the operating
  • Prolonged ventilation is defined as post-operative ventilator support for a total period of longer than 24 hours.
  • Prolonged post-operative stay is defined as discharge from hospital after 14 days of the
  • RBC transfusion is defined as Red Blood Cells transfused intra and/or post operatively.
  • Reoperation for bleeding is defined as operative re-intervention for bleeding/tamponade.
  • Septicaemia is defined as septicaemia proved by positive blood cultures supported by at least two of the following a) fever, b) elevated granulocyte cell counts, c) elevated and increasing CRP, d) elevated and increasing ESR, post-operatively.
  • Deep sternal infection is defined as infection involving muscle and bone, with or without mediastinal involvement, as demonstrated by surgical exploration. Must have: wound debridement and one of the following: a) positive culture, b) treatment with
  • Readmission within 30 days from surgery is defined as patient re-admission as in-patient within 30 days from the date of surgery for any reason (date of surgery counts as day 0).
  • New cardiac arrhythmia is defined as any new post-operative arrhythmia that required
  • Pneumonia is defined as pneumonia diagnosed by one of the following; a) positive sputum or trans-tracheal aspiration, b) clinical finding of pneumonia including radio-graphical
  • Peri-operative MI is defined as MI during the surgery diagnosed by; a) enzyme level elevation, b) new wall motion abnormalities, c) serial ECG (at least two) showing new Q

Expert's Answer


Hire Expert 

Get a Professional Help

Select FileChangeRemove