# Biostatistics Analysis Using SPSS or MS Excel (Statistics Project Sample)

The task is a biostatistics assignment whereby the undergraduate student was expected to read a case and carry out an analysis using spss or excel for various statistical measures. The analysis was carried out in spss, and the results were presented in word. In addition to the analysis, the student was expected to give an ex[plantation for each statistical finding.

* Can we predict patients’ baseline HbA1c (in mmol/mol) from their Total Cholesterol (in mmol/L)? Provide a complete statistical investigation using correlation and regression analyses (including the regression equation) with proper written interpretation and visual illustrations.

(7 marks)

* Drawing on your conclusion in the previous question. Can we add other variables to the regression model for confounding effect control? Provide at least two confounding variables with proper justification? (3 marks)

Patient’s baseline HbA1c in mmol/mol can be predicted from their Total Cholesterol in mmol/L by fulfilling the assumptions of linear regression model. According to Casson & Farmer (2014) these assumptions are:

* The two variables must be continuous and measurable.

* The two variables must have a linear relationship

* The variables must contain no significant outliers.

* For each value of X, the residual term e must be normally distributed with a mean of zero.

* For all values of X, the spread of the residual term e must be equal.

Using Spearman’s Rank Order Correlation test on the data we find that the two variables are not normally distributed. This is attributed to the significant outliers in the data. However, on eliminating the outliers as recommended by Dhakal (2017) a correlation analysis by Pearson r correlation analysis show that the two variables have a significant linear relationship with r = 0.11, and p = 0.013. Since the variables now fulfil the assumptions of a linear regression model, the model can be applied to predict HbA1c from Total Cholesterol.

The linear regression analysis is conducted by SPSS.

Dependent variable = baseline HbA1c

Independent variable = Total cholesterol

The following extract tables represent the results as obtained from the linear regression of baseline HbA1c against Total cholesterol

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.111a

.012

.010

17.08107

a. Predictors: (Constant), Total cholesterol (mmol/L) at baseline

Kasuya (2019) defines R squared as the proportion of the variance for a dependent variable that is explained by the variables in the linear model. Therefore, from the above summary table, we can conclude that 1.2% of the of the total variation in baseline HbA1c can be explained by Total Cholesterol

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

1804.988

1

1804.988

6.186

.013b

Residual

144422.597

495

291.763

Total

146227.585

496

a. Dependent Variable: HbA1c (mmol/mol) at baseline

b. Predictors: (Constant), Total cholesterol (mmol/L) at baseline

Based on the definition of significance of F-test in one-way ANOVA by Verma (2013), the results obtained from the analysis above show that the linear regression model is significant in predicting the dependent variable in this case the baseline HbA1c with the p-value of the overall F-test being 0.013 which is below the significance level.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

62.269

3.037

20.501

.000

Total cholesterol (mmol/L) at baseline

1.619

.651

.111

2.487

.013

a. Dependent Variable

