Some Statistics Questions to be Answered: Regression Model (Coursework Sample)

Instructions:

Some statistics questions to be answered.

source..

Content:

Statistics
Studentâ€™s Name
Instructorâ€™s Name
Class
Date
Question 1
A. Null deviance
It reflects how well the response variable is reported or predicted on an intercept only and therefore represents the null hypothesis.
B. Residual deviance
It reflects how well the variables are predicted with the intercept and covariate.
C. Odds of an event
It is used in statistics to describe the chance or probability of an event occurring.
D. Odds ratio
The ratio compares the number of ways an event can occur with the number of ways an event cannot occur.
E. Contingency table
A table of matrixes that displays frequency distribution of variables
F. Aikaike information criterion
Given the collection of data, an alkaline information criterion estimates the quality of the model relative to other models.
G. Probity regression model
It is a regression model where the dependant variable can only take two variables, for example, success or no success.
Question 2
A. Residuals and errors in a regression model
An error is a deviation of the observed value from the true value while a residual is the difference between observed value and estimated value of parameters.
B. Linear and non-linear regression models
A linear model shows the relationship between dependent variable and independent variable while a non-linear model is a regression model where observations are modelled by a function, which is a non-linear combination of parameters and depends on one or more dependent variables.
C. R-squared and adjusted R-squared
R-squared also known as the coefficient of determination shows how close data is to the regression line whereas adjusted R-squared adjusts parameters of R-squared based on independent variables.
D. Omitted and irrelevant variables in a model
Omitted variables is where a model is created and incorrectly leaves out one or more important factors whereas irrelevant variables are variables which includes unimportant factors and leads to a higher variance of the estimated coefficients.
Question 3
Include all variables that correlate- when deciding whether a variable should be included, I must first decide whether the variables correlate with the dependent and independent variable by using a spearmanâ€™s correlation coefficient. The negative the better and hence the variables correlate with both the dependent and the independent variables.
Look at the purpose of the model- when deciding whether a variable should be included, I should look at the aim of the model such as simplicity, determining the attitude of staff, parsimony, predictive power, or precision et cetera. For example, if the aim of the model is to determine the attitude of various employees and the variables I have are attitude and employee, then the variables should be included.
Decide based on subject matter- when deciding whether a variable should be included, I should look at the underlying model, the dependent variable, and the independent variable. If the dependent and independent variable form part of the underlying model, then the variables should be included.
Test significance of parameters- to decide whether to include the variables, I should first test the parameters using hypothesis. If the null hypothesis is chosen then the variables should not be included but if the alternative hypothesis is chosen the variables should therefore be included.
Question 4
Kengare

Wegesa

Rank(Kengare)

Rank(Wegesa)

D=R(Kengare)-R(Wegesa)

D^2

C

E

1

2

-1

1

E

A

2

4

-2

4

B

C

3

1

2

4

A

F

4

5

-1

1

F

D

5

6

-1

1

D

B

6

3

3

9

G

G

7

7

0

0

Sum

20

r=1-6Îµd2n3-n
r=1-6*2073-7
r=1-120336=0.6429
There is a high degree of positive correlation.
Question 5
Part A
How many levels = 5*3= 15
Part B
Y(Hours)

X

XY

X^2

Y^= 14.13+0.4X

(Y-Y^)

(Y-mean Y)

(Y-Y^)^2

(Y-mean Y)^2

16

1

16

1

14.53

1.47

-1.33

2.16

1.77

19

2

38

4

14.93

4.07

1.67

16.57

2.79

14

3

42

9

15.33

-1.33

-3.33

1.77

11.09

13

4

52

16

15.73

-2.73

-4.33

7.45

18.75

18

5

90

25

16.13

1.87

0.67

3.5

0.45

16

6

96

36

16.53

-0.53

-1.33

0.28

1.77

17

7

119

49

16.93

0.07

-0.33

0.005

0.11

13

8

104

64

17.33

-4.33

-4.33

18.75

18.75

12

9

108

81

17.73

-5.73

-5.33

32.83

28.41

17

10

170

100

18.13

-1.13

-0.33

1.28

0.11

24

11

264

121

18.53

5.47

6.67

29.92

44.49

22

12

264

144

18.93

3.07

4.67

9.43

21.81

19

13

247

169

19.33

-0.33

1.67

0.11

2.79

18

14

252

196

19.73

-1.73

0.67

2.99

0.45

22

15

330

225

20.13

1.87

4.67

3.5

21.81

ÎµY=260

ÎµX=120

ÎµXY=2192

ÎµX2=1240

0.05

0.05

130.545

175.35

b=nÎµxy-ÎµxÎµynÎµx2-Îµx2
b=15*2192-120*26015*1240-1202=0.4
a=Îµyn-bÎµxn=26015-0.4*12015=14.13
Y=14.13+0.4X
Part C
R=ÎµY-YÎµY-mean Y=0.050.05=1
r2=ÎµY-Y2ÎµY-mean Y2=130.545175.35=0.7445
F=r2K-11-r2n-k=0.74452-11-0.744515-2=37.88
Se=ÎµY-Y2n-k=130.54515-2=3.17
R2adjusted=1-1-r2n-1n-k-1=1-1-0.744515-115-2-1=0.7019

Se

DF

R^2

Adjusted R^2

F

P

Coefficients

0.4

3.17

15-2=13

0.7445

0.7019

37.88

p>37.8813=2.9

Part D
Estimates

A

0.025

1.1875

0.875

0.8125

1.125

B

1.067

1.133

0.867

0.8

1.133

C

1.1429

1.048

0.9048

0.8571

1.048

Part E
The coefficient of determination (r^2) is 0.7445 meaning that the variables have a high positive degree of correlation. The variables have a standard error of 3.17, which is the deviation from the mean at coefficient intercept 0.4.this is a higher error and it states that the variables have a great deviation from the mean. The alternative hypothesis in this case states that the variables are significant whereas the null hypothesis states that the variables are insignificant. The P value is greater than 29 which in this case it has surpassed the initial p value of 0.05. This tells us that we should reject the null hypothesis and accept the alternative hypothesis since the variables are significant. It also shows that the model is giving a good prediction despite the residual error. The F-static is 37.88, which is significant and tells me that the regression model is performing better than the random variables since the value is significant and higher than the variables.
Part F
The null hypothesis is the F of 37.88 at B coefficient 0.4. The alternative hypothesis on the other hand is less than or greater than F 37.88 at B coefficient 0.4.
System

Mean

A

16

B

15

C

21

Since the mean in lower than F, we accept the alternative hypothesis.
Part G
â€=0.05
The null hypothesis in this case is 1-0.05=0.95 whereas the alternative hypothesis is greater than or less than 0.95.
C.I.=0.95Â±t*Se
t=0.052=0.025=from the tables=2.13
C.I=0.95Â±2.13*3.17
C.I.=0.95Â±6.7521
We accept the alternative hypothesis since it the confidence interval is greater than or less than 0.95.
Part H
At 95 percent from the distribution tables, equal 2.13

Total

Mean

A

80

16

B

75

15

System A
C.I.=MeanÂ±t*Se
C.I=16Â±2.13*3.17
C.I=16Â±6.7521
9.2479
System BC.I.=MeanÂ±t*Se
C.I=15Â±2.13*3.17
C.I=15Â±6.7521
8.2479
Question 6
Part A
Employee
Attitude

Stackers

Sales staff

Administration

Very favourable

1

3

16

Favourable

7

9

24

Unfavourable

24

34

22

Very unfavourable

8

10

2

Part B
Since the job category and attitude in different columns vary significantly between rows, there is a contingency between the two variables and hence they are dependent on each other.
Question 7
Part A

Estimate

Std. error

Actual

Coefficients (Intercepts)

1.5034017

0.0467911

1.4566106

X1

2.5463082

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

Order

Some Statistics Questions to be Answered: Regression Model (Coursework Sample)

Other Topics: