Essay Available:

You are here: Home → Statistics Project → Social Sciences

Pages:

4 pages/≈1100 words

Sources:

Level:

APA

Subject:

Social Sciences

Type:

Statistics Project

Language:

English (U.S.)

Document:

MS Word

Date:

Total cost:

$ 14.4

Topic:

# Data Set Analysis (Statistics Project Sample)

Instructions:

This is a statistical data analysis paper on sample data collected on political opression in different countries.

source..Content:

PART 1

Question 1

The unit of analysis in this dataset is sampled US counties.

Question 2

County

â€˜Countyâ€™ is an independent nominal variable. It gives identity and classification of the study population.

Year

Year is a quantitative independent variable. It is an interval variable which is continuous.

Dem. Pct

This is a quantitative ratio variable.

Pres

â€˜Presâ€™ is a categorical dichotomous distribution. It is an independent sample in the dataset.

Turnout

It is a quantitative ratio variable.

Arrests

Arrests is a quantitative and independent ratio variable.

Urban

This is an independent and categorical nominal variable.

Question 3

While reported arrests may be a valid measure of crime, it is not reliable. This is due to the fact that the number of arrests can predict the trend in crime. However, arrests between different counties depend on the efficiency of the security personnel in the particular county. This means that it is possible for county A to have more reported arrests than county B whereas county B has more criminal activities than county A. Therefore the number of reported arrested may not show a clear comparison between two or more counties hence unreliable. The number of reported arrests is valid because the reported arrests are part of the crimes and therefore, an increase/decrease in crime in a given county will lead to a proportionate increase/decrease in the number of reported arrests.

Question 4

The unit of analysis for this dataset is the sampled US registered voters.

Question 5

Participant

â€˜Participantâ€™ is an independent and categorical nominal variable.

Party

â€˜Partyâ€™ is also an independent and categorical nominal variable

Crime

â€˜Crimeâ€™ is a categorical ordinal variable.

Obama

This is a categorical ordinal variable.

Clinton

â€˜Clintonâ€™ is a categorical ordinal variable.

Income

This is quantitative ratio variable.

Gay

â€˜Gayâ€™ is a categorical dichotomous variable.

Frack

â€˜Frackâ€™ is a categorical dichotomous variable.

Question 6

The pollster needs to obtain a fair representation of the whole population. This can be done by obtaining the sample of 600 individuals from across the population. This ensures that all heterogeneous characteristics of the population are represented in the sample. To avoid bias which may affect reliability, the pollster needs to randomly select samples from the homogeneous subgroups of the population.

The data should be collected at relatively the same time period to reduce chances of significant changes in variable values and attributes.

The sample of 600 participants must be a fair representation of the population. The sample size must be sufficient to generalize its characteristics to the characteristics of the whole population.

Question 7

Graph 1

Mean=0

Median=0

Mode=0

Question 8

Graph 2

Meanâ‰€30

Medianâ‰€25

Mode=20

Question 9

Graph 3

Meanâ‰€ 3

Medianâ‰€ 4

Mode= 5

Question 10

Graph 4

Meanâ‰€ 0

Medianâ‰€ 0

Mode= -1 and 1

Question 11

Rank from strongest negative correlation to strongest positive correlation

1 Graph 8

2 Graph 6

3 Graph 5

4 Graph 7

Question 12

Correlation coefficient (Pearsonâ€™s R) for graph 6 is zero. This is because the graph is a composition of two curves. One curve has a positive correlation coefficient while the other has a negative correlation coefficient. The slope of both curves is relatively similar and therefore both curves have relatively equal absolute values of correlation coefficients which sum up to zero.

Question 13

The average score of Penn students (+1) has a substantive significant difference from the average score of Drexel students (-0.5). From the information given it is not possible to determine whether both scores have significant statistical difference. To determine the statistical significance difference, more information needs to be provided about the variance or standard deviation of the students from the mean score as well as the sizes of sample and/or population obtained from Penn and Drexel students. The distribution of the scores determines whether the means of the mean scores have statistical significant differences or not.

Question 14

The measures chosen by Professor X are neither reliable nor valid. There lacks any relationship between â€˜repressâ€™ and â€˜protestsâ€™ as illustrated by the graph. This is invalid because the independent variable is not fully independent in that the dependent variable has a causal effect on the independent variable. This gives rise to multi-correlation of the dependent and independent variables. This gives rise to unreliable and invalid results due to the large errors associated with the multi-correlation of the variables.

On another note, the professor has drawn wrong inference from the bivariate relationship graph which in turn leads to erroneous analysis of the data. The graph does not show graph does not show a strong positive relationship between the variables. It shows zero relationship between the variables and therefore the professor was not supposed to regress variables that did not possess any linear relationship.

In order to test the hypotheses postulated by the other professors and the student, Professor X needed to perform independent analyses for each hypothesis in order to arrive at reliable conclusions of hypothesis under test. This would have be...

Question 1

The unit of analysis in this dataset is sampled US counties.

Question 2

County

â€˜Countyâ€™ is an independent nominal variable. It gives identity and classification of the study population.

Year

Year is a quantitative independent variable. It is an interval variable which is continuous.

Dem. Pct

This is a quantitative ratio variable.

Pres

â€˜Presâ€™ is a categorical dichotomous distribution. It is an independent sample in the dataset.

Turnout

It is a quantitative ratio variable.

Arrests

Arrests is a quantitative and independent ratio variable.

Urban

This is an independent and categorical nominal variable.

Question 3

While reported arrests may be a valid measure of crime, it is not reliable. This is due to the fact that the number of arrests can predict the trend in crime. However, arrests between different counties depend on the efficiency of the security personnel in the particular county. This means that it is possible for county A to have more reported arrests than county B whereas county B has more criminal activities than county A. Therefore the number of reported arrested may not show a clear comparison between two or more counties hence unreliable. The number of reported arrests is valid because the reported arrests are part of the crimes and therefore, an increase/decrease in crime in a given county will lead to a proportionate increase/decrease in the number of reported arrests.

Question 4

The unit of analysis for this dataset is the sampled US registered voters.

Question 5

Participant

â€˜Participantâ€™ is an independent and categorical nominal variable.

Party

â€˜Partyâ€™ is also an independent and categorical nominal variable

Crime

â€˜Crimeâ€™ is a categorical ordinal variable.

Obama

This is a categorical ordinal variable.

Clinton

â€˜Clintonâ€™ is a categorical ordinal variable.

Income

This is quantitative ratio variable.

Gay

â€˜Gayâ€™ is a categorical dichotomous variable.

Frack

â€˜Frackâ€™ is a categorical dichotomous variable.

Question 6

The pollster needs to obtain a fair representation of the whole population. This can be done by obtaining the sample of 600 individuals from across the population. This ensures that all heterogeneous characteristics of the population are represented in the sample. To avoid bias which may affect reliability, the pollster needs to randomly select samples from the homogeneous subgroups of the population.

The data should be collected at relatively the same time period to reduce chances of significant changes in variable values and attributes.

The sample of 600 participants must be a fair representation of the population. The sample size must be sufficient to generalize its characteristics to the characteristics of the whole population.

Question 7

Graph 1

Mean=0

Median=0

Mode=0

Question 8

Graph 2

Meanâ‰€30

Medianâ‰€25

Mode=20

Question 9

Graph 3

Meanâ‰€ 3

Medianâ‰€ 4

Mode= 5

Question 10

Graph 4

Meanâ‰€ 0

Medianâ‰€ 0

Mode= -1 and 1

Question 11

Rank from strongest negative correlation to strongest positive correlation

1 Graph 8

2 Graph 6

3 Graph 5

4 Graph 7

Question 12

Correlation coefficient (Pearsonâ€™s R) for graph 6 is zero. This is because the graph is a composition of two curves. One curve has a positive correlation coefficient while the other has a negative correlation coefficient. The slope of both curves is relatively similar and therefore both curves have relatively equal absolute values of correlation coefficients which sum up to zero.

Question 13

The average score of Penn students (+1) has a substantive significant difference from the average score of Drexel students (-0.5). From the information given it is not possible to determine whether both scores have significant statistical difference. To determine the statistical significance difference, more information needs to be provided about the variance or standard deviation of the students from the mean score as well as the sizes of sample and/or population obtained from Penn and Drexel students. The distribution of the scores determines whether the means of the mean scores have statistical significant differences or not.

Question 14

The measures chosen by Professor X are neither reliable nor valid. There lacks any relationship between â€˜repressâ€™ and â€˜protestsâ€™ as illustrated by the graph. This is invalid because the independent variable is not fully independent in that the dependent variable has a causal effect on the independent variable. This gives rise to multi-correlation of the dependent and independent variables. This gives rise to unreliable and invalid results due to the large errors associated with the multi-correlation of the variables.

On another note, the professor has drawn wrong inference from the bivariate relationship graph which in turn leads to erroneous analysis of the data. The graph does not show graph does not show a strong positive relationship between the variables. It shows zero relationship between the variables and therefore the professor was not supposed to regress variables that did not possess any linear relationship.

In order to test the hypotheses postulated by the other professors and the student, Professor X needed to perform independent analyses for each hypothesis in order to arrive at reliable conclusions of hypothesis under test. This would have be...

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

### Other Topics:

- Cemetery Lab-Human Demography: Questions on a Statistical GraphDescription: By observing the male curves and the female curves of the two cohorts separately, it is noticeable that there is difference in the life expectancy rate...1 page/≈275 words| 1 Source | APA | Social Sciences | Statistics Project |