Essay Available:

You are here: Home → Math Problem → Accounting, Finance, SPSS

Pages:

4 pages/≈1100 words

Sources:

No Sources

Level:

Other

Subject:

Accounting, Finance, SPSS

Type:

Math Problem

Language:

English (U.S.)

Document:

MS Word

Date:

Total cost:

$ 18.72

Topic:

# Data analysis & Business intelligence (Math Problem Sample)

Instructions:

Data analysis in R-Statistical program

source..Content:

STUDENT ID NUMBER:

ASSIGNMENT 2

WINTER TERM, 2015

DATA ANALYTICS & BUSINESS INTELLIGENCE

(8697)

DUE DATE: Friday 24 July by noon

WEIGHTING: 40%

PERMITTED MATERIALS: any materials

INSTRUCTIONS:

1. Complete the assignment individually.

2. Answer all questions.

3. Write your answers to the questions in this document, keeping your solution as a single Word document. You may cut and paste items from software into your document, or use Ctrl-Alt-PrtScn (on Windows) or Command+Shift+4 (on macOS) to create a screen shot for your document.

4. Submit as a single Word document. Late submissions will attract a penalty of 5% per day. Submissions will not be accepted after Friday 31 July by noon.

QUESTION 1(45 marks)

Cluster analysis is a data mining technique used to divide data into meaningful groups.

* Describe (in your own words) what the k means algorithm does.

[4 marks]

K means clustering is the data partitioning technique in which data points are grouped into small clusters. The objective of the k-means clustering process is to minimize the averaged squared Euclidean distance from the centre of the data points assigned in the clustered data. As such, the k-means algorithm can be said to partition the data into pre-determined cluster numbers with the distance metric being a metric function of the similarities or dissimilarities between the data points.

* Describe (in your own words) what hierarchical cluster analysis does.

[4 marks]

Clustering involves a number of algorithms used in grouping similar objects into diverse categories. Once the data objects have been classified into different categories, the distinct data clusters can then be classified into a tree-like structure in which item arrangement depends on the similarity or dissimilarity. This concept is called a hierarchical cluster. The analysis of the clusters based on their similarities or dissimilarities is called hierarchical cluster analysis.

* Imagine you are performing a k means cluster analysis using Rattle. Describe the steps you would go through to determine the optimal number of clusters.

[4 marks]

To determine the optimal number of k-means clusters in Rattle, the following criterion is used. First, select k-centroids, with each row being randomly selected. Secondly, assign all the data points in the clusters to their respectively close centroids. Third, recalculate the length of the centroids by averaging the data points within the clusters each of p variables. Next, assign all the data points to the centroids closest to them and repeat these third and fourth steps until unassigned observations or their respective maximum number of iterations is attained.

* Describe the measures/characteristics you would use to evaluate a k means cluster analysis you have created in Rattle.[4 marks]

The Euclidean distance â€“ This refers to the geometric distance within the multidimensional space. As the commonest type of distance, its computed as; distance (x,y) = {Î£i(xi â€“ yi)2}1/2. This measure evaluates raw data.

Squared Euclidean distance â€“ To progressively place greater weight on objects placed apart, the square of the standard Euclidean distance is evaluated as is computed as; distance (x,y) = Î£i(xi â€“ yi)2

Percentage disagreement â€“ This measure is useful in determining if the dimensions captured in the analysis are categorical i...

ASSIGNMENT 2

WINTER TERM, 2015

DATA ANALYTICS & BUSINESS INTELLIGENCE

(8697)

DUE DATE: Friday 24 July by noon

WEIGHTING: 40%

PERMITTED MATERIALS: any materials

INSTRUCTIONS:

1. Complete the assignment individually.

2. Answer all questions.

3. Write your answers to the questions in this document, keeping your solution as a single Word document. You may cut and paste items from software into your document, or use Ctrl-Alt-PrtScn (on Windows) or Command+Shift+4 (on macOS) to create a screen shot for your document.

4. Submit as a single Word document. Late submissions will attract a penalty of 5% per day. Submissions will not be accepted after Friday 31 July by noon.

QUESTION 1(45 marks)

Cluster analysis is a data mining technique used to divide data into meaningful groups.

* Describe (in your own words) what the k means algorithm does.

[4 marks]

K means clustering is the data partitioning technique in which data points are grouped into small clusters. The objective of the k-means clustering process is to minimize the averaged squared Euclidean distance from the centre of the data points assigned in the clustered data. As such, the k-means algorithm can be said to partition the data into pre-determined cluster numbers with the distance metric being a metric function of the similarities or dissimilarities between the data points.

* Describe (in your own words) what hierarchical cluster analysis does.

[4 marks]

Clustering involves a number of algorithms used in grouping similar objects into diverse categories. Once the data objects have been classified into different categories, the distinct data clusters can then be classified into a tree-like structure in which item arrangement depends on the similarity or dissimilarity. This concept is called a hierarchical cluster. The analysis of the clusters based on their similarities or dissimilarities is called hierarchical cluster analysis.

* Imagine you are performing a k means cluster analysis using Rattle. Describe the steps you would go through to determine the optimal number of clusters.

[4 marks]

To determine the optimal number of k-means clusters in Rattle, the following criterion is used. First, select k-centroids, with each row being randomly selected. Secondly, assign all the data points in the clusters to their respectively close centroids. Third, recalculate the length of the centroids by averaging the data points within the clusters each of p variables. Next, assign all the data points to the centroids closest to them and repeat these third and fourth steps until unassigned observations or their respective maximum number of iterations is attained.

* Describe the measures/characteristics you would use to evaluate a k means cluster analysis you have created in Rattle.[4 marks]

The Euclidean distance â€“ This refers to the geometric distance within the multidimensional space. As the commonest type of distance, its computed as; distance (x,y) = {Î£i(xi â€“ yi)2}1/2. This measure evaluates raw data.

Squared Euclidean distance â€“ To progressively place greater weight on objects placed apart, the square of the standard Euclidean distance is evaluated as is computed as; distance (x,y) = Î£i(xi â€“ yi)2

Percentage disagreement â€“ This measure is useful in determining if the dimensions captured in the analysis are categorical i...

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

### Other Topics:

- Data analysis & Business intelligenceDescription: Write your answers to the questions in this document, keeping your solution as a single Word document...4 pages/≈1100 words| No Sources | Other | Accounting, Finance, SPSS | Math Problem |
- Independence Test in SPSSDescription: Independence Test in SPSS Accounting, Finance, SPSS Math Problem...1 page/≈275 words| No Sources | Other | Accounting, Finance, SPSS | Math Problem |