5 pages/≈1375 words
IT & Computer Science
Data Warehoususes and Data Mining Research Paper (Research Paper Sample)
discuss the general relationship between data mining tool and data warehousing systems, especially on how data needs to be prepared in the data warehouse before used by a data mining tool.. use examples to demonstrate..source..
Data Warehouses and Data Mining
Name of the Student
Name of the Institution
Data Warehouses and Data Mining
Both Data Warehousing and Data Mining are utilized as intelligent commerce apparatus which transform information which is in the form of data into implementable information (knowledge). However, they are distinct in processes and the methods each apply to attain its objective. The aim of the paper is to establish the general relationship between the data warehousing system and the data mining tools. The discussion will also entail how data needs to be prepared in the data warehouse before its use in the data mining tool. In addition, the paper will focus on how the results data mining can be returned to warehouse to allow more broad access or for reuse.
Data mining refers to the procedure of statistical analysis. In this process, the analyst utilizes technical tools to inquire as well as to classify terabytes of data to establish the existing pattern. In performing this procedure, the analyst creates a hypothesis, for example, a customer who purchases product Z normally purchase product T within sixteen weeks. Thus, investigating the appropriate information to disapprove or approve the hypothesis is referred to as Data Mining. Once a conclusion is achieved, businesses utilize that kind of information to arrive at their decisions considering they comprehend their customers' behaviors. On the other hand, Data Warehousing illustrates the procedure of crafting how the data is kept with an aim of improving the analysis and reporting. Scholars have established that different stores of data are interrelated and interconnected both physically and conceptually (Suciu, Todoran, Ochian, Suciu, & Cropotova, 2014).
To safeguard business data, it is usually stored in different databases. However, to ensure that a wide range of data is covered during analysis the databases needs to be interconnected thus, the data available within these databases needs to be related and relevant. Additionally, the physical databases need to be connected to enable the analyst look into these data together especially for reporting sake. The core of the connection between Data Warehousing and Data Mining is that data which is well warehoused is easier to mine. For example, if a data warehouse specialist craft information storage system that is well interconnected with databases which contain relevant data then the data miner runs an efficient query that serves to improve the business performance (Sadalage & Fowler, 2012).
Preparation of Data in the Data warehouse before the use of Mining tool
Preparation of data in the data warehouse has main two objectives which include, making it easy for the mining tool and preventing any problem as much as possible. Data preparation consumes a lot of time due to the clarity needed it is always threatened by the possibility of error. Preparation of data in the data warehouse involves various stages such as, logging in the data, scrutinizing the data for accuracy, keying the into the computer, converting the data, and establishing and documenting a database system that can integrate different measures (Sudheer, et. al, 2013).
Varied research projects have their data sourced differently and at different times. Some of the sources of data include observation, interview or surveys. To ensure continuity in the data there is need of a clear set process for logging in the data and keeping track of it until it is enough for a comprehensive data analysis. One of the ways to enable one keep track of the incoming data is by creating a database. Creation of a database enables one to assess the data already available and know what data is appropriate. Creating of the database is possible through the available computerized database programs such as Claris file-maker and the Microsoft access or excel. In addition, such programs as Minitab, SAS, SPSS, and Data-desk. The creation and keeping of an original logging of incoming data is an essential step in the data preparation.
After data is received it is scrutinized for accuracy. In this process, several questions demand to be answered, and they include, are all responses complete, and are all important questions responded to, is all relevant contextual information such as place, time researcher, and data included. Also question such as are all responses legible must be answered. Ensuring that the data is accurate will enable the data miner to retrieve accurate information and thus, make the appropriate decision. After the scrutiny establishing of a database is also an essential stage, this involves developing the store one intend to store the data for study. A similar strategy as that involved in logging in data may be utilized (Sudheer, Reddy & Sitaramu, 2013).
Development of a database for a research project requires generation of the codebook that illustrates the data and shows how and where it can be retrieved from. The codebook should include information such as, variable location in the database, notes, the date data was collected, who was the respondent, the method of data collection used, the variable format used, variable name, and the variable description. Production of a codebook is a necessity in this procedure especially to assist the analyzing team in doing an effective work (Ochian, Suciu, Fratu & Suciu, 2014.
The other step in this process includes entering the data into the computer for the analysis purposes. There are different ways to key in the data into the computer the most common is typing the data while the most accurate method procedure is the use of the double entry. In double entry data is entered at once then a unique program is utilized to enter the data for the second time and counter check the second entry against the first. Double entry method reduces the possibility of error by a significant margin. The other step involved in this process is data conversion. Once the data has been entered and checked for error, it is the transformed from raw data to variables. Variables are very essential in the analyses process. Some of the conversations performed include missing values, some of the analyzing programs treat blank spaces as missing while other demand you fill a given value. However, a key must be used to indicate which variable is used to designate the missing value. Another transformation includes item reversals; this conversation is used to reduce the chances for a response set. For example, when analyzing scores in the scale items should be in a similar direction such that if it is high scores, they mean the same thing and if it is low score they should mean the same thing too. Other conversations include categories and scale totals (Connolly & Begg, 2015).
Establishing and documenting a database system require implementation of set guidelines such as the naming of database objects. The analyst experts involved must have a naming convention otherwise it will be difficult in identifying the database objects correctly. Similarly, questions such as the security of the database, performance, who is responsible for running the database, compatibility, expected network connectivity, and network connectivity must be addressed during the databas...
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:
- Internet Systems Development Software TechnologiesDescription: Explain how these technologies can bring value to the business or organization. Incorporate additional research or academic information to support your paper....3 pages/≈825 words| 7 Sources | APA | IT & Computer Science | Research Paper |
- Assessing The Impact Of Itax System On Tax Compliance In Kenya Revenue AuthorityDescription: This research is my original work and has not been presented for examination in any other university....61 pages/≈16775 words| 13 Sources | APA | IT & Computer Science | Research Paper |
- OSI Layers: Test Plan Pennywise National Bank SystemDescription: Testing Methodology: Defines what types of tests will be performed and during which phases they will be performed...3 pages/≈825 words| 3 Sources | APA | IT & Computer Science | Research Paper |