100% (1)
2 pages/≈550 words
IT & Computer Science
English (U.S.)
MS Word
Total cost:
$ 12.15

CRISP-DM: Modeling and Evaluation Analysis Essay Sample

Essay Instructions:

Must mention GE and Crisp -DM

Overview: Continuing from your Milestone Two, you will move to the next phases of CRISP-DM: Modeling and Evaluation. These phases begin the iterative steps

of CRISP-DM where you model, evaluate results, and then iterate on any of the prior phases of Business Understanding, Data Understanding, or Data

Preparation to make decisions about how to adjust the model based on results. This is a continuous loop of iterations until a final design is determined.

Prompt: In this milestone, you begin with describing what model you will be using and what your expected results are prior to running the model. Run the model

and capture your notes and outputs. In a systematic method, define the change and expected result and run the model. Continue this method until a

final model has been determined. In this milestone, include your test iteration discussion and results, which support your final model design. Use charts

and statistical model output to support your model.

Implementing your revised data analytic plan in the third milestone will prepare you for completing the first capstone component, the data analytic presentation.

In the third milestone, you will summarize the implementation of your revised plan.

The draft should include any artifacts (e.g. graphs and tables) that describe the results of the process: data quality information, description of data structure,

statistics generated by model building steps, and a discussion of the final model. Submit this draft to your instructor for feedback.

If you have any questions after reading through the feedback on this milestone, reach out to your instructor. Remember that your instructor is a resource you

should utilize throughout the course.

While you must reflect on your prior coursework, your submission must consist only of DAT 690 coursework to avoid self-plagiarism. Make sure to include the

following critical elements in your paper.

Guidelines for Submission: Your paper must be submitted as a two- to three-page Microsoft Word document with double spacing, 12-point Times New Roman

font and one-inch margins. Be sure to cite any sources in APA format.

Essay Sample Content Preview:

Milestone 3
DAT 690
Student Names
CRISP-DM: Modeling and Evaluation
The milestone describes the selected model and the expected results. Next, the model will be run, and the captured results will be documented. An iterative modeling process will be used to derive results that support the final model design. Lastly, charts and statistical model outputs will be captured and documented.
Project Overview
The Cross-Industry Standard Process for Data Mining (CRISP-DM) process will be followed throughout the data preparation and transformation process. The CRISP-DM phases of data description, preparation, cleaning and integration, and reformatting. Data was summarized used descriptive statistics and correlation analysis using R Studio. Descriptive statistics and correlation analysis helped to collect, collate and analyze the GE employee dataset.
GE Dataset Evaluation Process
The dataset had to be cleaned and reformatted for ease of use in R studio. The process of preparing the employee dataset adhered to all the CRISP-DM iterative processes. The external dataset used was prone to errors that adversely affected its quality, accuracy, and reliability. Firstly, data modeling will be achieved by changing the categories into numeric values for ease of input into the R studio (Mohd Selamat et al., 2018). This process entailed using the transform command in Rattle, while the second step in data cleaning involved riding the dataset of unnecessary details. The variables obtained were hires, layoffs_discharges, other_separations, and quit. These input variables can accept numeric input values. Based on these variables, linear modeling was used to determine attrition levels for GE employees. Nevertheless, some variables such as Hires were found to have no significant impact on the prediction of employee attrition.
The data cleaning phase was used to clean the data variables in the dataset iteratively. Changes made to the data element column involved altering input values for Quit [1] and All other [0]. Besides, data cleaning processes entailed changing other categories such as other_separation and Layoff_Discharges. Next, linear modeling was used where actual data was taken from the GE employee dataset and used as input in the linear data model. Subsequently, that eliminated inadvertent gaps, thereby increasing the accuracy and reliability of the model (Exenberger & Bucko, 2020).
Figure 1: Linear Model on Attrition [Validate/Proportions]
Based on the GE dataset, out of 164 hires, there were a total of 133 total separations that includes layoffs (24), other separations (77), and those that Quit (48). The error matrix for the linear model on attrition was derived for variables: Hires, Layoff...
Updated on
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:
Sign In
Not register? Register Now!