Modeling and Evaluation: GE Employee Attrition
Use Case: GE Employee Attrition
Dataset and rubric are attached
Critical Elements:
CRISP-DM Modeling Phase: Include artifacts (e.g., graphs and tables) explaining the preliminary results from the model.
CRISP-DM Modeling Phase: Assess data quality produced by implementation of analytic plan.
CRISP-DM Modeling Phase: Analyze data structure.
CRISP-DM Evaluation Phase: Evaluate how well the model worked.
CRISP-DM Evaluation Phase: Identify any areas of concern.
CRISP-DM Evaluation Phase: Describe statistics generated from the various stages of model building that describe the model’s fit and able to accurately depict the data.
CRISP-DM Evaluation Phase: Describe statistics generated from the various stages of model building that describe the model results.
Articulation of Response: Submission has no major errors related to citations, grammar, spelling, syntax, or organization.
Employee Attrition
Author
Affiliation
Course
Instructor
Due Date
Employee Attrition
Modeling
Model selection
According to Smart Vision Europe (n.d), modeling is the fourth stage in the cross-industry process for data mining (CRISP-DM) methodology. For our case, we applied the random forests technique. The random forest technique leverages a single tree model that outputs high variance with low predictive accuracy and turns them into a fairly accurate prediction model. Therefore, a random forest combines many weak trees and uses their predictive power to make an overall prediction, resulting in a more robust model. Random forests are a bagging modification that produces a large collection of non-correlated trees. They’ve become a widely known “out-of-the-box” learning algorithm with improved prediction efficiency. Bagging trees adds a random element to the tree-building process, lowering the variation of a single tree’s prediction and improving predictive performance. However, in bagging, the trees are not entirely independent of one another because all of the initial classifiers are assessed at each tree split. Rather, due to underlying relationships, trees from bootstrap samples generally have close similarities.
Before training the model, the data was first prepared by one-hot encoding categorical values. One-hot encoding transforms categorical data into numeric data; as a result, the model can synthesize the data and use it for prediction.
In addition, the dataset set was furthered pre-processed by splitting the dataset set into train and test datasets. The model was trained with 80% of the data, while the remaining 20% was used for testing.
In the setting of parameters, we used the default values. According to the RDocumentation (n.d.) of Random Forest, the critical parameters include the number of trees with a default value of 500 trees (RDocumentation, n.d). The target value, as early mentioned, was the “Attrition.” The target labels did not need en...
👀 Other Visitors are Viewing These APA Essay Samples:
-
Irrigation Systems using the Internet of Things Technology
1 page/≈275 words | 1 Source | APA | Engineering | Coursework |
-
Using Internet of Things to Improve Food Safety and Quality Control
2 pages/≈550 words | 2 Sources | APA | Engineering | Coursework |
-
How The World Views and Uses Fossil Fuels
2 pages/≈550 words | 6 Sources | APA | Engineering | Coursework |