Passenger Prediction Survival in Titanic
Project: You can work with up to two other classmates on this project. Immediate assignment is to decide the members of your group. You can work alone on the project.
Note: Depending on class format changes, you may have to work alone on the project.
Due Date: April 17, 2020. Will accept until April 21.
- Only 1 member of the group should submit the project.
Basic description: choose an analytical technique within the scope of this course. Demonstrate the analytical technique using example/real data. Show how to perform the analytical technique using R. Organize the project material along the lines of a 15-minute presentation of the data analysis, explanation of the methods used, and a discussion of the results. Summarize the material in the form of a handout, preferably Word document or Power Point presentation. Projects will be posted on Canvas.
Use the following helpful writing guide to help with the content of the minimum sections to include (abstract, methods, results, discussion, literature cited):
(Accessed on March 3, 2020)
http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html
The sections appear in a journal style paper in the following prescribed order:
Experimental process |
Section of Paper |
What did I do in a nutshell? |
Abstract just need ~ 3 sentences* |
What is the problem? |
Introduction |
How did I solve the problem? |
Materials and Methods* |
What did I find out? |
Results* |
What does it mean? |
Discussion* |
Who helped me out? |
Acknowledgments (optional) |
Whose work did I refer to? |
Literature Cited (give credit for anyone’s work you use) * |
Extra Information |
Appendices (optional) |
* required section for Project. Also include a cover page/slide with title, members of the project team, and date.
See next page for examples of projects completed during previous years.
Be creative. You can choose a topic we covered, will cover, or will not cover. Google “tutorial” on www.rseek.org for examples. There are over 226,000,000 hits.
COMPUTING & GRAPHICS IN APPLIED STATISTICS
Listing of Sample Student Projects Completed Previously
1 |
Analysis of Automobile Accident Rates in Minnesota |
2 |
Robust Regression in R |
3 |
Model Selection in R for the 2014 NFL Draft |
4 |
Finding Influential Observations |
5 |
Testing for Collinearity |
6 |
Birthrate and Economic Development |
7 |
Model Selection |
8 |
Statistical Analysis of Prostate Data using R |
9 |
Robust Regression |
10 |
Diagnostics |
11 |
Univariate Displays and Model Selection |
12 |
Creating a Prediction of Time Series Data Using R |
13 |
Univariate Displays of Data |
14 |
Monte Carlo Simulation of Craps Using R |
15 |
Linear Regression |
16 |
Modeling Arsenic Level in Bangladesh’s Groundwater |
17 |
Scatter, Stem-and-Leaf, Histogram, Box and Whisker, Ellipse, Residual, Quantile-Quantile Plots in R |
18 |
Wine Quality |
19 |
Univariate Displays of Data in both R |
20 |
Scatter Plot Matrices in R |
21 |
MPG of Cars in Response to Horsepower and Weight |
22 |
Model Selection and Regression Diagnostics |
23 |
Monte Carlo Simulation of Craps |
24 |
Means-Based Permutation Test |
25 |
Model Selection |
26 |
t-test SAS procedures |
27 |
Monte Carlo Simulations |
28 |
Bootstrapping: An Introduction |
29 |
Model Selection in R |
30 |
Regression Model Building |
31 |
Resampling Methods |
32 |
Outlier Detection and its Statistical Considerations |
33 |
What are the determinants of face-to face medical visits? |
34 |
Regression Analysis—Highway Accident Rate |
35 |
A Study of Residual Analysis |
36 |
Bootstrapping by Resampling Residuals |
Passenger Prediction Survival in Titanic Using R
Student’s Name
Institutional Affiliation
Abstract
The main aim of this paper was to predict the Survival of the people who boarded the Titanic based on the Titanic data available on Github. Besides, the question of if someone would have survived the Titanic then was investigated using the decision tree model. The results were discussed and meaningful, thus leading to the arrival of meaningful conclusions.
Key Words: Results, Titanic, Survival, Passenger, and Data.
Passenger Prediction Survival in Titanic Using R
Introduction
On April 15, 1912, the RMS Titanic ship collided with an iceberg and sank. Out of 2224 people on board, 1502 died. There were not enough lifeboats, and for this reason, better ship safety regulations were put in place. The Titanic tragedy forms this paper’s main aim as there was some element of luck in surviving the sinking for some groups. Some were more likely to survive than others, for example, women, children, and the upper class.
Materials and Methods
The Titanic Survival data was available and downloadable on (https: //vincentarelbundock.github.io/Rdatasets/datasets.html.) Through the R software, data were numerically and graphically summarized, inspected, cleaned, and eventually analyzed. There were 1309 rows and five columns. The 5 column names were names of the passengers, their survival status, sex, age, and passenger class (Agrawal, 2018). All variables except age were categorical. Besides, the latter contained unique values, NAs, totaling to 263. As part of cleaning the data, the missing values were replaced with the mean of the age variable. The final data was ready for further analysis data after renaming it.
Results
First, a visual representation of the data led to graphing of plots using the plot () function in R. The plots below were for age, sex, and passenger class against survived. The passengers’ names column was not plotted because it was regarded unimportant for any further analysis.
As shown above in the first plot to the left, the portion of female passengers who survived was higher than the males who did. In other words, some fewer female passengers survived as compared to men (Agrawal, 2018). In the middle plot, the ages of the survivors were between 22 and 37. The majority of those who did not survive were aged between 22 and 35. For passengers in the 1st class, many survived while those in the third class were few. Almost half of those in the middle class survived while the rest drowned and died.
Before any data analysis, the data was partitioned into training and test datasets. The training set contained 891 rows, while the test had 418 of such. Then, using the decision tree model in the rpart R package, a regression model was fit on the data with the exclusion of passengers’ names variable. The response variable was survived while age, sex, and passenger class were predictors. The model was later used to predict the survivorship of passengers onboard. The prediction of the model occurred on the test data to form survivorship values, where 0 meant did not survive while one survived. The first six predicted values were:
892 893 894 895 896 897
0.8885714 0.8885714 0.8885714 0.2592593 0.5000000 0.8885714
Values were rounded off to the nearest 1000000th value. For instance, the 892nd passenger survived because 0.8885714 was rounded off to 1. For the 895th, they did not survive as 0.2592593 was a 0 when rounded off. As for the 896th, 0.5 was a 1 when rounded off. Lastly, the total number of passengers who survived the test data was totaled, and the survivorship status of the passengers in the test data was used to esti...
👀 Other Visitors are Viewing These APA Essay Samples:
-
Inflation Data Mathematics & Economics Coursework Paper
2 pages/≈550 words | No Sources | Other | Mathematics & Economics | Coursework |
-
National Debt: The Aftermath of Great Recession of 2008
2 pages/≈550 words | No Sources | Other | Mathematics & Economics | Coursework |
-
The 2008 Great RECESSION
2 pages/≈550 words | No Sources | Other | Mathematics & Economics | Coursework |