STA437. Predicting the Happiness Score of a Country Using the Explanatory Variables.
~~~~~~
DEAR WRITER: if this order requires additional coding, please let Support know! We are waiting for your bid on this order to settle the final amount with the customer
~~~~~~
please write the whole file in r markdown and knit out as a pdf
https://github(dot)com/michealhackathon/STA437-Final-Project
my friend gave me his work for reference
you can check it on GitHub
but mine cannot be similar to his
otherwise will be considered plagiarism
and for the summary
please address points as below
Abstract: 1. Research Topic, 2. Data description, 3. statistical method used, (transformation, regression, PCA, etc), 4. Conclusion and Result
My last four digit of student number is 8874
Abstract: 1. Research Topic, 2. Data description, 3. statistical method used, (transformation, regression, PCA, etc), 4. Conclusion and Result
Predicting the Happiness Score of a Country Using the Explanatory Variables
Student's Name
Institutional Affiliation
Predicting the Happiness Score of a Country Using the Explanatory Variables
Abstract
The research paper uses the explanatory variables to predict the happiness score of a country. Moreover, it was the purpose of the research to assess the understanding of happiness from different subjects. Using the data analyzed in the study, the research will show that by using the explanatory variables from the data under consideration, it is possible to obtain accurate predictions of the happiness scores from the different nations. The methodology applied in this paper includes correlational, trying to investigate how the other nine variables correlate with the happiness scores of the different countries. The source of the data was the World Happiness Report 2017. The sample statistics used will be obtained randomly and will be of size n=100 from the original data set, which will be normally distributed following the gauss distribution. The main variables used in the data are as follows: Ladder (happiness score), LogGDP (the logarithm of country's' GDP 2017), Social (national average of the binary responses), HLE (healthy life expectancy), Freedom (national average of what you do), Generosity (donation regression residual of national averages), Corruption (national average on corruption), Positive (average of positive effects), Negative (average of negative effects), and Gini (Gini of household income).
Key words: Ladder, Log GDP, Social, HLE, Freedom, Generosity, Corruption, Positive, Negative, Gini, data, happiness score.
Research Topic: Predicting the Happiness Score of a Country Using the Explanatory Variables
Measuring and understanding the levels of happiness among people is a direct indication of the quality of life. It is a scientific approach where statistical models are used to measure how different variables such as GDP and corruption influence people's way of life. Using critical data to analyze how people perceive life and whether they are enjoying benefits, governments, policymakers, activists, and other stakeholders. It is a premise through which informed decisions can be made regarding issues like healthcare, freedom, and civil liberties, among others. In this statistical project, the changes and determinants of happiness were analyzed, with a model developed based on the people's evaluation of the variables under consideration. However, the presentation in this report is the average life evaluation for every nation.
When people are happy, they exhibit strong social relations, low levels of crimes, freedom of speech, and respect for human rights, and show higher levels of life satisfaction. In many developing countries, the levels of the country's GDP are directly associated with happiness. This is because, when people experience low levels of financial stress, they are happier and satisfied. There are other broader issues in society that influence the levels of happiness among the people. The data that will be used in this research include logGDP, life expectancy, freedom, generosity, government trust, and corruption. These variables are essential for the development of the model that can be used to determine happiness levels.
With an analysis of this information from each country, we are able to show the average happiness score for each nation. Their growth or decline in happiness levels depends on the people's level of satisfaction with core services such as healthcare. While the analysis does not construct the happiness measure for each nation, the model can be applied and generalized for specific cases, considering the variables stated here will be considered. It is also essential to note that the measures of wellbeing, particularly positive emotions, influence levels of happiness. In this regard, the topic is essential for decision-makers because it provides a tool for understanding how specific issues can be addressed to enhance life-quality within a nation. The measures of statistics outlined in the report are also validated, with a thorough analysis of the diverse factors influencing happiness.
During the analysis process, the focus was placed on the variables available in the happiness.csv file, which will be described in the subsequent section. It is these factors that determined the happiness rank in each country. However, there are vital variables that had a great influence on happiness compared to others. Life expectancy and GDP are the main factors that contributed greatly to each country's happiness score level, and also had a high positive correlation with happiness score. Other variables that had positive correlation included generosity, social, Gini, and freedom. This is a clear indication that these variables are essential in informing people's evaluation of their life satisfaction. The actual results of the correlation coefficients will be showcased in the result section. Some countries ranked higher concerning the happiness levels observed. These will be discussed and illustrated in detail in the result section.
Data Description
Based on the World Happiness Report 2017, there are numerous considerations in this statistical project. The data was sourced from the appendix of the project and entailed. The data was downloaded from World Happiness Report 2017, with variables including country, ladder, LogGDP, Social, HLE, Freedom, Generosity, Corruption, Positive, Negative, and Gini. The definition of each variable is outlined below: -
* Ladder is the happiness score or subjective wellbeing of a county, with values ranging between 0-10, with 10 representing the best possible life.
* LogGDP is the logarithm of the nation Gross Domestic Product (GDP).
* Social represents the national average of the binary responses to the question "If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?"
* HLE is the health life expectancy at birth based on the data reported from the World Health Organization (WHO).
* Freedom is the national average to the question "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?"
* Generosity is the residual of regressing the national average of the response to the question "Have you donated money to charity in the past month?" on GDP per capita
* Corruption is the national average to the two questions "Is corruption widespread throughout the government or not?" and "Is corruption widespread within businesses or not?"
* Positive is a measure of positive affect as the average of three positive affect measures - happiness, laugh and enjoyment in the Gallup World Poll.
* Negative is a measure of negative affect as the average of three negative affect measures - worry, sadness and anger in the Gallup World Poll.
* The Gini of household income in international dollars as reported in Gallup
Statistical Methods, Results, and Synopsis
Exploratory Data Analysis
The missing values in the data set were filled using means of the columns to avoid inconsistent data. The R-Studio package used for this purpose was a dplyr package. Further, outliers and inliers were checked using boxplots. This is a better way of visualizing the data and noting any form of inconsistencies. The sample of 100 selected for all the columns did not have outliers on inliers. Multivariate normality was checked using plots. This is a method of comparing both the visualized values and the computed values using MVN package in R-Studio. From the MVN results, the data was normally distributed through the plots. The data is observed to be normally distributed as the most variable data accumulates along the fitted normal quantile line.
Figure 1: Distribution of data in the norm quantiles
Each of the ten variables was shown how its distribution was across the data using a bar graph to aid in examining the data. The data was explored deeper to see the prediction of the happiness score by the other variables. The bar graphs can be viewed after running the codes on the data distribution in R or R-Studio. The box plots were used to see the distribution of the data by assessing the outliers and managing them. Box plots were also used to determine the maximums and minimums of the data.
Figure 2: A histogram showing the distribution of happiness scores
The histogram illustrates that many nations recorded their happiness score to be between 5-6. There are very few countries that had their happiness score at 7.5, as it can be observed from the histogram. Other variations across the data can be seen after running the codes shown on the codes sheet attached in the appendix.
Multiple Linear Regression
The regression model was used to show the relationship between the happiness score (ladder) and the nine explanatory variables. Through the regression analysis developed in the project, we can see how the happiness scores of each country can be calculated based on the correlation of the different variables. There is a relational effect between the happiness score and each of the explanatory variables. It was also noted that only corruption and Gini had a negative relationship with the happiness score, with coefficient values at -0.004427 and -0.068568, respectively. The variable with the highest coefficient with the happiness score is social, closely followed by freedom. The R code used to run the regression model is shown:
lm(formula = Ladder ~ LogGDP + Social + HLE + Freedom + Generosity +
Corruption + Positive + Negative + gini, data = data2)
Table 1 shows the coefficients of the different variables.
Intercept
LogGDP
Social
HLE
Freedom
Generosity
Corruption
Positive
Negative
gini