100% (1)
Pages:
4 pages/≈1100 words
Sources:
1
Style:
APA
Subject:
Mathematics & Economics
Type:
Statistics Project
Language:
English (U.S.)
Document:
MS Word
Date:
Total cost:
$ 20.74
Topic:

Multiple regression project.Mathematics & Economics Statistics Project

Statistics Project Instructions:

Do #6 and #7. This is a team work, my work is 6 and 7. 


Statistics Project Sample Content Preview:

Multiple Regression Project
Student’s Name
Institutional Affiliation
Multiple Regression Project
Data
Hypothesis
ANOVA Table
At significance level of 0.05, we wish to determine if the model is useful for predicting the response. Test of hypothesis:
Ho: β1=β2=β3=β4=β5=0 (The model is not significantly fit for the data)
H1: at least one βi ≠ 0 (Anything different)
The decision is to reject the null hypothesis if p-value is less than 0.05.
When determining if the independent variables are significant in predicting the dependent variable, the hypothesis states as the following:
Ho: βi = 0 (the ith independent variable selected is not significant or useful for predicting the 2014 retail sales)
H1: βi ≠ 0 (Otherwise or anything different)
The decision is to reject the null hypothesis whenever the p-value is less than the significance level 0.05.
Data Collection
Variables
The dependent variable is the 2014 retail sales; Y’. The USA sales growth % ('14 v ‘13), worldwide retail sales ($ Million), USA % of worldwide sales, number of 2014 stores and worldwide growth % ('14 v ‘13) are independent variables.
Metadata
The source of the dataset is from US Commerce Department and was compiled by the National Retail Federation. There is no known evidence why and how the data was collected. The number of stores sampled were 100. The data is of year 2015 because it was used to predict 2014 sales.
Regression Model
First analysis
The general form of a multi-regression equation is:
Y’=β0+β1X1+β2X2+β3X3+β4X4+β5X5 ; where X1 is USA sales growth %, X2 is worldwide retail sales, X3 is USA % of worldwide sales, X4 is #2014 stores and X5 is worldwide growth %.
The first regression equation for the data as per the coefficients’ output table is:
2014 sales = -28595.576 + 26.644(USA sales growth %) + 0.706(worldwide retail sales) + 336.192(USA% of worldwide sales) + 0.154(#2014stores) – 5.903(worldwide growth %)
β0= -28595.576 is the constant for the equation. The unstandardized coefficients from the coefficients table show how much the 2014 sales variable varies with any independent variable when all other independent variables are held constant. For instance, consider the effect of USA sales growth %. The unstandardized B1 coefficient is 26.644. This means each increase of USA sales in % results to an increase in 2014 sales by 26.644. As for worldwide growth %, 2014 sales would decrease by 5.903.
Significance of independent variables
Some of the independent variables are not significant for predicting 2014 retail sales. To test if all independent variables are necessary in predicting 2014 retail sales or if some can be removed from the model, a test of hypothesis is necessary at α = 0.05.
H0: βi = 0 (the ith independent variable selected is not significant or useful for predicting the 2014 retail sales)
H1: βi ≠ 0 (otherwise or anything different)
Here also, the decision is to reject the null hypothesis if p-value<0.05. For USA sales growth %, #2014 stores and worldwide growth%, the t-value is 0.406, 0.936 and -0.086 while p-value=0.685, 0.351 and 0.931 respectively. Since all p-values mentioned are less than 0.05, we shall accept the null hypothesis and conclude that they are not useful variable in the model and can be removed. In the case of worldwide retail sales and USA %, t-value is 58.196, p-value=0.0005 and 10.255, 0.0005 respectively. Here, 0.0005<0.05 so reject the null hypothesis and conclude that worldwide retail sales and USA % of worldwide sales are significant in predicting 2014 retail sales. For every regression, these assumptions are necessary:
* Linearity: Since this assumption cannot be tested using SPSS, they are determined by the choice of variable. In this case, all variables were measured at continuous level. Therefore, the assumption is met.
* Variability: To check for homogenous variance (homoscedasticity), a studentized residuals against unstandardized predicted values graph is plotted. It shows a random scatter of plots with constant variability.
* Normality: The normal P-P plot of the residuals shows majority of points close to diagonal. The data points are connected and...
Updated on
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:
Sign In
Not register? Register Now!