COVID-19 Confirmed Cases, Hospitalization, and Deaths in Chicago
There will be a mini-project that you are required to work on individually. The project is designed to teach the students on how to work with data, use commercially available Statistical software (I ordered Statistix for the class but you can use any other packages such R, SAS,.. ) and statistical writing skills. A short report describing
1) Data
2) Statistical tools you used to analyze the data
3) Results
4) Conclusion
You need to find or collect data on at least 80 objects/subjects/entities with at least 7 variables including at least one nominal, one categorical and three numerical data. You need to analyze your data using the at least two of the following (ANOVA and Multiple Regression; ANOVA and Categorical Data Analysis; Multiple Regression and Time series).
Here are links for thousands of datasets,..
http://users(dot)stat(dot)ufl(dot)edu/~winner/datasets.html
https://www(dot)sheffield(dot)ac(dot)uk/mash/statistics/datasets
https://people(dot)sc(dot)fsu(dot)edu/~jburkardt/datasets/stats/stats.html
http://statweb(dot)stanford(dot)edu/~sabatti/data.html
https://guides(dot)lib(dot)berkeley(dot)edu/publichealth/healthstatistics/rawdata
https://data(dot)world/datasets/csv
https://knowledge(dot)domo(dot)com/Training/Self-Service_Training/Onboarding_Resources/Fun_Sample_Datasets?utm_source=google&utm_medium=g&campid=7015w000001Off8AAC&gcreative=341955416836&gdevice=c&gnetwork=g&gkeyword=&gplacement=&gmatchtype=b>arget=&gadposition=&s_kwcid=AL!5964!3!341955416836!b!!g!!&gclid=EAIaIQobChMI_Jb24JPv6wIVYwiICR3zggJhEAAYASAAEgLGl_D_BwE
https://libraryguides(dot)missouri(dot)edu/c.php?g=213300&p=1407295
https://libguides(dot)lib(dot)rochester(dot)edu/data-stats
https://shsulibraryguides(dot)org/stats
COVID-19 Confirmed Cases, Hospitalization, and Deaths by Race, Age, and Gender from March 1 to September 17, 2020 in the City of Chicago
Your Name
Department, University
Course number: Course name
Professor’s Name
Date
COVID-19 Confirmed Cases, Hospitalization, and Deaths by Race, Age, and Gender from March 1 to September 17, 2020 in the City of Chicago
* DATA
The data acquired summarized the number of COVID-19 cases in the City of Chicago from March 1 to September 19, 2020. A total number of seven variables was acquired from the data set, which includes race (categorical/nominal), gender (categorical/nominal), age group (categorical), number of deaths (numerical), number of hospitalizations (numerical), number of deaths (numerical), and date (numerical).
The data shows the total number of cases per day, in each category, which also includes the overall total daily cases, the number of patients hospitalized, and the number of deaths. According to the data set provided by the City of Chicago (2020), the daily cases are tallied on the day that a test specimen was collected from the individual. Hospitalizations are tallied on the first day that the patient was admitted, so only one hospitalization is accounted for each case. Last, the deaths are tallied on the day of death of the patient that was confirmed with COVID-19.
The data set has many categorical and nominal variables based on the demographics, including race, gender, and age group. In the race demographic, the data set tallies the frequency of daily cases, hospitalization, and deaths of Latin, non-Latin, Non-Latin blacks, non-Latin White, Asians, and unknown races. In the gender demographic, the data set tallies the frequency of daily cases, hospitalization, and deaths of males, females, and unknown gender. In the age group, the data set tallies the frequency of daily cases, hospitalization, and deaths of ages 0 to 17, 18 to 29, 30 to 39, 40 to 49, 50 to 59, 60 to 69, 70 to 79, and 80 onwards (City of Chicago, 2020).
* STATISTICAL TOOLS TO ANALYZE DATA
The statistical used to analyze the COVID-19 data from the City of Chicago is one-way ANOVA and categorical analysis (Chi-squared goodness of fit). The data acquired are daily frequencies of confirmed daily cases, hospitalization, and deaths. Each group of the independent variable (gender, race, and age group) is treated individually to test its association with each dependent variable (case, hospitalization, and death). The association of the independent variable is not applicable in the data set since it only presents the daily frequencies and not the interaction of the independent variables.
The paper aims to answer the question if there is a significant difference between the independent variables (gender, race, and age group) and the dependent variables (case, hospitalization, and death). It also aims to answer if there is a significant difference between the expected and observed frequencies of each independent variable. Lastly, it aims to forecast future values of the COVID-19 related case, hospitalization, and deaths for two weeks using time series analysis.
* Daily COVID-19 case
1 Ho: There is no significant difference between each independent variable (gender, race, and age group) and the frequency of cases.
Ha: There is a significant difference between each independent variable (gender, race, and age group) and the frequency of cases.
2 Ho: There is no significant between each independent variable (gender, race, and age group) and the frequency of hospitalization.
Ha: There is a significant difference between each independent variable (gender, race, and age group) and the frequency of hospitalization.
3 Ho: There is no significant difference between each independent variable (gender, race, and age group) in the frequency of deaths.
Ha: There is a significant difference between each independent variable (gender, race, and age group) and in the frequency of deaths.
* Observed vs Expected frequencies
1 Ho: There is no significant difference between observed and expected frequencies of the dependent variable case, hospitalization, and deaths) by gender.
Ha: There is a significant difference between observed and expected frequencies of the dependent variable (case, hospitalization, and deaths) by gender.
2 Ho: There is no significant difference between observed and expected frequencies of the dependent variable (case, hospitalization, and deaths) by race.
Ha: There is a significant difference between observed and expected frequencies of the dependent variable (case, hospitalization, and deaths) by race.
3 Ho: There is no significant difference between observed and expected frequencies of the dependent variable (case, hospitalization, and deaths) by age group.
Ha: There is a significant difference between observed and expected frequencies of the dependent variable (case, hospitalization, and deaths) by age group.
* RESULTS
* Age Group (ANOVA)
In table 1, the p-value acquired is less than alpha level of 0.05 (F7,1600 = 1.9732e-07, p<.05). Therefore, there is a significant difference between age and daily confirmed cases; thus, reject the null hypothesis and accept the alternative hypothesis. Post hoc analysis using Scheffe’s test shows that the age group of 0-17 (M=26.36, SD=21.21), 70-79 (M=21.22, SD=21.48), and 80+ (M=15.75, SD=21.03) have lower frequencies of daily confirmed COVID-19 cases compared to other age groups. The age group that has the highest frequency of daily confirmed cases of COVID-19 are aged 18-29 (M=83.39, SD=56.98).
Table 1
ANOVA results COVID-19 cases by Age Group
Sum of Squares
df
Mean Square
F
Sig.
Between Groups
884771.510
7
126395.930
67.451
1.9732e-85 *
Within Groups
2998229.771
1600
1873.894
Total
3883001.281
1607
*p < .05, two-tailed.
In table 2, the p-value acquired is less than alpha level of 0.05 (F7,1568 = 9.2211e-48, p<.05). Therefore, there is a significant difference between age and hospitalization; thus, reject the null hypothesis and accept the alternative hypothesis. Post hoc analysis using Scheffe’s test shows that the age group of 50-59 (M=10.71, SD=12.217), 60-69 (M=11.88, SD=13.24), and 70-79 (M= 9.79, SD=10.63) have the highest frequencies of hospitalization compared to other age groups. The age group that has the lowest frequency of hospitalization are aged 0-17 (M=0.18 SD=1.05).
Table 2
ANOVA results of COVID-19 Related Hospitalization by Age Group
Sum of Squares
df
Mean Square
F
Sig.
Between Groups
19757.710
7
2822.530
36.678
9.2211e-48 *
Within Groups
120665.168
1568
76.955
Total
140422.878
1575
*p < .05, two-tailed.
In table 3, the p-value acquired is less than alpha level of 0.05 (F7,1600 = 2.4763e-92, p<.05). Therefore, there is a significant difference between age and death; thus, reject the null hypothesis and accept the alternative hypothesis. Post hoc analysis using Scheffe’s test shows that the age group of 80+ (M=4....