Use of AI Modeling in Determination of Premium Prices for the Insurance Industry
See attachment. Let me know if you need to add extra pages.
Assignment Summary: This assignment is to select a Dataset and analyze it (through the lens of answering a business question) with the appropriate AI techniques in Python you learned during the semester in Chapters 12, 13, 15 or 16. You may NOT use any of the datasets used as examples in the book for those chapters. You may NOT use any of the datasets used in homework exercises assigned this semester. If you use one of these datasets, your grade on the paper will be zero. This semester we learned several AI techniques in Python to analyze data. You should think about data that interests you and how it could be analyzed with AI to inform a business problems/issues/qustions. You should identify a dataset that exists that you can use (there is no requirement to collect your own data). There are many web sites that provide datasets – you’ll need to spend some time looking for a dataset that you are interested in. The dataset must have at least 10,000 cases/items and at least 5 features per item. The database must be publicly available and in English. You might try (but there are many other sites available): https://aihub.cloud.google.com( https://www.tensorflow.org/datasets/catalog/overview Assignment Details: This assignment is submitted in 2 parts both due on the same day/time as noted in CourseSite. Part 1: Word document submitted via TurnItIn.com link. Part 2: Jupyter notebook with code containing all revelant analyses that you used to write the paper. The notebook should include Markdown cells or comments in the code codes giving details on what each code block is doing. These can be short but must clearly explain in English what the code is doing (example: The following code implements the KneighborsClassifier and trains the model). Word Document Deliverable: The focus of this paper is to describe the data you will use for the final project, what analyses you choose to do on the data, and what you concluded from the analyses. The paper must include the following sections, numbered with these exact headings: 1. Introduction: What is the dataset you are using and where did you find it? Give the exact URL in the reference section at the end of the paper. Describe the data in full sentences including how the data was collected, when it was collected, how many data points are in the dataset and how many variables (columns) are in the dataset. What does each data point (meaning each row) represent? For example, in the California Housing dataset, each row is a census block. In the movie data each row is a movie. Include why you think this data is interesting to study in the context of AI. What possible problems could exist with the data (such as issues with the data collection that would make the data false or biased)? If you find more information about the data online (such as what others have analyzed), give all relevant URLs in the reference section. 2. Detailed Description of Data: For each variable in the dataset, give the descriptors from the describe() command including count of examples, mean, standard deviation, max, min, and various quantiles. Note which columns are numerical and which are categorical. If there are many columns, you only need to include the columns you think you would use for your specific business questions. Detail any interesting observations or discrepancies you see for any columns, such as the data being skewed in a certain direction, having a low or high standard deviation, or a substantial amount of missing data. You must write about any possible bias you see in the data and how to correct the possible bias.You should use tables or other figures here to help understand the descriptive data. Do not provide the Python code, but include graphs and tables from the Python output here. 3. Three AI Business Questions: What business questions might be posed for this data that you could consider exploring using an AI model in the final project? You must pose at least three questions in three separate paragraphs. Examples from the tutorials would be “Can we predict median housing price in a city block given the total rooms in properties in that city block?” or “Given census data about a person such as age, gender, education and occupation, can we predict whether or not the person earns more than $50,000/year?” For EACH AI business question, give details about which variables (columns) you would use – include which variables are the independent variables and which are the dependent variables. Include any transformations you may need to make to any variables (such as transforming a numerical variable into a binary variable). The business questions must be substantially different from each other. For example, you can’t take the census data income question above and change $50,000 to $100,000 and call that another question. You don’t have to use these questions for the final project, but they are a good starting point. You must include for question why it is interesting – would it affect public policy or business decisions of a particular company, or something else? This must be fairly detailed and could include references to articles about current business issues. 4. AI Model Analysis: This section has the following subsections: a. Businss Question and AI Technique: Select ONE of the business questions and ONE AI technique learned in class. You must use a technique from the textbook chapters we covered. For example, classification, sentiment analysis, multiple linear regression, unsupervised machine learning, convolutional neural networks (this is not a complete list). Write a short paragraph on which business problem you selected, and why you selected a particular AI technique to explore the business question. b. Data Visualization: For the question you are analyzing, provide appropriate visualization. This could be word clouds, scatterplots, bar graphs. The code to produce these should be in the Jupyter Notebook, but you must copy and paste the visualization into your paper. You must number and give each visualization a Title (example could be Figure 1: Scatter Plot of Median House Price by House Age). Write about each visualization you chose in your paper detailing what it means. c. AI Model Results: Using the relevant output from the Jupyter Notebook, write about the results of your model and any tuning you did (for example, trying different hyperparameters). You should intrepret the results, not just report them. What do they mean in the context of your business problem? 5. Conclusion: Write a short concluding paragraph about your chosen dataset and business question you analyzed, summing up what you learned by exploring the data and running the AI model.
!!! NOTE: ignore the coding part of the instructions !!!
Use of AI Modeling in Determination of Premium Prices for the Insurance Industry
Your Name
Subject and Section
Professor’s Name
December 1, 2022
Understanding the importance of technological innovation and data analysis is essential for today's organizations. It allows for a more competitive approach toward real-life problems and data utilization for strategic business decisions. Accordingly, the dataset I am using in this article is entitled Non-communicable Diseases (NCDs) from the World Health Organization (WHO).
Accordingly, this dataset provides a comprehensive list of the data collected from various countries for the year 2019. It includes data from various countries for the most common lifestyle and other related illnesses, including (1)alcohol, (2)cancer, (3)CRDs, (4)CVDs, (5)Diabetes, (6)Obesity, (7)Physical Inactivity, and (8)Tobacco. Nonetheless, the real data is not downloadable as a whole but per NCD type. Upon downloading, each column represents each measurable unit for the year 2019, while the rows represent each country/region. For example, in the case of Cancer-NCD, the columns represent various units or metrics, including the region, sex, and numeric. However, units like upper- and lower-confidence limits were not indicated.
In line with this research, the author believes that knowing these NCDs is essential for businesses, especially healthcare insurance providers. Note that the main business of insurance providers is related to the assumption and diversification of risks among a large group of individuals. In turn, the insurer's premium depends on various factors, including the number of individuals covered, inflation rates, socio-political circumstances, and other relative circumstances.
Given this business model, it is clear that the profit-generation model of insurance companies may be affected severely by the happening of risk. Their profit generation would be lower when the number of insured payers is low while the happening or occurrence of risk is high (O'Connell, 2019). Thus, in order for insurance companies to continue providing service to their consumers while also maintaining their profitability, the use of Artificial Intelligence (AI) for Probabilistic Modelling could be used to determine the amount of premium that the insured will pay relative to the amount of covered peril (i.e., combined risk ratio).
However, despite the importance of this dataset for answering the relevant question at hand, some of the problems that may arise include (1) the representativeness of data, (2) biases in data collection, (3) and the broad scope of this dataset.
First, it is clear that the data covers nation-states on a large scale. This means that the scope and data collection methods are census and surveys collected by each nation. However, not all insurance companies operate on a global scale over, which this dataset would be helpful.
The second refers to the representativeness of the data collected. Even though most of these would be census data, hospital records, and community-based data, some numeric information may only represent part of the population, especially for low-GDP countries that have less capacity for a nationwide census. In one study by Skinner (2018), the author noted the sheer difficulty faced by smaller developing countries when it comes to census taking, with some of them reportedly taking shortcuts to increase census feasibility. For example, some may take small random samples rather than deriving the data from the whole population.
Finally, there is also the possibility of bias in the data collection process. Note that compared to broad community surveys, censuses are usually conducted by nation-states for policy-making purposes. However, reports may be changed or altered depending on the specific goal of the country, whether domestic or international.
Detailed Description of the data
The dataset provided includes various worksheets for each NCD type and associated risk factors. However, since they are separated, a collation of all the data between worksheets is necessary for a complete analysis. First, the datasets provide a general overview of each critical NCD-related key risk factor for the year 2019, divided between (1) Males, (2) Females, and (3) Total. Given that about eight (8) risk factors are provided, with each one having a total of 1,173 rows (and five independent variables), the number of row data provided in the risk-related datasheet is about 46,920 in total. In contrast, the NCD datasheet provides a single general number amounting to 2,280 rows divided into the same five variables. However, considering that the columns represent the same independent variables, then the total number of unique data amounts to 11,400. All in all, the total number of unique data spread across five variables (indicator name, country name, region, sex, and numeric data) is 58,320.
Table 1
Detailed Description of Data and Unique Variables
No. of Unique Variables
NCD
Rows
2,280
Categories
1
Sub-total
2,280
Risk Factors
Rows
1,170
Categories
8
Sub-Total
9,360
Total Unique Data
11,640
Independent Categories
5
TOTAL
58,200
Accordingly, one of the significant variables, which are included in the NCD variable, are the various probability estimates, including (1) probability of premature mortality, (2) percentage of total deaths, and (3) NCD age-standardized age rate, among others. Since there are only a few columns in this dataset, then all of these units would be necessary for the probabilistic determination of mortality rates from NCD relative to a country, age, or gender, among others.
However, one issue with the datasets is the need for more information for a few of the countries. Fortunately, these instances are very few, with almost all countries having data for each NCD differentiated between age and gender. Nevertheless, one bias (or error) that the author believes is risky is the use of substitutes for missing datasets. In the Tobacco sheet, it is clear that multiple rows are missing "tobacco use estimates" and thus replaced with "tobacco smoking estimates"; other instances may also be seen in other worksheets. In one study done by Beurs et al. (2019), the authors found that the process of data imputation, while effective and beneficial for most cases, could lead to bias, unrepresentativeness, and other issues, especially when kept unchecked. Thus, post-processing and cross-referencing data may be needed to polish the worksheets that utilized imputed data.
Three AI Business Questions
In line with the primary goal and the problems established in the succeeding sections, the author believes that the three essential questions for this AI project are:
1 What is the probability of more significant premature NCD death between external factors (i.e., based on the United Nations Country Classification Method)?
2 Is there any specific trend for each variable (NCD and Risk-types) between the countries' classifications (as determined from the first question)?
3 What is the proper premium price adjustment relative to the combined risk ratio of each c...
👀 Other Visitors are Viewing These APA Essay Samples:
-
Cybercrime and Its Effects on American Society
7 pages/≈1925 words | 10 Sources | APA | IT & Computer Science | Research Paper |
-
Trends of Cybersecurity in the Healthcare Industry
5 pages/≈1375 words | 7 Sources | APA | IT & Computer Science | Research Paper |
-
Kiva Borrowers: Data Requirements, Assessments, and Recommendations
2 pages/≈550 words | 2 Sources | APA | IT & Computer Science | Research Paper |