The results below are likely only meaningful to subject matter experts because the source dataset employs abbreviations, jargon and/or otherwise non-obvious labels. You may get in touch to help improve the source data, or you may browse Analyst-2 to find more accessible datasets.

US Health Insurance Dataset

Insurance Premium Charges in US with important details for risk underwriting. (teertha/ushealthinsurancedataset)   []

Context

The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.

Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:

  • Big Data: The explosion of unstructured data in the form of images, videos, text, emails, social media
  • AI: The recent advances in Machine Learning and Deep Learning that can enable businesses to gain insight, do predictive analytics and build cost and time - efficient innovative solutions
  • Real time Processing: Ability of real time information processing through various data feeds (for ex. social media, news)
  • Increased Computing Power: a complex ecosystem of new analytics vendors and solutions that enable carriers to combine data sources, external insights, and advanced modeling techniques in order to glean insights that were not possible before.

This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.

Content

This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.

Inspiration

This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.

Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression

Data summary

  • File 'insurance.csv'

    • Table ‘insurance’ consists of 1338 data rows along seven dimensions: ‘age’, ‘sex’, ‘bmi’, ‘children’, ‘smoker’, ‘region’ and ‘charges’

Size: 16.0 KBSource: KaggleLast updated: 2022-01-28 14:40

Analyst-2 explores entire data repositories and data lakes, autonomously analyzing each dataset using the Inspirient Automated Analytics Engine.

If you would like Analyst-2 to surface insights in your company's data repository or data lake, please get in touch!

Creative Commons License

These analysis results by Inspirient GmbH are licensed under a Creative Commons Attribution 4.0 International License in conjunction with the licence of the source dataset.