fbpx

Bridging the gap: constructing mortality tables using data science

By Mayur Shah, FCIA

For life insurance and reinsurance companies, mortality tables are foundational in the pricing and valuation of products. Actuaries have historically relied on traditional methods to build these tables, using data points such as gender, smoking status, age and duration.

However, as data science techniques have advanced, there is a significant opportunity to re-imagine the task of building these tables.

Building mortality tables

Actuaries have historically constructed mortality tables using a process called graduation, which uses statistical techniques to smooth observed mortality rates to remove fluctuations and create a more regular pattern. With the increases in computing power over time, we can now use different techniques to produce mortality tables.

One could say “newer” statistical techniques, but some of these methods have been around for longer than traditional actuarial graduation. They can be considered more modern in their applicability to actuarial science. These techniques include:

  • generalized linear models (GLMs)
  • neural networks
  • generalized additive models
  • tree-based methods

The GLM is the simplest for most to understand, but all the techniques are variations on the same principle – take a block of data and fit a model that can be used for prediction.

The role of data scientists in actuarial work

Data scientists can contribute to actuarial tasks by applying their expertise in these advanced methods. Their skills in data handling, model creation, and visualization are valuable for modernizing mortality tables. However, it’s important for data scientists and actuaries to work closely together, as the process requires careful consideration of model assumptions, potential overfitting, and interpretability.

Know your data

Most of the work in any modelling exercise has to do with understanding the data.

The relevance of the data must be considered as mortality rates can vary significantly over time due to factors such as medical advances, lifestyle changes and underwriting requirements.

Here’s a snapshot of how underwriting and data collection have evolved in Canada:

  • Pre-1980: Most business was whole life. The only premium distinction was between males and females.
  • Early 1980s: Smoker distinction was introduced but was self-reported.
  • Mid-1980s: Blood testing begins with the AIDS epidemic. Cotinine was a by-product, allowing insurers to verify smoking status.
  • Early 1990s: Introduction of oral fluid collection for lower face amounts to test for HIV/cotinine/cocaine, with blood testing for higher face amounts.
  • Late 1990s: Preferred classes were introduced. More data points were collected via fluid testing (such as lipids), giving underwriters more information.
  • Early 2000s: Companies begin to perform cost/benefit analyses of the fluids collected, leading some to start removing these at lower ages and amounts.
  • Post-2015: Accelerated underwriting gains popularity to speed up policy issuance. With the onset of the COVID-19 pandemic, accelerated underwriting became more of a necessity as labs were closed and in-home visits were not permitted. Companies increased the use of predictive models and random testing to offset the negative impact to mortality of removing fluid testing.

Developing a model

We heard the buzzwords about machine learning and predictive analytics but wanted to come up with a real-world solution. PartnerRe’s Canadian actuarial pricing team and data scientists worked together to come up with a framework to create a different kind of mortality table.

We started with the Canadian Institute of Actuaries (CIA) intercompany mortality study data. This is a comprehensive dataset covering most underwritten individual life business in Canada issued as far back as a century ago.

There were some challenges in modelling as identified by the data scientists:

  • Data grouping: The data is not true seriatim but is grouped at a fine enough level to model.
  • Sparse data: There is limited data at higher ages and durations.
  • Extreme values: Wide range of claim amounts (from zero to millions), with a high proportion of records with zero claims.
  • Preferred class reporting: Inconsistencies were reported between companies.

The first step was to summarize the data and review the applicability. In performing this we noticed some oddities:

  • For a given attained age, durations past 40 seemed to show lower mortality than durations just before 40. There may be reasons for this but intuitively it didn’t make sense.
  • There was a surprising amount of data on business issued to non-juveniles with blended smoking status in recent study years. Our knowledge of the market indicated this should not be the case. The CIA investigated and found an error in the original data which was subsequently corrected.

Actuarial judgment was also required to understand that certain data such as post-renewal term and conversions should not be included in the dataset as that is accounted for separately in setting a mortality assumption.

In the end we limited the data by duration and attained age to ensure applicability to the current market and because older age data can be volatile. Expertise was required to ensure the extrapolation of the model to older ages.

One of the key learnings is that the developers of the model should work together with the end users, and it requires constant back and forth to keep the project on track.

Building the generalized linear model

We worked backwards from the premise that our goal was to have a GLM formula that could be implemented in Moody’s AXISTM Actuarial Modeling System, which is widely used in the industry for modelling cash flows and reserves. This informed our view on model and variable selection.

Model selection and approach

After our initial data review, we considered what model to select as the GLM. We looked at various options including the binomial, Poisson, negative binomial and others.

One question was whether to model mortality rates based on claim counts or claim amounts. The industry standard in North America is to use amount. Discussion between the actuarial and data science teams confirmed the view that modelling mortality rates by amount was the appropriate approach, as they are the economic driver of the business, and differences between count and amount can be handled by the model.

Data science tools also aid in selecting which variables to include. We “scored” the available variables to confirm the importance of traditional variables – age, duration, gender, smoking status – and further expanded to include variables such as face band, insurance type, and study year.

Further to this, some variables interact with each other (meaning there are dependencies between variables). We identified the most significant interactions and included them in the model. Using traditional mortality table construction techniques ignores these interactions and effectively assumes the variables are independent.

Handling select periods and variable interactions

Another interesting finding relates to the select period. Traditionally, the select period has been fixed at 15 or 20 durations for all issue ages and going from select to ultimate mortality results in a “jump” in the rate. Our analysis showed that the select effect could last longer and can vary by issue age. This was accounted for in the GLM where the duration was transformed into an asymptote that gradually approaches a maximum (the ultimate rate).

Results

Through this process, we were able to develop a GLM formula that can be implemented as a formula table in AXISTM.

Benefits and considerations

Using a GLM is a novel approach to traditional mortality tables in modelling. There are several benefits, including the following:

  • Broader factor consideration: Predictive models can include a wider range of factors impacting mortality.
  • Valuable interaction: Captures interaction between variables.
  • Efficiency: Many tables can be replaced with a single formula.
  • Flexibility: There can be a smoother transition from the select to the ultimate period.
  • Timeliness: Building mortality tables using graduation can be time intensive. Predictive models can be “refreshed” with new data, ensuring current relevance.

However, there are also potential considerations:

  • Software limitations: Constraints in modelling software – both in variables and functionality.
  • Complexity: You can’t “see” a two-dimensional table – GLMs require understanding and training.
  • Data credibility: Overrides are required at older attained ages where data is not credible (however, this is also the case with traditional techniques).
  • Co-ordination: Requires internal and external buy-in to ensure success and co-ordination between teams (management, pricing, valuation, Appointed Actuary, external auditors, peer reviewers, etc.).

Conclusion

The unique skill sets of actuaries can be greatly enhanced by incorporating data science techniques – and collaborating closely with data scientists – as demonstrated in the construction of mortality tables. This example illustrates how integrating modern statistical methods with traditional actuarial expertise can lead to more accurate and flexible models. Success on any similar project also requires constant collaboration between the individuals and teams involved and buy-in at all levels of an organization.

By leveraging predictive models, the accuracy and flexibility of mortality tables can be improved, ultimately benefiting life insurers, reinsurers, and other organizations that rely on them.

Mayur Shah is Senior Vice President and Chief Pricing Officer at PartnerRe Life & Health Canada.

The opinions expressed herein are solely those of the author. This article is for general information, education and discussion purposes only. It does not constitute legal or professional advice and does not necessarily reflect, in whole or in part, any corporate position, opinion, or view of the Canadian Institute of Actuaries, PartnerRe, or any affiliates.

Add comment

Follow us

Contact Us

Canadian Institute of Actuaries
360 Albert Street, Suite 1740
Ottawa, Ontario K1R 7X7
SeeingBeyondRisk@cia-ica.ca

Subscribe to our emails