Research and data

Big data analytics to tackle substance use disorders

hands holding a pill bottle substance use disorder

Substance use disorders are a major cause of death and illness in the United States with more than 70,000 deaths due to overdose in 2016 alone (1). Collecting accurate and comprehensive public health data on people with substance use disorders and associated health issues promises to help address the effectiveness of treatment and possibly reveal new methods of prevention. By leveraging the data that does exist and creating new tools to analyze it, we are beginning to see new patterns in substance use disorders we hope will help develop more effective interventions.

Substance use and the challenges of underreporting

Substance use disorders are characterized by taking a substance, such as illicit drugs or alcohol, in larger amounts or for longer than one is meant to, as in using more of a medication than is prescribed. This pattern leads to increases in use and impacts an individual’s ability to manage at work, home or school, potentially leading to detrimental effects in a person’s life (2). Individuals with substance use disorder face barriers to healthcare and are often underrepresented in traditional surveillance data sources (3). The underreporting of substance use disorder and any subsequent overdoses can fuel broader public health issues, such as increased infectious disease transmission among those using substances (4); (5) and social instability (6).

Public health surveillance is a cornerstone of public health practice. Collecting data in an ongoing and systematic way better informs our understanding of public health. Through surveillance, we gain greater insight into the health of populations and become aware of potential health problems. We also have the ability to determine if different forms of treatment and interventions are working.

Currently there are limited tools available to perform surveillance on substance use disorders. This is not surprising since individuals living with substance use disorders often struggle with issues such as stigma, unstable housing, and poor access to healthcare, making them almost invisible to many of our traditional systems for surveillance.

Given that substance use disorders are a major public health problem, there is a tremendous need to perform better surveillance. Understanding the dynamics of substance use disorders in near real time could provide early warning of overdose outbreaks and surges in substance use disorder associated infectious diseases. Surveilling substance use can also provide valuable feedback on the efficacy of public health interventions. However, there are currently no established approaches to leverage the growing collection of data to better address this public health challenge.

Using big data to identify trends

Fortunately, state-level integrated databases to study substance use disorder behaviors have been created in states such as Massachusetts, Pennsylvania, and Maryland, and there is a call for more states to develop similar databases (7). Massachusetts currently leads the nation with its Public Health Data Warehouse, (8) an extensive integrated database housed at the Massachusetts Department of Public Health (9). The Public Health Data Warehouse links individual level data between the Massachusetts All-Payer Claims Database, to more than 15 databases across government agencies.

These sources link individuals between multiple administrative data systems, creating the potential to understand how they interact with healthcare and other governmental systems. These data systems are complex, but rich with information. They have tremendous potential to give us a better understanding of substance use disorders and monitor patterns over time.

Researchers at the Boston University School of Public Health have begun to use this data to explore and understand patterns and trends in substance use disorders. We previously used the Public Health Data Warehouse to estimate the prevalence of Opioid Use Disorder in Massachusetts using capture-recapture methods (10). This capture-recapture approach allows us to estimate the “hidden” populations of individuals living with Opioid Use Disorder by leveraging information from those who are observed at some point in the system. We are now leveraging this work and the Public Health Data Warehouse to develop a model to develop cost effective treatment strategies for Opioid Use Disorder in a project lead by Dr. Benjamin Linas at Boston Medical Center.

New systems to further understand substance use disorders

As part of this work, we are building systems and creating analytic tools to better utilize the rich, but noisy, information in the Public Health Data Warehouse with the goals of:

  1. better understanding the current patterns and burden of substance use disorders,
  2. determining factors that are associated with worse community level outcomes for substance use disorders (such as overdose and development of related diseases, such as HIV or Hepatitis C), and
  3. creating monitoring systems for early detection of outbreaks of overdose or other substance use disorder-associated infectious diseases.

The work we are doing in Massachusetts can be translated to other states as they develop their systems..

There are exciting opportunities to use machine learning tools to determine the best ways to find signals in this data. For instance, research has shown that overdoses tend to cluster geographically (11). This allows us to look further at important features of some communities that could make them more susceptible to some of these outcomes. Machine learning could be useful in defining community level signatures that might be associated with poor outcomes and can be monitored over time.

Ultimately, in partnership with public health departments, we are working to develop computational tools. These tools will be used to gain greater insight into substance use disorder and more effectively monitor them with the goal of developing and testing interventions to reduce their adverse impact on society.

If you would like to learn more about the work being done at BUSPH and collaboration opportunities at idea hub, please reach out, or visit

Laura Forsberg White, PhD, is an Associate Professor of Biostatistics at Boston University. She is interested in developing statistical tools to better understand the transmission dynamics and epidemiology of infectious diseases and substance use disorders including Hepatitis C, HIV, TB, and COVID-19. She has developed methods to estimate transmission parameters. Her work involves the development of novel methodologies for the detection of disease outbreaks, as well as the application statistical methods for public health surveillance systems, including spatial statistics and time series.


1) Scholl, L., Seth, P., Kariisa, M., Wilson, N., Baldwin, G., 2018. Drug and Opioid-Involved Overdose Deaths — United States, 2013–2017. MMWR. Morb. Mortal. Wkly. Rep. 67, 1419–1427.

2) DSM 5 Criteria for Substance Use Disorders [WWW Document], n.d. URL (accessed 10.19.20).

3) Wu, L.T., Zhu, H., Swartz, M.S., 2016. Treatment utilization among persons with opioid use disorder in the United States. Drug Alcohol Depend. 169, 117–127.

4) Dutta, R., Roy, S., 2012. Mechanism(s) Involved in Opioid Drug Abuse Modulation of HAND. Curr. HIV Res. 10, 469–477.

5) Lambdin, B.H., Bluthenthal, R.N., Zibbell, J.E., Wenger, L., Simpson, K., Kral, A.H., 2019. Associations between perceived illicit fentanyl use and infectious disease risks among people who inject drugs. Int. J. Drug Policy 74, 299–304.

6) Daley, D.C., 2013. Family and social aspects of substance use disorders and treatment, in: Journal of Food and Drug Analysis. Elsevier Taiwan LLC, pp. S73–S76.

7) Stein, B., 2019. Addressing the U.S. Opioid Crisis: Using an Integrated Systems-Based Approach, Addressing the U.S. Opioid Crisis: Using an Integrated Systems-Based Approach. RAND Corporation.

8) Chapter 55 of the Acts of 2015: An Act Requiring Certain Reports for Opiate Overdoses, 2015. . The 190th General Court of the Commonwealth of Massachusetts.

9) Bharel, M., Bernson, D., Averbach, A., 2020. Using Data to Guide Action in Response to the Public Health Crisis of Opioid Overdoses. NEJM Catal. 1.

10) Barocas, J.A., White, L.F., Wang, J., Walley, A.Y., LaRochelle, M.R., Bernson, D., Land, T., Morgan, J.R., Samet, J.H., Linas, B.P., 2018. Estimated Prevalence of Opioid Use Disorder in Massachusetts, 2011-2015: A Capture-Recapture Analysis. Am. J. Public Health 108, 1675–1681.

11) Nesoff, E.D., Branas, C.C., Martins, S.S., 2020. The Geographic Distribution of Fentanyl-Involved Overdose Deaths in Cook County, Illinois. Am. J. Public Health 110, 98–105.

New call-to-action

You may also like

view of hurricane natural disaster from space
Research and data
Promising data for prediction of post-disaster mental health needs in an evolving disaster landscape

There is no question that the disaster landscape is becoming more dangerous and complex. Our warming global climate has steadily brought more intense hurricane seasons and increasingly dangerous heatwaves and…

replicating research in science blu background with lights
Research and data
Communicating and promoting replicability in science

Scientific research is facing a reckoning in the court of public opinion. Anti-science sentiment has manifested in a lack of trust in institutions, claims of frivolity leading to cuts in…

vial of blood on test sheet
Research and data
Type 2 diabetes, HIV, and tuberculosis in South Africa

Advances in human immunodeficiency virus (HIV) treatment have resulted in patients living longer, largely due to antiretroviral therapy (ART). With this extended lifespan, we are now seeing increases in lifestyle-associated…