Research and data

Big data analytics to tackle substance use disorders

hands holding a pill bottle substance use disorder

Substance use disorders are a major cause of death and illness in the United States with more than 70,000 deaths due to overdose in 2016 alone (1). Collecting accurate and comprehensive public health data on people with substance use disorders and associated health issues promises to help address the effectiveness of treatment and possibly reveal new methods of prevention. By leveraging the data that does exist and creating new tools to analyze it, we are beginning to see new patterns in substance use disorders we hope will help develop more effective interventions.

Substance use and the challenges of underreporting

Substance use disorders are characterized by taking a substance, such as illicit drugs or alcohol, in larger amounts or for longer than one is meant to, as in using more of a medication than is prescribed. This pattern leads to increases in use and impacts an individual’s ability to manage at work, home or school, potentially leading to detrimental effects in a person’s life (2). Individuals with substance use disorder face barriers to healthcare and are often underrepresented in traditional surveillance data sources (3). The underreporting of substance use disorder and any subsequent overdoses can fuel broader public health issues, such as increased infectious disease transmission among those using substances (4); (5) and social instability (6).

Public health surveillance is a cornerstone of public health practice. Collecting data in an ongoing and systematic way better informs our understanding of public health. Through surveillance, we gain greater insight into the health of populations and become aware of potential health problems. We also have the ability to determine if different forms of treatment and interventions are working.

Currently there are limited tools available to perform surveillance on substance use disorders. This is not surprising since individuals living with substance use disorders often struggle with issues such as stigma, unstable housing, and poor access to healthcare, making them almost invisible to many of our traditional systems for surveillance.

Given that substance use disorders are a major public health problem, there is a tremendous need to perform better surveillance. Understanding the dynamics of substance use disorders in near real time could provide early warning of overdose outbreaks and surges in substance use disorder associated infectious diseases. Surveilling substance use can also provide valuable feedback on the efficacy of public health interventions. However, there are currently no established approaches to leverage the growing collection of data to better address this public health challenge.

Using big data to identify trends

Fortunately, state-level integrated databases to study substance use disorder behaviors have been created in states such as Massachusetts, Pennsylvania, and Maryland, and there is a call for more states to develop similar databases (7). Massachusetts currently leads the nation with its Public Health Data Warehouse, (8) an extensive integrated database housed at the Massachusetts Department of Public Health (9). The Public Health Data Warehouse links individual level data between the Massachusetts All-Payer Claims Database, to more than 15 databases across government agencies.

These sources link individuals between multiple administrative data systems, creating the potential to understand how they interact with healthcare and other governmental systems. These data systems are complex, but rich with information. They have tremendous potential to give us a better understanding of substance use disorders and monitor patterns over time.

Researchers at the Boston University School of Public Health have begun to use this data to explore and understand patterns and trends in substance use disorders. We previously used the Public Health Data Warehouse to estimate the prevalence of Opioid Use Disorder in Massachusetts using capture-recapture methods (10). This capture-recapture approach allows us to estimate the “hidden” populations of individuals living with Opioid Use Disorder by leveraging information from those who are observed at some point in the system. We are now leveraging this work and the Public Health Data Warehouse to develop a model to develop cost effective treatment strategies for Opioid Use Disorder in a project lead by Dr. Benjamin Linas at Boston Medical Center.

New systems to further understand substance use disorders

As part of this work, we are building systems and creating analytic tools to better utilize the rich, but noisy, information in the Public Health Data Warehouse with the goals of:

  1. better understanding the current patterns and burden of substance use disorders,
  2. determining factors that are associated with worse community level outcomes for substance use disorders (such as overdose and development of related diseases, such as HIV or Hepatitis C), and
  3. creating monitoring systems for early detection of outbreaks of overdose or other substance use disorder-associated infectious diseases.

The work we are doing in Massachusetts can be translated to other states as they develop their systems..

There are exciting opportunities to use machine learning tools to determine the best ways to find signals in this data. For instance, research has shown that overdoses tend to cluster geographically (11). This allows us to look further at important features of some communities that could make them more susceptible to some of these outcomes. Machine learning could be useful in defining community level signatures that might be associated with poor outcomes and can be monitored over time.

Ultimately, in partnership with public health departments, we are working to develop computational tools. These tools will be used to gain greater insight into substance use disorder and more effectively monitor them with the goal of developing and testing interventions to reduce their adverse impact on society.

If you would like to learn more about the work being done at BUSPH and collaboration opportunities at idea hub, please reach out, or visit

Laura Forsberg White, PhD, is an Associate Professor of Biostatistics at Boston University. She is interested in developing statistical tools to better understand the transmission dynamics and epidemiology of infectious diseases and substance use disorders including Hepatitis C, HIV, TB, and COVID-19. She has developed methods to estimate transmission parameters. Her work involves the development of novel methodologies for the detection of disease outbreaks, as well as the application statistical methods for public health surveillance systems, including spatial statistics and time series.


1) Scholl, L., Seth, P., Kariisa, M., Wilson, N., Baldwin, G., 2018. Drug and Opioid-Involved Overdose Deaths — United States, 2013–2017. MMWR. Morb. Mortal. Wkly. Rep. 67, 1419–1427.

2) DSM 5 Criteria for Substance Use Disorders [WWW Document], n.d. URL (accessed 10.19.20).

3) Wu, L.T., Zhu, H., Swartz, M.S., 2016. Treatment utilization among persons with opioid use disorder in the United States. Drug Alcohol Depend. 169, 117–127.

4) Dutta, R., Roy, S., 2012. Mechanism(s) Involved in Opioid Drug Abuse Modulation of HAND. Curr. HIV Res. 10, 469–477.

5) Lambdin, B.H., Bluthenthal, R.N., Zibbell, J.E., Wenger, L., Simpson, K., Kral, A.H., 2019. Associations between perceived illicit fentanyl use and infectious disease risks among people who inject drugs. Int. J. Drug Policy 74, 299–304.

6) Daley, D.C., 2013. Family and social aspects of substance use disorders and treatment, in: Journal of Food and Drug Analysis. Elsevier Taiwan LLC, pp. S73–S76.

7) Stein, B., 2019. Addressing the U.S. Opioid Crisis: Using an Integrated Systems-Based Approach, Addressing the U.S. Opioid Crisis: Using an Integrated Systems-Based Approach. RAND Corporation.

8) Chapter 55 of the Acts of 2015: An Act Requiring Certain Reports for Opiate Overdoses, 2015. . The 190th General Court of the Commonwealth of Massachusetts.

9) Bharel, M., Bernson, D., Averbach, A., 2020. Using Data to Guide Action in Response to the Public Health Crisis of Opioid Overdoses. NEJM Catal. 1.

10) Barocas, J.A., White, L.F., Wang, J., Walley, A.Y., LaRochelle, M.R., Bernson, D., Land, T., Morgan, J.R., Samet, J.H., Linas, B.P., 2018. Estimated Prevalence of Opioid Use Disorder in Massachusetts, 2011-2015: A Capture-Recapture Analysis. Am. J. Public Health 108, 1675–1681.

11) Nesoff, E.D., Branas, C.C., Martins, S.S., 2020. The Geographic Distribution of Fentanyl-Involved Overdose Deaths in Cook County, Illinois. Am. J. Public Health 110, 98–105.

New call-to-action

You may also like

The CAFÉ: A New Climate and Health Research Coordinating Center
Climate, the planet, and health
New Climate and Health Research Coordinating Center

At a time when climate change is causing more frequent and severe natural disasters, as well as a range of other health impacts, it’s essential that researchers, policymakers, and industry…

idea hub impact
Carlin Foundation Funds Innovation Grants: “You’ve got a great idea. How do you get it done?”

Over the past few years, and with generous support from alumni and friends, the Boston University School of Public Health has increased its seed funding for faculty-initiated projects that advance…

the effects of social media on mental health is a trending topic with no signs of slowing
Research and data
The pros and cons of social media on mental health

Social media has become deeply ingrained in our daily lives, but what have been the effects of these apps on mental health? As we rely on these platforms to stay…