Probabilistic Disease Surveillance (PDS) encompasses the systems, methods and research we have created to address uncertainty in the process of detecting and characterizing disease outbreaks.
Automated Surveillance of Overlapping Outbreaks and New Outbreak Diseases
Disease outbreaks due to novel or re-emergent infectious diseases continue to occur at a concerning rate, including previous outbreaks of SARS, virulent strains of influenza, such as H1N1, and COVID-19. In light of this history and likely future occurrence of outbreaks of novel diseases, we created and evaluated new methods for detecting and characterizing outbreaks of novel diseases.
We developed and implemented a computer system that monitors for disease outbreaks using clinical data from patients who visit emergency departments (EDs). The data include patient findings from ED reports and laboratory results.
Our basic approach was to first create computational models of known infectious diseases that cause outbreaks, as well as a model the represents non-outbreak diseases. A computer system, named ILI Tracker, uses those models to monitor for outbreaks of the known diseases. In particular, for each patient case ILI Tracker generates a probability for each of the modeled diseases. A sum of these probabilities over all patients on a given day provides an estimate of the number of cases for each modeled disease. If none of the models predict a subgroup of current patient cases well, it suggests the possibility of the presence of a novel outbreak disease.
We evaluated the ILI Tracker outbreak-detection system using real ED data from a large healthcare system in Allegheny county, Pennsylvania for the years 2012-2021. We evaluated how well the system is able to track the presence of modeled infectious diseases when compared to known disease levels that were determined using laboratory testing. We found that ILI Tracker performed well for more prevalent diseases, such as influenza and respiratory syncytial virus (RSV). We also evaluated ILI Tracker’s ability to detect novel diseases. The results provide support that it was able to achieve early detection of a historical outbreak of Enterovirus D68 in 2014 and the outbreak of COVID-19 in early 2020 in Allegheny County. Further testing with simulated outbreaks of novel diseases was also positive. The outbreak-detection system is freely available and can be readily applied to monitor for regional outbreaks of disease from clinical data.
People
John M. Aronis, PhD
Research Scientist
Department of Biomedical Informatics
University of Pittsburgh
Gregory F. Cooper, MD, PhD
Professor of Biomedical Informatics
Department of Biomedical Informatics
University of Pittsburgh
Jessi Espino, MD, MS
Senior Research Scientist
Department of Biomedical Informatics
University of Pittsburgh
Harry Hochheiser, PhD
Associate Professor of Biomedical Informatics
Department of Biomedical Informatics
University of Pittsburgh
Marian G. Michaels, MD, MPH
Professor of Pediatrics and Surgery
Department of Pediatrics
University of Pittsburgh School of Medicine
Ye Ye, MBBS, MS, MSPH , PhD
Associate Professor
School of Public Health and Emergency Management
Southern University of Science and Technology
Shenzhen, China
Software and Models
Influenza Like Illness (ILI) Tracker
The tracker takes output from the case detection system (CDS) and outputs expected counts and the probability that a novel disease is circulating in the population. Source code and documentation for the ILI Tracker software is available from github.com/rodslaboratory/pds
Docker Container

Those familiar with Docker can use our Docker build file to build a container that functions as PDS system. The pipeline in this container includes CDS, the ILI tracker, Metamap Lite and programs that link the components together. Source code, build files and documentation for the Docker build is available from github.com/rodslaboratory/pds-docker
Models
As a result of our work on this project we trained 36 different models covering multiple years from 2012-2020 that include influenza, respiratory syncytial virus, adenovirus, parainfluenza, human metapneumovirus, enterovirus and Covid-19. The input for these models are up to 696 discrete valued variables derived from medical records and the output of each model is the presence or absence of a disease. In order to utilize these models, one can use the Weka desktop software or Java. A Java program demonstrating usage is in the CDS Github project. These models are available at https://github.com/RodsLaboratory/CDS/tree/main/models
Third Party Software and Libraries
We use Weka, Metamap Lite and the UMLS Metathesaurus in the pipeline. Weka is licensed under the GNU GPL. Metamap Lite is courtesy of the U.S. National Library of Medicine and licensed under the BSD. The UMLS Metathesaurus License requires you to respect the copyrights of the constituent vocabularies and to file a brief annual report on your use of the UMLS.
Software License and Attribution
The software and models for this work is licensed under the MIT license. The license only requires that the copyright notice and this MIT permission notice be included in all copies or substantial portions of the Software.
Copyright 2025 University of Pittsburgh
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Publications
Aronis, JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian system to detect and track outbreaks of influenza-like illnesses including novel diseases: Algorithm development and validation. JMIR Public Health Surveillance, 10 (2024) PMID: 38805611 PMCID: PMC11350309 DOI: 10.2196/57349.
Available at: https://doi.org/10.2196/57349
Aronis JM, Ye Y, Espino J, Michaels MG, Hochheiser H, Cooper GF. An evaluation of a Bayesian method to track outbreaks of known and novel influenza-like illnesses (under review) PMID: 40909849 PMCID: PMC12407607 DOI: 10.1101/2025.08.22.25334257.
Available at: https://medrxiv.org/cgi/content/short/2025.08.22.25334257v1
Sun Y, Gao Y, Bao R, Cooper GF, Espino J, Hochheiser H, Michaels MG, Aronis JM, Song C, Ye Y. Online transfer learning for RSV case detection. In: IEEE International Conference on Healthcare Informatics (ICHI) (2024) 512-521. PMID: 40391117 PMCID: PMC12086431 DOI: 10.1109/ichi61247.2024.00074. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC12086431/
Gao Y, Bao R, Ji Y, Sun Y, Song C, Ferraro JP, Ye Y. Transfer Learning with clinical concept embeddings from large language models. In: AMIA Summits on Translational Science (2025) PMID: 40502269 PMCID: PMC12150738. Available at: https://pubmed.ncbi.nlm.nih.gov/40502269/
Song C, Gao Y, Bao R, Sun Y, Alicea JT, Ye Y. Probabilistic disease surveillance using large language models. Poster and abstract presented at the AMIA Informatics Summit (2025). Available at: https://amia.secure-platform.com/summit/solicitations/102005/sessiongallery/94376
Talks
Contact
Jessi Espino – juest4 at pitt dot edu
Acknowledgements
This work was supported by NLM grant R01LM013509,
“Automated Surveillance of Overlapping Outbreaks and New Outbreak Diseases”.
Harry Hochheiser and Jessi Espino also received support from:
MIDAS grant U24GM132013 and NIGMS grant R24GM153920
“MIDAS Coordination Center”.
Jessi Espino also received support from CDC grant 5U01IP001184
“Evaluating respiratory virus vaccine effectiveness in a large, diverse healthcare system”.
Ye Ye also received support from NLM grant R00LM013383
“Transfer Learning to Improve the Re-usability of Computable Biomedical Knowledge”.
Marian Michaels also received support from CDC grant U01IP001152
“New Vaccine Surveillance Network”.