March 2024 Bayesian multiple instance classification based on hierarchical probit regression
Danyi Xiong, Seongoh Park, Johan Lim, Tao Wang, Xinlei Wang
Author Affiliations +
Ann. Appl. Stat. 18(1): 80-99 (March 2024). DOI: 10.1214/23-AOAS1780


In multiple instance learning (MIL), the response variable is predicted by features (or covariates) of one or more instances, which are collectively denoted as a bag. Learning the relationship between bags and instances is challenging because of the unknown and possibly complicated data generating mechanism regarding how instances contribute to the bag label. MIL has been applied to solve a variety of real-world problems, which have been mostly focused on supervised tasks, such as molecule activity prediction, protein binding affinities prediction, object detection, and computer-aided diagnosis. However, to date, the majority of the off-the-shelf MIL methods are developed in the computer science domain, and they focus on improving the prediction performance while spending little effort on explainability of the algorithm. In this article a Bayesian multiple instance learning model, based on probit regression (MICProB), is proposed, which contributes a significant portion to the suite of statistical methodologies for MIL. MICProB is composed of two nested probit regression models, where the inner model is estimated for predicting primary instances, which are considered as the “important” ones that determine the bag label, and the outer model is for predicting bag-level responses based on the primary instances estimated by the inner model. The posterior distribution of MICProB can be conveniently approximated using a Gibbs sampler, and the prediction for new bags can be performed in a fully integrated Bayesian way. We evaluate the performance of MICProB against 15 benchmark methods and demonstrate its competitiveness in simulation and real-data examples. In addition to its capability of identifying primary instances, as compared to existing optimization-based approaches, MICProB also enjoys great advantages in providing a transparent model structure, straightforward statistical inference of quantities related to model parameters, and favorable interpretability of covariate effects on the bag-level response.

Funding Statement

This work was supported by NIH grants R01CA258584 (PIs: T. Wang and X. Wang), R15GM131390 (PI: X. Wang), Cancer Prevention and Research Institute of Texas (CPRIT) grant RP190208 (PI: T. Wang; subcontract PI: X. Wang), and National Research Foundation of Korea (NRF) grant 2021R1G1A1005641 funded by the Korean Ministry of Science and ICT (PI: S. Park).


The corresponding author Dr. Xinlei Wang acknowledges that this work was partially done at Southern Methodist University. Dr. Xinlei Wang is also affiliated with Center for Data Science Research and Education, College of Science, University of Texas at Arlington. In loving memory of Ze Zhang, who was a great friend and colleague, all authors would like to express their deepest gratitude for her kindness and support.


Download Citation

Danyi Xiong. Seongoh Park. Johan Lim. Tao Wang. Xinlei Wang. "Bayesian multiple instance classification based on hierarchical probit regression." Ann. Appl. Stat. 18 (1) 80 - 99, March 2024.


Received: 1 November 2021; Revised: 1 December 2022; Published: March 2024
First available in Project Euclid: 31 January 2024

Digital Object Identifier: 10.1214/23-AOAS1780

Keywords: Bayesian inference , Binary classification , Gibbs sampling , primary instance , Weakly supervised learning

Rights: Copyright © 2024 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.18 • No. 1 • March 2024
Back to Top