March 2024 Selective inference for sparse multitask regression with applications in neuroimaging
Snigdha Panigrahi, Natasha Stewart, Chandra Sripada, Elizaveta Levina
Author Affiliations +
Ann. Appl. Stat. 18(1): 445-467 (March 2024). DOI: 10.1214/23-AOAS1796

Abstract

Multitask learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multitask learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multitask problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant predictors for each task through a sparsity-inducing penalty and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Brain Cognitive Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multitask learning with selective inference can more accurately recover true signals than single-task methods.

Funding Statement

S. Panigrahi’s research is supported in part by NSF grants 1951980 and 2113342.
N. Stewart is supported in part by NSF RTG grant 1646108 and a Rackham Science Award from the University of Michigan.
E. Levina’s research is supported in part by NSF grants 1916222, 2052918, and 2210439 and NIH grant R01MH123458.

Acknowledgments

We would like to thank Ji Zhu and Daniel Kessler for their helpful feedback throughout the project, Qianhua Shan for guidance in accessing the ABCD dataset, Aman Taxali for generating the cartographic maps and helping to run code in parallel on the University of Michigan’s high-performance computing cluster, Michael Angstadt for creating the NDA study associated with this paper, and Tian Xie and Qiang Chen for testing different simulations as part of their undergraduate research project.

Data used in the preparation of this article were obtained from the Adolescent Brain Cognitive Development (ABCD) Study (https://abcdstudy.org), held in the NIMH Data Archive (NDA). This is a multisite, longitudinal study designed to recruit more than 10,000 children, aged 9–10, and follow them over 10 years into early adulthood. The ABCD Study is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041022, U01DA041028, U01DA041048, U01DA041089, U01DA041106, U01DA041117, U01DA041120, U01DA041134, U01DA041148, U01DA041156, U01DA041174, U24DA041123, and U24DA041147. A full list of supporters is available at https://abcdstudy.org/nih-collaborators. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in analysis or writing of this report. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators. The ABCD data repository grows and changes over time. The ABCD data used in this report came from NDA Study 721, 10.15154/1504041, which can be found at https://nda.nih.gov/study.html?id=721. The specific NDA study associated with this report is NDA Study 1689, 10.15154/1527789.

Citation

Download Citation

Snigdha Panigrahi. Natasha Stewart. Chandra Sripada. Elizaveta Levina. "Selective inference for sparse multitask regression with applications in neuroimaging." Ann. Appl. Stat. 18 (1) 445 - 467, March 2024. https://doi.org/10.1214/23-AOAS1796

Information

Received: 1 June 2022; Revised: 1 April 2023; Published: March 2024
First available in Project Euclid: 31 January 2024

Digital Object Identifier: 10.1214/23-AOAS1796

Keywords: fMRI data , joint sparsity , multilevel lasso , multitask learning , neuroimaging , postselection inference , selective inference

Rights: Copyright © 2024 Institute of Mathematical Statistics

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.18 • No. 1 • March 2024
Back to Top