September 2022 Co-clustering of multivariate functional data for the analysis of air pollution in the South of France
Charles Bouveyron, Julien Jacques, Amandine Schmutz, Fanny Simões, Silvia Bottini
Author Affiliations +
Ann. Appl. Stat. 16(3): 1400-1422 (September 2022). DOI: 10.1214/21-AOAS1547

Abstract

Nowadays, air pollution is a major threat for public health with clear relationships with many diseases, especially cardiovascular ones. The spatiotemporal study of pollution is of great interest for governments and local authorities when deciding for public alerts or new city policies against pollution increase. The aim of this work is to study spatiotemporal profiles of environmental data collected in the south of France (Région Sud) by the public agency AtmoSud. The idea is to better understand the exposition to pollutants of inhabitants on a large territory with important differences in term of geography and urbanism. The data gather the recording of daily measurements of five environmental variables, namely, three pollutants (PM10, NO2, O3) and two meteorological factors (pressure and temperature) over six years. Those data can be seen as multivariate functional data: quantitative entities evolving along time for which there is a growing need of methods to summarize and understand them. For this purpose a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each co-cluster a probabilistic distribution for multivariate functional principal component scores. A stochastic EM algorithm, embedding a Gibbs sampler, is proposed for model inference as well as a model selection criteria for choosing the number of co-clusters. The application of the proposed co-clustering algorithm on environmental data of the Région Sud allowed to divide the region, composed by 357 zones, into six macroareas with common exposure to pollution. We showed that pollution profiles vary accordingly to the seasons, and the patterns are similar during the six years studied. These results can be used by local authorities to develop specific programs to reduce pollution at the macroarea level and to identify specific periods of the year with high pollution peaks in order to set up specific health prevention programs. Overall, the proposed co-clustering approach is a powerful resource to analyse multivariate functional data in order to identify intrinsic data structure and to summarize variables profiles over long periods of time.

Funding Statement

This research has benefited from the support of the “FMJH Research Initiative Data Science for Industry.” This work has also been supported by the French government, through the 3IA Côte d’Azur and UCAJEDI Investments in the Future project managed by the National Research Agency (ANR) with the reference numbers ANR-19-P3IA-0002 and ANR-15-IDEX-01.

Acknowledgments

The authors would like to extend special thanks the AtmoSud institute (http://atmosud.org) for providing the data.

Citation

Download Citation

Charles Bouveyron. Julien Jacques. Amandine Schmutz. Fanny Simões. Silvia Bottini. "Co-clustering of multivariate functional data for the analysis of air pollution in the South of France." Ann. Appl. Stat. 16 (3) 1400 - 1422, September 2022. https://doi.org/10.1214/21-AOAS1547

Information

Received: 1 October 2020; Revised: 1 July 2021; Published: September 2022
First available in Project Euclid: 19 July 2022

MathSciNet: MR4455886
zbMATH: 1498.62282
Digital Object Identifier: 10.1214/21-AOAS1547

Keywords: co-clustering , Latent Block Model , multivariate functional data , Pollution , SEM-Gibbs algorithm

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.16 • No. 3 • September 2022
Back to Top