The Annals of Applied Statistics

Clustering Chlorophyll-a satellite data using quantiles

Carlo Gaetan, Paolo Girardi, Roberto Pastres, and Antoine Mangin

Full-text: Open access


The use of water quality indicators is of crucial importance to identify risks to the environment, society and human health. In particular, the Chlorophyll type A (Chl-a) is a shared indicator of trophic status and for monitoring activities it may be useful to discover local dangerous behaviours (for example, the anoxic events). In this paper we consider a comprehensive data set, covering the whole Adriatic Sea, derived from Ocean Colour satellite data, during the period 2002–2012, with the aim of identifying homogeneous areas. Such zonation is becoming extremely relevant for the implementation of European policies, such the Marine Strategy Framework Directive. As an alternative to clustering based on an “average” value over the whole period, we propose a new clustering procedure for the time series. The procedure shares some similarities with the functional data clustering and combines nonparametric quantile regression with an agglomerative clustering algorithm. This approach permits to take into account some features of the time series as nonstationarity in the marginal distribution and the presence of missing data. A small simulation study is also presented for illustrating the relative merits of the procedure.

Article information

Ann. Appl. Stat., Volume 10, Number 2 (2016), 964-988.

Received: February 2015
Revised: March 2016
First available in Project Euclid: 22 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Functional data clustering quantile sheet nonparametric regression clustering methods surface water classification satellite data


Gaetan, Carlo; Girardi, Paolo; Pastres, Roberto; Mangin, Antoine. Clustering Chlorophyll-a satellite data using quantiles. Ann. Appl. Stat. 10 (2016), no. 2, 964--988. doi:10.1214/16-AOAS923.

Export citation


  • Abraham, C., Cornillon, P. A., Matzner-Løber, E. and Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Stat. 30 581–595.
  • Antoniadis, A., Brossat, X., Cugliari, J. and Poggi, J.-M. (2013). Clustering functional data using wavelets. Int. J. Wavelets Multiresolut. Inf. Process. 11 1350003, 30.
  • Behrenfeld, M. J. and Falkowski, P. G. (1997). Photosynthetic rates derived from satellite-based chlorophyll concentration. Limnology and Oceanography 42 1–20.
  • Bondell, H. D., Reich, B. J. and Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika 97 825–838.
  • Campbell, J. W. (1995). The lognormal distribution as a model for bio-optical variability in the sea. Journal of Geophysical Research: Oceans 100 13237–13254.
  • Cheng, K. F. (1983). Nonparametric estimators for percentile regression functions. Comm. Statist. Theory Methods 12 681–692.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
  • D’Ortenzio, F. and Ribera d’Alcalà, M. (2009). On the trophic regimes of the Mediterranean Sea: A satellite analysis. Biogeosciences 6 139–148.
  • Djakovac, T., Degobbis, D., Supić, N. and Precali, R. (2012). Marked reduction of eutrophication pressure in the northeastern Adriatic in the period 2000–2009. Estuarine, Coastal and Shelf Science 115 25–32.
  • Eilers, P. H. C., Currie, I. D. and Durbán, M. (2006). Fast and compact smoothing on large multidimensional grids. Comput. Statist. Data Anal. 50 61–76.
  • Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with $B$-splines and penalties. Statist. Sci. 11 89–121.
  • Eilers, P. H. C., Gampe, J., Marx, B. D. and Rau, R. (2008). Modulation models for seasonal time series and incidence tables. Stat. Med. 27 3430–3441.
  • Frühwirth-Schnatter, S. and Kaufmann, S. (2008). Model-based clustering of multiple time series. J. Bus. Econom. Statist. 26 78–89.
  • Giani, M., Djakovac, T., Degobbis, D., Cozzi, S., Solidoro, C. and Umani, S. F. (2012). Recent changes in the marine ecosystems of the northern Adriatic Sea. Estuarine, Coastal and Shelf Science 115 1–13.
  • Giraldo, R., Delicado, P. and Mateu, J. (2012). Hierarchical clustering of spatially correlated functional data. Stat. Neerl. 66 403–421.
  • Haggarty, R. A., Miller, C. A. and Scott, E. M. (2015). Spatially weighted functional clustering of river network data. J. R. Stat. Soc. Ser. C. Appl. Stat. 64 491–506.
  • Haggarty, R. A., Miller, C. A., Scott, E. M., Wyllie, F. and Smith, M. (2012). Functional clustering of water quality data in Scotland. Environmetrics 23 685–695.
  • He, X. (1997). Quantile curves without crossing. Amer. Statist. 51 186–192.
  • Henderson, B. (2006). Exploring between site differences in water quality trends: A functional data analysis approach. Environmetrics 17 65–80.
  • Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.
  • Hunter, D. R. and Lange, K. (2000). Quantile regression via an MM algorithm. J. Comput. Graph. Statist. 9 60–77.
  • Huot, Y., Babin, M., Bruyant, F., Grob, C., Twardowski, M. S. and Claustre, H. (2007). Does chlorophyll a provide the best index of phytoplankton biomass for primary productivity studies? Biogeosciences Discussions 4 707–745.
  • Jacques, J. and Preda, C. (2014). Functional data clustering: A survey. Adv. Data Anal. Classif. 8 231–255.
  • James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397–408.
  • Jiang, H. and Serban, N. (2012). Clustering random curves under spatial interdependence with application to service accessibility. Technometrics 54 108–119.
  • Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
  • Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673–680.
  • Liao, T. W. (2005). Clustering of time series data—A survey. Pattern Recognition 38 1857–1874.
  • Marini, M., Grilli, F., Guarnieri, A., Jones, B. H., Klajic, Z., Pinardi, N. and Sanxhaku, M. (2010). Is the southeastern Adriatic Sea coastal strip an eutrophic area? Estuarine, Coastal and Shelf Science 88 395–406.
  • Maritorena, S., d’Andon, O. H. F., Mangin, A. and Siegel, D. A. (2010). Merged satellite ocean color data products using a bio-optical model: Characteristics, benefits and issues. Remote Sensing of Environment 114 1791–1804.
  • Mélin, F., Vantrepotte, V., Clerici, M., D’Alimonte, D., Zibordi, G., Berthon, J.-F. and Canuti, E. (2011). Multi-sensor satellite time series of optical properties and chlorophyll-a concentration in the Adriatic Sea. Progress in Oceanography 91 229–244.
  • Nieto-Barajas, L. E. and Contreras-Cristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Anal. 9 147–169.
  • Pastres, R., Pastore, A. and Tonellato, S. F. (2011). Looking for similar patterns among monitoring stations. Venice Lagoon application. Environmetrics 22 712–724.
  • Petitjean, F., Inglada, J. and Gançarski, P. (2012). Satellite image time series analysis under time warping. Geoscience and Remote Sensing, IEEE Transactions on 50 3081–3095.
  • Piccolo, D. (1990). A distance measure for classifying ARMA models. J. Time Series Anal. 2 153–163.
  • R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Ramos, E., Juanes, J. A., Galván, C., Neto, J. M., Melo, R., Pedersen, A., Scanlan, C., Wilkes, R., van den Bergh, E., Blomqvist, M., Karup, H. P., Heiber, W., Reitsma, J. M., Ximenes, M. C., Silió, A., Méndez, F. and González, B. (2012). Coastal waters classification based on physical attributes along the NE Atlantic region. An approach for rocky macroalgae potential distribution. Estuarine, Coastal and Shelf Science 112 105–114.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Reich, B. J. (2012). Spatiotemporal quantile regression for detecting distributional changes in environmental processes. J. R. Stat. Soc. Ser. C. Appl. Stat. 61 535–553.
  • Schlossmacher, E. J. (1973). An iterative technique for absolute deviations curve fitting. J. Amer. Statist. Assoc. 68 857–859.
  • Schnabel, S. K. and Eilers, P. H. C. (2013). Simultaneous estimation of quantile curves using quantile sheets. AStA Adv. Stat. Anal. 97 77–87.
  • Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 411–423.
  • Wang, X., Smith, K. and Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 13 335–364.
  • Yoder, J. A., McClain, C. R., Feldman, G. C. and Esaias, W. E. (1993). Annual cycles of phytoplankton chlorophyll concentrations in the global ocean: A satellite view. Global Biogeochemical Cycles 7 181–193.
  • Yuan, M. (2006). GACV for quantile smoothing splines. Comput. Statist. Data Anal. 50 813–829.