Statistics Surveys

Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists

John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, and Mariya P. Shiyko

Full-text: Open access


Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.

Article information

Statist. Surv., Volume 13 (2019), 150-180.

Received: November 2018
First available in Project Euclid: 6 November 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62-02: Research exposition (monographs, survey articles)
Secondary: 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 62G08: Nonparametric regression

Distal outcomes functional regression intensive longitudinal data scalar-on-function regression trajectories

Creative Commons Attribution 4.0 International License.


Dziak, John J.; Coffman, Donna L.; Reimherr, Matthew; Petrovich, Justin; Li, Runze; Shiffman, Saul; Shiyko, Mariya P. Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists. Statist. Surv. 13 (2019), 150--180. doi:10.1214/19-SS126.

Export citation


  • [1] Andersen, S. L. and Teicher, M. H. (2008). Stress, sensitive periods and maturational events in adolescent depression. Trends in Neurosciences, 31, 183–191.
  • [2] Andersen, S. L., Tomada, A., Vincow, E. S., Valente, E., Polcari, A., and Teicher, M. H. (2008). Preliminary evidence for sensitive periods in the effect of childhood sexual abuse on regional brain development. Journal of Neuropsychiatry and Clinical Neurosciences, 20, 292–301.
  • [3] Ash, R. B. and Gardner, M. F. (1975). Topics in Stochastic Processes. New York: Academic Press.
  • [4] Ben-Zeev, D., Scherer, E. A., Wang, R., Xie, H., and Campbell, A. T. (2015). Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health. Psychiatric Rehabilitation Journal, 38, 218–226.
  • [5] Borland, R., Yong, H.-H., O’Connor, R. J., Hyland, A., and Thompson, M. E. (2010). The reliability and predictive validity of the Heaviness of Smoking Index and its two components: Findings from the International Tobacco Control Four Country study. Nicotine & Tobacco Research, 12, S45–S50.
  • [6] Braveman, P., Acker, J., Arkin, E., Bussel, J., Wehr, K., and Proctor, D. (2018). Early Childhood Is Critical to Health Equity. Princeton, NJ: Robert Wood Johnson Foundation.
  • [7] Cai, T. T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistican Association, 107, 1201–1216.
  • [8] Cardot, H., Ferraty, F., Mas, A., and Sarda, P. (2003). Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics, 30, 241–225.
  • [9] Chow, S.-M., Witkiewitz, K., Grasman, R. P. P. P., and Maisto, S. A. (2015). The cusp catastrophe model as cross-sectional and longitudinal mixture structural equation models. Psychological Methods, 20, 142–164.
  • [10] Cofta-Woerpel, L., McClure, J. B., Li, Y., Urbauer, D., Cinciripini, P. M., and Wetter, D. W. (2011). Early cessation success or failure among women attempting to quit smoking: Trajectories and volatility of urge and negative mood during the first postcessation week. Journal of Abnormal Psychology, 120, 596–606.
  • [11] Compton, W. M., Jones, C. M., Baldwin, G. T., Harding, F. M., Blanco, C., and Wargo, E. M. (2019). Targeting youth to prevent later substance use disorder: An underutilized response to the US opioid crisis. AJPH Perspectives, 109(S3), S185–S189.
  • [12] Crainiceanu, C., Reiss, P., Goldsmith, J., Huang, L., Huo, L., Scheipl, F. (2014). refund: Regression with Functional Data. R package version 0.1-11. Accessed at
  • [13] Dziak, J. J., Li, R., Tan, X., Shiffman, S., and Shiyko, M. P. (2015). Modeling intensive longitudinal data with mixtures of nonparametric trajectories and time-varying effects. Psychological Methods, 20, 444–469.
  • [14] Dziak, J. J. and Shiyko, M. P. (2016). funreg: Functional Regression for Irregularly Timed Data. R package version 1.2. Accessed at
  • [15] Eilers, P. H. C., and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statistical Science, 11, 89–121.
  • [16] Escabias, M., Aguilera, A. M., and Vanderrama, M. J. (2004). Principal component estimation of functional logistic regression: Discussion of two different approaches. Nonparametric Statistics, 16, 365–384.
  • [17] Fish, J. N., Rice, C. E., Lanza, S. T., and Russell, S. T. (2018). Is young adulthood a critical period for suicidal behavior among sexual minorities? Results from a US national sample. Prevention Science, in press.
  • [18] GBD 2015 Tobacco Collaborators (2017). Smoking prevalence and attributable disease burden in 195 countries and territories, 1990-2015: a systematic analysis from the Global Burden of Disease Study 2015. Lancet, 389, 1885–1906.
  • [19] Goldsmith, J., Bobb, J., Crainiceanu, C. M., Caffo, B., and Reich, D. (2011a). Penalized functional regression. Journal of Computational and Graphical Statistics, 20, 830–851.
  • [20] Goldsmith, J., Crainiceanu, C. M., Caffo, B. S., and Reich, D. S. (2011b). Penalized functional regression analysis of white-matter tract profiles in multiple sclerosis. Neuroimage, 57, 431–439.
  • [21] Goldsmith, J., Huang, L., and Crainiceanu, C. M. (2014). Smooth scalar-on-image regression via spatial Bayesian variable selection. Journal of Computational and Graphical Statistics, 23, 46–64.
  • [22] Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M. W., Swihart, B., Xiao, L., Crainiceanu, C. M., and Reiss, P. T. (2016). refund: Regression with Functional Data. R package version 0.1-16. Accessed
  • [23] Hastie, T., and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society, Series B, 55, 757–796.
  • [24] Heatherton, T. F., Kozlowski, L. T., Frecker, R. C., Rickert, W., and Robinson, J. (1989). Measuring the heaviness of smoking: using self-reported time to the first cigarette of the day and number of cigarettes smoked per day. British Journal of Addiction, 84, 791–800.
  • [25] Hedström, A. K., Olsson, T., Alfredsson, L. (2016). Smoking is a major preventable risk factor for multiple sclerosis. Multiple Sclerosis, 22, 1021–1026.
  • [26] Heinonen, K., Räikkönen, K., Pesonen, A.-K., Kajantie E., Andersson, S., Eriksson, J. G., Niemelä, A., Vartia, T., Peltola, J., and Lano, A. (2008). Prenatal and postnatal growth and cognitive abilities at 56 months of age: A longitudinal study of infants born at term. Pediatrics, 121, e1325–e1333.
  • [27] Hendricks, P. S., Ditre, J. W., Drobes, D. J. and Brandon, T. H. (2006). The early time course of smoking withdrawal effects. Psychopharmacology, 187, 385–396.
  • [28] Hicks, J. L., Althoff, T., Sosic, R., Kuhar, P., Bostjancic, B., King, A. C., Leskovec, J., and Delp, S. L. (2019). Best practices for analyzing large-scale health data from wearables and smartphone apps. npj Digital Medicine, 2, article 45.
  • [29] Ivanescu, A. E., Crainiceanu, C. M., and Checkley, W. (2017). Dynamic child growth prediction: A comparative methods approach. Statistical Modelling, 17(6), 468–493.
  • [30] James, G. (2002). Generalized linear models with functional predictor variables. Journal of the Royal Statistical Society, Series B, 64, 411–432.
  • [31] James, G. M., and Hastie, T. J. (2001). Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, Series B, 63, 533–550.
  • [32] James, G. M., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable. Annals of Statistics, 37, 2083–2108.
  • [33] Kalkhoran, S., Benowitz, N. L., Rigotti, N. A. (2018). Prevention and treatment of tobacco use: JACC health promotion series. Journal of the American College of Cardiology, 72, 1030–1045.
  • [34] Kamarck, T. W., Muldoon, M. F., Shiffman, S. and Sutton-Tyrrell, K. (2007). Experiences of demand and control during daily life are predictors of carotid atherosclerotic progression among healthy men. Health Psychology, 26, 324–332.
  • [35] Kaye, A. P., Kwan, A. C., Ressler, K. J., and Krystal, J. H. (2019). A computational model for learning from repeated trauma. bioRxiv,
  • [36] Khoury, J., Gonzalez, A., Levitan, R. D., Pruessner, J. C., Chopra, K., Santo Basile, V., Masellis, M., Goodwill, A., and Atkinson, L. (2015). Summary cortisol reactivity indicators: Interrelations and meaning. Neurobiology of Stress, 2, 34–43.
  • [37] Knudsen, E. I. (2004). Sensitive periods in the development of the brain and behavior. Journal of Cognitive Neuroscience, 16, 1412–1425.
  • [38] Kong, D., Staicu, A.-M., and Maity, A. (2016). Classical testing in functional linear models. Journal of Nonparametric Statistics, 28, 813–838.
  • [39] Kozlowski, L. T., Porter, C. Q., Orleans, C. T., Pope, M. A., and Heatherton, T. (1994). Predicting smoking cessation with self-reported measures of nicotine dependence: FTQ, FTND, and HSI. Drug and Alcohol Dependence, 34, 211–216.
  • [40] Kuehl, R. O. (2000). Design of Experiments: Statistical Principles of Research Design and Analysis (2nd ed.). Pacific Grove, CA: Duxbury Thomson.
  • [41] Laber, E. B., and Staicu, A.-M. (2017). Functional feature construction for individualized treatment regimes. Journal of the American Statistical Association, in press.
  • [42] Liang, K.-Y., and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
  • [43] Lindor, K. D., Gershwin, M. E., Poupon, R., Kaplan, M., Bergasa, N. V., Heathcote, E. J. (2009). Primary biliary cirrhosis. Hepatology, 2009, 291–308.
  • [44] Lindquist, M. A., and McKeague, I. W. (2009). Logistic regression with Brownian-like predictors. Journal of the American Statistical Association, 104, 1575–1585.
  • [45] Lupien, S. J., McEwen, B. S., Gunnar, M. R., and Heim, C. (2009). Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nature Reviews Neuroscience, 10, 434–445.
  • [46] N. Maruyama, F. Takahashi, and M. Takeuchi (2009). Prediction of an outcome using trajectories estimated from a linear mixed model. Journal of Biopharmaceutical Statistics, 19, 779–790.
  • [47] McCarthy, D. E., Piasecki, T. M., Fiore, M. C., and Baker, T. B. (2006). Life before and after quitting smoking: an electronic diary study. Journal of Abnormal Psychology, 115, 454–466.
  • [48] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models (2nd ed.). Boca Raton: Chapman and Hall/CRC.
  • [49] McVicar, D., Moschion, J., and Ours, J. C. (2019). Early illicit drug use and the age of onset of homelessness. Journal of the Royal Statistical Society, A, 182, 345–372.
  • [50] Müller, H.-G. and Stadtmüller, U. (2005). Generalized functional linear models. Annals of Statistics, 33, 774–805.
  • [51] National Academies of Sciences, Engineering, and Medicine (2019). Vibrant and Healthy Kids: Aligning Science, Practice, and Policy to Advance Health Equity. Washington, DC: The National Academies Press.
  • [52] National Institute on Drug Abuse (2003). Preventing Drug Use Among Children and Adolescents: A Research-Based Guide for Parents, Educators, and Community Leaders (2nd ed.). Available online at
  • [53] Neely, K. A., Planetta, P. J., Prodoehl, J., Corcos, D. M., Comella, C. L., Goetz, C. G., Shannon, K. L., and Vaillancourt, D. E. (2013). Force control deficits in individuals with Parkinson’s disease, multiple systems atrophy, and progressive supranuclear palsy. PLOS ONE, 8, e58403.
  • [54] Nguyen, H., and Loughran, T. A. (2018). On the measurement and identification of turning points in criminology. Annual Review of Criminology, 1, 335–358.
  • [55] Njagi, E. J., Rizopoulos, D., Molenberghs, G., Dendale, P., and Willekens, K. (2013). A joint survival-longitudinal modelling approach for the dynamic prediction of rehospitalization in telemonitored chronic heart failure patients. Statistical Modeling, 13, 179–198.
  • [56] Orben, A., and Przybylski, A. K. (2019). The association between adolescent well-being and digital technology use. Nature Human Behavior,
  • [57] Pechtel, P., Lyons-Ruth, K., Anderson, C. M., and Teicher, M. H. (2014). Sensitive periods of amygdala development: The role of maltreatment in preadolescence. Neuroimage, 97, 236–244.
  • [58] Piasecki, T. M., Niaura, R., Shadel, W. G., Abrams, D., Goldstein, M., Fiore, M. C., Baker, T. B. (2000). Smoking withdrawal dynamics in unaided quitters. Journal of Abnormal Psychology, 109, 74–86.
  • [59] Piasecki, T. M. (2006). Relapse to smoking. Clinical Psychology Review, 26, 196–215.
  • [60] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Accessed at
  • [61] Rabe-Hesketh, S., and Skrondal, A. (2008). Generalized linear mixed-effects models. In Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs, G., Longitudinal Data Analysis, pp. 79–106. Boca Raton: Chapman & Hall.
  • [62] Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis. Journal of the Royal Statistical Society, Series B, 53, 539–572.
  • [63] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis (2nd ed.). Springer: New York.
  • [64] Ramsay, J. O., Wickham, H., Graves, S., and Hooker, G. (2014). fda: Functional Data Analysis. R package version 2.4.4. Accessed at
  • [65] Ramsey, F., and Schafer, D. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3nd ed.). Boston: Brooks/Cole.
  • [66] Ratcliffe, S. J., Heller, G. Z. and Leader, L. R. (2002). Functional data analysis with application to periodically stimulated foetal heart rate data. II: Functional logistic regression. Statistics in Medicine, 21, 1115–1127.
  • [67] Reiss, P. T., Goldsmith, J., Shang, H. L., and Ogden, R. T. (2017). Methods for scalar-on-function regression. International Statistical Review, 85, 228–249.
  • [68] Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67, 819–829.
  • [69] Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6, 15–51.
  • [70] Roque, N. A. and Ram, N. (2019). tsfeaturex: An R package for automating time series feature extraction. Journal of Open Source Software,
  • [71] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge: Cambridge University Press.
  • [72] Sang, P., Wang, L., & Cao, J. (2018). Estimation of sparse functional additive models with adaptive group LASSO. Statistica Sinica, in press,
  • [73] Shapiro, J. M., Smith, H., and Schaffner, F. (1979). Serum bilirubin: a prognostic factor in primary biliary cirrhosis. Gut, 20, 138–140.
  • [74] Shiffman, S., Gwaltney, C. J., Balabanis, M. H., Liu, K. S., Paty, J. A., Kassel, J. D., Hickcox, M., and Gnys, M. (2002). Immediate antecedents of cigarette smoking: An analysis from ecological momentary assessment. Journal of Abnormal Psychology, 111, 531–545.
  • [75] Shiffman, S. (2007). Use of more nicotine lozenges leads to better success in quitting smoking. Addiction, 102, 809–814.
  • [76] Shiffman, S. (2009). Ecological momentary assessment (EMA) in studies of substance use. Psychological Assessment, 21, 486–497.
  • [77] Shiffman, S., Engberg, J. B., Paty, J. A., Perz, W. G., Gnys, M., Kassel, J. D., and Hickcox, M. (1997). A day at a time: predicting smoking lapse from daily urge. Journal of Abnormal Psychology, 106, 104–116.
  • [78] Shiffman, S., Hickcox, M., Paty, J. A., Gnys, M., Kassel, J. D., and Richards, T. J. (1996). Progression from a smoking lapse to relapse: prediction from abstinence violation effects, nicotine dependence, and lapse characteristics. Journal of Consulting and Clinical Psychology, 64, 993–1002.
  • [79] Shiffman, S., Paty, J. A., Gnys, M., Kassel, J. A., and Hickcox, M. (1996). First lapses to smoking: Within-subjects analysis of real-time reports. Journal of Consulting and Clinical Psychology, 64, 366–379.
  • [80] Singh, R., Quinn, J. D., Reed, P. M., Keller, K. (2018). Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point. PLoS ONE, 13, e0191768.
  • [81] Sørensen, H., Goldsmith, J., and Sangalli, L. M. (2013). An introduction with medical applications to functional data analysis. Statistics in Medicine, 32, 5222–5240.
  • [82] Shiyko, M. and Lanza, S. T., and Tan, X. and Li, R. and Shiffman, S. (2011). Using the time-varying effect model (TVEM) to examine dynamic associations between negative affect and self confidence on smoking urges: Differences between successful quitters and relapsers. Prevention Science, 13, 288–299.
  • [83] Steidtmann, D., Manber, R., Blasey, C., Markowitz, J. C., Klein, D. N., Rothbaum, B. O., Thase, M. E., Kocsis, J. H., & Arnow, B. A. (2013). Detecting critical decision points in psychotherapy and psychotherapy $+$ medication for chronic depression. Journal of Consulting and Clinical Psychology, 81, 783–792.
  • [84] Stone, A. A. and Shiffman, S. (1994). Ecological momentary assessment (EMA) in behavorial medicine. Annals of Behavioral Medicine, 16, 199–202.
  • [85] Tan, X., Shiyko, M. P., Li, R., Li, Y., and Dierker, L. (2012). A time-varying effect model for intensive longitudinal data. Psychological Methods, 17, 61–77.
  • [86] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
  • [87] Trail, J. B., Collins, L. M., Rivera, D. E., Li, R., Piper, M. E., and Baker, T. B. (2014). Functional data analysis for dynamical system identification of behavioral processes. Psychological Methods, 19, 175–187.
  • [88] van Houwelingen, H. C. (2006). Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics, 34, 70–85.
  • [89] van Zundert, R. M. P., Boogerd, E. A., Vermulst, A. A., and Engels, R. C. (2009). Nicotine withdrawal symptoms following a quit attempt: An ecological momentary assessment study among adolescents. Nicotine and Tobacco Research, 11, 722–729.
  • [90] Vinci, C., Li, L., Wu, C., Lam, C. Y., Guo, L., Correa-Fernández, V., Spears, C. A., Hoover, D. S., Etcheverry, P. E., and Wetter, D. W. (2017). The association of positive emotion and first smoking lapse: An ecological momentary assessment study. Health Psychology, 36, 1038–1046.
  • [91] Walls, T. A., and Schafer, J. L. (2006). Models for Intensive Longitudinal Data. Oxford: Oxford University Press. Wand, M. (2013).
  • [92] Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016). Review of functional data analysis. Annual Review of Statistics and its Application, 3, 257–295.
  • [93] Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). New York: Chapman and Hall/CRC.
  • [94] Worley, M. J.. Heinzerling, K. G., Shoptaw, S., and Ling, W. (2015). Pain volatility and prescription opioid addiction treatment outcomes in patients with chronic pain. Experimental and Clinical Psychopharmacology, 23(6), 428–435.
  • [95] Wrobel, D., Zipunnikov, V., Schrack, J., and Goldsmith, J. (2018). Registration for exponential family functional data. Biometrics, in press.
  • [96] Yen, J. D. L., Thomson, J. R., Paganin, D. M., Keith, J. M., and MacNally, R. (2015). Function regression in ecology and evolution: FREE. Methods in Ecology and Evolution, 6, 17–26.
  • [97] Yuen, H. P., and Mackinnon, A. (2016). Performance of joint modelling of time-to-event data with time-dependent predictors: an assessment based on transition to psychosis data. PeerJ, 4, e2582. eCollection 2016.
  • [98] Zhang, Y., Zhou, J., Niu, F., Donowitz, J. R., Haque, R., Petri, W. A. Jr., and Ma, J. Z. (2017). Characterizing early child growth patterns of height-for-age in an urban slum cohort of Bangladesh with functional principal component analysis. BMC Pediatrics, 17, 84.