User:Shawndouglas/sandbox/sublevel6

From LIMSWiki
Jump to navigationJump to search

Sandbox begins below

Full article title A new numerical method for processing longitudinal data: Clinical applications
Journal Epidemiology Biostatistics and Public Health
Author(s) Stura, Ilaria; Perracchione, Emma; Migliaretti, Giuseppe; Cavallo, Franco
Author affiliation(s) Università di Torino, Università di Padova
Primary contact Email: Ilaria dot stura at unito dot it
Year published 2018
Volume and issue 15(2)
Page(s) e12881
DOI 10.2427/12881
ISSN 2282-0930
Distribution license Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Website https://ebph.it/index.php/ebph/article/view/12881
Download https://ebph.it/article/view/12881/11630 (PDF)

Abstract

Background: Processing longitudinal data is a computational issue that arises in many applications, such as in aircraft design, medicine, optimal control, and weather forecasting. Given some longitudinal data, i.e., scattered measurements, the aim consists in approximating the parameters involved in the dynamics of the considered process. For this problem, a large variety of well-known methods have already been developed.

Results: Here, we propose an alternative approach to be used as an effective and accurate tool for the parameters fitting and prediction of individual trajectories from sparse longitudinal data. In particular, our mixed model, that uses radial basis functions (RBFs) combined with stochastic optimization algorithms (SOMs), is here presented and tested on clinical data. Further, we also carry out comparisons with other methods that are widely used in this framework.

Conclusion: The main advantages of the proposed method are the flexibility with respect to the datasets, meaning that it is effective also for truly irregularly distributed data, and its ability to extract reliable information on the evolution of the dynamics.

Keywords: statistical method, radial basis function; stochastic optimization algorithm, longitudinal data

Introduction

Longitudinal data are often the object of study in many fields, e.g., sociology, meteorology, and medicine. In medicine, repeated measurements are used to monitor patients’ behaviors and also to adjust therapies accordingly. However, many problems occur when these data are analyzed. Indeed, each time series could have a different number of observations and not be equally spaced. In addition, the sampling period could vary from patient to patient, and measurement errors and also missing data often occur. Thus, since in these cases common methods such as linear regression usually fail, the recent research is directed towards more robust statistical methods. For instance, longitudinal data are commonly analyzed using parametric models such as Bayesian ones[1], as well as functional data analysis (FDA).[2][3] In both cases, many data are required in order to model the behavior of the studied variable(s). These methods, in fact, try to find an "average curve" using all the data, including truncated series and observations with missing information.

However, in clinical applications the estimate on the future dynamics of a single series, given few previous values, could be needed; think for instance to tumor volumes during a treatment, height/weight of children during growth, and concentration of some substance in the body. Each patient is different and could have different growth behavior and different growth parameters, so an "average curve" could not be sufficient. An important piece of information could be, for example, the possible future development of the subject, given his/her previous growth and the clinical background (e.g., treatments). These data could be compared with the real dynamics, in order to see if the response of the patients to the treatment is stable (the parameters do not vary in the future) or not (change in the parameters).

The aim of our work is to propose our numerical tool that can provide information on the future dynamics given few follow-up data. Thus, we first model longitudinal data via widely used mathematical models in population dynamics. As such, on one hand we aim at validating such a model by approximating the parameters involved in the dynamics. On the other one, we are also interested in giving reliable information on the future dynamics of the curves.

In order to achieve our goal, we propose our numerical tool based on optimization methods coupled with interpolation techniques. Specifically, we approximate the parameters involved in the dynamics by means of stochastic optimization algorithms (SOMs).[4][5][6][7] Moreover, for each data series, we improve the performances of the optimization tools by means of radial basis function (RBF) interpolation; see Fasshauer and McCourt[8] for a general overview and Cavoretto et al.[9][10] for particular instances on the topic and applications. In the interpolation process, we also take into account the critical computational issue of carrying out stable computations. For this reason, and since data are subject to noise, we adopt a kind of Tikhonov regularization.[11]

The method, namely RBF-SOM, is here tested on two different datasets:

  • height measurements of children with a diagnosis of growth hormone deficiency (GHD) during treatment, and
  • prostate-specific antigen (PSA) values of prostatectomized patients with a recurrence of prostate cancer.

In the next section of this paper, the RBF-SOM technique is described. Afterwards, the two datasets used for the validation are presented. The "Results" section is devoted to the numerical results and it is divided into two subsections: in the first one, all the data of each series are considered in order to reconstruct the curves and approximate the parameters, while, in the second one, only a few initial data of each series are used to predict the curve behavior. The last two sections offer a discussion and conclusions.

Methods

This section is devoted to describe the method used to fit a given data series and to approximate the parameters involved in the dynamics.

Given several scattered measurements sampled at different times , the basic idea of the RBF-SOM here proposed consists in considering the theoretical function f, depending on the time t and on several parameters λ = (λ1,..., λp), and to approximate such parameters in order to obtain reliable information on the biological or physical phenomenon.


In the proposed examples, we use, as theoretical growth curve f, the so-called Gompertzian function:

,

where f0 is the measurement at time t0 (i.e., the first measurement), λ1 is the growth rate, and λ2 is the carrying capacity, i.e., the maximum value that can be asymptotically achieved by f.

The Gompertzian function is characterized by a fast-growing initial period and by a progressive slowdown, reaching a carrying capacity after a certain time. This curve, depending on the values of the parameters, is able to model a variety of types of growth, from human to cancer cells ones, see [12-16] for details. For this reason, we will use in Section 5 the same function for both datasets. Moreover, its form is particularly suitable in this study because the parameter estimation is not possible via simple methods like Least Square Approximation.

Trivially, the parameters are approximated by finding

.

Note that we need optimization methods that can be used in case of non-linearity of f, as in the considered cases. In particular, we direct our research throughout stochastic methods. They have been designed by considering analogies with natural phenomena. The most popular are evolution strategy and genetic algorithms, both based on competition among individuals. On the opposite, other methods proposed in the last decades mainly focus on cooperation. Among them, particle swarm optimization (PSO), cuckoo search (CS), and ant colony are widely used techniques, based on the mutual interaction and exchange of information between individuals. In particular, here we will consider PSO and CS, briefly described in what follows.

PSO has been firstly introduced by Kennedy (social psychologist) and Eberhart (electrical engineer)[4] and was further developed by other researches.[6][7][12] In order to describe it, let us consider a group of particles or birds which are represented as points in the space. At first, we need to model their way of flying. Then, taking into account that the target of birds consists in looking for the maximum availability of food, i.e., the minimum of the objective function f, we can easily find its minimum.

The main objective consists in simulating the trajectories of the single birds by considering their selfish behavior (which is the ability of a bird of randomly flying away from the flock) and their social behavior (which is the ability of a bird of staying in the group). With these simple considerations, it is possible to simulate the way of moving of a group of birds, taking also into account that particles avoid collisions.

To explain how we can find the minimum of the objective function interpreting the latter as food, let us first suppose that a bird discovers some food. Then, the other birds have two alternatives: get out of the flock and reach the food (selfish behavior) or stay in the flock (social behavior).

If a good trade-off between the two behaviors is allowed, then the flock can reach the minimum. Indeed, if a bird can move towards some food, then other birds can change their directions towards the same place. Acting in this way, the flock gradually changes its direction until the best place, i.e., the minimum, is reached.

As concerns CS, it was developed by Yang[13] and it simulates the behavior of the cuckoo, a bird that does not incubate its eggs but tries to put them in nests of other species. The problem of this conduct is that, in some cases, the egg is removed by the nest’s owner. The cuckoo, then, searches for a nest in which its egg can be "confused" with the others. Therefore, in this algorithm the minimum of the function is the nest in which more cuckoos can put their eggs without being discovered.

As for the PSO, the user needs to give a set of possible initial solutions. They are usually randomly initialized. Indeed, if the initial solutions are chosen so that they are feasible, the stochastic methods do not fail into local minima, and thus the methods are not truly sensitive with respect to the initial conditions. The main difference with respect to the PSO approach is that, at each iteration, a fraction of nests, which are far from the minimum, are abandoned and new ones close to the minimum are built.

Note that both PSO and CS approaches can be performed in order to minimize the target function, but unfortunately the cardinality of the samples in concrete applications is really small. Thus, in order to improve the performance of the optimization methods, we first reconstruct the growth curves by means of an RBF-based interpolation scheme.[8][9][14] In doing so, we also take into account the instability problems arising in applications. An example of RBF reconstruction can be seen in Fig.1a-b (big colored dots).


Fig1 Stura EpidemBiostatPubHealth2018 15-2.png

Figure 1. One unknown parameter: curve reconstruction of a) the height of a GH patient; b) the PSA values of a prostatectomized patient

Moreover, the so-reconstructed function can be used to estimate λl, l > N, i.e., the evolution of the considered quantity.

As an accuracy indicator, we use the following root mean square error:

,

where denotes the number of patients. This indicator represents the standard deviation of the differences between predicted and observed values.

Data

In order to assess strength and weaknesses of the discussed methods, we use two different datasets presented below. Both datasets are real patients’ data. Each patient has a different number of irregularly spaced measurements in a different time interval.

Children data

Different problems can occur during growth. Here we consider pediatric patients with a diagnosis of idiopathic growth hormone deficiency (IGHD), i.e., a low or absent production of GH for unknown causes. These children are treated with rhGH (a synthetic GH) and monitored during growth.

Our dataset is composed by 121 male IGHD patients, treated in “Ospedale Infantile Regina Margherita (OIRM)” in Turin between January 2000 and January 2016.[15][16] Few studies analyze the effect of GH therapy on height, preferring a more indirect approach, where factors influencing the total pubertal and pre-pubertal growth in GH-deficient patients are evaluated and subsequently used to estimate the overall effect at the end of the therapy. Unfortunately, this approach does not quantify the real growth gain in treated patients. Using a non-parametric empirical Bayes approach, our study analyzes the growth response to GH treatment in a homogeneous cohort of 317 patients with pituitary GH deficiency who were enrolled during their pre-pubertal stage in the GH Piedmont Registry (Italy). The measurements are collected every four to six months from the beginning of the therapy (age between 3 and 14 years) to adulthood (18-20 years old). As explained by Gliozzi et al.[17], each period of human growth can be modeled with a Gompertzian law. We therefore use, as a theoretical function, the Gompertz curve explained in the section before.

Note that each series is monotonic, strictly growing, not (or slightly) affected by measurement errors and it has very few missing data. Therefore, we expect robust performances of the RBF-SOM method for this clinical test.

Prostate cancer data

Data released from clinicians about the PSA value (which is a mirror of the mass of the prostate cancer) are needed in order to have a reliable estimate of the cancer’s evolution. After a radical prostatectomy, the PSA turns out to be a good biomarker and could be used to monitor a possible relapse. In fact, only prostate cells produce PSA, and, after surgery, there should be no prostate cells in the body. Hence, the PSA value should be very small, close to zero. If its value is bigger than 0.2 ng/mL, then PSA-producer cells are present, i.e., a relapse (a local or distal metastasis) has occurred.

Here we use a subset of the Eureka1 study collection.[18] Eureka1 is a retrospective study on Italian patients who had a prostatectomy in the last 15 years. Our subset contains follow-up data (PSA values series) of relapsed patients who did not undergo an adjuvant therapy.

In general, cancer growth is very fast and is modeled with an exponential function. However, prostate cancer is a very slow-growing tumor, so Gompertzian[19][20][21] and West[22][23][24] laws are often used. In the following section, we will use the Gompertzian one.

References

  1. Rao, C.R. (1987). "Prediction of Future Observations in Growth Curve Models". Statistical Science 2 (4): 434–47. doi:10.1214/ss/1177013119. 
  2. Ji, H; Müller, H.-G. (2017). "Optimal designs for longitudinal and functional data". Statistical Methodology Series B 79 (3): 859-876. doi:10.1111/rssb.12192. 
  3. Ramsay, J.; Silverman, B.W. (2005). Functional Data Analysis. Springer-Verlag. pp. 428. ISBN 9780387400808. 
  4. 4.0 4.1 Kennedy, J.; Eberhart, R. (1995). "Particle swarm optimization". Proceedings of ICNN'95 - International Conference on Neural Networks 4: 1942–8. doi:10.1109/ICNN.1995.488968. 
  5. Parsopoulos, K.; Vrahatis, M. (2002). "Particle swarm optimization method for constrained optimization problems". In Sincák, P.; Kvasnicka, V.; Vascák, J.; Pospíchal, J.. Intelligent Technologies: from Theory to Applications. Frontiers in Artificial Intelligence and Applications. 76. IOS Press. pp. 214–20. ISBN 9781586032562. 
  6. 6.0 6.1 Pedersen, M.E.H.; Chipperfield, A.J. (2010). "Simplifying Particle Swarm Optimization". Applied Soft Computing 10 (2): 618–28. doi:10.1016/j.asoc.2009.08.029. 
  7. 7.0 7.1 Shi, Y.; Eberhart, R. (1998). "A modified particle swarm optimizer". 1998 IEEE International Conference on Evolutionary Computation Proceedings: 69–73. doi:10.1109/ICEC.1998.699146. 
  8. 8.0 8.1 Fasshauer, G.; McCourt, M. (2015). Kernel-based Approximation Methods using MATLAB. Interdisciplinary Mathematical Sciences. 19. World Scientific. pp. 536. doi:10.1142/9335. ISBN 9789814630139. 
  9. 9.0 9.1 Cavoretto, R.; De Rossi, A.; Perracchione, E. (2018). "Optimal Selection of Local Approximants in RBF-PU Interpolation". Journal of Scientific Computing 74 (1): 1–22. doi:10.1007/s10915-017-0418-7. 
  10. Cavoretto, R.; De Rossi, A.; Qiao, H. (2018). "Topology analysis of global and local RBF transformations for image registration". Mathematics and Computers in Simulation 147 (5): 52–72. doi:10.1016/j.matcom.2017.10.010. 
  11. Cancelliere, R.; Gai, M.; Gallinari, P.; Rubini, L. (2015). "OCReP: An Optimally Conditioned Regularization for pseudoinversion based neural training". Neural Networks 71 (11): 76–87. doi:10.1016/j.neunet.2015.07.015. 
  12. Qasem, S.N.; Shamsuddin, S.M. (2011). "Radial basis function network based on time variant multi-objective particle swarm optimization for medical diseases diagnosis". Applied Soft Computing 11 (1): 1427–38. doi:10.1016/j.asoc.2010.04.014. 
  13. Yang, X.-S.; Deb, S. (2009). "Cuckoo Search via Lévy flights". Proceedings from the 2009 World Congress on Nature & Biologically Inspired Computing: 210-214. doi:10.1109/NABIC.2009.5393690. 
  14. Wendland, H. (2005). Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. 17. Cambridge University Press. ISBN 9781139456654. 
  15. Migliaretti, G.; Ditaranto, S.; Guiot, C. et al. (2018). "Long-term response to recombinant human growth hormone treatment: A new predictive mathematical method". Journal of Endocrinological Investigation 41 (7): 839–48. doi:10.1007/s40618-017-0816-6. PMID 29318462. 
  16. Migliaretti, G.; Berchialla, P.; Borraccino, A. et al. (2012). "A mathematical model in the analysis of the response to growth hormone treatment in pediatric patients with diagnosis of growth hormone deficiency". Journal of Endocrinological Investigation 35 (2): 209–14. PMID 22490990. 
  17. Gliozzi, A.S.; Guiot, C.; Delsanto, P.P.; Iordache, D.A. (2012). "A novel approach to the analysis of human growth". Theoretical Biology and Medical Modeling 9: 17. doi:10.1186/1742-4682-9-17. PMC PMC3439303. PMID 22594680. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439303. 
  18. Gabriele, D.; Porpiglia, F.; Muto, G. et al. (2015). "Eureka-1 database: An epidemiological analysis". Minerva Urologica e Nefrologica 67 (Suppl. 1): 9–15. 
  19. Perracchione, E.; Stura, I. (2016). "An RBF-PSO based approach for modeling prostate cancer". AIP Conference Proceedings 1738 (1): 390008. doi:10.1063/1.4952182. 
  20. Guiot, C.; Degiorgis, P.G.; Delsanto, P.P. et al. (2003). "Does tumor growth follow a "universal law"?". Journal of Theoretical Biology 225 (2): 147–51. PMID 14575649. 
  21. Retsky, M.W.; Swartzendruber, D.E.; Wardwell, R.H.; Bame, P.D. (1990). "Is Gompertzian or exponential kinetics a valid description of individual human cancer growth?". Medical Hypotheses 33 (2): 95–106. PMID 2259298. 
  22. von Bertalanffy, L. (1957). "Quantitative Laws in Metabolism and Growth". The Quarterly Review of Biology 32 (3): 217–31. doi:10.1086/401873. PMID 13485376. 
  23. West, G.B.; Brown, J.H.; Enquist, B.J. (2001). "A general model for ontogenetic growth". Nature 413 (6856): 628–31. doi:10.1038/35098076. PMID 11675785. 
  24. Deisboeck, T.S.; Mansury, Y.; Guiot, C. et al. (2005). "Insights from a novel tumor model: Indications for a quantitative link between tumor growth and invasion". Medical Hypotheses 65 (4): 785–90. doi:10.1016/j.mehy.2005.04.014. PMID 15961253. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. The original article's inline citations are not in numerical order (after citation 11); due to the nature of this wiki, citations are numbered in order automatically, and therefore the numbering differs from the original after citation 11. No other modifications were made in accordance with the "no derivatives" portion of the distribution license.