Difference between revisions of "User:Shawndouglas/sandbox/sublevel6"

From LIMSWiki
Jump to navigationJump to search
(Replaced content with "<div class="nonumtoc">__TOC__</div> {{ombox | type = notice | style = width: 960px; | text = This is sublevel2 of my sandbox, where I play with features and...")
Line 7: Line 7:


==Sandbox begins below==
==Sandbox begins below==
{{Infobox journal article
|name        =
|image        =
|alt          = <!-- Alternative text for images -->
|caption      =
|title_full  = A new numerical method for processing longitudinal data: Clinical applications
|journal      = ''Epidemiology Biostatistics and Public Health''
|authors      = Stura, Ilaria; Perracchione, Emma; Migliaretti, Giuseppe; Cavallo, Franco
|affiliations = Università di Torino, Università di Padova
|contact      = Email: Ilaria dot stura at unito dot it
|editors      =
|pub_year    = 2018
|vol_iss      = '''15'''(2)
|pages        = e12881
|doi          = [http://10.2427/12881 10.2427/12881]
|issn        = 2282-0930
|license      = [https://creativecommons.org/licenses/by-nc-nd/4.0/ Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International]
|website      = [https://ebph.it/index.php/ebph/article/view/12881 https://ebph.it/index.php/ebph/article/view/12881]
|download    = [https://ebph.it/article/view/12881/11630 https://ebph.it/article/view/12881/11630] (PDF)
}}
{{ombox
| type      = content
| style    =
| text      = This article contains rendered mathematical formulae. You ''may'' require the [https://chrome.google.com/webstore/detail/math-anywhere/gebhifiddmaaeecbaiemfpejghjdjmhc Math Anywhere] plugin for Chrome or the [https://addons.mozilla.org/en-US/firefox/addon/native-mathml/ Native MathML] add-on and [https://developer.mozilla.org/en-US/docs/Mozilla/MathML_Project/Fonts fonts] for Firefox if they don't render properly for you.
}}
{{ombox
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
'''Background''': Processing longitudinal data is a computational issue that arises in many applications, such as in aircraft design, medicine, optimal control, and weather forecasting. Given some longitudinal data, i.e., scattered measurements, the aim consists in approximating the parameters involved in the dynamics of the considered process. For this problem, a large variety of well-known methods have already been developed.
'''Results''': Here, we propose an alternative approach to be used as an effective and accurate tool for the parameters fitting and prediction of individual trajectories from sparse longitudinal data. In particular, our mixed model, that uses radial basis functions (RBFs) combined with stochastic optimization algorithms (SOMs), is here presented and tested on clinical data. Further, we also carry out comparisons with other methods that are widely used in this framework.
'''Conclusion''': The main advantages of the proposed method are the flexibility with respect to the datasets, meaning that it is effective also for truly irregularly distributed data, and its ability to extract reliable [[information]] on the evolution of the dynamics.
'''Keywords''': statistical method, radial basis function; stochastic optimization algorithm, longitudinal data
==Introduction==
Longitudinal data are often the object of study in many fields, e.g., sociology, meteorology, and medicine. In medicine, repeated measurements are used to monitor patients’ behaviors and also to adjust therapies accordingly. However, many problems occur when these data are analyzed. Indeed, each time series could have a different number of observations and not be equally spaced. In addition, the sampling period could vary from patient to patient, and measurement errors and also missing data often occur. Thus, since in these cases common methods such as linear regression usually fail, the recent research is directed towards more robust statistical methods. For instance, longitudinal data are commonly analyzed using parametric models such as Bayesian ones<ref name="RaoPrediction87">{{cite journal |title=Prediction of Future Observations in Growth Curve Models |journal=Statistical Science |author=Rao, C.R. |volume=2 |issue=4 |pages=434–47 |year=1987 |doi=10.1214/ss/1177013119}}</ref>, as well as functional data analysis (FDA).<ref name="JiOptimal17">{{cite journal |title=Optimal designs for longitudinal and functional data |journal=Statistical Methodology Series B |author=Ji, H; Müller, H.-G. |volume=79 |issue=3 |pages=859-876 |year=2017 |doi=10.1111/rssb.12192}}</ref><ref name="RamsayFunctional05">{{cite book |title=Functional Data Analysis |author=Ramsay, J.; Silverman, B.W. |publisher=Springer-Verlag |pages=428 |year=2005 |isbn=9780387400808}}</ref> In both cases, many data are required in order to model the behavior of the studied variable(s). These methods, in fact, try to find an "average curve" using all the data, including truncated series and observations with missing information.
However, in clinical applications the estimate on the future dynamics of a single series, given few previous values, could be needed; think for instance to tumor volumes during a treatment, height/weight of children during growth, and concentration of some substance in the body. Each patient is different and could have different growth behavior and different growth parameters, so an "average curve" could not be sufficient. An important piece of information could be, for example, the possible future development of the subject, given his/her previous growth and the clinical background (e.g., treatments). These data could be compared with the real dynamics, in order to see if the response of the patients to the treatment is stable (the parameters do not vary in the future) or not (change in the parameters).
The aim of our work is to propose our numerical tool that can provide information on the future dynamics given few follow-up data. Thus, we first model longitudinal data via widely used mathematical models in population dynamics. As such, on one hand we aim at validating such a model by approximating the parameters involved in the dynamics. On the other one, we are also interested in giving reliable information on the future dynamics of the curves.
In order to achieve our goal, we propose our numerical tool based on optimization methods coupled with interpolation techniques. Specifically, we approximate the parameters involved in the dynamics by means of stochastic optimization algorithms (SOMs).<ref name="KennedyParticle95">{{cite journal |title=Particle swarm optimization |journal=Proceedings of ICNN'95 - International Conference on Neural Networks |author=Kennedy, J.; Eberhart, R. |volume=4 |pages=1942–8 |year=1995 |doi=10.1109/ICNN.1995.488968}}</ref><ref name="ParsopoulosParticle02">{{cite book |chapter=Particle swarm optimization method for constrained optimization problems |title=Intelligent Technologies: from Theory to Applications  |author=Parsopoulos, K.; Vrahatis, M. |editor=Sincák, P.; Kvasnicka, V.; Vascák, J.; Pospíchal, J. |publisher=IOS Press |volume=76 |series=Frontiers in Artificial Intelligence and Applications |pages=214–20 |year=2002 |isbn=9781586032562}}</ref><ref name="PedersenSimp10">{{cite journal |title=Simplifying Particle Swarm Optimization |journal=Applied Soft Computing |author=Pedersen, M.E.H.; Chipperfield, A.J. |volume=10 |issue=2 |pages=618–28 |year=2010 |doi=10.1016/j.asoc.2009.08.029}}</ref><ref name="ShiAMod98">{{cite journal |title=A modified particle swarm optimizer |journal=1998 IEEE International Conference on Evolutionary Computation Proceedings |author=Shi, Y.; Eberhart, R. |pages=69–73 |year=1998 |doi=10.1109/ICEC.1998.699146}}</ref> Moreover, for each data series, we improve the performances of the optimization tools by means of radial basis function (RBF) interpolation; see Fasshauer and McCourt<ref name="FasshauerKernel15">{{cite book |title=Kernel-based Approximation Methods using MATLAB |author=Fasshauer, G.; McCourt, M. |publisher=World Scientific |series=Interdisciplinary Mathematical Sciences |volume=19 |pages=536 |year=2015 |isbn=9789814630139 |doi=10.1142/9335}}</ref> for a general overview and Cavoretto ''et al.''<ref name="CavorettoOptimal18">{{cite journal |title=Optimal Selection of Local Approximants in RBF-PU Interpolation |journal=Journal of Scientific Computing |author=Cavoretto, R.; De Rossi, A.; Perracchione, E. |volume=74 |issue=1 |pages=1–22 |year=2018 |doi=10.1007/s10915-017-0418-7}}</ref><ref name="CavorettoTopology18">{{cite journal |title=Topology analysis of global and local RBF transformations for image registration |journal=Mathematics and Computers in Simulation |author=Cavoretto, R.; De Rossi, A.; Qiao, H. |volume=147 |issue=5 |pages=52–72 |year=2018 |doi=10.1016/j.matcom.2017.10.010}}</ref> for particular instances on the topic and applications. In the interpolation process, we also take into account the critical computational issue of carrying out stable computations. For this reason, and since data are subject to noise, we adopt a kind of Tikhonov regularization.<ref name="CancelliereOCReP15">{{cite journal |title=OCReP: An Optimally Conditioned Regularization for pseudoinversion based neural training |journal=Neural Networks |author=Cancelliere, R.; Gai, M.; Gallinari, P.; Rubini, L. |volume=71 |issue=11 |pages=76–87 |year=2015 |doi=10.1016/j.neunet.2015.07.015}}</ref>
The method, namely RBF-SOM, is here tested on two different datasets:
* height measurements of children with a diagnosis of growth hormone deficiency (GHD) during treatment, and
* prostate-specific antigen (PSA) values of prostatectomized patients with a recurrence of prostate cancer.
In the next section of this paper, the RBF-SOM technique is described. Afterwards, the two datasets used for the validation are presented. The "Results" section is devoted to the numerical results and it is divided into two subsections: in the first one, all the data of each series are considered in order to reconstruct the curves and approximate the parameters, while, in the second one, only a few initial data of each series are used to predict the curve behavior. The last two sections offer a discussion and conclusions.
==Methods==
This section is devoted to describe the method used to fit a given data series and to approximate the parameters involved in the dynamics.
Given several scattered measurements <math> \{ y_{i} \}_{i=1}^N</math> sampled at different times <math> \{ t_{i} \}_{i=1}^N</math> , the basic idea of the RBF-SOM here proposed consists in considering the theoretical function ''f'', depending on the time ''t'' and on several parameters λ = (λ<sub>1</sub>,..., λ<sub>p</sub>), and to approximate such parameters in order to obtain reliable information on the biological or physical phenomenon.
In the proposed examples, we use, as theoretical growth curve ''f'', the so-called Gompertzian function:
:<math> f(t,\lambda_{1}, \lambda_{2}) = \lambda_{2} \exp\left( -\log\left( \frac{\lambda_{2}}{f_{0}} \right) \exp\left(\lambda_{1} (t-t_{0}) \right) \right)</math>,
where ''f<sub>0</sub>'' is the measurement at time ''t<sub>0</sub>'' (i.e., the first measurement), λ<sub>1</sub> is the growth rate, and λ<sub>2</sub> is the
carrying capacity, i.e., the maximum value that can be asymptotically achieved by ''f''.
The Gompertzian function is characterized by a fast-growing initial period and by a progressive slowdown, reaching a carrying capacity after a certain time. This curve, depending on the values of the parameters, is able to model a variety of types of growth, from human to cancer cells.<ref name="GliozziANovel12">{{cite journal |title=A novel approach to the analysis of human growth |journal=Theoretical Biology and Medical Modeling |author=Gliozzi, A.S.; Guiot, C.; Delsanto, P.P.; Iordache, D.A. |volume=9 |page=17 |year=2012 |doi=10.1186/1742-4682-9-17 |pmid=22594680 |pmc=PMC3439303}}</ref><ref name="GompertzOnThe1825">{{cite journal |title=On the Nature of the Function Expressive of the Law of Human Mortality, and on a New Mode of Determining the Value of Life Contingencies |journal=Philosophical Transactions of the Royal Society of London |author=Gompertz, B. |volume=115 |pages=513–83 |year=1825}}</ref><ref name="SturaATwo16">{{cite journal |title=A two-clones tumor model: Spontaneous growth and response to treatment |journal=Mathematical Biosciences |author=Stura, I.; Venturino, E.; Guiot, C. |volume=271 |issue=1 |pages=19–28 |year=2016 |doi=10.1016/j.mbs.2015.10.014}}</ref><ref name="SturaATwo14">{{cite journal |title=A two population model of cancer growth with fixed capacity |journal=Proceedings of the 2014 6th International Advanced Research Workshop on In Silico Oncology and Cancer Investigation |author=Stura, I.; Gabriele, D.; Guiot, C. |pages=1–4 |year=2014 |doi=10.1109/IARWISOCI.2014.7034636}}</ref><ref name="MigliarettiLong18">{{cite journal |title=Long-term response to recombinant human growth hormone treatment: A new predictive mathematical method |journal=Journal of Endocrinological Investigation |author=Migliaretti, G.; Ditaranto, S.; Guiot, C. et al. |volume=41 |issue=7 |pages=839–48 |year=2018 |doi=10.1007/s40618-017-0816-6 |pmid=29318462}}</ref> For this reason, we will use in Section 5 the same function for both datasets. Moreover, its form is particularly suitable in this study because the parameter estimation is not possible via simple methods like Least Square Approximation.
Trivially, the parameters are approximated by finding
:<math> \tilde{\lambda} = min_\lambda \left( \sum_{k=1, ..., N} \left(y_i - f(t,\lambda_{1}, \lambda_{2}) \right)^2 \right)</math>.
Note that we need optimization methods that can be used in case of non-linearity of ''f'', as in the considered cases. In particular, we direct our research throughout stochastic methods. They have been designed by considering analogies with natural phenomena. The most popular are evolution strategy and genetic algorithms, both based on competition among individuals. On the opposite, other methods proposed in the last decades mainly focus on cooperation. Among them, particle swarm optimization (PSO), cuckoo search (CS), and ant colony are widely used techniques, based on the mutual interaction and exchange of information between individuals. In particular, here we will consider PSO and CS, briefly described in what follows.
PSO has been firstly introduced by Kennedy (social psychologist) and Eberhart (electrical engineer)<ref name="KennedyParticle95" /> and was further developed by other researches.<ref name="PedersenSimp10" /><ref name="ShiAMod98" /><ref name="QasemRadial11">{{cite journal |title=Radial basis function network based on time variant multi-objective particle swarm optimization for medical diseases diagnosis |journal=Applied Soft Computing |author=Qasem, S.N.; Shamsuddin, S.M. |volume=11 |issue=1 |pages=1427–38 |year=2011 |doi=10.1016/j.asoc.2010.04.014}}</ref> In order to describe it, let us consider a group of particles or birds which are represented as points in the space. At first, we need to model their way of flying. Then, taking into account that the target of birds consists in looking for the maximum availability of food, i.e., the minimum of the objective function ''f'', we can easily find its minimum.
The main objective consists in simulating the trajectories of the single birds by considering their selfish behavior (which is the ability of a bird of randomly flying away from the flock) and their social behavior (which is the ability of a bird of staying in the group). With these simple considerations, it is possible to simulate the way of moving of a group of birds, taking also into account that particles avoid collisions.
To explain how we can find the minimum of the objective function interpreting the latter as food, let us first suppose that a bird discovers some food. Then, the other birds have two alternatives: get out of the flock and reach the food (selfish behavior) or stay in the flock (social behavior).
If a good trade-off between the two behaviors is allowed, then the flock can reach the minimum. Indeed, if a bird can move towards some food, then other birds can change their directions towards the same place. Acting in this way, the flock gradually changes its direction until the best place, i.e., the minimum, is reached.
As concerns CS, it was developed by Yang<ref name="YangCuckoo09">{{cite journal |title=Cuckoo Search via Lévy flights |journal=Proceedings from the 2009 World Congress on Nature & Biologically Inspired Computing |author=Yang, X.-S.; Deb, S. |pages=210-214 |year=2009 |doi=10.1109/NABIC.2009.5393690}}</ref> and it simulates the behavior of the cuckoo, a bird that does not incubate its eggs but tries to put them in nests of other species. The problem of this conduct is that, in some cases, the egg is removed by the nest’s owner. The cuckoo, then, searches for a nest in which its egg can be "confused" with the others. Therefore, in this algorithm the minimum of the function is the nest in which more cuckoos can put their eggs without being discovered.
As for the PSO, the user needs to give a set of possible initial solutions. They are usually randomly initialized. Indeed, if the initial solutions are chosen so that they are feasible, the stochastic methods do not fail into local minima, and thus the methods are not truly sensitive with respect to the initial conditions. The main difference with respect to the PSO approach is that, at each iteration, a fraction of nests, which are far from the minimum, are abandoned and new ones close to the minimum are built.
Note that both PSO and CS approaches can be performed in order to minimize the target function, but unfortunately the cardinality of the samples in concrete applications is really small. Thus, in order to improve the performance of the optimization methods, we first reconstruct the growth curves by means of an RBF-based interpolation scheme.<ref name="FasshauerKernel15" /><ref name="CavorettoOptimal18" /><ref name="WendlandScat05">{{cite book |title=Scattered Data Approximation |author=Wendland, H. |publisher=Cambridge University Press |series=Cambridge Monographs on Applied and Computational Mathematics |volume=17 |year=2005 |isbn=9781139456654}}</ref> In doing so, we also take into account the instability problems arising in applications. An example of RBF reconstruction can be seen in Fig.1a-b (big colored dots).
[[File:Fig1 Stura EpidemBiostatPubHealth2018 15-2.png|800px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.'''  One unknown parameter: curve reconstruction of a) the height of a GH patient; b) the PSA values of a prostatectomized patient</blockquote>
|-
|}
|}
Moreover, the so-reconstructed function can be used to estimate λ<sub>l</sub>, l > N, i.e., the evolution of the considered quantity.
As an accuracy indicator, we use the following root mean square error:
<math> \epsilon = \sqrt{ \frac {\sum_{i=1}^m \nolimits \left( f (t, \tilde{\lambda_{1}}, \tilde{\lambda_{2}}) - y_{i} \right)^2 }{m}} </math>,
where denotes the number of patients. This indicator represents the standard deviation of the differences between predicted and observed values.
==Data==
In order to assess strength and weaknesses of the discussed methods, we use two different datasets presented below. Both datasets are real patients’ data. Each patient has a different number of irregularly spaced measurements in a different time interval.
===Children data===
Different problems can occur during growth. Here we consider pediatric patients with a diagnosis of idiopathic growth hormone deficiency (IGHD), i.e., a low or absent production of GH for unknown causes. These children are treated with rhGH (a synthetic GH) and monitored during growth.
Our dataset is composed by 121 male IGHD patients, treated in “Ospedale Infantile Regina Margherita (OIRM)” in Turin between January 2000 and January 2016.<ref name="MigliarettiLong18" /><ref name="MigliarettiAMath12">{{cite journal |title=A mathematical model in the analysis of the response to growth hormone treatment in pediatric patients with diagnosis of growth hormone deficiency |journal=Journal of Endocrinological Investigation |author=Migliaretti, G.; Berchialla, P.; Borraccino, A. et al. |volume=35 |issue=2 |pages=209–14 |year=2012 |pmid=22490990}}</ref> Few studies analyze the effect of GH therapy on height, preferring a more indirect approach, where factors influencing the total pubertal and pre-pubertal growth in GH-deficient patients are evaluated and subsequently used to estimate the overall effect at the end of the therapy. Unfortunately, this approach does not quantify the real growth gain in treated patients. Using a non-parametric empirical Bayes approach, our study analyzes the growth response to GH treatment in a homogeneous cohort of 317 patients with pituitary GH deficiency who were enrolled during their pre-pubertal stage in the GH Piedmont Registry (Italy). The measurements are collected every four to six months from the beginning of the therapy (age between 3 and 14 years) to adulthood (18-20 years old). As explained by Gliozzi ''et al.''<ref name="GliozziANovel12" />, each period of human growth can be modeled with a Gompertzian law. We therefore use, as a theoretical function, the Gompertz curve explained in the section before.
Note that each series is monotonic, strictly growing, not (or slightly) affected by measurement errors and it has very few missing data. Therefore, we expect robust performances of the RBF-SOM method for this clinical test.
===Prostate cancer data===
Data released from clinicians about the PSA value (which is a mirror of the mass of the prostate cancer) are needed in order to have a reliable estimate of the cancer’s evolution. After a radical prostatectomy, the PSA turns out to be a good biomarker and could be used to monitor a possible relapse. In fact, only prostate cells produce PSA, and, after surgery, there should be no prostate cells in the body. Hence, the PSA value should be very small, close to zero. If its value is bigger than 0.2 ng/mL, then PSA-producer cells are present, i.e., a relapse (a local or distal metastasis) has occurred.
Here we use a subset of the Eureka1 study collection.<ref name="GabrieleEureka15">{{cite journal |title=Eureka-1 database: An epidemiological analysis |journal=Minerva Urologica e Nefrologica |author=Gabriele, D.; Porpiglia, F.; Muto, G. et al. |volume=67 |issue=Suppl. 1 |pages=9–15 |year=2015}}</ref> Eureka1 is a retrospective study on Italian patients who had a prostatectomy in the last 15 years. Our subset contains follow-up data (PSA values series) of relapsed patients who did not undergo an adjuvant therapy.
In general, cancer growth is very fast and is modeled with an exponential function. However, prostate cancer is a very slow-growing tumor, so Gompertzian<ref name="PerracchioneAnRFB16">{{cite journal |title=An RBF-PSO based approach for modeling prostate cancer |journal=AIP Conference Proceedings |author=Perracchione, E.; Stura, I. |volume=1738 |issue=1 |page=390008 |year=2016 |doi=10.1063/1.4952182}}</ref><ref name="GuiotDoes03">{{cite journal |title=Does tumor growth follow a "universal law"? |journal=Journal of Theoretical Biology |author=Guiot, C.; Degiorgis, P.G.; Delsanto, P.P. et al. |volume=225 |issue=2 |pages=147–51 |year=2003 |pmid=14575649}}</ref><ref name="RetskyIsGomp90">{{cite journal |title=Is Gompertzian or exponential kinetics a valid description of individual human cancer growth? |journal=Medical Hypotheses |author=Retsky, M.W.; Swartzendruber, D.E.; Wardwell, R.H.; Bame, P.D. |volume=33 |issue=2 |pages=95–106 |year=1990 |pmid=2259298}}</ref> and West<ref name="vonBertalanffyQuant57">{{cite journal |title=Quantitative Laws in Metabolism and Growth |journal=The Quarterly Review of Biology |author=von Bertalanffy, L. |volume=32 |issue=3 |pages=217–31 |year=1957 |doi=10.1086/401873 |pmid=13485376}}</ref><ref name="WestAGeneral01">{{cite journal |title=A general model for ontogenetic growth |journal=Nature |author=West, G.B.; Brown, J.H.; Enquist, B.J. |volume=413 |issue=6856 |pages=628–31 |year=2001 |doi=10.1038/35098076 |pmid=11675785}}</ref><ref name="DeisboeckInsights05">{{cite journal |title=Insights from a novel tumor model: Indications for a quantitative link between tumor growth and invasion |journal=Medical Hypotheses |author=Deisboeck, T.S.; Mansury, Y.; Guiot, C. et al. |volume=65 |issue=4 |pages=785–90 |year=2005 |doi=10.1016/j.mehy.2005.04.014 |pmid=15961253}}</ref> laws are often used. In the following section, we will use the Gompertzian one.
Note that these series are not monotonic: PSA values can grow, be stable for months, or decrease. PSA values should be sampled each four to six months, while in some series only one or two values are reported in one to two years. The PSA value strictly depends on the accuracy of instruments used in the [[laboratory]]: some machines have a precision of 0.1 ng/ mL, while others of 0.01 ng/mL. Moreover, the precision could change from value to value in the same series. Therefore, for the RBF-SOM method, this dataset is more challenging than the previous one.
==Results==
===Curve reconstruction===
The tests carried out in this subsection are devoted to assess the robustness of the RBF-SOM method as a descriptive tool. Indeed, we consider each growth curve and we approximate the two parameters involved in the dynamics, i.e., the growth rate and carrying capacity. We remark that, at first, we reconstruct the curve by means of RBF interpolants and we then apply PSO or CS methods. The aim of these experiments is to obtain feedback about the accuracy of the proposed approach in approximating the parameters.
Figure 1 (previous) shows two examples of the methods in the case of one unknown parameter. In particular, Fig.1a shows the reconstruction of a growth curve of one child (GH Database), while Fig.1b shows the reconstruction of a PSA growth curve. The circles represent real data, the big colored dots are the RBF interpolation, and the straight and dotted lines are the PSO and CS reconstructions respectively.
For both datasets, we consider as theoretical growth curve the Gompertzian function. Note that, differently from the growth rate, the carrying capacity (the maximum value that can be achieved asymptotically) is supposed to be fixed and known for all patients in both datasets, i.e., only one parameter should be estimated. In what follows, we test the methods with both one and two unknown parameters. As expected, when only the growth rate needs to be estimated, we obtain better results.
More precisely, as concerns only one unknown parameter λ<sub>1</sub> (while λ<sub>2</sub> is fixed as 200 cm in GH and 200 ng/mL in PSA database) both RBF-PSO and RBF-CS are truly performing and show the same accuracy (<math>\epsilon</math> =1.2 cm in GH and <math>\epsilon</math> =0.9 ng/mL in PSA database). Results are shown in Table 1. This means that data follow the theoretical distribution predicted by the Gompertzian and that the parameter is accurately estimated.
As concerns two unknown parameters, RBF-CS is more effective than RBF-PSO. In fact, for the former the <math>\epsilon</math> =0.80 cm and <math>\epsilon</math> =0.89 ng/mL in GH and PSA database, respectively, while for the latter the <math>\epsilon</math> =2.66 cm and <math>\epsilon</math> =0.91 ng/mL. Results are shown in Table 1. Note that this happens also enlarging the basin of possible solutions in the CS algorithm.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="100%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 1.'''  Summary of the accuracy of the methods on test data
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Model
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Number of unknown parameters
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Database
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|RBF-PSO (<math>\epsilon</math>)
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|RBF-CS (<math>\epsilon</math>)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="4"|Curve reconstruction
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1
  | style="background-color:white; padding-left:10px; padding-right:10px;"|GH
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1.2 cm
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1.2 cm
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PSA
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.9 ng/mL
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.9 ng/mL
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2
  | style="background-color:white; padding-left:10px; padding-right:10px;"|GH
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2.66 cm
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.80 cm
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PSA
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.91 ng/mL
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.89 ng/mL
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|Dynamics evolution
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1
  | style="background-color:white; padding-left:10px; padding-right:10px;"|GH
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.01 cm
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.01 cm
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PSA
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.20 ng/mL
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.20 ng/mL
|-
|}
|}
===Dynamics evolution===
This subsection is addressed to test the method for modelling the evolution of the dynamics. Thus, we apply the RBF-SOM using only the first four values of each series. We select the first four values for several reasons. First of all, we do it because of data availability: indeed, only a few patients (in both databases) have more than six values in their series. Then, selecting four data points in the databases means that they represent about two years of follow-up, which is a reasonable period to understand the future dynamics.
Again, by fixing the carrying capacity, both methods give the same results (<math>\epsilon</math> =3.01 cm in GH and <math>\epsilon</math> =3.20 ng/mL in PSA database). Results are shown in Table 1 (previous). Of course, the error is larger than the one shown in the previous subsection. However, since here we only take four values, such error indicates a meaningful accuracy on the estimation, especially as concerns the GH case. In fact, it means that, after four measurements, we are able to estimate the final height with a precision of about 3 cm. Looking at the other dataset, the error seems truly large, with <math>\epsilon</math> =3.20 ng/mL. But, the majority (73.46%) of estimates differ less than 1 ng/mL to the real value. Unfortunately, the irregularity of the measurements and the errors due to the machine precision might cause problems for several series. Indeed, in a few cases (6.12%), results differ more than 3 ng/mL from the real value.
Figure 2 shows some examples of the output of the method, in case of one unknown parameter: the circles are the real data (heights or PSA during visits), dots are the RBF interpolation on the first four values, and PSO and CS provisions are the straight and dotted lines. Fig.2a-b concern the GH database, while Fig.2c-d the PSA database. In particular, Fig.2a shows only one growth period, while in Fig.2b there is the combination of two growth periods. Note in Fig.2d that the estimation error grows during time because of the irregularity of the data.
[[File:Fig2 Stura EpidemBiostatPubHealth2018 15-2.png|800px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' One unknown parameter: dynamics evolution of a-b) height of GH patients; c-d) PSA values of prostatectomized patients</blockquote>
|-
|}
|}
As for the two parameters (carrying capacity and growth rate), the estimate on the final height cannot be performed with only four values. Indeed, it is well known that there are no optimization tools robust enough to accurately approximate two parameters given only four samples. However, we remark once more that more measurements can be added to the model and in this sense, after also approximating the carrying capacity, the estimate on the final value can be sensibly improved.
==Discussion==
From the numerical experiments, we note that both RBF-CS and RBF-PSO provide reliable approximations, even if CS seems to be less affected by the quality of input data.
The novelty of the presented RBF-SOM method is the type of output. Indeed, it provides a (continuous) growth curve, allowing the analysis of each growth function independently of the others. This can be an advantage for doctors: for example, they can understand if the treatment is effective and, if not, change the therapy accordingly. In this sense, this approach is opposite to the most used techniques such as Bayesian methods or FDA. They analyze a (hopefully) large amount of similar data, from different patients, and try to find a common behavior (or groups of them), in order to make predictions about new "standard" patients. RBF-SOM, instead, starts from very few data of a single patient and estimates the curve, with a "personalized medicine" approach.
RBF-SOM and FDA have the same starting point: each series is considered as a curve<ref name="RamsayFunctional05" /><ref name="PerracchioneAnRFB16" /><ref name="RamsayApplied02">{{cite book |title=Applied Functional Data Analysis: Methods and Case Studies |author=Ramsay, J.O.; Silverman, B.W. |publisher=Springer-Verlag |pages=191 |year=2002 |isbn=9780387224657}}</ref>, while Bayesian methods take into account data points. RBFSOM and FDA find coefficients that describe the growth, while Bayesian methods are non-parametric. FDA gives a polynomial reconstruction, i.e., the coefficients represent velocity, acceleration, and other derivatives, while RBFSOM finds the biological parameters of the chosen theoretical function. In some cases, a simple description of the growth velocity could be sufficient. RBF-SOM is more problem-specific, but this means that a theoretical function, such as the Gompertzian one, needs to be known.
As concerns the reconstruction, FDA is similar to RBF. Our method introduces additional errors due to the parameter approximation and the adhesion of data to the theoretical function. Bayesian methods are flexible in clustering and curve reconstruction, primarily because they are non-parametric techniques and do not need theoretical functions like the Gompertzian one. Hence, in order to reconstruct the whole curve and/or make clusters, FDA and Bayesian methods are preferable to RBF-SOM.
Concerning the estimate of the future dynamics of the curve, RBF-SOM is proved to be reliable and robust even with few initial data. Moreover, the result is personalized on the single patient, which is not possible using Bayesian methods. FDA cannot be used in this way, while it can compare different versions of the same phenomenon and distinguish phase and amplitude variations (see for example Ramsay and Silverman<ref name="RamsayApplied02" />).
==Conclusions==
Longitudinal data can be analyzed in different ways. In the majority of cases, the goal of the analysis is to find a trend, or cluster, that a new series would follow, given many similar series and some additional parameters.
In this work, the goal is slightly different: to find the exact future dynamics of the single curve, given the previous data (of the curve) and the theoretical function that it should follow.
This approach could be useful in order to both consider biological meaning on the parameters of the curve and predict the future of the series without considering a large amount of similar data.
Moreover, RBF-SOM is a good tool for the reconstruction of curves, given its expected shape. It could be used in a large variety of cases, for both analyzing preexisting data as well as producing estimates of future scenarios, in a personalized medicine framework.
==Acknowledgements==
===Ethics approval and consent to participate===
Informed consent and ethical approvals were obtained from the participants of both studies. Data are anonymous and were used in accordance with the Declaration of Helsinki.
===Consent for publication===
Not applicable.
===Availability of data and material===
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
===Competing interests===
The authors declare that they have no competing interests.
===Funding===
Not applicable.
===Authors’ contributions===
IS and EP have created the mathematical method, tested it on data and been involved in drafting the manuscript.
GM and FC have been involved in drafting the manuscript and revising it critically for important intellectual content.
All authors read and approved the final manuscript.
==References==
{{Reflist|colwidth=30em}}
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. The original article's inline citations are not in numerical order (after citation 11); due to the nature of this wiki, citations are numbered in order automatically, and therefore the numbering differs from the original after citation 11. No other modifications were made in accordance with the "no derivatives" portion of the distribution license.
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2018)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on public health informatics]]

Revision as of 15:44, 14 October 2018

Sandbox begins below