Difference between revisions of "User:Shawndouglas/sandbox/sublevel8"

From LIMSWiki
Jump to navigationJump to search
(Replaced content with "<div class="nonumtoc">__TOC__</div> {{ombox | type = notice | style = width: 960px; | text = This is sublevel1 of my sandbox, where I play with features and...")
Line 7: Line 7:


==Sandbox begins below==
==Sandbox begins below==
{{Infobox journal article
|name        =
|image        =
|alt          = <!-- Alternative text for images -->
|caption      =
|title_full  = Approaches to Medical Decision-Making Based on Big Clinical Data
|journal      = ''Journal of Healthcare Engineering''
|authors      = Malykh, V.L.; Rudetskiy, S.V.
|affiliations = Ailamazyan Program Systems Institute of RAS
|contact      = Email: mvl at interin dot ru
|editors      =
|pub_year    = 2018
|vol_iss      = '''2018'''
|pages        = 3917659
|doi          = [http://10.1155/2018/3917659 10.1155/2018/3917659]
|issn        = 2040-2309
|license      = [http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International]
|website      = [https://www.hindawi.com/journals/jhe/2018/3917659/ https://www.hindawi.com/journals/jhe/2018/3917659/]
|download    = [http://downloads.hindawi.com/journals/jhe/2018/3917659.pdf http://downloads.hindawi.com/journals/jhe/2018/3917659.pdf] (PDF)
}}
{{ombox
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
The paper discusses different approaches to building a [[clinical decision support system]] based on big data. The authors sought to abstain from any data reduction and apply universal teaching and big data processing methods independent of disease classification standards. The paper assesses and compares the accuracy of recommendations among three options: case-based reasoning, simple single-layer neural network, and probabilistic neural network. Further, the paper substantiates the assumption regarding the most efficient approach to solving the specified problem.
==Introduction==
Providing support to clinical decision-making is one of the most urgent issues in healthcare automation. It has been repeatedly noted in different articles, reports, and forum discussions<ref name="Medsoft2016">{{cite web |url=http://www.armit.ru/medsoft/2016/conference/prog/ |title=Presentations of the 12th International Forum "MedSoft-2016" |publisher=Association for the Development of Medical Information Technologies |date=2016}}</ref> both in Russia and abroad that medical information system (MIS) introduction requires a considerable extra effort from users/doctors in the first place—to enter primary data into the system. Naturally, doctors expect practical intelligent outcomes from big clinical data accumulated by modern MISs. Handler ''et al.''<ref name="HandlerGartner07">{{cite web |url=https://www.gartner.com/doc/508592/gartners--criteria-enterprise-cpr |title=Gartner's 2007 Criteria for the Enterprise CPR |author=Handler, T.J.; Hieb, B.R. |publisher=Gartner, Inc |date=2007}}</ref> present the operating paradigm of fifth generation MISs, referred to as “MIS as Mentor.” Malykh ''et al.''<ref name="MalykhActive16">{{cite journal |title=Active MIS |journal=Information Technologies for the Physician |author=Malykh, V.L.; Rudetskiy, S.V.; Hatkevich, M.I. |volume=2016 |issue=6 |year=2016}}</ref> adds one more qualitative characteristic to the above paradigm—“MIS as automated mentor.”
<blockquote>It is advisable to abandon the practice of active user dialogs typical of expert systems, involving requests for data that the system considers missing from the user, and substitute the dialog with an automated nonintrusive algorithm that draws its own logical conclusions and generates recommendations in a completely automated manner based on available data, without involving the user in the process. The user may either accept or ignore the system’s prompts and recommendations; however, they will not provoke rejection in users if delivered automatically without requiring a dialog with the system.<ref name="MalykhActive16" /></blockquote>
To provide a brief qualitative description of this increasing subjectivity of MISs, we have proposed the new term “active MIS” that emphasizes a certain degree of independence from users or subjectivity of the cyber system. Kohane<ref name="KohaneTheTwin09">{{cite journal |title=The twin questions of personalized medicine: who are you and whom do you most resemble? |journal=Genome Medicine |author=Kohane, I.S. |volume=1 |issue=1 |page=4 |year=2009 |doi=10.1186/gm4 |pmid=19348691 |pmc=PMC2651581}}</ref> presents the most “balanced” definition of personalized medicine: “personalized medicine is the practice of clinical decision-making such that the decisions made maximize the outcomes that the patient most cares about and minimize those that the patient fears the most, on the basis of as much knowledge about the individual’s state as is available.” This perception of personal medicine is focused on clinical decision-making and once again exhibits the urgency and importance of scientific research in the area. Therefore, building an automated active mentor-type system that provides recommendations regarding treatment and diagnostic activities to the doctor is an urgent practical task.
Butko and Olshansky<ref name="ButkoNew90">{{cite journal |title=New Decision Support Systems in Foreign Healthcare |journal=Automation and Remote Control |author=Butko, S.N.; Olshansky, V.K. |volume=51 |year=1990}}</ref> and Kotov<ref name="KotovNew04">{{cite book |chapter=New Mathematical Approaches to Medical Diagnostics |title=Editorial URSS |author=Kotov, Y.B. |year=2004}}</ref> provide a retrospective overview of approaches to building clinical decision support systems. The applied approaches were restricted in many respects by the abilities of computers at that time. Accordingly, there was no such problem as processing big medical data. Technologies have evolved to the point where big medical data (both on individuals and the population in general) collection and accumulation is finally feasible. At the same time, big data processing and intelligent system learning methods were evolving as well. Along with “deep learning,” the term “deep patient”<ref name="MiottoDeep16">{{cite journal |title=Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records |journal=Scientific Reports |author=Miotto, R.; Li, L.; Kidd, B.A. et al. |volume=6 |page=26094 |year=2016 |doi=10.1038/srep26094}}</ref> was coined, meaning the opportunity to extract increasingly more complete, deep, and valuable [[information]] about patients from big clinical data using deep learning methods.
Malykh ''et al.''<ref name="MalykhCase15">{{cite journal |title=Case-based reasoning in clinical processes using clinical data banks |journal=Proceedings from the 2015 International Conference on Biomedical Engineering and Computational Technologies |author=Malykh, V.L.; Belyshev, D.V. |pages=211-216 |year=2015 |doi=10.1109/SIBIRCON.2015.7361885}}</ref> mention the possibility of creating national-scale clinical data banks. Herrett ''et al.''<ref name="HerrettData15">{{cite journal |title=Data Resource Profile: Clinical Practice Research Datalink (CPRD) |journal=International Journal of Epidemiology |author=Herrett, E.; Gallagher, A.M.; Bhaskaran, K. et al. |volume=44 |issue=3 |pages=827-36 |year=2015 |doi=10.1093/ije/dyv098 |pmid=26050254 |pmc=PMC4521131}}</ref> provide an example of a database (DB) containing anonymous medical records on primary healthcare services provided. This DB was created by a joint effort of 674 general practitioners and covers over 11.3 million patients in Great Britain.
Decision-making in [[hospital]]s has evolved from being opinion-based to being based on sound scientific evidence. This decision-making is recognized as evidence-based practice. Perpetual publication of new evidence combined with the demands of everyday practice makes it difficult for health professionals to keep up to date.M<ref name="RotterClinical10">{{cite journal |title=Clinical pathways: Effects on professional practice, patient outcomes, length of stay and hospital costs |journal=Cochrane Database of Systematic Reviews |author=Rotter, T.; Kinsman, L.; James, E. et al. |issue=3 |page=CD006632 |year=2010 |doi=10.1002/14651858.CD006632.pub2 |pmid=20238347}}</ref>
A large number of publications are devoted to clinical decision support systems (DSSs), including publications in specialized scientific journals (''Artificial Intelligence in Medicine'', ''BMC Medical Informatics and Decision Making'', ''International Journal of Medical Informatics'', ''Medical Decision Making'', etc.). The work does not aim to give an overview of different approaches to making of decision support systems, referring readers to the original reviews.<ref name="BernerClinical09">{{cite web |url=https://healthit.ahrq.gov/health-it-tools-and-resources/health-it-bibliography/clinical-decision-support-systems-cdss/clinic-0 |title=Clinical Decision Support Systems: State of the Art |author=Berner, E.S. |publisher=Agency for Healthcare Research and Quality |date=June 2009}}</ref><ref name="EfimenkoIntellingent17">{{cite journal |title=Intelligent decision support systems in medicine: State of the art and beyond |journal=Proceedings from Open Semantic Technologies for Intelligent Systems OSTIS-2017 |author=Efimenko, I.V.; Khoroshevsky, V.F. |pages=251-260 |year=2017 |url=https://libeldoc.bsuir.by/handle/123456789/12259}}</ref><ref name="CDSSWikipedia">{{cite web |url=https://en.wikipedia.org/wiki/Clinical_decision_support_system |title=Clinical decision support system |work=Wikipedia}}</ref> We can give a few definitions for decision support system from Wikipedia: “Clinical Decision Support systems link health observations with health knowledge to influence health choices by clinicians for improved health care” and “active knowledge systems, which use two or more items of patient data to generate case-specific advice.” No one doubts the feasibility of such systems and that they have a positive impact on professional practice, patient outcomes, length of hospital stay, and hospital costs. The main problem is to find effective approaches to building such systems.
A number of contemporary approaches to clinical decision support system development are listed by Malykh ''et al.''<ref name="MalykhEstimation16">{{cite journal |title=Estimation of accuracy of recommended diagnostic and treatment actions based on precedent approach |journal=Proceedings of the IADIS International Conference e-Health 2016 |author=Malykh, V.L.; Kononenko, I.N.; Rudetskiy, S.V. |pages=52-8 |year=2016 |url=http://www.iadisportal.org/digital-library/estimation-of-accuracy-of-recommended-diagnostic-and-treatment-actions-based-on-precedent-approach}}</ref> The first one of these approaches involves provision of relevant data sources to doctors, helping them make decisions independently. The system does not recommend any final solutions—instead, it suggests data sources to study and find answers to current questions (e.g., [http://www.uptodate.com/home UpToDate]).
The second approach is to use clinical pathways. Clinical pathways represent prescriptive models of the standard healthcare procedures that need to be undertaken for a specific patient population. Instances of the clinical pathways (also known as cases) describe the actual diagnostic-therapeutic cycle of an individual patient.<ref name="CaronHealth13">{{cite journal |title=Healthcare Analytics: Examining the Diagnosis–treatment Cycle |journal=Procedia Technology |author=Caron, F.; Vanthienen, J.; Baesens, B. |volume=9 |pages=996-1004 |year=2013 |doi=10.1016/j.protcy.2013.12.111}}</ref> But even in the case of the use of clinical pathways, the process of clinical decision-making has high complexity. While the medical knowledge used in the decision process comes partially from published research contributions and widespread medical guidelines (with various kinds of evidence levels), it is generally accepted that the decision process is profoundly influenced by the expertise and experiences of the involved medical experts.<ref name="CaronHealth13" />
The third approach involves development of a large number of individual narrow-focused decision support systems. This approach helps achieve top quality when solving isolated problems<ref name="KotovNew04" /><ref name="EfimenkoIntellingent17" />; however, it is almost impossible to apply it to big clinical data.
The fourth approach that claims to have a global scope of application is focused on building a cognitive system capable of self-learning and knowledge digestion directly from nonformalized text sources (e.g., [http://www.ibm.com/smarterplanet/us/en/ibmwatson/ IBM Watson]).
None of the reviewed approaches is immaculate. All of them require efforts of experts and regular updates of knowledge bases. Moreover, each of the approaches is in fact tailored to specific purposes.
The latest Russian-language review<ref name="EfimenkoIntellingent17" /> noted that clinical decision support systems have not become widespread in Russia. This is due to the complexity of the development of such systems, the specific character of the systems already developed, and the need to involve high-class experts in the development.
In this paper, we review general approaches to decision support system development based on nonreduced big clinical data. The main expectations related to application of general approaches ensue from the case-based nature of decision-making in healthcare, and the assumption that big clinical data already contain enough knowledge for efficient decision-making.
There are two other factors that draw attention to systems based on machine learning or precedent approach.
First of them is that there are trends in the development of our civilization, which include an explosive development of information technologies (among them machine to machine (M2M), big data, and the [[internet of things]] (IoT), their strong need for formalized knowledge, and practical absence of qualified experts who could formalize that knowledge. The chief editor of the ''Rational Enterprise Management'' (REM) magazine (Russia) holds regular discussions on a wide range of problems including the above-mentioned ones. Results of the discussions are published in the REM editor’s column. The guests of a recent discussion<ref name="VasilyevaIndustrial15">{{cite journal |title=Industrial Internet of Things (IoT) |journal=Rational Enterprise Management |author=Vasilyeva, E. |year=2015}}</ref> included Igor Rudym (Intel), Dmitriy Tameev (PTC), Alexander Belotserkovskiy (Microsoft), Igor Girkin (Cisco), and Igor Kulinitchev (IBM). All the participants agreed that, nowadays, the key challenge of IT development is not associated with hardware or software, but it needs breakthrough approaches to [[data analysis]].
As for the second factor, it is obvious that, nowadays, there are no qualified experts in the field of knowledge even in key branches. The actual situation is even more critical as the experts who are able to solve at least a part of these problems are not able to cope with ever increasing information flow. From this point of view, precedent-based DSSs practically need no experts. Experts may be needed for enhancing or optimizing existing medical databases and knowledge bases.<ref name="MalykhEstimation16" />
==Models and methods==
We regard the diagnostic and treatment process (DTP) as a discrete controlled process with a memory. The model was first introduced by Malykh ''et al.''<ref name="MalykhControlled14">{{cite journal |title=Controlled stochastic precedent process with memory as a mathematical model of the diagnostic and treatment process |journal=Information Technologies and Computational Systems |author=Malykh, V.L.; Guliev, Y.I. |volume=2 |pages=62-72 |year=2014}}</ref><ref name="MalykhMan14">{{cite journal |title=Management and decision making in clinical processes |journal=Proceedings of XII All-Russian Conference on Problems of Management of VSPU-2014 |author=Malykh, V.L.; Guliev, Y.I.; Eremin, A.V. et al. |pages=6518–6528 |year=2014}}</ref> in Russian, later described by Malykh ''et al.'' in English.<ref name="MalykhCase15" /><ref name="MalykhPrecedent15">{{cite journal |title=Precedent Approach to Decision Making in Clinical Processes |journal=Studies in Health Technology and Informatics |author=Malykh, V.L.; Guliev, Y.I. |volume=2016 |page=957 |pmid=26262259}}</ref> To ensure further understanding of the essence of the problem, let us provide an extract from the source.
Modern medical information systems store [[electronic medical record]]s and contain descriptions of millions of various clinical cases. The degree of formalization of clinical data stored in MISs varies. MISs model the diagnostic and treatment process as a sequence of controlling events reflecting diagnostic and treatment activities, and a sequence of monitoring events describing the condition of the patient. Controlling events are well formalized; medical organizations keep statistical and business records of such events, plan them, and allocate required resources. Medical data related to monitoring of patients’ condition are less formalized and may be partly available in the form of plain text medical documents.
Previous studies provide evidence that is possible to model the DTP using controlled stochastic Markov processes [18]. The model is based on the assumption that the DTP is a discrete controlled process. The model introduces the notions of control U and state X. Controls are diagnostic and treatment decisions made and executed in future. Controls are different diagnostic and treatment activities prescribed by doctors, including diagnostic tests, medicines, surgical interventions, various procedures, and manipulations. The choice of diagnostic and treatment activities is based on the accumulated medical knowledge and the doctor’s individual experience. The scope of potential diagnostic and treatment activities comprises previously applied measures with proven efficiency. Controls are essentially precedent dependent.
The choice of control (X<sub>''i''</sub>, U<sub>''i''</sub>) depends not only on the current state (X<sub>''i''</sub>) but also on the overall background of the process as well as controls applied at earlier DTP stages {''i'', ''i'' − 1, ''i''− 2, …}. This is due to the specific features and nature of the treatment process. To take the process memory effect into account, it is proposed to include the integral property of the relevant control in the extended state of the discrete process. Each control in the DTP can be associated with some integral property of such control. For example, such integral properties include a full dose of medicine taken by the patient at this stage of the DTP or a full dose of radiation the patient was exposed to in the course of radiotherapy. The frequency (number) of application of different control elements is also regarded as an integral property (e.g., the number of assigned ECGs).
DTP modeling based on the Markov process appears sufficiently substantiated<ref name="MalykhControlled14" /><ref name="MalykhMan14" /><ref name="BennettArtificial13">{{cite journal |title=Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach |journal=Artificial Intelligence in Medicine |author=Bennett, C.C.; Hauser, K. |volume=57 |issue=1 |pages=9–19 |year=2013 |doi=10.1016/j.artmed.2012.12.003 |pmid=23287490}}</ref>, especially in cases involving DTP description for inpatients with strictly regular monitoring and medical decision-making.
Thus, in the model, the DTP is represented by a sequence of vectors of equal length and structure V split into two components—control U and monitored properties X. Control components have non-negative numerical values. A zero value of control at this stage of the process means that this kind of control has never been applied before, starting from the beginning of the process and up until this step inclusively. Components of monitored properties are of different nature. They can be dimensional physical values or non-numerical, for example, assignment of a property’s value to a specific class. Since it is almost impossible to monitor all the properties at the same time, certain components of properties may be unknown to us. When applying different methods to the model, we may need to digitize non-numerical values of components and identify missing values of monitored properties.
===Definition of the objective===
We will review several methods that can be applied to build a cybernetic taught system. The input into the system will be a sequence of vectors describing a discrete DTP in accordance with the presented model. The output will consist of recommendations proposing diagnostic and treatment options (choice of controls) for this particular state of the process. A diagram of the system is presented in Figure 1.
[[File:Fig1 Malykh JofHealthEng2018 2018.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' Recommender system</blockquote>
|-
|}
|}
Let us define the objective more accurately and assume that each DTP model is considered in the context of an already available predominant diagnosis. For each model, we have an array of earlier observed DTP implementations. Such implementations are sources of knowledge about treatment of a particular nosology, and they are used to teach a cybernetic recommender system to operate in the given context. Based on available DTP implementations, we defined a glossary of controls and monitored properties for each model. Issues related to normalization of primary data, outlier testing and exclusion, and approaches to data generalization based on assignment of monitored properties to generic classes are beyond the scope of this paper.<ref name="MalykhMan14" /> It might also be necessary to extract data directly from the text of medical documents. Once this enormous and useful effort is completed, we will have a bank of clinical data containing sets of DTPs with homogeneous descriptions for each nosology present in the bank. We would like to emphasize that no primary data reduction is envisaged, such as focusing solely on properties meaningful in the context of the relevant nosology. Data are extracted from the MIS “as is”—exactly as there were entered in the MIS by doctors, assuming such data will most likely contain significant and meaningful information for the relevant nosology.
Finally, let us provide examples of typical properties of nonreduced primary data. We believe that a process ensemble in a data bank may reach 10<sup>3</sup> to 10<sup>6</sup> processes for an individual nosology. The dimension of a vector describing one step of a discrete DTP exceeds 10<sup>3</sup>. The dimension of a control (output of the cybernetic system) may also exceed 10<sup>3</sup>.
The case-based approach, including its application to medical decision support, has been described in sufficient detail in multiple sources.<ref name="KotovNew04" /><ref name="MalykhEstimation16" /><ref name="MalykhPrecedent15" /> The main idea of the case-based approach is quite simple—find a clinical case in the DB similar to the one in focus and use it for medical decision support purposes. Additionally, clinical cases used as precedents during the search can be filtered, taking into account such factors as reputation of medical organizations that such cases originate from, reputation of doctors who created such cases, or relevance of the cases in view of contemporary medical technologies. To ensure successful application of the case-based approach, it is necessary to have representative DBs of clinical cases.
Malykh ''et al.''<ref name="MalykhEstimation16" /> present assessment results with respect to the accuracy of diagnostic and treatment activities recommended using case-based reasoning. The structure of the cybernetic system chosen for the approach in focus is presented in Figure 2.
[[File:Fig2 Malykh JofHealthEng2018 2018.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Structure of a case-based system</blockquote>
|-
|}
|}
We have a network and each node in it is presented by a single DTP state. Each individual DTP represents a specific route within the network (routes are marked in Figure 2 by orange arrows). In the model, each state is represented by vector V. A metric or distance ''d''(''X'', ''Y'') is defined for each state. Based on the defined metric or distance, a small-world graph is plotted.<ref name="MalkovApprox14">{{cite journal |title=Approximate nearest neighbor algorithm based on navigable small world graphs |journal=Information Systems |author=Malkov, Y.; Ponomarenko, A.; Logvinov, A. et al. |volume=45 |pages=61–8 |year=2014 |doi=10.1016/j.is.2013.10.006}}</ref> For each node in the small-world graph, ''n'' (graph parameter) closest neighbors are identified. In Figure 2, closest neighbors are marked with pointing blue arrows; four closest neighbors are specified for node ''t''—N1, N2, N3, and N''i''2.
Here is how the recommender system operates. The input into the system is a current state of the DTP: The situation when the input contains the entire implemented sequence of process states is beyond the scope of this paper. Several nodes are randomly selected on the small-world graph (R1 in the example presented in Figure 2). From original nodes towards their closest neighbors, we go down to the graph node minimizing locally the distance between the node (R1 → N''i''1 → N''i''2 → ''t'' in Figure 2) and the input state. The best of all the identified local minimums is selected. It will be regarded as the closest neighbor of input state In. At this point, the recommended control can be calculated as the difference between integral properties of control components of two vectors. In Figure 2, these are state vectors (''t'' + 1) and (''t''). The recommended control is U = U(''t'' + 1) − U(''t'').
It is easy to assess the scale of the network in focus. In the example with 1,000 processes for one main nosology with the average duration of the process equal to ten days, we will need 10,000 network nodes. Each node will store a vector with the dimension 1,000 or higher. Computational experiments show that 0.5–1% of the total number of nodes is sufficient as random initial network nodes. In case with 10,000 nodes, the number of initial nodes will be 50–100. The descent along the small-world graph was quick, and the routes did not exceed 10 steps on average. The number of edges originating from each node in the small-world graph was equal to eight. The top-down assessment of the number of metric calculations in this case equals to 100∗10∗8. It is possible to accelerate the calculations by splitting the small-world graph into layers corresponding to specific DTP lengths and searching for closest neighbors within the layer corresponding to the input state. In the above example, we would have layers consisting of 1,000 states, and we would search for closest neighbors starting from five to ten randomly selected nodes. This is fully acceptable in view of the computational requirements: computational experiments show that, in this case, computations can be performed almost real-time.
Let us review the network teaching process. Teaching means adding new DTP implementations to the network. The number of metric calculations ''d'' when adding ''k'' states of a new process to the network containing ''m'' states equals to ''k''∗''m''. This is absolutely acceptable in view of the computational requirements As a result, new knowledge will be added to the network, and it will be extended by ''k'' new nodes and (''k'' − 1 +'' k''∗''n'') edges. It is essential to emphasize the network’s sensitivity to new knowledge. Apparently, any newly added DTP implementation may have a significant impact on the decision recommended by the system if the closest neighbor is selected from the added implementation. It may be asserted that the network digests new knowledge and starts applying it immediately. We will not see this in approaches described below.
As an alternative approach, let us consider a basic neural network with a single layer. The structure of the network is outlined in Figure 3.
[[File:Fig3 Malykh JofHealthEng2018 2018.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 3.''' Neural network</blockquote>
|-
|}
|}
Current DTP state is used as input to a basic one-layer neural network. The network contains ''m'' adders and ''m'' neurons in accordance with the dimension of control component U. In the output, each neuron has either one of the values {0,1}. Output 1 of neuron ''i'' means the system recommends control U<sub>''i''</sub> for this state. Output 0 of neuron ''i'' means the system refuses to recommend control U<sub>''i''</sub> for this state.
Let us refer to the network scale as an example. Let the dimension of input vector be 1,000 and that of the control component 500. In such case the teaching process will involve definition of 1,000∗500 weights. Let us remark that no major reduction of the neural network is possible to solve the above problem. The reason is that the dimension of the control component is the number of diagnostic and treatment activities that can be prescribed for this nosology, including coexisting illnesses. And this number is enormous. Adding new layers to the neural network will only make matters worse by increasing the number of taught parameters.
Let us examine the network teaching process. Initially, a certain set of DTPs is selected and used for network teaching purposes, including calculation of weights. New DTP implementations emerge. How should we use this new knowledge? If a sufficiently large volume of DTP implementations was used to teach the network (1,000 to 10,000) and new implementations constitute an insignificant share of the teaching sample (e.g., 100 new implementations versus 10,000 is merely 1%), it can be asserted that network re-teaching will not result in any noticeable changes in teaching parameters, and consequently, any major variations in the network’s output. This kind of network is rough and conservative; it can “digest” new knowledge only when the volume of such is sufficient. In this respect, neural networks are not as good as networks applying the case-based approach.
As another alternative approach, let us consider a probabilistic neural network. The structure of the network is outlined in Figure 4. For each state (state vector V), there is one kernel function f(V) common for all the states. In our case, we used a multivariate Gaussian distribution function with a diagonal covariance matrix. The kernel function includes parameter ''σ'' affecting the function’s width. Each state is classified into 2''m'' classes, where ''m'' is the dimension of the control component. If a doctor applies control L to state ''t'', then ''t'' belongs to class KL1; otherwise, it belongs to class KL0.
[[File:Fig4 Malykh JofHealthEng2018 2018.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 4.''' Probabilistic neural network</blockquote>
|-
|}
|}
Figure 5 shows the impact of control parameter ''σ'' on the type of distribution.
[[File:Fig5 Malykh JofHealthEng2018 2018.png|500px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 5.''' Impact of control parameter ''σ'' on kernel functions and type of distribution</blockquote>
|-
|}
|}
Now, a probability density function can be “restored” for each class. For input vector In, we apply Bayes’ formula to calculate the posterior probability of belonging to each class and generate recommendations regarding the choice of diagnostic and treatment activities for this state.
Let us refer to the network scale as an example. Let the dimension of the input vector be 1,000, the dimension of the control component be 500, and the teaching sample contain 1,000 processes with 10 states in each. We will need to calculate 10,000 kernel functions and then calculate 1,000 posterior probabilities of the input vector belonging to each class for various distributions of kernel function supports for 500∗2 different classes.
Let us examine the network teaching process. The teaching process is focused on adding new DTP implementations to the network, including assignment of states to different classes. If the number of new implementations is a small share of the teaching sample used earlier, it can be asserted that adding new implementations will have no major impact on the network’s output. The probabilistic neural network proves to be rough and conservative; it can “digest” new knowledge only when the volume of such is sufficient. In this respect, probabilistic neural networks are not as good as networks applying the case-based approach.
==Results==
We performed computational experiments for a network built using the case-based approach in 2015-2016. The results were published by Malykh ''et al.''.<ref name="MalykhEstimation16" />  To compare different approaches to the problem, we will present the results of paper<ref name="MalykhEstimation16" /> in a slightly modified format.
Table 1 shows that the number of correct recommendations (TP True Positive) varies from 58.7 to 94.9% depending on the type of nosology. The majority of recommendations match the doctor’s actions.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 1.''' Accuracy assessment of recommended diagnostic and treatment activities for seven nosologies using the case-based approach
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|MKB-10 code / nosology
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Total number of clinical precedents / number of control precedents
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|Number of correct recommendations among control precedents<br />/<br />absolute value or share in the total number of diagnostic and treatment activities
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|Number of recommendations with a different control level among control precedents<br />/<br />absolute value or share in the total number of diagnostic and treatment activities
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|Number of diagnostic and treatment activities the decision support system was unable to provide recommendations for among control precedents<br />/<br />absolute value or share in the total number of diagnostic and treatment activities
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Number of states / number of controlled variables
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|J13 / pneumonia due to ''Streptococcus pneumoniae''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|166 / 11
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|6788 / 81.6%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|3923 / 47.2%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|1530 / 18.4%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2938 / 118
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|K80.1 / calculus of gallbladder with other cholecystitis
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1018 / 128
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|34468 / 76.7%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|18390 / 40.9%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|10490 / 23.3%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|12853 / 931
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|H25.1 / age-related nuclear cataract
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1205 / 121
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|3522 / 94.9%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|539 / 14.5%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|189 / 5.1%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|5509 / 293%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|H26.2 / complicated cataract
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1255 / 126
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|4362 / 91.4%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|1617 / 33.9%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|408 / 8.6%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|5778 / 249%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|I67.4 / hypertensive encephalopathy
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1336 / 134
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|65678 / 72.4%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|37563 / 41.4%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|25060 / 27.6%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|23165 / 1431
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|I67.9 / cerebrovascular disease, unspecified
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1403 / 141
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|58649 / 75.4%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|32447 / 41.7%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|19117 / 24.6%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|24875 / 1518
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|N20.1 / calculus of ureter
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1632 / 164
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|17489 / 58.7%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|9948 / 58.7%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|12291 / 41.3%
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|15922 / 205
|-
|}
|}
In the matter of neural networks, computational experiments for all nosologies listed in Table 1 required quite a lot of time and computing power. The practical value of such full-scale experiments was unclear. Therefore, it was decided to limit computational experiments to estimations for nosology J13. Table 2 contains general information about the experiment with a single-layer neural network.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 2.''' Accuracy assessment of recommended diagnostic and treatment activities for nosology J13 based on a single-layer neural network
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|MKB-10 code / nosology
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Total number of clinical precedents / number of control precedents
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|Number of correct positive recommendations among control precedents<br />/<br />absolute value or share in the total number of positive recommendations
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" rowspan="2"|Number of incorrect positive recommendations among control precedents<br />/<br />absolute value or share in the total number of positive recommendations
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Total number of negative recommendations / total number of positive recommendations
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Number of neural network inputs / number of neural network outputs (number of controlled variables)
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Absolute value / percent
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|J13 / pneumonia due to ''Streptococcus pneumoniae''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|266 / 11
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|339 / 40.31%
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|502 / 59.69%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|35567 / 841
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|224 / 222
  | style="background-color:white; padding-left:10px; padding-right:10px;"|98.55% / 40.31%
|-
|}
|}
Let us emphasize that the volume of statistics on this illness stored in the DB has increased compared to an earlier experiment involving the same nosology—from 166 to 266 completed clinical processes. Controls included all types of drug prescriptions (222 different pharmaceutical products in our case). Data normalization involved adjustment of prescribed dosages of pharmaceutical products to unified dose units. The only monitored variable was “inpatient days.” Inputs also included bias. 49,728 weights had to be determined. The optimized target function was a quadratic residual between neural network output and control components monitored in control samples, adjusted to (0, 1). We used a nonstandard neurons activation bell curve (Gaussian function). This choice of activation function was based on the fact that integral values of many controls had apparent limits stipulated by Russian federal healthcare standards (standards of the Russian Ministry of Health). Different insurance programs also limit integral values of controls. Healthcare providers will not exceed these limits unless they find it necessary. Formally, with respect to the model, it means that once an integral property of a control reaches a certain limit, it stops growing further or such growth is highly unlikely. The gradient of the target function with respect to weights was calculated explicitly, and the steepest descent method was applied. Teaching included 1,006 descent steps. Criteria reflecting the accuracy of the neural network are presented in Table 3.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="4"|'''Table 3.''' Accuracy of recommended diagnostic and treatment activities for nosology J13 based on a single-layer neural network with an activation threshold equal to 0.1<br />&nbsp;<br />TP, true positive; FP, false positive; TN, true negative; FN, false negative
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="4"|Absolute values (neuron activation threshold equal to 0.1)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TP
  | style="background-color:white; padding-left:10px; padding-right:10px;"|339
  | style="background-color:white; padding-left:10px; padding-right:10px;"|502
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FP
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TN
  | style="background-color:white; padding-left:10px; padding-right:10px;"|35,052
  | style="background-color:white; padding-left:10px; padding-right:10px;"|515
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FN
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="4"|Percent (neuron activation threshold equal to 0.1)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TP
  | style="background-color:white; padding-left:10px; padding-right:10px;"|40.31%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|59.69%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FP
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TN
  | style="background-color:white; padding-left:10px; padding-right:10px;"|98.55%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1.45%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FN
|-
|}
|}
The relevant receiver operating characteristic (ROC) error curve is shown in Figure 6.
[[File:Fig6 Malykh JofHealthEng2018 2018.png|492px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="492px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 6.''' ROC error curve</blockquote>
|-
|}
|}
Results of the experiment based on a probabilistic neural network are presented in Table 4. The state vector dimension was equal to 639. The control component included 125 diagnostic tests, 200 laboratory tests (different kinds), 222 different pharmaceuticals, 87 medical treatments, and four controls classified as “others.” The only monitored property was “inpatient days.” The number of kernels (states) in the teaching sample of 266 processes was 4,361. The dimension of the state vector in the probabilistic neural network was almost three times the dimension of the state vector in the single-layer neural network (639 versus 223). To make the results of both networks comparable, the output of the probabilistic neural network was considered to be the same as for the first neural network. The output was a vector with a dimension of 222, related to prescription of different pharmaceuticals. Both neural networks generated 36,408 positive and negative recommendations for the control sample. The experiment involved one control parameter ''σ'' and a multiplier for a diagonal covariance matrix used in the kernel function (multivariate Gaussian distribution of independent random variables). A value grid was predetermined for the parameter ''σ'', and the best value of the parameter was chosen based on experimental calculation results.<ref name="KvetniyProb10">{{cite book |chapter=Probabilistic Neural Networks in Time Series Identification |title=Information Technologies and Computers |author=Kvetniy, R.N.; Kabachiy, V.V.; Chumachenko, O.O. |publisher=Vinnytsia National Technical University |year=2010}}</ref> Calculations were performed for the following values of ''σ'': (0.1, 0.5, 1, and 2.5). The best results were obtained for ''σ'' = 2.5. They are presented in Table 4. Let us emphasize that standard deviation values of the state vector components calculated for the teaching sample were significant and often exceeded average values. The multiplier equal to 2.5 yields “wide” kernel functions (see the rightmost distribution in Figure 5). With “sharp” kernel functions (''σ'' = 0.1), the results were obviously worse.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="4"|'''Table 4.''' Accuracy of recommended diagnostic and treatment activities for nosology J13 based on a probabilistic neural network with ''σ'' = 2.5<br />&nbsp;<br />TP, true positive; FP, false positive; TN, true negative; FN, false negative
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="4"|Absolute values (''σ'' = 2.5)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TP
  | style="background-color:white; padding-left:10px; padding-right:10px;"|233
  | style="background-color:white; padding-left:10px; padding-right:10px;"|191
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FP
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TN
  | style="background-color:white; padding-left:10px; padding-right:10px;"|35,376
  | style="background-color:white; padding-left:10px; padding-right:10px;"|608
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FN
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="4"|Percent (''σ'' = 2.5)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TP
  | style="background-color:white; padding-left:10px; padding-right:10px;"|55.0%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|45.0%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FP
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|TN
  | style="background-color:white; padding-left:10px; padding-right:10px;"|98.31%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|1.69%
  | style="background-color:white; padding-left:10px; padding-right:10px;"|FN
|-
|}
|}
==Summary==
The focus of this paper was how to build a clinical decision support system based on big clinical data. The authors review general approaches to the problem that do not involve individual models for specific nosologies and neither do they require engagement of experts in the relevant subject area to such modeling or knowledge extraction from data. Data are extracted from the MIS without reduction, “as is.” It is assumed that the data contain significant information reflecting medical knowledge and contemporary medical treatment technologies. Three different approaches to big clinical data processing were examined: (1) case-based reasoning for decision-making; (2) decision-making based on a single-layer neural network; and (3) decision-making based on a probabilistic neural network. Experimental calculations were performed to assess the accuracy of recommendations generated using different approaches.
Drawbacks of the above neural networks with respect to the given problem were identified. The overall accuracy of provided recommendations was rather high. Moreover, the accuracy of negative recommendations that the neural networks learned to provide was very high (98–99%). However, the accuracy of positive recommendations provided by the neural networks was not so high (40–55%, which is obviously insufficient for successful practical application). Another disadvantage of neural networks is their rough and conservative nature, particularly when digesting isolated portions of new data with the volume insignificant compared to previously available data.
The case-based approach to decision-making yielded more accurate recommendations (59–95%), which is sufficient for its successful practical application. Another advantage of the case-based approach is its sensitivity to new data. With respect to calculations, the case-based approach is also more efficient compared to other options under consideration as it ensures a high operating speed of the decision support system, thus making it acceptable for practical application. These are the key findings of the study conducted.
This offers encouraging prospects for designing and developing decision support systems for physicians based on empirical components of medical knowledge. This approach also corresponds to existing case-based character of management and decision-making in medical practice. So far, the results indicate that precedent-based approach has a high effectiveness and could naturally enhance other approaches to supporting physicians’ decision-making, particularly knowledge-based ones. The obvious practical value of this approach lies in the fact that it can be complementary to other knowledge-based approaches (clinical pathways, evidence-based clinical decision support, expert systems, Watson, etc.). The doctor will be able to make decisions based on the best examples of medical practice, finding precedents of clinical cases close to the given case.
The constraints of a precedent-based approach include the need for a representative database of verified precedents excluding medical errors. From another perspective, precedents with corrected errors are of particular interest to physicians training and further prevention of such errors. The information about the results of these errors and possible ways of correcting them is also valuable. Thus, precedent-based approach could be widely spread as an educational tool. On the other hand, the precedent-based approach does not imply formalization of medical knowledge, which entails poor cognitive justification of generated recommendations. Consequently, justifications only describe how other patients were treated in similar clinical cases. There are also problems with optimization of provided metrics, compression of state descriptions, and construction of training procedures. These problems are connected with high dimensionality of the space of state characteristics and samples of clinical precedents. However, discussion of these issues and possible ways of addressing them has been left outside of this research.<ref name="MalykhEstimation16" />
In further studies, we are going to focus on detailed application of the case-based approach, analyze metrics, and distances not only for pairs of vectors but also for pairs of vector sequences, and examine issues concerned with intelligent normalization of primary data and data extraction from plain texts of medical documents.
==Acknowledgements==
The authors would like to thank Professor V. M. Khachumov, a doctor of engineering sciences, and V. P. Fralenko, a candidate of engineering sciences, for the consulting support on neural networks and teaching methods, as well as Professor N. N. Nepevoda, a doctor of physical and mathematical sciences, for the discussion and assessment of the outcomes. Some of the outcomes presented in the paper were achieved earlier under the support of the Ministry of Education and Science of the Russian Federation (Project RFMEFI60714X0089) and in the context of Grant 13-07-12012 provided by the Russian Foundation for Basic Research.
===Disclosures===
UDC 007.52 (Automatically operated systems without any humans among system links, robots, and automated machines).
===Conflicts of interest===
The authors declare that they have no conflicts of interest.
==References==
{{Reflist|colwidth=30em}}
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2018)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on big data]]
[[Category:LIMSwiki journal articles on health informatics‎‎]]

Revision as of 03:38, 24 October 2018

Sandbox begins below