Difference between revisions of "User:Shawndouglas/sandbox/sublevel13"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Replaced content with "<div class="nonumtoc">__TOC__</div> {{ombox | type = notice | style = width: 960px; | text = This is sublevel13 of my sandbox, where I play with features and...")
Tag: Replaced
 
(119 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Infobox journal article
<div class="nonumtoc">__TOC__</div>
|name        =
{{ombox
|image        =
| type     = notice
|alt          = <!-- Alternative text for images -->
| style    = width: 960px;
|caption     =  
| text     = This is sublevel13 of my sandbox, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see [[User_talk:Shawndouglas|my discussion page]] instead.<p></p>
|title_full  = DataCare: Big data analytics solution for intelligent healthcare management
|journal     = ''International Journal of Interactive Multimedia and Artificial Intelligence''
|authors      = Baldominos, Alejandro; de Rada, Fernando; Saez, Yago
|affiliations =  Universidad Carlos III de Madrid, Camilo José Cela University
|contact      = Email: abaldomi at inf dot uc3m dot es
|editors      =
|pub_year    = 2018
|vol_iss      = '''4'''(7)
|pages        = 13–20
|doi          = [http://10.9781/ijimai.2017.03.002 10.9781/ijimai.2017.03.002]
|issn        = 1989-1660
|license      = [https://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported]
|website      = [http://www.ijimai.org/journal/node/1621 http://www.ijimai.org/journal/node/1621]
|download    = [http://www.ijimai.org/journal/sites/default/files/files/2017/03/ijimai_4_7_2_pdf_16566.pdf http://www.ijimai.org/journal/sites/default/files/files/2017/03/ijimai_4_7_2_pdf_16566.pdf] (PDF)
}}
}}
{{ombox
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
This paper presents DataCare, a solution for intelligent healthcare management. This product is able not only to retrieve and aggregate data from different key performance indicators in healthcare centers, but also to estimate future values for these key performance indicators and, as a result, fire early alerts when undesirable values are about to occur or provide recommendations to improve the quality of service. DataCare’s core processes are built over a free and open-source cross-platform document-oriented database (MongoDB), and Apache Spark, an open-source cluster computing framework. This architecture ensures high scalability capable of processing very high data volumes coming at rapid speeds from a large set of sources. This article describes the architecture designed for this project and the results obtained after conducting a pilot in a healthcare center. Useful conclusions have been drawn regarding how key performance indicators change based on different situations, and how they affect patients’ satisfaction.
'''Keywords''': Architecture, artificial intelligence, big data, healthcare, management
==Introduction==
When managing a healthcare center, there are many key performance indicators (KPIs) that can be measured, such as the number of events, the waiting time, the number of planned tours, etc. Often, keeping these KPIs within the expected limits is vital to achieving high user satisfaction.
In this paper we present DataCare, a solution for intelligent healthcare management. DataCare provides a complete architecture to retrieve data from sensors installed in the healthcare center, process and analyze it, and finally obtain relevant information, which is displayed in a user-friendly dashboard.
The advantages of DataCare are twofold: first, it is intelligent. Besides retrieving and aggregating data, the system is able to predict future behavior based on past events. This means that the system can fire early alerts when a KPI is expected to have a future value that falls outside the expected boundaries, and it can provide recommendations for improving the behavior and the metrics, or prevent future problems with attending events.
Second, the core system module is built on top of a big data platform. Processing and analysis are run over Apache Spark, and data are stored in MongoDB, thus enabling a highly scalable system that can process large volumes of data coming in at very high speeds.
This article will discuss many aspects of DataCare. The next section will present context for this research by analyzing the state of the art and related work. After that an overview of DataCare’s architecture will be presented, including the three main modules responsible for retrieving data, processing and analyzing it, and displaying the resulting valuable information.
After the architecture has been explained, the subsequent three sections will describe the preprocessing, processing, and analytics engines in further detail. The design of these systems is crucial to providing a scalable solution with an intelligent behavior. After discussing those engines in detail, the article will then describe the visual analytics engine and the different dashboards that are presented to users.
Finally, the penultimate section will describe how the solution has been validated, and the last section will provide some conclusive remarks, along with potential future work.
==State of the art==
Because healthcare services are very complex and life-critical, many works have tackled the design of healthcare management systems, aimed at monitoring metrics in order to detect undesirable behaviors that decrease their satisfaction or even threaten their safety.
Discussion on the design and implementation of the healthcare management system is not new. In the 2000s, Curtright ''et al.''<ref name="CurtrightStat00">{{cite journal |title=Strategic performance management: Development of a performance measurement system at the Mayo Clinic |journal=Journal of Healthcare Management |author=Curtwright, J.W.; Stolp-Smith, S.C.; Edell, E.S. |volume=45 |issue=1 |pages=58–68 |year=2000 |pmid=11066953}}</ref> described a system to monitor KPIs, summarizing them in a dashboard report, with a real-world application in the Mayo Clinic. Also, Griffith and King<ref name="GriffithChampion00">{{cite journal |title=Championship management for healthcare organizations |journal=Journal of Healthcare Management |author=Griffith, J.R. |volume=45 |issue=1 |pages=17–30 |year=2000 |pmid=11066948}}</ref> proposed to establish a “championship” where those healthcare systems with consistently good metrics would help improve decision making processes.
Some of these works explore the sensing technology that enable proposals. For instance, Ngai ''et al.''<ref name="NgaiDesign09">{{cite journal |title=Design of an RFID-based Healthcare Management System using an Information System Design Theory |journal=Information Systems Frontiers |author=Ngai. E.W.T.; Poon, J.K.L.; Suk, F.F.C.; Ng, C.C. |volume=11 |issue=4 |pages=405–417 |year=2009 |doi=10.1007/s10796-009-9154-3}}</ref> focus on how RFID technology can be applied for building a healthcare management system, yet it is only implemented in a quasi real-world setting. Ting ''et al.''<ref name="TingCritical11">{{cite journal |title=Critical elements and lessons learnt from the implementation of an RFID-enabled healthcare management system in a medical organization |journal=Journal of Medical Systems |author=Ting, S.L.; Kwok, S.K.; Tsang, A.H.; Lee, W.B. |volume=35 |issue=4 |pages=657–69 |year=2011 |doi=10.1007/s10916-009-9403-5}}</ref> also focus on the application of RFID technology to such a project, from the perspective of its preparation, implementation, and maintenance.
Some previous works have also tackled the design of intelligent healthcare management systems. Recently Jalal ''et al.''<ref name="JalalADepth17">{{cite journal |title=A Depth Video-based Human Detection and Activity Recognition using Multi-features and Embedded Hidden Markov Models for Health Care Monitoring Systems |journal=International Journal of Interactive Multimedia and Artificial Intelligence |author=Jalal, A.; Kamal, S.; Kim, D. |volume=4 |issue=4 |pages=54–62 |year=2017 |doi=10.9781/ijimai.2017.447}}</ref> have proposed an intelligent, depth video-based human activity recognition system to track elderly patients that could be used as part of a healthcare management and monitoring system. However, the paper does not explore this integration. Also, Ghamdi ''et al.''<ref name="GhamdiAnOnt16">{{cite journal |title=An ontology-based system to predict hospital readmission within 30 days |journal=International Journal of Healthcare Management |author=Ghamdi, H.A.; Alshammari, R.; Razzak, M.I. |volume=9 |issue=4 |pages=236–244 |year=2016 |doi=10.1080/20479700.2016.1139768}}</ref> have proposed an ontology-based system for prediction of patients’ readmission within 30 days so that those readmissions can be prevented.
Regarding the impact of data in a healthcare management system, the importance of data-driven approaches has been addressed by Bossen ''et al.''.<ref name="BossenChallenges16">{{cite journal |title=Challenges of Data-driven Healthcare Management: New Skills and Work |journal=19th ACM Conference on Computer-Supported Cooperative Work and Social Computing |author=Bossen, C.; Danholt, P.; Ubbesen, M.B. et al. |pages=5 |year=2016 |url=http://pure.au.dk/portal/da/publications/challenges-of-datadriven-healthcare-management-new-skills-and-work(fd56833b-db7b-44ed-b4fd-15882b382271).html}}</ref> Roberts ''et al.''<ref name="RobertsADesign16">{{cite journal |title=A design thinking framework for healthcare management and innovation |journal=Healthcare |author=Roberts, J.P.; Fisher, T.R.; Trowbridge, M.J.; Bent, C. |volume=4 |issue=1 |pages=11–14 |year=2016 |doi=10.1016/j.hjdsi.2015.12.002 |pmid=27001093}}</ref> have explored how to design healthcare management systems using a design thinking framework. Basole ''et al.''<ref name="BasoleHealthcare13">{{cite journal |title=Healthcare management through organizational simulation |journal=Decision Support Systems |author=Basole, R.C.; Bodner, D.A.; Rouse, W.B. |volume=55 |issue=2 |pages=552–563 |year=2013 |doi=10.1016/j.dss.2012.10.012}}</ref> propose a web-based game using organizational simulation for healthcare management. Zeng ''et al.''<ref name="ZengVIKOR13">{{cite journal |title=VIKOR method with enhanced accuracy for multiple criteria decision making in healthcare management |journal=Journal of Medical Systems |author=Zeng, Q.L.; Li, D.D.; Yang, Y.B. |volume=37 |issue=2 |pages=9908 |year=2013 |doi=10.1007/s10916-012-9908-1 |pmid=23377778}}</ref> have proposed an enhanced VIKOR method that can be used as a decision support tool in healthcare management contexts. A relevant work from Mohapatra<ref name="MohapatraUsing15">{{cite journal |title=Using integrated information system for patient benefits: A case study in India |journal=International Journal of Healthcare Management |author=Mohapatra, S. |volume=8 |issue=4 |pages=262–71 |year=2015 |doi=10.1179/2047971915Y.0000000007}}</ref> explores how a [[hospital information system]] is used for healthcare management, improving the KPIs; and a pilot has been conducted in Kalinga hospital (India), turning out to be beneficial for all stakeholders.
Some works have also explored how to increase patients’ satisfaction. For example, Fortenberry and McGoldrick<ref name="FortenberryInternal15">{{cite journal |title=Internal marketing: A pathway for healthcare facilities to improve the patient experience |journal=International Journal of Healthcare Management |author=Fortenberry Jr., J.L. |volume=9 |issue=1 |pages=28–33 |year=2015 |doi=10.1179/2047971915Y.0000000014}}</ref> suggest improving the patient experience via internal marketing efforts, while Minniti ''et al.''<ref name="MinnitiPatient16">{{cite book |chapter=Patient-Interactive Healthcare Management, a Model for Achieving Patient Experience Excellence |title=Healthcare Information Management Systems |author=Minniti, M.J.; Blue, T.R.; Freed, D.; Ballen, S. |publisher=Springer |pages=257–281 |year=2016 |isbn=9783319207650 |doi=10.1007/978-3-319-20765-0_16}}</ref> propose a model in which patient feedback is processed in real time, driving rapid cycle improvement.
To place this work into its context, what we have developed is a data-driven intelligent healthcare management system. Because of the volume and velocity of big data, we have used a big data architecture based on the one proposed by Baldominos ''et al.''<ref name="BaldominosAScal14">{{cite journal |title=A scalable machine learning online service for big data real-time analysis |journal=2014 IEEE Symposium on Computational Intelligence in Big Data |author=Baldominos, A.; Albacete, E.; Saez, Y.; Isasi, P. |year=2014 |doi=10.1109/CIBD.2014.7011537}}</ref>, but updating the tools to use Apache Spark for the sake of efficiency. Also, a pilot has been conducted to evaluate the performance of the proposed system.
==Overview of the architecture==
DataCare’s architecture comprises three main modules: the first oversees retrieving and aggregating the information generated in the health center or [[hospital]], the second processes and analyzes the data, and the third displays the valuable information in a dashboard, allowing the [[System integration|integration]] with external information systems.
Figure 1 depicts a broad overview of this architecture, while the following describes each of the modules in further detail.
[[File:Fig1 BaldominosIntJOfIMAI2018 4-7.png|800px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' DataCare’s architecture. The first column lists the data sources, which are retrieved and aggregated by AdvantCare software (second column). The last column shows the big data platform, which contains engines for the data processing and analytics module (yellow) and the data visualization module (purple).</blockquote>
|-
|}
|}
===Data retrieval and aggregation module===
Data retrieval is carried out by AdvantCare software, developed by Itas Solutions S.L. AdvantCare is a set of hardware and software tools designed to manage communications between patients and healthcare staff. Its core comprises three main systems: 1) Buslogic manages and aggregates the information of actions carried out by nondoctor personnel (nurses and nursing assistants), 2) AdvantControl monitors and controls the infrastructure, and 3) EasyConf manages voice communication.
In hospital rooms, different data acquisition systems are placed, which often consist of hardware devices connected to an IP network and include one of the following elements:
* sensors such as thermometers or noise or light sensors measuring some current value or status either in a continuous or periodic fashion and sending it to Buslogic or AdvantControl servers;
* assistance devices such as buttons or pull handlers that are actioned by the patients and transmit the assistance call to the Buslogic server;
* voice and video communication systems that send and receive information from other devices or from Jitsi (SIP Communicator), which are handled by EasyConf; or
* data acquisition systems operated by means of graphical user interfaces in devices such as tablets, e.g., surveys or other information systems.
In general terms, the information retrieved by AdvantCare belongs to one of the following:
* Planned tours: Healthcare personnel will periodically visit certain rooms or patients as a part of a pre-established plan. Data about how shifts are carried out is essential to evaluate assistance quality and the efficiency of nurses and nursing assistants.
* Assistance tasks: Nurses and nursing assistants must perform certain tasks as a response to an assistance call. It would be great to know in advance these tasks, so they can be monitored properly.
* Patient satisfaction: The most important service quality subjective metric is the patient's satisfaction, which is obtained by mean of surveys.
As said before, AdvantCare software comprises three systems, as well as communication/integration interfaces.
====Buslogic====
This software oversees communication with the assistance call systems. It also handles GestCare and MediaCare, which are the systems used for tasks planning, personnel work schedules, patient information, satisfaction surveys, and entertainment. Buslogic retrieves core business information about the assistance process, including alerts, waiting times to assist patients, and achieved assistance objectives.
====AdvantControl====
This software controls and monitors the infrastructure and automation functionalities, including the status of lights, doors, or the DataCare infrastructure itself. It provides real-time alerts about possible quality of service issues.
====EasyConf====
This software manages SIP Communicator and provides data about calls such as the origin, the destination, and the total call duration.
====Communication/Integration APIs====
Data can be retrieved from AdvantCare servers by means of SOAP web services, which get used in those requests that require high processing capacity, and are stateless. Also, the information can be accessed via a REST [[application programming interface]] (API), where the calls are performed through HTTP requests, and data is exchanged in JSON-serialized format. REST servers are placed in the software servers themselves (either Buslogic, AdvantControl or EasyConf), thus allowing real-time queries, as well as parameter modifications. Finally, a TELNET channel will allow asynchronous communication to broadcast events from the servers to the connected clients.
===Data processing and analysis module===
The Data Processing and Analysis module is part of a big data platform based on Apache Spark<ref name="ZahariaSpark10">{{cite journal |title=Spark: Cluster computing with working sets |journal=Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing |author=Zaharia, M.; Chowdhury, M.; Franklin, M.J. et al. |page=10 |year=2010}}</ref>, which allows an integrated environment for the development and exploitation of real time massive data analysis, outperforming other solutions such as Hadoop MapReduce or Storm, scaling out up to 10,000 nodes, providing fault tolerance<ref name="ZahariaResilient12">{{cite journal |title=Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing |journal=Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation |author=Zaharia, M.; Chowdhury, M.; Das, T. et al. |page=2 |year=2012}}</ref> and allowing queries using a [[SQL]]-like language.
As shown in Figure 1, this module comprises four different systems: Preprocessing Engine, Processing Engine, Big Data and Historic Data Warehouses, and Analytics Engine.
====Preprocessing Engine====
This system performs the ETL (extract-transform-load) processes for the AdvantCare data. It first communicates with AdvantCare using the available APIs to retrieve the data, which later is transformed into a suitable format to be introduced to the Processing Engine. Because of the metadata provided by AdvantCare, the information can be classified to ease its analysis. Normalized and consolidated data gets stored in MongoDB, the leading free and open-source document-oriented database, where collections store both data for real time analysis as well as historic data to support batch analysis to compute the evolution of different metrics in time.
====Processing Engine====
This system runs over the Spark computing cluster and oversees data consolidation processes for periodically aggregating data, also supporting the alert and recommendation subsystems.
====Data Warehouses====
Data filtered by the Preprocessing Engine and enriched by the Processing Engine gets stored in the Big Data Warehouse, responsible for storing real-time information. Additionally, the Historic Data Warehouse stores aggregated historic data, which gets used by the Analytics Engine to identify new trends or trend shifts for the different quality metrics.
====Analytics Engine====
This system runs the batch processes that will apply the statistical analysis methods, as well as machine learning algorithms over real-time big data. Along with the historic data, time series and ARIMA (autoregressive integrated moving average) techniques provide diagnosis of the temporal behavior of the model. This engine also implements a Bayes-based early alerts system (EAS) able to detect and predict a decrease in the service quality or efficiency metrics under a preset threshold, sending alerts in the form of push or email notifications.
===Data visualization module===
This module provides a reporting dashboard that receives information from the big data platform in real time and displays two panels. The first panel shows the main quality and efficiency metrics in real time, along with its evolution over time and the quality thresholds. The second panel provides the diagnoses computed by the Analytics Engine, as well as intelligent recommendations to prevent reaching undesired situations, such as metrics falling below acceptable thresholds.
The dashboard is implemented using the D3.js library, providing nice and intuitive visualizations.
==Preprocessing Engine==
The Preprocessing Engine performs the ETL process over the data, and this section describes how different data are extracted from the various sources, transformed and loaded as a part of this process.
===Extraction===
This engine extracts the assistance call data by polling the AdvantCare module every five minutes, retrieving all data generated by all the rooms. Data from planned tours are retrieved daily also by polling the REST API, while patients’ satisfaction surveys are loaded as CSV files.
===Transformation===
The Preprocessing Engine performs several transformation tasks so that data is in a suitable format to be handled by the Processing Engine and the Analytics Engine.
====Assistance task events====
Assistance task events get transformed into MongoDB documents, where each event is stored in a different document, and all of them belong to the events collection. When one event
status changes (e.g., from “activated” to “notified”), the document is updated to reflect these changes.
Figure 2 shows a sample document representing an event.
<pre>{
“_id”: ObjectId(“565c234f152aee26874d7a18”),
“full_event”: true,
“presence”: {
      “ev”: “EV PRES”,
      “ts”: ISODate(“2015-10-02T01:35:36.384Z”)
},
“area”: “Madrid”,
“notification” : {
      “ev”: “EV NOTIF”,
      “ts”: ISODate(“2015-10-02T01:32:21.984Z”)
},
“room_number”: “126”,
“location”: “PERA”,
“activation” : {
      “week”: 40,
      “weekday”: 5,
      “user”: “Anonimo”,
      “hour”: 1,
      “minute”: 31,
      “year”: 2015,
      “month”: 10,
      “day”: 2,
      “ev”: “EV PERA”,
      “ts”: ISODate(“2015-10-02T01:31:45.696Z”)
},
“room_letter”: “-”,
“center”: “Aravaca”,
“day_properties”: {
      “holiday_or_sunday”: true,
      “social_events”: true,
      “rain”: true,
      “extreme_heat”: true,
      “summer_vacation”: true,
      “holiday”: true,
      “weekend”: true,
      “friday_or_eve”: true
},
“floor”: “1”,
“times”: {
      “cancellation_notification”: 195,
      “used”: 194,
      “idle”: 36,
      “cancellation_activation”: 231,
      “total”: 230,
      “cancellation_presence”: 1
},
“hour_properties”: {
      “shift_change”: true,
      “shift”: “TARDE”,
      “sleeptime”: true,
      “nurse_count”: “8”,
      “dinnertime”: true,
      “lunchtime”: true
},
“cancellation”: {
      “ev”: “EV CPRES”,
      “remote”: true,
      “ts”: ISODate(“2015-10-02T01:35:37.248Z”)
}
}</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Sample JSON document representing an assistance task event in the
MongoDB events collection</blockquote>
|-
|}
|}
====Planned tours====
Data from planned tours are retrieved daily from AdvantCare using the REST API and are transformed to a MongoDB document in the ''shifts'' collection. A sample document is shown in Figure 3.
<pre>{
      “_id”: ObjectId(“569e50b1aa40450a027eb4ec”),
      “floor”: 3,
      “room”: 326,
      “date”: “1/10/15”,
      “hour”: “9:00:45”,
      “center_name”: “Aravaca”,
      “ts”: ISODate(“2015-10-01T09:00:45.000Z”),
      “shift_type”: “MAÑANA”
}
</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 3.''' Sample JSON document representing a shift in the MongoDB ''shifts''
collection</blockquote>
|-
|}
|}
====Satisfaction surveys====
As stated before, satisfaction data are loaded as CSV files. The Preprocessing Engine transforms it into a MongoDB document, which gets stored into the surveys collection. Figure 4 shows the structure of a sample document representing a satisfaction survey.
<pre>
      “_id” : ObjectId(“569e483daa404509a9796754”),
      “care_punctuation”: 2,
      “center”: “Aravaca”,
      “area”: “Madrid”,
      “floor”: 2,
      “night_punctuation”: 5,
      “morning_punctuation”: 4,
      “speed_punctuation”: 2,
      “price_quality_punctuation”: 2,
      “afternoon_punctuation”: 4,
      “year”: 2015,
      “month”: 11,
      “day”: 27,
      “date”: ISODate(“2015-11-27T00:00:00.000Z”),
      “global_punctuation”: 2,
      “id”: “Anonimo”,
      “room”: 221
}
</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 4.''' Sample JSON document representing a satisfaction survey in the
MongoDB ''surveys'' collection</blockquote>
|-
|}
|}
===Load===
Once data is transformed into MongoDB documents (BSON format), they are loaded into the corresponding MongoDB collection.
==Processing Engine==
The Processing Engine runs batch processes to consolidate data previously transformed by the Preprocessing Engine. This consolidation aggregates data to be handled by the Analytics Engine.
===Periodic data consolidation===
As the Processing Engine consolidates data periodically, two new collections are created, namely ''hourly'' and ''daily'', depending on the periodicity of the aggregated data. A sample document in the ''hourly'' collection is shown in Figure 5.
<pre>{
“_id”: ObjectId(“5665a51f0b1d4cf6f9728ae4”),
“center”: “Aravaca”,
“date”: {
      “week”: 40,
      “weekday”: 4,
      “hour”: 4,
      “ts”: ISODate(“2015-10-01T04:00:00.000Z”),
      “year”: 2015,
      “month”: 10,
      “day”: 1
},
“idle_time”: 67,
“wait_time”: {
  “floors”: {
      “1”: 0.6363636363636364,
      “2”: 29.5,
      “3”: 120,
      “4”: 0.5
  },
  “shifts”: {
      “NOCHE”: 23.72222222222222
  },
  “total”: 427,
  “types”: {
      “EV HABA”: 4,
      “EV PERA”: 359
  }
},
“used_time”: 344,
“activity”: {
  “floors”: {
      “1”: 11,
      “2”: 2,
      “3”: 3,
      “4”: 2
  },
  “shifts”: {
      “NOCHE”: 18
  },
  “total”: 18,
  “types”: {
      “EV HABA”: 17,
      “EV PERA”: 1
  }
}
}
</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 5.''' Sample JSON document representing consolidated data in the ''hourly''
collection</blockquote>
|-
|}
|}
This aggregation enables fast visualization of aggregated data, and it is key for the Analytics Engine to detect strange behaviors, fire alerts, or make recommendations. Both the ''hourly'' and ''daily'' collections are indexed by timestamp to enable fast filtering on consolidated data based on temporal queries.
===Real-time data processing===
To support the real-time dashboard, a process takes the data from the ''hourly'' collection and computes the average value for each KPI for different time periods: last day, last week, last month, and since the beginning. This allows comparison of the current value for a KPI with the average of past periods of time. A small fragment of a sample document in the ''realtime'' collection showing the aggregated data for the “activity” (number of events) KPI is shown in Figure 6.
<pre>{
      “_id” : ObjectId(“56850cb00b1d4cf6f9b4f2da”),
      “center”: “Aravaca”, “activity”: {
      “total”: [
      {“type”: “yesterday”, “hour”: 0, “value”: 106},
      {“type”: “lastweek”, “hour”: 0, “value”: 58},
      {“type”: “lastmonth”, “hour”: 0, “value”: 52},
      {“type”: “alltime”, “hour”: 0, “value”: 51.1489},
      {“type”: “yesterday”, “hour”: 1, “value”: 20},
      {“type”: “lastweek”, “hour”: 1, “value”: 33.571},
      ...
}
</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 6.''' Sample JSON document representing a fragment of the real-time
information for the KPI “activity” in the ''realtime'' collection</blockquote>
|-
|}
|}
==Analytics Engine==
The Analytics Engine is responsible for performing an intelligent analysis of the data to compute daily prediction, firing alerts when an undesired condition is detected (e.g., a certain metric falls under a specified threshold) and suggesting recommendations. This section describes these processes.
===Prediction system===
The prediction system takes the data contained in the events collection along with contextual data (weather, holidays, or labor dates, etc.) and predicts the estimated value for each KPI for every hour in the next day. This batch process is executed daily. The predicted values are stored in a document per each KPI, in the ''predictions'' collection in MongoDB. A sample document is shown in Figure 7.
<pre>{
      “_id”: ObjectId(“5683f978e4b0d671e427e1db”),
      “center”: “Aravaca”,
      “name”: “wait_time.total”,
      “date”: “1/10/15”,
      “predictions”: {
      “0”: 5637,
      “1”: 28557,
      “2”: 15711,
      “3”: 4133,
      ...
}
</pre>
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 7.''' Sample JSON document representing a fragment of the predictions for
the “wait time” KPI in the ''predictions'' collection</blockquote>
|-
|}
|}
The prediction algorithm analyzes behavioral patterns in the events data and applies these patterns to simulate future behavior. The algorithm proceeds as follows for each KPI:
Given ''N'' clusters, the algorithm computes a matrix ''M'' where each row is a cluster and each column is an hour, thus resulting in an ''Nx''24 matrix. The value in the position ''M<sub>i,j</sub>'' contains the average value of the KPI for events happening in the cluster ''i'' and in the ''j''<sup>th</sup> hour of the day:
==References==
{{Reflist|colwidth=30em}}
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. Grammar has been updated for clarity. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.


<!--Place all category tags here-->
==Sandbox begins below==
[[Category:LIMSwiki journal articles (added in 2018)‎]]
<div class="nonumtoc">__TOC__</div>
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on big data‎‎]]
[[Category:LIMSwiki journal articles on health informatics‎‎]]

Latest revision as of 21:57, 15 June 2024

Sandbox begins below