Journal:Secure data outsourcing in presence of the inference problem: Issues and directions

From LIMSWiki
Revision as of 00:54, 6 April 2021 by Shawndouglas (talk | contribs)
Jump to navigationJump to search
Full article title Secure data outsourcing in presence of the inference problem: Issues and directions
Journal Journal of Information and Telecommunication
Author(s) Jebali, Adel; Sassi, Salma; Jemai, Akderrazak
Author affiliation(s) Tunis El Manar University, Jendouba University, Carthage University
Primary contact Email: adel dot jbali at fst dot utm dot tn
Year published 2020
Volume and issue 5(1)
Article # 16–34
DOI 10.1080/24751839.2020.1819633
ISSN 2475-1847
Distribution license Creative Commons Attribution 4.0 International
Website https://www.tandfonline.com/doi/full/10.1080/24751839.2020.1819633
Download https://www.tandfonline.com/doi/pdf/10.1080/24751839.2020.1819633 (PDF)

Abstract

With the emergence of the cloud computing paradigms, secure data outsourcing—moving some or most data to a third-party provider of secure data management services—has become one of the crucial challenges of modern computing. Data owners place their data among cloud service providers (CSPs) in order to increase flexibility, optimize storage, enhance data manipulation, and decrease processing time. Nevertheless, from a security point of view, access control proves to be a major concern in this situation seeing that the security policy of the data owner must be preserved when data is moved to the cloud. The lack of a comprehensive and systematic review on this topic in the available literature motivated us to review this research problem. Here, we discuss current and emerging research on privacy and confidentiality concerns in cloud-based data outsourcing and pinpoint potential issues that are still unresolved.

Keywords: cloud computing, data outsourcing, access control, inference leakage, secrecy and privacy

Introduction

In light of the increasing volume and variety of data from diverse sources—e.g., from health systems, social insurance systems, scientific and academic data systems, smart cities, and social networks—in-house storage and processing of large collections of data has becoming very costly. Hence, modern database systems have evolved from a centralized storage architecture to a distributed one, and with it the database- as-a-service paradigm has emerged. Data owners are increasingly moving their data to cloud service providers (CSPs) in order to increase flexibility, optimize storage, enhance data manipulation, and decrease processing times. Nonetheless, security concerns are widely recognized as a major barrier to cloud computing and other data outsourcing or database-as-a-service arrangements. Users remain reluctant to place their sensitive data in the cloud due to concerns about data disclosure to potentially untrusted external parties and other malicious parties.[1] Being processed and stored externally, data owners feel they have little control over their sensitive data, consequently putting data privacy at risk. From this perspective, access control is a major challenge seeing that the security policy of a data owner must be preserved when data is moved to the cloud. Access control policies are enforced by CSPs by keeping some sensitive data separated from each other.[2] However, some techniques like encryption are helpful to better guarantee the confidentiality of data.[3][4][5] The intent of encryption is to break sensitive associations among outsourced data by encrypting some attributes of that data. However, other data security concerns exist as well. Security breaches in distributed cloud databases could be exacerbated due to inference leakage, which occurs when a malicious actor uses information from a legitimate public response to discover more sensitive information, often from metadata. During the last two decades, researchers have devoted significant effort to enforcing access control policies and privacy protection requirements externally while maintaining a balance with data utility.[6][7][8][9][10][11][12]

In this paper, we review the current and emerging research on privacy and confidentiality concerns in data outsourcing and highlight research directions in this field. In summary, our systematic review addresses security concerns in cloud database systems for both communicating and non-communicating servers. We also survey this research field in relation to the inference problem and the unresolved problems that are introduced. Recognizing these challenges, this paper provides an overview of our proposed (because this is an ongoing work) solution. The crux of that solution is to firstly optimize data distribution without the need to query the workload, then partition the database in the cloud by taking into consideration access control policies and data utility, before finally running a query evaluation model on a big data framework to securely process distributed queries while retaining access control.

The reminder of this paper is organized as follows. The next section describes the literature review methodology adopted in this paper. After that, we review emerging research on data outsourcing in the context of privacy concerns and data utility. Then we discuss data outsourcing in relation to the inference problem. Afterwards, we introduce our proposed solution to implement a secure distributed cloud database on a big data framework (Apache Spark). We close with future research directions and challenges, as well as our final conclusions.

Literature review methodology

The methodology for literature review adopted in this paper follows the checklist proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.[13] It includes, as shown in Figure 1, three steps: input literature, processing steps, and review output.


Fig1 Jebali JofInfoTelec2020 5-1.jpg

Figure 1. Three stages of the literature review process.

Input literature

In this section we describe selected literature and their selection process. Firstly, our advance keyword research was conducted on the Google Scholar search engine, with a time filter from January 1, 1990 to December 31, 2019. Table 1 lists keywords used in different Google Scholar queries.

Table 1. Keywords used in our literature review search.
Keyword Number of viewed papers
Access control, Data outsourcing 43
Cloud computing, Authorization policies 78
Database, inference leakage 33
Confidentiality constraints, Cloud database 41
Secure data integration 11
Big data, Distributed query processing 39
Privacy, data publishing 24

The logical operator used between keywords during search was the "And" operator. Finally, from the 269 viewed papers, 43 articles were retained for review. Figure 2 shows their distribution by publication year.


Fig2 Jebali JofInfoTelec2020 5-1.jpg

Figure 2. Distribution of the 43 articles retained for review, by publication year.

Processing steps

During the review, papers were processed by identifying the problem, understanding the proposed solution process, and listing the important findings. We summarized and compared each paper with the papers associated with the similar problem. Then for each processed paper, three or four critical sentences was introduced to highlight the limits and specify potential directions that may be followed to enhance the proposed approaches. Based on our literature review, we classified the data outsourcing and access control papers into three categories, as shown in Figure 3. The first category of papers addresses the problem of secure data outsourcing when the servers in the cloud are unaware of each other. The second category addresses secure data outsourcing where interaction between servers exists and how this later can aggravate the situation. In the last category, we address data outsourcing in relation to the inference problem, as this later can exploit semantic constraints to bypass authorization policies at the cloud level.


Fig3 Jebali JofInfoTelec2020 5-1.jpg

Figure 3. Classification scheme of the literature retained for review.

Review output

The outcome of the methodological review process is presented later in this paper as our proposed solution. In the "Proposed solution" section, we present an incremental approach composed of three steps, each step treating one of the three problems mentioned in the previous subsection. We believe that our proposed solution is capable of providing good results compared to other reviewed approaches. Afterwards, we report other potential future research areas and challenges.

Preserving confidentiality in data outsourcing scenarios

There is a consensus in the security research community about the efficiency of data outsourcing for solving data management problems.[2] This consists of moving data from in-house storage to cloud databases, while also maintaining a balance between data confidentiality and utility (Figure 4).


Fig4 Jebali JofInfoTelec2020 5-1.jpg

Figure 4. A representation of secure data outsourcing.

CSPs are considered honest-but-curious: the database servers answer user queries correctly and do not manage stored data, but they attempt to intelligently analyze data and queries in order to learn as much information as possible from them.

Two powerful techniques have been proposed to enforce access control in cloud databases. The first technique exploits vertical database fragmentation to keep some sensitive data separated from each other. The second technique resorts to encryption to make a single attribute invisible to unauthorized users. These two techniques can be implemented using the following approaches:

  • Full outsourcing: The entire in-house database is moved to the cloud. It considers vertical database fragmentation to enforce confidentiality constraints with more than two attributes by keeping them separated from each other among distributed servers. Moreover, it resorts to encryption in order to hide confidentiality constraints with a single attribute.[6]
  • "Keep a few": This approach departs from encryption by involving the owner side. The attributes to be encrypted are stored in plain text on the owner side since this later is considered a trusted part. The rest of the database is distributed among servers while maintaining data confidentiality through vertical fragmentation.[11]

Aside from the fact that encrypting data for storing them externally carries a considerable cost[2], previous studies have primarily concentrated on non-communicating cloud servers.[6][10][11][12][14] In this situation, servers are unaware of each other and do not exchange any information. When a master node receives a query, it decomposes it and processes it locally without the need to perform a join query. In recent years, researchers have studied the effect of communication between servers on query execution, and secure query evaluation strategies have been elaborated.[4][15][16]


(Bkakria et al., 2013a; De Capitani di Vimercati et al., 2016; di Vimercati et al., 2013) In the rest of this section, we discuss current and emerging research efforts in each of the mentioned architectures.



References

  1. Xu, X.; Xiong, L.; Liu, J. (2015). "Database Fragmentation with Confidentiality Constraints: A Graph Search Approach". Proceedings of the 5th ACM Conference on Data and Application Security and Privacy: 263–70. doi:10.1145/2699026.2699121. 
  2. 2.0 2.1 2.2 Samarati, P.; di Vimarcati, S.D.C. (2010). "Data protection in outsourcing scenarios: Issues and directions". Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security: 1–14. doi:10.1145/1755688.1755690. 
  3. Biskup, J.; Preuß, M. (2013). "Database Fragmentation with Encryption: Under Which Semantic Constraints and A Priori Knowledge Can Two Keep a Secret?". Data and Applications Security and Privacy XXVIII: 17–32. doi:10.1007/978-3-642-39256-6_2. 
  4. 4.0 4.1 Bkakria, A.; Cuppens, F.; Cuppens-Boulahia, N. et al. (2013). "Preserving Multi-relational Outsourced Databases Confidentiality using Fragmentation and Encryption". JoWUA 4 (2): 39–62. doi:10.22667/JOWUA.2013.06.31.039. 
  5. Ciriani, V.; di Vimaercati, S.D.C.; Foresti, S. et al. (2007). "Fragmentation and Encryption to Enforce Privacy in Data Storage". Computer Security - ESORICS 2007: 171–86. doi:10.1007/978-3-540-74835-9_12. 
  6. 6.0 6.1 6.2 Aggarwal, G.; Bawa, M.; Ganesan, P. et al. (2005). "Two Can Keep a Secret: A Distributed Architecture for Secure Database Services". Second Biennial Conference on Innovative Data Systems Research: 1–14. http://ilpubs.stanford.edu:8090/659/. 
  7. Alsirhani, A.; Bodorik, P. Sampalli, S. (2017). "Improving Database Security in Cloud Computing by Fragmentation of Data". Proceedings of the 2017 International Conference on Computer and Applications: 43–49. doi:10.1109/COMAPP.2017.8079737. 
  8. Bollwein, F.; Wiese, L. (2017). "Separation of Duties for Multiple Relations in Cloud Databases as an Optimization Problem". Proceedings of the 21st International Database Engineering & Applications Symposium: 98–107. doi:10.1145/3105831.3105873. 
  9. Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2009). "Fragmentation Design for Efficient Query Execution over Sensitive Distributed Databases". Proceedings of the 29th IEEE International Conference on Distributed Computing Systems: 32–39. doi:10.1109/ICDCS.2009.52. 
  10. 10.0 10.1 Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2009). "Fragmentation Design for Efficient Query Execution over Sensitive Distributed Databases". Proceedings of the 29th IEEE International Conference on Distributed Computing Systems: 32–39. doi:10.1109/ICDCS.2009.52. 
  11. 11.0 11.1 11.2 Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2009). "Keep a Few: Outsourcing Data While Maintaining Confidentiality". Computing Security - ESORICS 2009: 440–55. doi:10.1007/978-3-642-04444-1_27. 
  12. 12.0 12.1 di Vimercati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2014). "Fragmentation in Presence of Data Dependencies". IEEE Transactions on Dependable and Secure Computing 11 (6): 510–23. doi:10.1109/TDSC.2013.2295798. 
  13. Moher, D.; Liberti, A.; Tetzlaff, J. et al. (2009). "Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement". PLoS One 6 (7): e1000097. doi:10.1371/journal.pmed.1000097. PMC PMC2707599. PMID 19621072. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2707599. 
  14. di Vimercati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2010). "Fragments and loose associations: respecting privacy in data publishing". Proceedings of the VLDB Endowment 3 (1–2). doi:10.14778/1920841.1921009. 
  15. Bkakria, A.; Cuppens, F.; Cuppens-Boulahia, N. et al. (2013). "Confidentiality-Preserving Query Execution of Fragmented Outsourced Data". Proceeding of ICT-EurAsia 2013: Information and Communication Technology 4 (2): 426-440. doi:10.1007/978-3-642-36818-9_47. 
  16. Bkakria, A.; Cuppens, F.; Cuppens-Boulahia, N. et al. (2013). "Confidentiality-Preserving Query Execution of Fragmented Outsourced Data". Proceeding of ICT-EurAsia 2013: Information and Communication Technology 4 (2): 426-440. doi:10.1007/978-3-642-36818-9_47. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design.