Journal:Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services
Full article title | Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services |
---|---|
Journal | Sensors |
Author(s) |
Pinheiro, Alexandre; Canedo, Edna Dias; De Sousa Junior, Rafael Timoteo;, De Oliveira Albuquerque, Robson; Villalba, Luis Javier Garcia; Kim, Tai-Hoon |
Author affiliation(s) | University of Brasília, Universidad Complutense de Madrid, Sungshin Women’s University |
Primary contact | Email: javiergv at fdi dot ucm dot es |
Year published | 2018 |
Volume and issue | 18(3) |
Page(s) | 753 |
DOI | 10.3390/s18030753 |
ISSN | 1999-5903 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.mdpi.com/1424-8220/18/3/753/htm |
Download | https://www.mdpi.com/1424-8220/18/3/753/pdf (PDF) |
Abstract
Cloud computing is considered an interesting paradigm due to its scalability, availability, and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verification regarding data preservation. Due to these requirements, integrity, availability, privacy, and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance.
Keywords: cloud computing; cloud data storage; proof of integrity; services monitoring; trust
Introduction
Companies, institutions, and government agencies generate large amounts of digital information every day, such as documents, projects, and transaction records. For legal or business reasons, this information needs to remain stored for long periods of time.
Due to the popularization of cloud computing (CC), its cost reduction, and an ever-growing supply of cloud storage services (CSS), many companies are choosing these services to store their sensitive information. Cloud computing’s advantages include scalability, availability, and virtually unlimited storage capacity. However, it is a challenge to build safe storage services, mainly when these services run in public cloud infrastructures and are managed by service providers under conditions that are not fully trustworthy.
Data owners often need to keep their stored data for a long time, though it is possible that they rarely will have to access it. Furthermore, some data could be stored in a CSS without its owner having to keep the original copy. However, in these situations, the storage service reliability must be considered, because even the best services sometimes fail[1], and since the loss of these data or their leakage can bring significant business or legal damage, the issues of integrity, availability, privacy, and trust need to be answered before the adoption of the CSS.
Data integrity is defined as the accuracy and consistency of stored data. These two properties indicate that the data have not changed and have not been broken.[2] Moreover, besides data integrity, a considerable number of organizations consider both confidentiality and privacy requirements as the main obstacles to the acceptance of public cloud services.[2] Hence, to fulfill these requirements, a CSS should provide mechanisms to confirm data integrity, while still ensuring user privacy and data confidentiality.
Considering these requirements, this paper proposes an architecture for periodically monitoring both the information stored in the cloud infrastructure and the contracted storage service behavior. The architecture is based on the operation of a proposed protocol that uses a third party and applies trust and encryption means to verify both the existence and the integrity of data stored in the cloud infrastructure without compromising these data’s confidentiality. Furthermore, the protocol was designed to minimize the overload that it imposes on the cloud storage service.
To validate the proposed architecture and its supporting protocol, a corresponding prototype was developed and implemented. Then, this prototype was submitted to testing and simulations by means of which we verified its functional characteristics and its performance.
This paper addresses all of this and is structured as follows. The "Background" section reviews the concepts and definitions of cloud computing, encryption, and trust, then we present works related to data integrity in the cloud. Then we describe the proposed architecture, while its implementation is discussed in the following section. Afterwards, the "Experimental validation" section is devoted to the experiments and respective results, while the main differences between related works and the proposed architecture follow it. The paper ends with our conclusions and outlines future works.
Background
Cloud computing (CC) is a model that allows convenient and on-demand network access to a shared set of configurable computational resources. These resources can be quickly provisioned with minimal management effort and without the service provider’s intervention.[3] Since it constitutes a flexible and reliable computing environment, CC is being gradually adopted in different business scenarios using several available supporting solutions.
Relying on different technologies (e.g., virtualization, utility computing, grid computing, and service-oriented architecture) and proposing a new computational services paradigm, CC requires high-level management activities, which include: (a) selection of the service provider, (b) selection of virtualization technology, (c) virtual resources’ allocation, and (d) monitoring and auditing procedures to comply with service level agreements (SLAs).[4]
A particular CC solution comprises several components such as client modules, data centers, and distributed servers. These elements form the three parts of the cloud solution[4][5], each one with a specific purpose and specific role in delivering working applications based on the cloud.
The CC architecture is basically structured into two main layers: a lower and a higher resource layer, each one dealing with a particular aspect of making application resources available. The lower layer comprises the physical infrastructure, and it is responsible for the virtualization of storage and computational resources. The higher layer provides specific services, such as software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Each of these layers may have its own management and monitoring systems, independent of one another, thus improving flexibility, reuse, and scalability.[6][7]
Since CC provides access to a shared pool of configurable computing resources, its provisioning mode can be classified by the intended access methods and coverage of services’ availability, which yields different models of CC services’ deployment, ranging from private clouds, in which resources are shared within an owner organization, to public clouds, in which cloud providers possess the resources that are consumed by other organizations based on contracts, but also including hybrid cloud environments and community clouds.[8]
The central concept of this paper’s proposal is the verification by the cloud service user that a particular property, in our case the integrity of files, is fulfilled by the cloud service provider, regardless of the mode of a service's provision and deployment, either in the form of private, public, or hybrid clouds.
The verification of file integrity is performed by means of a protocol that uses contemporaneous computational encryption, specifically public key encryption and hashes, which together provide authentication of messages and compact integrity verification sequences that are unequivocally bound to each verified file (signed file hashes).
This proposed protocol is conceived to allow the user of cloud services to check whether the services provider is indeed acting as expected in regard to maintaining the integrity of the user files, which corresponds to the idea of the user monitoring the provider to acquire and maintain trust in the provider behavior in this circumstance.
Some specific aspects of trust, encryption, and hashes that are considered as useful for this paper’s comprehension are briefly reviewed in the subsections below.
Trust
Trust is a common reasoning process for humans to face the world’s complexities and to think sensibly about everyday life possibilities. Trust is strongly linked to expectations about something, which implies a degree of uncertainty and optimism. It is the choice of putting something in another’s hands, considering the other’s behavior to determine how to act in a given situation.[9]
Trust can be considered as a particular level of subjective probability in which an agent believes that another agent will perform a certain action, which is subject to monitoring.[10] Furthermore, trust can be represented as an opinion so that situations involving trust and trust relationships can be modeled. Thus, positive and negative feedback on a specific entity can be accumulated and used to calculate its future behavior.[11] This opinion may result from direct experience or may come from a recommendation from another entity.[12]
According to Adnane et al.[13] and De Sousa, Jr. and Puttini[14], trust, trust models, and trust management have been the subject of various research works demonstrating that the conceptualization of computational trust allows a computing entity to reason with and about trust, and to make decisions regarding other entities. Indeed, since the initial works on the subject by the likes of Marsh[9] and Yahalom et al.[15], computational trust is recognized as an important aspect for decision-making in distributed and auto-organized applications, and its expression allows formalizing and clarifying trust aspects in communication protocols.
Yahalom et al.[15], for instance, find the notion of "trust" to mean that if an entity A trusts an entity B in some respect, this means that A believes that B will behave in a certain way and will perform some action under certain specific circumstances. This leads to the possibility of conducting a protocol operation (action) that is evaluated by the entity A on the basis of what A knows about the entity B and the circumstances of the operation. This accurately corresponds to the protocol relationship established between a CC service consumer and a CC service provider, which is the focus of the present paper.
Thus, in our proposal, trust is used in the context of a cloud computing service as a means to verify specific actions performed by the participating entities in this context. Using the definitions by Yahalom et al.[15] and Grandison and Sloman[16], we can state that in a CC service, one entity, the CC service consumer, may trust another one, the CC service provider, for actions such as providing identification to the other entity, not interfering in the other entity sessions, neither passively by reading secret messages, nor actively by impersonating other parties. Furthermore, the CC service provider will grant access to resources or services, as well as make decisions on behalf of the other entity, with respect to a resource or service that this entity owns or controls.
In these trust verifications, it is required to ensure some properties such as the secrecy and integrity of stored files, authentication of message sources, and the freshness of the presented proofs, avoiding proof replays. It is required as well to present reduced overhead in cloud computing protocol operations and services. In our proposal, these requirements are fulfilled with modern robust public key encryption involving hashes, as discussed hereafter, considering that these means are adequately and easily deployed in current CC service provider and consumer situations.
Encryption
Encryption is a process of converting (or ciphering) a plaintext message into a ciphertext that can be deciphered back to the original message. An encryption algorithm, along with one or more keys, is used either in the encryption or the decryption operation.
The number, type, and length of the keys used depend on the encryption algorithm, the choice of which is a consequence of the security level needed. In conventional symmetric encryption, a single key is used, and with this key the sender can encrypt a message, and a recipient can decrypt the ciphered message. However, key security becomes an issue since at least two copies of the key exist, one at the sender and another at the recipient.
Oppositely, in asymmetric encryption, the encryption key and the decryption key are correlated, but different, one being a public key of the recipient that can be used by the sender to encrypt the message, while the other related key is a recipient private key allowing the recipient to decrypt the message.[17] The private key can be used by its owner to send messages that are considered signed by the owner since every entity can use the corresponding public key to verify if a message comes from the owner of the private key.
These properties of asymmetric encryption are useful for the trust verifications that in our proposal are designed for checking the integrity of files stored in cloud services. Indeed, our proposal uses encryption of hashes as the principal means to fulfill the trust requirements in these operations.
Hashes
A hash value, hash code, or simply hash is the result of applying a mathematical one-way function that takes a string of any size as the data source and returns a relatively small and fixed-length string. A modification of any bit in the source string dramatically alters the resulting hash code after executing the hash function.[18] These one-way functions are designed to make it very difficult to deduce from a hash value the source string that was used to calculate this hash. Furthermore, it is required that it should be extremely difficult to find two source strings whose hash codes are the same, i.e., a hash collision.
Over the years, many cryptographic algorithms have been developed for hashes, for which the Message-Digest algorithm 5 (MD5) and Secure Hash Algorithm (SHA) family of algorithms can be highlighted, due to the wide use of these algorithms in the most diverse information security software packages. MD5 is a very fast cryptographic algorithm that receives as input a random-sized message and produces as output a fixed length hash with 128 bits.[19]
The SHA family is composed of algorithms named as SHA-1, SHA-256, and SHA-512, which differ regarding the respective security level and the output hash length, that can vary from 160 to 512 bits. The SHA-3 algorithm was chosen by the National Institute of Standards and Technology (NIST) in an international competition that aimed to replace all of the SHA family of algorithms.[20]
The Blake2 algorithm is an improved version of the hash cryptographic algorithm called “Blake,” a finalist of the SHA-3 selection competition that is optimized for software applications. Blake2 can generate hash values from eight to 512 bits. The main Blake2 characteristics are: the memory consumption reduction by 32% compared to other SHA algorithms, the processing speed being greater than that of MD5 on 64-bit platforms, direct parallelism support without overhead, and faster hash generation on multicore processors.[21] In our proposed validation prototype, the Blake2 algorithm was considered as a good choice due to its combined characteristics of speed, security, and simplicity.
Related work
This section presents a brief review of papers regarding the themes of computational trust applications, privacy guarantees, data integrity verification, services management, and monitoring, all of them applicable to cloud computing environments.
Computational trust applications
Depending on the used approach, trust can either be directly measured by one entity based on its own experiences or can be evaluated through the use of third-party opinions and recommendations.
Tahta et al.[22] propose a trust model for peer-to-peer (P2P) systems called “GenTrust,” in which genetic algorithms are used to recognize several types of attacks and to help a well-behaved node find other trusted nodes. GenTrust uses extracted features (number of interactions, number of successful interactions, the average size of downloaded files, the average time between two interactions, etc.) that result from a node’s own interactions. However, when there is not enough information for a node to consider, recommendations from other nodes are used. Then, the genetic algorithm selects which characteristics, when evaluated together and in a given context, present the best result to identify the most trustful nodes.
Another approach is presented by Gholami and Arani[23], proposing a trust model named “Turnaround_Trust” aimed at helping clients to find cloud services that can serve them based on service quality requirements. The Turnaround_Trust model considers service quality criteria such as cost, response time, bandwidth, and processor speed, to select the most trustful service among those available in the cloud.
Our approach in this paper differs from these related works since we use trust metrics that are directly related to the stored files in CC and that are paired to the cryptographic proof of these files' integrity.
Canedo[24] bases the proposed trust model on concepts such as direct trust, trust recommendation, indirect trust, situational trust, and reputation to allow a node selection for trustful file exchange in a private cloud. For the sake of trust calculation, the processing capacity of a node, its storage capacity, and operating system—as well as the link capacity—are adopted as trust metrics that compose a set representative of the node availability. Concerning reputation, the calculation considers the satisfactory and unsatisfactory experiences with the referred node informed by other nodes. The proposed model calculates trust and reputation scores for a node based on previously-collected information, i.e., either information requested from other nodes in the network or information that is directly collected from interactions with the node being evaluated.
In the present paper, our approach is applied to both private and public CC services, with the development of the necessary architecture and secure protocol for trust verification regarding the integrity of files in CC services.
Integrity verification and privacy guarantee
In their effort to guarantee the integrity of data stored in cloud services, many research works present proposals in the domain analyzed in this paper.
A protocol is proposed by Juels and Kaliski, Jr.[25] to enable a cloud storage service to prove that a file subjected to verification is not corrupted. To that end, a formal and secure definition of proof of retrievability is presented, and the paper introduces the use of sentinels, which are special blocks hidden in the original file prior to encryption to be afterward used to challenge the cloud service. Based on Juels and Kaliski, Jr.'s work[25], Kumar and Saxena[26] present another scheme where one does not need to encrypt all the data, but only a few bits per data block.
George and Sabitha[27] propose a bipartite solution to improve privacy and integrity. The first part, called “anonymization,” initially recognizes fields in records that could identify their owners and then uses techniques such as generalization, suppression, obfuscation, and the addition of anonymous records to enhance data privacy. The second part, called “integrity checking,” uses public and private key encryption techniques to generate a tag for each record on a table. Both parts are executed with the help of a trusted third party called the “enclave” that saves all generated data that will be used by the de-anonymization and integrity verification processes.
An encryption-based integrity verification method is proposed by Kavuri et al.[28] The proposed method uses a new hash algorithm, the dynamic user policy-based hash algorithm, to calculate hashes of data for each authorized cloud user. For data encryption, an improved attribute-based encryption algorithm is used. The encrypted data and corresponding hash value are saved separately in cloud storage. Data integrity can be verified only by an authorized user and requires the retrieval of all the encrypted data and corresponding hash.
Al-Jaberi and Zainal[29] provide another proposal to simultaneously achieve data integrity verification and privacy-preserving, which proposes the use of two encryption algorithms for every data upload or download transaction. The Advanced Encryption Standard (AES) algorithm is used to encrypt client data, which will be saved in a CSS, and an RSA-based partial homomorphic encryption technique is used to encrypt AES encryption keys that will be saved in a third-party entity together with a hash of the file. Data integrity is verified only when a client downloads one file.
Kai et al.[30] propose a data integrity auditing protocol to allow the fast identification of corrupted data using homomorphic cipher-text verification and a recoverable coding methodology. Checking the integrity of outsourced data is done periodically by either a trusted or untrusted entity. The adopted methodology aims at reducing the total auditing time and the communication cost.
References
- ↑ Tandel, S.T.; Shah, V.K.; Hiranwal, S. (2013). "An implementation of effective XML based dynamic data integrity audit service in cloud". International Journal of Societal Applications of Computer Science 2 (8): 449–553. https://web.archive.org/web/20150118081656/http://ijsacs.org/previous.html.
- ↑ 2.0 2.1 Dabas, P.; Wadhwa, D. (2014). "A Recapitulation of Data Auditing Approaches for Cloud Data". International Journal of Computer Applications Technology and Research 3 (6): 329–32. doi:10.7753/IJCATR0306.1002. https://ijcat.com/archieve/volume3/issue6/ijcatr03061002.
- ↑ Mell, P.; Grance, T. (September 2011). "The NIST Definition of Cloud Computing". Computer Security Resource Center. https://csrc.nist.gov/publications/detail/sp/800-145/final.
- ↑ 4.0 4.1 Miller, M. (2008). Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online. Que Publishing. ISBN 9780789738035.
- ↑ Velte, T.; Velte, A.; Elsenpeter, R.C. (2009). Cloud Computing: A Practical Approach. McGraw-Hill Education. ISBN 9780071626941.
- ↑ Zhou, M.; Zhang, R.; Zeng, D.; Qian, W. (2010). "Services in the Cloud Computing era: A survey". Proceedings from the 4th International Universal Communication Symposium: 40–46. doi:10.1109/IUCS.2010.5666772.
- ↑ Jing, X.; Jian-Jun, Z. (2010). "A Brief Survey on the Security Model of Cloud Computing". Proceedings from the Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science: 475–8. doi:10.1109/DCABES.2010.103.
- ↑ Mell, P.; Grance, T. (October 2009). "The NIST Definition of Cloud Computing" (PDF). https://www.nist.gov/sites/default/files/documents/itl/cloud/cloud-def-v15.pdf.
- ↑ 9.0 9.1 Marsh, S.P. (April 1994). "Formalising Trust as a Computational Concept" (PDF). University of Stirling. http://stephenmarsh.wdfiles.com/local--files/start/TrustThesis.pdf.
- ↑ Gambetta, D. (1990). "Can We Trust Trust?". In Gambetta, D. (PDF). Trust: Making and Breaking Cooperative Relations (2008 Scanned Digital Copy). ISBN 0631155066. https://www.nuffield.ox.ac.uk/media/1779/gambetta-trust_making-and-breaking-cooperative-relations.pdf.
- ↑ Jøsang, A.; Knapskog, S.J. (2011). "A Metric for Trusted Systems" (PDF). Proceedings from the 21st National Information Systems Security Conference. https://csrc.nist.gov/csrc/media/publications/conference-paper/1998/10/08/proceedings-of-the-21st-nissc-1998/documents/papera2.pdf.
- ↑ Victor, P.; De Cock, M.; Cornelis, C. (2011). "Trust and Recommendations". In Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B.. Recommender Systems Handbook. Springer. pp. 645–75. ISBN 9780387858197.
- ↑ Adnane, A.; Bidan, C.; de Sousa Júnior, R.T. (2013). "Trust-based security for the OLSR routing protocol". Computer Communications 36 (10–11): 1159-71. doi:10.1016/j.comcom.2013.04.003.
- ↑ De Sousa Jr., R.T.; Puttini, R.S. (2010). "Trust Management in Ad Hoc Networks". In Yan, Z.. Trust Modeling and Management in Digital Environments: From Social Concept to System Development. IGI Global. pp. 224–49. ISBN 9781615206827.
- ↑ 15.0 15.1 15.2 Yahalom, R.; Klein, B.; Beth, T. (1993). "Trust relationships in secure systems-a distributed authentication perspective". Proceedings from the 1993 IEEE Computer Society Symposium on Research in Security and Privacy: 150–64. doi:10.1109/RISP.1993.287635.
- ↑ Grandison, T.; Sloman, M. (2000). "A survey of trust in internet applications". IEEE Communications Surveys & Tutorials 3 (4): 2–16. doi:10.1109/COMST.2000.5340804.
- ↑ Bellare, M.; Boldyreva, A.; Micali, S. (2000). "Public-Key Encryption in a Multi-user Setting: Security Proofs and Improvements". Proceedings from Advances in Cryptology — EUROCRYPT 2000: 259–74. doi:10.1109/COMST.2000.5340804.
- ↑ Bose, R. (2008). Information Theory, Coding and Cryptography (2nd ed.). Mcgraw Hill Education. pp. 297–8. ISBN 9780070669017.
- ↑ Rivest, R. (April 1992). "The MD5 Message-Digest Algorithm". ietf.org. https://tools.ietf.org/html/rfc1321. Retrieved 25 June 2016.
- ↑ Dworkin, M.J. (4 August 2015). "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions". NIST. https://www.nist.gov/publications/sha-3-standard-permutation-based-hash-and-extendable-output-functions.
- ↑ Aumasson, J.-P.; Neves, S.; W.-O.; Winnerlein, C. (2013). "BLAKE2: Simpler, Smaller, Fast as MD5". Proceedings from the 2013 International Conference on Applied Cryptography and Network Security: 119–35. doi:10.1007/978-3-642-38980-1_8.
- ↑ Tahta, U.E.; Sen, S.; Can, A.B. (2015). "GenTrust: A genetic trust management model for peer-to-peer systems". Applied Soft Computing 34: 693–704. doi:10.1016/j.asoc.2015.04.053.
- ↑ Gholami, A.; Arani, M.G. (2015). "A Trust Model Based on Quality of Service in Cloud Computing Environment". International Journal of Database Theory and Application 8 (5): 161–70. https://pdfs.semanticscholar.org/487e/11b3605276b5ff66de363d4e735bcdd740c3.pdf?_ga=2.219348827.37751313.1553622532-1472248397.1551840079.
- ↑ Canedo, E.D. (30 January 2013). "Modelo de confiança para a troca de arquivos em uma nuvem privada - Tese (Doutorado em Engenharia Elétrica)". Universidade de Brasília. http://repositorio.unb.br/handle/10482/11987.
- ↑ 25.0 25.1 Juels, A.; Kaliski, Jr., B.S. (2007). "PORs: Proofs of retrievability for large files". Proceedings of the 14th ACM Conference on Computer and Communications Security: 584–97. doi:10.1145/1315245.1315317.
- ↑ Kumar, R.S.; Saxena, A. (2011). "Data integrity proofs in cloud storage". Proceedings of the Third International Conference on Communication Systems and Networks: 1–4. doi:10.1109/COMSNETS.2011.5716422.
- ↑ George, R.S.; Sabitha, S. (2013). "Data anonymization and integrity checking in cloud computing". Proceedings of the Fourth International Conference on Computing, Communications and Networking Technologies: 1–5. doi:10.1109/ICCCNT.2013.6726813.
- ↑ Kavuri, S.K.S.V.A.; Kancherla, G.R.; Bobba, B.R. (2014). "Data authentication and integrity verification techniques for trusted/untrusted cloud servers". Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics: 2590-2596. doi:10.1109/ICACCI.2014.6968657.
- ↑ Al-Jaberi, M.F.; Zainal, A. (2014). "Data integrity and privacy model in cloud computing". Proceedings of the 2014 International Symposium on Biometrics and Security Technologies: 280-284. doi:10.1109/ISBAST.2014.7013135.
- ↑ Kai, H.; Chuanhe, H.; Jinhai, W. et al. (2013). "An Efficient Public Batch Auditing Protocol for Data Security in Multi-cloud Storage". Proceedings of the 8th ChinaGrid Annual Conference: 51-56. doi:10.1109/ChinaGrid.2013.13.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added.