A CAE model-based secure deduplication method

Cloud storage services are widely used due to their convenience and flexibility. However, the presence of a large amount of duplicate data in the cloud imposes a significant storage burden and increases the risk of privacy breaches. Random Message Locked Encryption (R-MLE) is an effective tool for s...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 15; no. 1; pp. 24605 - 11
Main Authors	Wang, Chunbo, Zhang, Guoying, Qi, Hui, Chen, Bin
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 09.07.2025 Nature Publishing Group Nature Portfolio
Subjects	639/705/117 639/705/258 Access control Algorithms Artificial intelligence Big Data Cloud computing Computer applications Efficiency Humanities and Social Sciences multidisciplinary Neural networks Science Science (multidisciplinary) Semantics Storage requirements
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cloud storage services are widely used due to their convenience and flexibility. However, the presence of a large amount of duplicate data in the cloud imposes a significant storage burden and increases the risk of privacy breaches. Random Message Locked Encryption (R-MLE) is an effective tool for secure deduplication of cloud data. However, since it is based on bilinear mapping, the comparison of fingerprint tags during deduplication results in substantial computational overhead. To address this issue, we propose a secure deduplication method based on an Autoencoder model. The summary tags generated by the model are used to reduce the number of fingerprint tag comparisons, thereby improving deduplication efficiency. Building on this, this paper further introduces a secure deduplication method based on a Convolutional Autoencoder (CAE) model, which utilizes convolution and pooling operations to reduce the number of parameters in the Convolutional Autoencoder model, thereby decreasing computational and storage overhead. Additionally, it effectively mitigates the problem of overfitting. Experiments conducted on the source code dataset indicate that the proposed approach yields superior deduplication efficiency, reduced model storage requirements, and a more uniform distribution.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-09788-0