Big Data Security Solutions
In recent years, "digital economy" is a typical demonstration of the flexible use of data and becoming the driving force of economic development. In September 2015, the State Council of the People's Republic of China issued the "Action Plan on Promoting Big Data Development", which becomes a guiding document for China to promote the application and development of big data and empower the industry. However, big data faces lots of security challenges caused by data leakage and personal privacy exposure.
As one of the most important technologies on cyberspace security, cryptographic technology can be effectively applied to achieve data authenticity, integrity, confidentiality, and non-repudiation. It plays an important role as infrastructure in cyberspace security protection, and is the most effective, reliable, and economical ways to maintain cyber security. The "Cryptography Law of the People's Republic of China" was officially enforced on January 1, 2020, providing legal basis for the comprehensive promotion of cryptography applications.
The data status inside the big data platform includes data transmission, usage, and storage. Without data security protection, plain-text data will suffer security risks during its whole life cycle. Once the data leakage occurs, it will cause serious consequences.
The security risks of the data during the whole life cycle include:
Lack of Security Mechanism in Big Data Platform
In the early design stage of the Hadoop ecosystem, the security schemes for user authentication, access control, key management, and security auditing are inadequate.
Severe Risks of Private Data Leakage
The big data platform stores lots of of data, usually up to hundreds of terabytes. Facing such a large amount, sensitive information must be proactively protected.
Insufficient Traditional Security Protection Methods
Traditional cryptographic methods only respond to the encryption requirements for data transmission and storage, including TLS (Transport Layer Security) and TDE (Transparent Database Encryption). There are security vulnerabilities such as lack of application-level encryption, uncontrollable permissions and using unsecure cryptographic algorithm.
Lack of independent data security authority system
The data security authority system in the big data platform relies on its own authority control over users and administrators. Lack of independent control over sensitive data can easily lead to the abuse of high authority and breaches of single-layer control.
In response to the aforementioned data security risks, Sansec has built a big data full life cycle security system based on cryptographic technology after many years of research, trials and experiences.
This security system
- Thoroughly extend the cryptography protection of sensitive data to the application layer.
- Overcome the security drawbacks: "transmitting with TLS and storing with TDE"
- Effectively solve the retrieval and calculation inconvenience of ciphertext data
- Build an independent third-party data security authority system
Figure1. Big Data Encryption Scheme
This solution adopts the combination of security platform and cryptographic middleware to provide encryption functions for multiple application components and databases in the big data platform. The architecture is shown in the following figure:
Figure2. Technical Architecture of Big Data Encryption Solution
Security platform: located at the core of the entire cryptographic system, it is responsible for:
- Provide hardware-level security protection for the entire cryptographic framework (HSM)
- Key security management based on KMIP (Key Management Interoperability Protocol)
- Provide identity authentication, access management of ciphertext search engine and cryptographic middleware
Cryptographic middleware: The cryptographic middleware is installed in the form of an application-side software agent. This component is transparently embedded and deployed inside the application, and it realizes the sensitive data encryption and the key secure management through linkage with the security platform. It has several kinds of cryptographic algorithms, including FPE (Format-preserving encryption) and homomorphic encryption algorithms.
Application layer: Big data platforms mostly use various data processing components based on the Hadoop ecosystem, mainly including three parts:
- Data cleaning and message distribution (ETL and KAFKA, etc.)
- Data storage and processing (HDFS, Hive, Hbase, Spark and Flink, etc.)
- Analysis and presentation (BI).
This solution can support more types of application components, quick customization, and adaptation according to practical requirements.
The security platform includes HSM, key management system, ciphertext search engine and management terminal. These servers are connected to the big data production cluster through the Ethernet and independently deployment in a secure subnet. At the same time, cryptographic middleware is deployed in various clusters.
Figure3. Big Data Encryption Product Deployment
1.High-performance Data Encryption
Support self-developed cryptographic algorithm engines and optimized algorithm applications to achieve high-performance data encryption.
2.Key Security Management
By using HSM to store the root key securely and the KMIP protocol, we achieve the key centralized management of multiple clients.
3.Independent Access Control for Sensitive Data
Authority access control function under the cryptographic system. Under the permission framework of native Hadoop, realize the permission control for an independent third party's sensitive data.
4.Ciphertext Retrieval and Ciphertext Calculation
- Realize accurate, fuzzy and high frequency ciphertext query functions.
- Overcome the bottleneck that ciphertext can only be accurately searched by accurate condition and restricted fuzzy condition.
- Achieve full-scene, ciphertext retrieval capabilities that are indistinguishable from plaintext. The cryptographic engine has homomorphic algorithm functions such as Paillier and Elgamal algorithms.
Application Layer Data Encryption
The data at data platform is stored , processed and transported in the form of ciphertext, and only ca be decrypted as plaintext when required. This eliminates the problem of insufficient protection caused by low-level encryption protection.
Centralized Management and Distributed Encryption
It occupies abilities of central management of keys and authorities, protecting the sensitive data through big data cluster. It protects sensitive data through the high-performance computing ability of the big data cluster.
Support Multi-platform Applications
The solution supports CDH, Apache Hadoop, Huawei FusionInsight, H3C Dataengine and other big data platforms.
The solution has obtained the Commercial Cryptography Certification and meets the requirements of relevant policies and regulations by using Chinese cryptographic algorithms.
This solution is suitable for finance, government affairs, public security, energy, education, healthcare and enterprises industries.