Architecting privacy-centric data pipelines with generative AI

Praveen-Kodakandla *

Independent Researcher, Hyderabad, Telangana, India.
 
Review
International Journal of Science and Research Archive, 2024, 13(02), 1502-1512.
Article DOI: 10.30574/ijsra.2024.13.2.2591
Publication history: 
Received on 16 November 2024; revised on 24 December 2024; accepted on 29 December 2024
 
Abstract: 
Now that data is right at the heart of progress in AI and analytics, issues regarding privacy, compliance and wrongful use have reached their peak importance. This paper examines a new architectural idea that uses Generative AI to build data pipelines that keep privacy secure even as they are used. With differential privacy, federated learning and synthetic data generation as part of the system, unnecessary detail of sensitive information is not revealed. The framework was built to enforce privacy, including during ingestion, transformation, storage and model training, in a way that matches with laws like GDPR and HIPAA. Evaluations in healthcare and finance show that synthetic data can resemble true data without disclosing people’s real identities. Problems such as how models can be effective at generating data even though they might fail in areas of privacy and explanation are discussed to aid ethical implementation. It leads privacy laws to favor a forward-thinking approach to engineering data, supporting the growth of AI that is both safe and reliable for future users.
 
Keywords: 
Generative AI; Privacy-Centric Architecture; Synthetic Data; Federated Learning; Differential Privacy; Data Compliance; Secure Data Pipelines
 
Full text article in PDF: