Architecting data lake-houses in the cloud: Best practices and future directions

Aravind Nuthalapati *

Microsoft, USA.
 
Review
International Journal of Science and Research Archive, 2024, 12(02), 1902–1909.
Article DOI: 10.30574/ijsra.2024.12.2.1466
Publication history: 
Received on 30 June 2024; revised on 08 August 2024; accepted on 10 August 2024
 
Abstract: 
As the volume of data has grown exponentially, what this means for organisations are significant opportunities and challenges. Three significant challenges face traditional data warehouses when it comes to big-data volume, velocity and variety. Data lakes was a step in the evolution to resolve these challenges, but many times data quality and governance aspects has unmet expectations with traditional Data Lake solutions. Data lakehouses step in as a hybrid approach that combines the freedom of cloud data lakes with rigorously managed warehouses, consolidating operational and analytics workloads on one platform. The study discusses the design and implementation of modern cloud-based data lakehouses, elaborating vital aspects such as storage layer, metadata management system along with its access control policy enforcement. Data lakehouses take advantage of cloud technologies to provide scalable, cost-effective ways to improve the decision-making process based on sound data. We talk about best practices and look to the future, struggling with issues in data governance and integration to optimise organisational data strategies.
 
Keywords: 
Data Lakehouse; Cloud Computing; Data Management; Big Data Analytics; Data Governance
 
Full text article in PDF: