Invisibl Cloud helps WEGoT Utility Solutions to build a Serverless Data Lake on AWS
WEGoT Utility Solutions provides smart water meters for residential and commercial
buildings. These smart devices help property owners to conserve water by regulating its
flow, detect leaks by monitoring the usage in real time and by sending instant alerts of
water wastage to users.
The proposed solution should meet the following key requirements:
Scalability: As the number of devices grow, the Data Lake storage and processing infrastructure should Auto Scale to meet the demand.
Serverless: The Data Lake infrastructure should use serverless technologies as much as possible to reduce infrastructure management overhead.
Extensibility: The architecture to be set up such that further ingestion for required downstream systems could be orchestrated easily.
Repeatability: Post Data Lake implementation, the data engineering team should be able to add new data pipeline jobs without much of code changes.
Security: Ensure data lake remains secure by implementing policies to control data access.
Effective: Cost The overall cost of the Data Lake infrastructure should be cost effective.
The Serverless Data Lake deployed for WEGoT comprises of the following:
Real–time sensor data ingested from AWS Kinesis Firehose
- Transactional data ingested from Amazon Aurora MySQL
- Different Data Lake Zones created in Amazon S3
- AWS Glue based data engineering pipelines to process and transform data
- Adhoc querying & Analytics through Amazon Athena
- AWS Quicksight for Dashboards
- Implement various data lake best practices such as optimal file formats, appropriate lifecycle policies to expire data, limits to control Athena query costs.
- Developed a custom framework that allowed WEGoT to easily ingest data from third party sources like Google Analytics, Freshworks CRM.
- Built extensible data pipelines that allowed third party integration by feeding data to third party vendors who provide extended analytics (such as forecasting).
- One of the main scalability requirements was to run intense reconciliation jobs at scale and cost effective. So rolling window practices were deployed to limit latest 6 months data when data baselines were created.
- Developed a DynamoDB based custom scheduling framework that helped WEGoT to further extend new jobs and alter existing jobs and their dependencies easily via simple configuration changes.
- Another key architectural requirement was to be able to manage the Data Lake post implementation without the need for changes at the framework. Configurability was a key requirement as the team were ramping up their sensors and wanted to be independent in terms of managing the scale and business logic only via configuration. Invisibl Cloud developed templatised jobs that were easy to create and schedule. New jobs could be created by reusing these templates and making relevant configurations.
- Implemented fine grained access control policies using Lake Formation to control access to data available in the Data Lake.
- Perform Ad-hoc, interactive queries on the data to derive various insights about smart water meter usage.
- Drill down dashboards for self-service analytics.
- Seamless scalability of the data lake as more devices get connected to the platform.
- Remain cost-effective all the time through cost controls implemented at various layers of the data lake.