This TechNet Wiki post provides an overview on how Lambda Architecture can be implemented leveraging Microsoft Azure platform capabilities. It is subjected to further community refinements & updates based on the availability of new features & capabilities
from Microsoft Azure.
Gone are those days when Enterprises will wait for hours and days to look at the dashboards based on the old, stale data. In this fast world of BYOD, fitness gears and flooding of other devices, it is becoming super important to derive out “actionable”
information from huge volume of data / noise that is generated from these devices or any other data sources and act proactively on them in real-time, to stay competitive.
At the same time, the need for having dashboards and other analytical capabilities based on the quality, cleansed, processed data still very much exists.
With the emergence of more data types and need to handle huge volume, shift is happening from the conventional data warehouse practice to cloud based data processing & management capabilities where high volume batch processing is possible at the optimized
cost. Business scenarios demanding the need to process the data in real-time for subsequent actions makes things complex.
From the various available Architectural patterns for data processing & management , Lambda Architecture stands out
first , where it aims to address the business scenarios demanding the need for processing huge volume of data both in batch and real-time.
Objective of Lambda Architecture is to leverage the combined power of both batch & real-time processing to address the business scenarios where it requires both historic view of the data as
well as getting insight into the data in real-time as business happens.
The logical layers of the Lambda Architecture includes:
The batch layer precomputes results using a distributed processing system that can handle very large quantities of data. The batch layer aims at perfect accuracy by being able to process all available data when generating views
The speed layer processes data streams in real time and without the requirements of fix-ups or completeness. This layer sacrifices throughput as it aims to minimize latency by providing real-time views into the most recent data. Essentially, the speed layer
is responsible for filling the "gap" caused by the batch layer's lag in providing views based on the most recent data.
Output from the batch and speed layers are stored in the serving layer, which responds to ad-hoc queries by returning precomputed views or building views from the processed data.
Lambda Architecture is envisioned to provide following business benefits:
When you come across any of the scenarios similar to the one listed below, Lambda Architecture can be considered to address those scenarios
When it comes to Lambda Archtiecture realization based on a public cloud, Microsoft Azure provides various capabilities that can be leveraged together for the implementation.
The picture depicted above provides a high level mapping between some of the Azure capabilities and various layers of Lambda Architecture.
The below table provides a mapping between logical layers of Lambda Architecture and Azure capabilities:
Data will get appended and stored (Batch View)
Processed in real-time and stored for both read & write operations (real-time view)
Indexes batch views / out of date results