Imagine a customer is entering the retail shop and we want all crucial data points about him or her (viz. buying behavior, interest area, personality traits, recent purchase etc.) available instantly to enable salesman to make an adequate sales pitch, or imagine we have to make a decision to redistribute the fund for effective marketing campaign, or imagine there is a bad guy who has holed the system up, attempting to make unauthorized online purchase and so on, there are numerous such real world scenarios in which we want our business to be enabled with real time analytical response system.
Computing programming process mainly collapse into either Online (Request & Response) or Batch processing, but the third category “Stream Processing” has become very relevant due to quick time to value needs in areas such as Fraud detection, campaign effect, crisis response etc. Let’s dive into Real Time or Streaming analytics element in the context of digital transformation era and organizational data strategy roadmap.
In the current digital disruptive scenario, the crux of business transformation largely lies in the ability to derive insight, establish pattern, and bring hidden perspective and many more while feeding such valuable insight into business transformation strategy. There could be two types of insights: A. The insight that we are trying to derive that answers business questions such as what is customers buying behavior, how people of the age group 30-35 years of age are driving their vehicles in a particular area, what genre of music has got high likes in a particular set of group etc. B. Insight that we don’t know until we scratch the surface of data. Insight might suddenly pop up and add new perspective. For Example – People who don’t pay more than minimum balance of their credit card are more prone to fraudulent activities. The latter may primarily concern correlative vs causative effect. There are numerous such hidden patterns and data mining is the subject area that talks a great deal about such scenarios.
I have picked up “Real Time Analytics” topic here and would like to delve into various sub areas and attempt to connect dots without flying off the tangent. We have been observing the focus shifting from conventional multidimensional pre aggregated OLAP (Online Analytical Process) enabled analytics set up with traditional Data Ware House to modern advance analytics set up encompassing machine learning algorithm, real time queuing mechanism like Kafka and Big Data processing engine such as Hadoop and/or Spark equipped with in-memory capability.
Can we afford to lose the focus from conventional to only Advance Analytics? I guess not, Conventional analytics or operational analytics is the heart beat of business process. Our BAU online system such as ERP, SCM, CRM, Pricing, Sales & Distribution, Marketing & Campaign management and many more function efficiently while calibrating approach taking cue from operational / BAU analytics back to the IT system to support the fundamental business process.
Data Maturity: The need of data maturity for any organization is very essentials. BFSI Industries have been mandated to address Regulatory and Compliance requirement such as BASEL II/III, Solvency II, Todd Frank & Volcker rules, IRFS, AML etc. and many more across the globe. Such regional and global regulatory compliance is great deal of task for any organization as a measure of RISK Mitigation. Implementation of such policies and adherence to the process essentially leads to less of profit margin in the existing business process. For example, only Basel III has 14 principles from Data Architecture to the Reporting requirements while ensuring financial institutions don’t exceed financial transactions beyond certain threshold.
In the midst of plummeting profit ratio, business conglomerates must have to find newer business opportunities and optimize bottom line by leveraging disruptive forces such as SMAC (Social media and networking, advance analytics & visualization, mobility access points, cloud set up), IoT, Cybersecurity, gamification etc. and go digital.
All this is possible once we have developed the sense of the need of a robust data management solution. We can form data strategy and execution plan if not there already. Enterprise architect needs to come up with long term strategic data solution integrating itself with core business function. Business remains the key driver for such digital data transformation. However, We don’t have to hit the reset button if we already have some MDM (Master data Management) in place (Why to waste prior investment). We may have to augment the existing data ecosystem by infusing elements of external data sources (blogging, social media, journals, B2B sites etc.) apart from intensifying the effort to standardize internal data by creating.
The 1st step that comes in my mind while putting data strategy is to conduct the gap analysis relative to data maturity. This is the curve that can be envisioned to pass through Explorative, Diagnostic, Predictive and Prescriptive stages.
What is Real time Analytics? Digital disruption impact is not only limited to business but also to bad guys who are indulging into hacking, DDoS, fraudulent activities etc. while increasing data theft, market, credit risks etc. Need is to have real time capture of incidents followed by its remediation fast. Real time analytics is an evolving area and that can be thought to have come between 3rd and 4th stage of data maturity capability curve as explained above.
Real time analytics can be enabled by using SPS (Stream Processing System). SPS Architecture is conceptualized by having Data Producer (System that sends data to the SPS), Data Receiver partition/s (SPS partition/s configured to receive data), TTL (configured “Time to Live” parameter – SPS expires or flushed out streamed/stored data after TTL).
Technologies that support Real Time Analytics:
- In-Memory analytics: This is an approach to querying data when it resides in random access memory (RAM), as opposed to querying data that is stored on physical disks, such as SAP HANA DB.
- Processing in Memory (PIM): It’s a HW technology in which process chip is integrated with memory chip reducing the latency while expediting the response time.
- In-database analytics: Eliminate the need for frequent I/O by building analytical logic within database itself.
- Data Ware House Appliances: Equipped with the combination of S/W & H/W designed specifically for fast analytical data process, certain off the shelf DWH appliance can be deployed in the data infrastructure.
- Massively Parallel programming (MPP): By employing multiple processors focusing at different part of the same program in a well-coordinated and controlled manner can bring incredible achievement in analytical data response.
Kafka: One such distributed stream processing platform is Kafka that acts like an enterprise messaging system that decouples the conventional mode of message processing into producers, broker, streaming and consumers. Primarily, it is used to store event data (streams), coordinate amongst the system and act responsively. Source: http://kafka.apache.org/intro
Summary: Real time stream processing has become essential element in the data strategy implementation for better customer relation management & segmentation, value for engagement, risk mitigation and regulatory requirement. With proper planning and right technical solution, we can augment predictive, classifier or recommender system with real time streaming solution to deliver instant information to decision support system. Such combination helps delivers customers personalized solution and that is required in almost all business spheres ranging from finance, manufacturing, insurance, health care to retail or education in the midst of plummeting profit margins and increased customer demands. I hope, above article helps executives in both business and IT world to give special focus for REAL TIME ANALYTICS in the overall organization Data Strategy.