Business and Technology department intersects at IT Production Operation. IT Operation reflects health and maturity of both Business and underlying IT solution. Analytics and Big Data are surfacing up almost in all segments of business, products, technology and IT application space creating canvas to design future IT vis-à-vis business landscape. The landscape will keep on changing as user behavior and business requirement changes. Where is Infrastructure Operational Analytics (IOA) in the spectrum of whole world of analytics? Do we have well defined and unified architecture for this intersection – ITO (IT Operation) and IOA? Where are the data sources for IOA? Should data model be following the same methodology of big data of structured and unstructured data source or should be following traditional ODS (Operational Data Store), DWH (Data Ware House), Dimensional/Snow Flake data models, Data Mart concept etc. or something different. I think, most of the organizations in the space of RIM don’t seem to have strategy to derive answers for some of the questions above. I feel, IOA approach can be the hybrid approach involving both traditional and big data typical map-reduce approach. We may not have whole lot of unstructured data types e.g. video, image, audio, sms etc. but volume will continue to grow so hybrid model can be appreciated in the beginning to generate analytics in making robust and stable IT Infrastructure management system (IMS). Irrespective the approach, goal has to be to host and store data in a particular format or style that should help in carrying out exploration as well as multi-dimensional analysis to generate new valuable relationships and insight from data sources. In a typical, IT establishment, We emphasis to conceptualize MDM (Master Data Management) to manage life cycle of information right from controlling input data to cleansing, digesting, processing, disseminating to archiving or deleting it. We may probably don’t need that extensive MDM solution in IOA space, since infrastructural data objects are created inside the Infrastructure eco system. However, we need to layout discipline to structure, store, processing and analyze Infrastructural objects/data addressing the need to have infrastructure information life cycle. Infrastructure objects, viz. I/O, memory utilization, heartbeat, page fault, CPU utilizations, traffic density, latency etc. relative to components such as IP,MAC address, machine, servers, applications etc. are the key ingredients that go as input to analytics processing engine to derive actionable and meaningful insight for infrastructure evolvement. In my last blog about IOA and its importance, I emphasized about the need to have IOA for a better future of IT platform vis-à-vis business ecosystem @ http://www.techmanthan.com/index.php/2015/08/09/infrastructure-operation-analytics-healthy-infrastructure-keeps-business-healthy/, This blog I am attempting to draw design & architecture approach for both IT and Business department.
Challenge
In the midst of Big Data and solution around it ranging from Hadoop framework, map-reduce methodology, node clusters solutions etc., we should never lose the focus on the core business requirement and underlying benefits. Unless we have strong use case and solution approach that can be successfully tested with the some dry test cases, gap between theory and practical solution will always be there and like any other BI (Business Intelligence) requirement; IOA requirement will also remain unaddressed. Like any project management, we need to have clear business need and underlying benefits. Business benefit is the common string that connects all stakeholders encompassing business, IT, providers, users, suppliers, sales and marketing team etc. If we may have to run few POCs to obtain buy-in and build up confidence, so be it, it is always win-win for all involved to run POCs and test the water before starting a full blown project and encounter tons of surprises later. In the beginning, apparently, we didn’t have strategy and adequate focus in the space of IT Infrastructure evolution, and we ended up bloating up monitoring and administrative tools to gain visibilities and controls for all of infrastructure components. For example, Network group needed to get visibility and control in network devices such as switches, routers, repeaters, signal strength etc. , started banking on products specialized to be managing network components only, Similarly DBA, storage, Application team, middle ware team started banking upon products specialized for their respective areas and so on. This continued and we ended up having infrastructure landscape bloated with multitude of monitoring tools without connecting or correlating data from other tools giving disparate views with less of focus and actionable information. Our IOA design methodology should have such a framework that can address such challenges and break isolations while giving a holistic view for the whole of infrastructure stack.
Design Approach
Let’s dive into the architectural and design framework for IOA. In my opinion, IOA solution frameworks should start with the identification of source of data along with its attributes such as data flowing in the network (packets, header, protocols between data to application layer – L2/L7), data generated in machine, data collected through agents (checking on application/coding performance ), data coming out of synthetic transactions (Simulating “n” users and gauge response, # of processing threads for a particular real type of business environment) etc. along with their sources strength, limitation and current usage within IT ecosystem. Let’s dive little bit into the type of source of data.
Computer Data: Data from sources such as Desktop, Lap Tops, Servers, Network Devices-routers, switches, sensors, applications, and security appliances could be self-generated and reported into log files (Application log, call records, system audit files etc.) as well as using polling software viz. MS WMI or trapping solution viz. SNMP. Apart from that data involves make, model, # of processors, cores, memory, VM vs. Host ratio, exception counters, CPU, I/O, Memory, Disk utilization, firewall & security events etc. This source gives wealth of infrastructural information that can be mined and proved to be useful for both IT and business. This will need to be further qualified with related context from other data source to amplify the information and insight. This data can be used to analyze load of any system (server, user client etc.), plan capacity, carry out forensic analysis for any past events etc. One big challenge is having dissimilar data attributes giving varied meaning and context coming from different type of machine sources (Different OS, different vendor tools etc.). We can look for an ETL (Extract Transfer and Load) type of a mapping solution to give uniformity amongst diverse machine sets.
If the VM/Host ratio is high indicating high virtualized ratio, then performance relationship between application and underlying resource utilization will be less co-related, a weighting factor should be introduced here while calculating the underlying cause for application performance. Overall this data source type renders a great deal of mammoth data in IOA space and above factors must be considered while creating source as well as target/ODS/OLAP tables.
Network data: Traditional NW monitoring tools have been used to monitor data between L2 – L4 (Data to Transport level) reflecting upon TCP flag, Type of Service, bytes by ports while packet analyzers take deep packet insight enabling staff to view individual to contagious packets payload and carry out investigation on need basis. Newer N/W monitoring technology has been making stride in this space making L2-L7 real time analysis easy. Data flowing in wire or wireless will be always real and connects all infrastructure components reflecting upon status and health of components including server, network, middleware, application etc. real time. With the growing capacity of link (going beyond 10Gbps) can carry more than 10TB of data single day indicates the volume of data that can be harnessed to analysis health of all such components to a greater detail. Even if server is crashed due to some reason, data in network contains potential information to allow analysts to explore and give insight. However, such data flowing in network may not carry information related to resource utilization, performance, changes in policy and security etc., In order to leverage such wealthy data in network, data model should be built factoring other source of information to complement the context that can’t be solely addressed by network data alone.
Agents Data: Agents are set of codes that can be hooked up into the run time environments such as JRE,.NET environment and help generate performance data of the codes. Agents can produce the performance profile by traversing through set of applications for a particular business transaction. Agents can help locating performance bottleneck or failure with the help of stack that it builds with call method at “entries” and “exists” points, memory allocations and free events etc. giving an end to end transactional visibility. These data points will be valuable in optimizing application performance and that helps IOA meet its objectives. Overuse of agents could be detrimental for the application performance in production so IOA design must weigh pros and cons while including agent in the solution framework. Design should also factor to monitor the usage of agents. We can look at traps or polling solutions for off the shelf applications viz. SAP, MS Share Points etc. rather than hooking through agents.
Synthetic Data: Synthetic data could be characterized as scheduled and scripted time-series transactional level data that indicates the performance simulating a production scenario involving #users, locations etc. This helps to test applications which are supposed to be widely available and performing good such as Amazon, eBay etc. IOA design strategy should look at factoring synthetic solution to keep a close tab on critical applications.
One of the essential driving forces behind the design strategy is to stay focused on reduced spread of tools (monitoring and admin tools to manage and monitor Infrastructure components). Lesser the spread of tools, easier would be the configuration to integrate with multitude of infrastructure components, subsequently lesser costly would be the design and implementation while increasing operational visibility, IT Productivity, improving MTTR, increasing cross team efficiency by having lesser adhoc and war room type of scenarios at the same time.
Design strategy should address primarily two points (1) integrating the context: Integration and collaboration amongst multitude of tools. This may involve creation of a neutral ODS (Operation Data Store) without any influence from any vendor supplier that can store data from multiple data sources and stay integrated and/or related. This obviously depends upon data model that an IOA data modeler will have to create. This can also be seen as conceptualizing MOM (Manager of Managers) solution for IOA. Different supplier of data from different infrastructure components should be equipped with appropriate APIs so that context of data from sources can be derived, maintained and managed real time. (2) Scalability: I am not very much sure about unstructured data types viz. video, image, SMS, Sensors etc. in infrastructure domain, but volume of data is un doubtfully continue to be increasing requiring us to ensure our data base is robust enough to sustain the growth, and that may require design to based upon NoSQL or similar data structure. This will be departure from traditional RDBMS style to Big Data model to address scalability.
These above two could be the prime determinants to lay out foundation for design approach while addressing remaining essential requirements such as security, performance, availability, business insight etc.
Benefit
IOA solution should enable IT staff to perform real time analysis for what is happening in the infrastructure at any given time, ability to perform adhoc query, carry out analytics using drill-down and slice & dice approach, conduct explorative analysis for any situation, predict and forecast resource consumption followed by conducting “what-if” scenario, develop trending etc. As any other analytics model, IOA goal is to keep up business IT environment healthy, up & running at any moment in a most cost effective manner.
Summary
IOA design and architecture framework should help IT staff to proactively identify and resolve performance and security issues, correlate transactions and events across every tier of infrastructure, intelligently plan investment and optimize infrastructure resources, facilitate WAR Room type of scenario – enabling cross team collaboration while addressing problem, increase staff productivity and process efficiency, harness less spread of tools (Admin and Monitoring related) etc. IOA should be able to appreciate BI (Business Intelligence) tools for business users by supplementing infrastructural data feed related to real time insight of sales, customers purchase behavior, concurrent users etc. as well as underlying Infrastructure components behavior. IOA and BI need to complement each other while giving opportunity to improve simultaneously. I don’t think, one single tool will be able to address this complete analytics circle, but a solution need to be built in a framework that involves and factors variety of tools, business dimensions(multi stakeholders, multi products and varied usage of business applications across different line of customers), multitier architecture, disperse geographic location (Latency) etc. In the nutshell, IOA design strategy involves from identification and categorization of data source to cleaning, loading and storing to presenting (May be using visualizing tool such as Tableau, QlikView etc. ) the processed or massaged data to both IT and Business department. I hope, this article helps both technocrats and business users with key IOA design considerations and brings both department close enough in putting design strategy through which IT provides a hassle free and disruption less environment for business application to proliferate.