Here is the acid test – our customers often know that there is a service problem before we do. When that happens, as far as our customers are concerned, we suck, we are incompetent and they are smarter than us. Our IT Operations team may be publishing loads of great Key Performance Indicators (KPIs) that show improvements in average fix times, improvements in system availability, reductions in incidents etc. but all of that is without value if our customers think we suck.
So why do customers get the jump on our IT Operations Centre? Actually it is quite simple, our customers experience the full end to end service as users of our systems, our IT Operations people only experience a partial picture, the components of the service as they use the instrumentation that checks each is available and functional. Our customers and IT Operations have different views of the world, and the customer view is more powerful, more meaningful and more immediate. Modern distributed systems are so complex that it is impossible for IT Operations to understand the impact of a component failure on the end-to-end service.
Customers think about services, not just bits of technology, they see things end-to-end, not in isolation and even if all of the key components are available and functional, sometimes the end-to-end service just does not work. I am sure we all have an experience of calling a helpdesk analyst and being told that it is all working for everyone else and they can’t see your problem? The supplier has probably spent millions on instrumentation the service you are complaining about, you have spent nothing and your instrumentation is better! Yours is better because yours is the Customer Experience View.
I wrote about some of the history and reasons that we monitor and manage IT in technical silos earlier in the article on this Website:- Why do IT Operations suck? It’s worth a read again.
So how do we get to a Customer Experience View? Actually it is not that difficult, it just takes a little extra effort. The first stage is to map out the end-to-end service (I call this process Tube Mapping after London’s Underground Train Network) that we are offering our customer. The diagram below shows an example for a Broadband Consumer service, showing Cycle Time and Right First Time metrics.
Tube Maps are built in workshops between IT and the Business Units and they crystalize the dependancies between IT systems and applications and their supporting business processes. It is quite common in companies that no one individual understands the whole end-to-end service and that, although the service is documented, there is no one place that it all comes together. Tube Mapping can be very enlightening and in themselves Tube Maps are valuable in driving a clear understanding of what we are delivering to our customers.
The next stage is to start building end-to-end instrumentation for Customer Experience Monitoring. There are three types that we can use:
- Realtime Active Dashboards
- Synthetic Transactions
- Built in End-to End Application instrumentation
The first, Realtime Active Dashboards can be fabricated using tools from IBM’s purchase of Netcool. This toolset enables a federated approach to Customer Experience Monitoring by enabling a set of federated alerts and instrumentation to be combined into a single end-to-end view of the service offered to customers. A set of RAG (Red, Amber, Green) indicators show the impact of a failure at a low level in the service hierarchy on the customer impacting service component. For example a failed Websphere server could cause the service view to be Red if there was no built in resilience (thereby causing a service outage) or Amber if it was part of a protected cluster thereby compromising service protection but not impacting the customer experience.
These Service Hierarchies are built from the Tube Maps so actually translate into meaningful description of the business impact. For example a hardware failure might impact the sell service impacting the call centers because they can no longer take orders. By taking this approach the Command Center begins to understand the impact of outages on the business and our customers thereby being more responsive and better able to determine the most important components to recover.
Actually these Customer Experience Views are incredibly useful and can be shared with customers. Helpdesk call volumes tumble when the customer can see that there is a problem, what the impact of the problem is and that you know about it already. Tables are turned, we tell the customer about the problem, we are competent.
The second, synthetic transactions are a borrowed idea from end-to-end systems testing. Here we take the test harness that is applied to our applications prior to deployment into production and use them constantly to provide in-life testing. Synthetic transactions offer a unique view of the end-to-end service being as close as it is possible to be to the real Customer Experience that we need to achieve. HP have a set of tools called Business Availability Center that work well in this space. The objective is to create software robots that create pseudo transactions and monitor the progress of the transactions through the end-to-end service. Slow performance can be detected and highlighted in the same way as a complete failure. Synthetic transactions are highly flexible with a rich set of instrumentation output.
Here is what HP say about the product set:
HP Business Availability Center helps your organization:
- Make ITSM incident and problem management processes more efficient and business aligned
- Measure business impact and risk from the end-user perspective
- Manage business and operational service levels proactively
- Accelerate problem isolation by automating standard operational processes
- Manage complex business transactions across heterogeneous environments
- Manage the complexity of composite applications and SOA
We focus on service by organizing around key services and mapping those services using instrumentation working along the tube maps. We delight customers by caring about the same thing they do Service Protection and Service Recovery. Service Recovery works better because initial triage, where we work out where to focus our attentions, is enabled by understanding which issues are actually impacting customer experience.