One of the strangest things about IT people is that they often miss the obvious until it is explained to them (sometimes with the aid of a hammer). We are all so used to following the rules and doing what everyone else does. This particularly applies to IT Operations, also known as the Command Centre or the Operations Bridge. The traditional Command Centre has a very simple function, that is to “log and flog”, to watch for traps coming from instrumentation on the managed estate and log them into the Incident Management workflow system.
It is done like this because highly skilled technicians hate that kind of work and are more expensive than typical operators. The skilled guys take the incidents on the workflow system and deal with them in order, fixing a piece of storage, restoring a server, fixing a network connection. Typically the work is organized into first, second and third line support to protect the really good (third level) guys from having to fix systems. The same good guys who design the systems that break! (Is there a lesson here?)
The operator, command centre job reads like a particularly low skilled occupation, low status and low value. Guess what, it is and more and more companies are outsourcing or off-shoring this activity. Actually, I believe that it is just wrong headed to do operations like this because it is too slow and too expensive to run a command centre like that with complex distributed systems.
You see the command centre, operations bridge concept was designed in the days of the mainframe, lots of batch processing and batch jobs to fix because the input data was wrong. Lots of simple things that operators could do at a low cost and 24×7. When the problem got too hard then it got escalated via the incident management workflow to the smarter guys. That worked just fine. Today systems are just so complex and convoluted that this approach breaks down and is slow and cumbersome. If we look at trouble to resolve cycle times we find that 85% of the time is taken up in handoffs as the incident is passed around resolver groups looking for a home.
Resolver groups are also a leftover from the past when systems were simple and could generally be resolved in a technical silo, because it was a mainframe issue or a network issue and it was obvious what the problem was. Today problem determination and allocating an incident to the right resolver group on the first attempt is a matter of chance. I have seen an average of three to five handoffs per incident as being a normal distribution in todays IT Operations.
The other thing that 20th Century Operations misses out is the Customer. Read any marketing book for the last 50 years and the first rule is to put the customer at the centre of your business, that is how high growth, successful companies work. We must ask ourselves what the customer wants from IT Operations and funnily enough it is very simple, they want IT Operations to protect and recover the services that we manage for them. So what do we do? Typically we don’t even understand the end to end service that the customer has been sold. We focus on fixing boxes and recovering networks without any clear understanding of how the customer is affected.
If you believe what I am saying, then it is clear that 20th Century IT Operations is fatally flawed and needs a complete rethink. Multiple levels of support doesn’t work, log and flog adds no value and where is the Customer?
So this is where Customer Experience Management comes in. CEM is where IT Operations throws out the rule book and starts again. Starts again with a total focus on Customer Experience and Service. OK so it sounds a bit trite and Business School speak but I promise you it works. The boring picture above is of my old boss Al-noor Ramji the CIO of BT visiting the Customer Experience Management Centre in Sheffield UK. He came to see what had happened to radically change incident handling, customer satisfaction and trouble to resolve cycle times. He came to see how BT fixed it’s Broadband Lead to Provision service which led directly to BT’s massive growth in broadband subscribers. He came to see how we put customers are the centre of IT Operations.
So what is Customer Experience Management all about and how does one implement CEM in a real business? Actually it is quite simple:
- First we work out what services we want to support.
- Then we create tube maps of each end to end service (as experienced by the customer)
- Then we appoint a Service Operations Manager to own Service Protection and Recovery
- Then we organize around service
- By up-skilling the operators to understand the service
- By ensuring that every part of our business that is involved in delivering the service is integrated
- By co-locating the people with the right level of skill to fix service in the same room
- Then we build instrumentation that tells us if the customer experience of the service is good
- We put tools exploitation engineers in the CEMC to reduce Problem Management cycle time
- We also need to drive standardization of the IT Infrastructure (less urgent)
- We need to drive autonomics to improve compliance, build standards and cycle time
By taking this approach we kill off the problems of 20th Century IT Operations. We reduce trouble to resolve cycle times by co-locating the necessary skills in one place and taking out the hand off time (85% reduction). We focus on service by organizing around key services and mapping those services using instrumentation working along the tube maps. We delight customers by caring about the same thing they do Service Protection and Service Recovery. Service Recovery works better because initial triage, where we work out where to focus our attentions, is enabled by understanding which issues are actually impacting customer experience.