The Hot Aisle Logo
Fresh Thinking on IT Operations for 100,000 Industry Executives

One of the strangest things about IT people is that they often miss the obvious until it is explained to them (sometimes with the aid of a hammer). We are all so used to following the rules and doing what everyone else does. This particularly applies to IT Operations, also known as the Command Centre or the Operations Bridge. The traditional Command Centre has a very simple function, that is to “log and flog”, to watch for traps coming from instrumentation on the managed estate and log them into the Incident Management workflow system.


It is done like this because highly skilled technicians hate that kind of work and are more expensive than typical operators. The skilled guys take the incidents on the workflow system and deal with them in order, fixing a piece of storage, restoring a server, fixing a network connection. Typically the work is organized into first, second and third line support to protect the really good (third level) guys from having to fix systems. The same good guys who design the systems that break! (Is there a lesson here?)

The operator, command centre job reads like a particularly low skilled occupation, low status and low value. Guess what, it is and more and more companies are outsourcing or off-shoring this activity. Actually, I believe that it is just wrong headed to do operations like this because it is too slow and too expensive to run a command centre like that with complex distributed systems.

You see the command centre, operations bridge concept was designed in the days of the mainframe, lots of batch processing and batch jobs to fix because the input data was wrong. Lots of simple things that operators could do at a low cost and 24×7. When the problem got too hard then it got escalated via the incident management workflow to the smarter guys. That worked just fine. Today systems are just so complex and convoluted that this approach breaks down and is slow and cumbersome. If we look at trouble to resolve cycle times we find that 85% of the time is taken up in handoffs as the incident is passed around resolver groups looking for a home.

Resolver groups are also a leftover from the past when systems were simple and could generally be resolved in a technical silo, because it was a mainframe issue or a network issue and it was obvious what the problem was. Today problem determination and allocating an incident to the right resolver group on the first attempt is a matter of chance. I have seen an average of three to five handoffs per incident as being a normal distribution in todays IT Operations.

The other thing that 20th Century Operations misses out is the Customer. Read any marketing book for the last 50 years and the first rule is to put the customer at the centre of your business, that is how high growth, successful companies work. We must ask ourselves what the customer wants from IT Operations and funnily enough it is very simple, they want IT Operations to protect and recover the services that we manage for them. So what do we do? Typically we don’t even understand the end to end service that the customer has been sold. We focus on fixing boxes and recovering networks without any clear understanding of how the customer is affected.

If you believe what I am saying, then it is clear that 20th Century IT Operations is fatally flawed and needs a complete rethink. Multiple levels of support doesn’t work, log and flog adds no value and where is the Customer?

BT Sheffield CEMC

So this is where Customer Experience Management comes in. CEM is where IT Operations throws out the rule book and starts again. Starts again with a total focus on Customer Experience and Service. OK so it sounds a bit trite and Business School speak but I promise you it works. The boring picture above is of my old boss Al-noor Ramji the CIO of BT visiting the Customer Experience Management Centre in Sheffield UK. He came to see what had happened to radically change incident handling, customer satisfaction and trouble to resolve cycle times. He came to see how BT fixed it’s Broadband Lead to Provision service which led directly to BT’s massive growth in broadband subscribers. He came to see how we put customers are the centre of IT Operations.

So what is Customer Experience Management all about and how does one implement CEM in a real business? Actually it is quite simple:

  • First we work out what services we want to support.
  • Then we create tube maps of each end to end service (as experienced by the customer)
  • Then we appoint a Service Operations Manager to own Service Protection and Recovery
  • Then we organize around service 
  1. By up-skilling the operators to understand the service
  2. By ensuring that every part of our business that is involved in delivering the service is integrated
  3. By co-locating the people with the right level of skill to fix service in the same room
  • Then we build instrumentation that tells us if the customer experience of the service is good
  • We put tools exploitation engineers in the CEMC to reduce Problem Management cycle time
  • We also need to drive standardization of the IT Infrastructure (less urgent)
  • We need to drive autonomics to improve compliance, build standards and cycle time

By taking this approach we kill off the problems of 20th Century IT Operations. We reduce trouble to resolve cycle times by co-locating the necessary skills in one place and taking out the hand off time (85% reduction). We focus on service by organizing around key services and mapping those services using instrumentation working along the tube maps. We delight customers by caring about the same thing they do Service Protection and Service Recovery. Service Recovery works better because initial triage, where we work out where to focus our attentions, is enabled by understanding which issues are actually impacting customer experience.

There Are 4 Responses So Far. »

  1. Hi Steve,

    Firstly, welcome to the blogging world. I sincerely hope you keep up this sort of output. Its superb.

    One thing I wanted to share with you was the story of a helpdesk implementation we rolled-out for UK hosting company who were outsourcing the bulk of their operation to a Contact Center Serbia.

    We conducted a Customer Experience audit and found that a significant number of people contacting the helpdesk were web developers and freelance technologists who were both reselling and acting as mavens for a critical new service (at the time one of the first hosted Exchange Services in the UK).

    We profiled them and found that they were both very technically knowledgeable and typically contacted the helpdesk only after expending considerable effort trying to solve their problems themselves.

    These vital customers hated getting script reading support agents on the phone. They were being forced to describe a complex problem to people who did not understand the issues and whose main job was to register the issue and add it to the list of issues to be escalated “up” to the sort people who were the developer’s technical peers.

    Our solution to this proved to extremely successfully. We simply inverted the classic support pyramid. Instead of having our least knowledgeable people in the first line, we put our system administrators on the phones. They triaged incoming issues, passing down low level support tasks (like mailbox configuration) to juniors whilst giving immediate attention to serious issues.

    Where necessary, the issues was passed to the NOC engineering team, but it had been captured and described by an administrator and most importantly the customer knew they were dealing with highly skilled and knowledgeable “agents” who immediately understood their issues as peers. We also delighted customers by caring about the same thing they did and proving it by providing them with technical equals to support them.

    Of course part of the reason were able to do this was because this was an outsourced implementation. We were able to hire truly brilliant Exchange administrators in Serbia for the price of trainee secretary in the UK. I believe that this is one of the the main customer experience benefits from correctly implemented outsourcing: Operational and Excellence delivered live. Customers love it.

  2. […] reasons that we monitor and manage IT in technical silos earlier in the article on this Website:- Why do IT Operations suck? It’s worth a read […]

  3. […] This post was mentioned on Twitter by K Kopec, Israel Mendoza, Steve O'Donnell, Caleb Bontrager, Preston de Guise and others. Preston de Guise said: RT @stephenodonnell Blog] Why do IT Operations Suck? <– Great analysis of ops issues! […]

  4. […] Why do IT operations suck? An insightful article by Steve O’Donnell. Steve asks why our staff who have primary involvement with systems 24×7 (operators) are often the least skilled, least trained and least paid. (As a consultant, I’ve frequently experienced companies who consider it a waste of time to properly train operators, and as a result their systems usually suffer for it.) […]

Post a Response