Throughout my 20 plus year career in IT consulting, I have noticed that the most successful businesses often have something in common – they run their IT like a business and they treat IT like a key part of the business, and not like an add-on function. I have been in Qatar for only just over one month now but I have already seen encouraging signs that businesses here are starting to recognize that IT can be a key enabler for achieving strategic objectives. The next step for many companies in Qatar is to run their IT like a business, with the same deliverables, service levels and outcomes that is expected from any other business.
Businesses decisions are driven by three constraints, affordability, risk and time to deliver. Generally, suppliers to business leverage this understanding to deliver something that their customer needs yet is constrained from doing themselves. So, for example, if you need a 24 x 7 security patrol for your company premises, you can either choose to employ and manage a team of security officers or you can outsource the service to a professional security firm.
The outsourced security firm can be an effective business choice because it reduces risk and you can determine what you need by defining a service level (such as a full perimeter patrol every 2 hours and check all doors and windows are secure at 7PM) rather than deciding to employ a team of employees to deliver security. Service levels are the key business driver here because they force us to think about the problem before we think about the solution.
Much the same is true of Information Technology, companies can choose to think of IT as hardware, software and people that somehow come together to help business execute or, alternatively, as a set of underpinning business services with service level agreements and requirements. Companies that start off thinking about the problem – what they are trying to deliver – generally do a better job than those who leave it all to chance.
By thinking of IT as a set of services that underpin your core business processes (selling cars or homes, banking, insurance, liquefying gas) you can start aligning your business and IT requirements and make significantly better investment decisions. Research shows that the most successful and profitable businesses have mature business processes underpinned by mature IT processes. No surprise then that here in Qatar, IT Infrastructure Library (ITIL) training is extremely popular as fast growing businesses look to grow their IT and business maturity.
The basis of ITIL Is that IT becomes a set of services delivered as standard processes with service level agreements in a structured and repeatable way. Businesses are looking to make IT repeatable, standard and reliable with defined costs and reduced risk.
So in the same way that security, cleaning, and facilities management have long been recognized as being suitable for outsourcing as a managed service, many parts of IT delivery are equally suitable. Managed storage, managed network, managed email and managed data center services are common across the world. These reflect the IT outsourcers’ ability to build repeatable capability at low cost by leveraging scale and investment in process and technology.
The characteristics of a service that is suitable for outsourcing are:
- Definable by a service level
- Requirement to scale up and down depending on demand
- Benefits from delivery by a mature specialist organization with defined processes
- Benefits from volumes of scale above your own requirements
Reliable IT delivery is becoming business critical with outages often meaning that customers take their business elsewhere or employees cannot work. IT outages cost money and damage brand reputation so careful management and delivery of IT is critical. Service levels align business needs to IT delivery ensuring that the right levels of service design and service operation are put in place to avoid problems.
For businesses to truly reap the advantages that IT can provide, there needs to be this focus on service levels, outcomes and deliverables. Running IT like a business will enable IT to help businesses prosper and grow.
I don’t normally plug press releases straight from vendors but today I received an email from Emily Wood at Google with a message that I agree 100% with. Cooling data centers is not just about refrigeration – there are lots of options – many of which we have written about here on The Hot Aisle – Fresh Air cooling, Liquid Cooling, Spray cooling, and others we haven’t even thought about yet (there are tons of smart engineers out there doing great work).
I guess it is unsurprising that ASHRAE the American Society of Heating, Refrigerating and Air-Conditioning Engineers write standards that are about Refrigeration after all turkeys don’t vote for Thanksgiving.
Here is the article in it’s entirety:
Setting efficiency goals for data centers
For the past decade, we have been working to make our data centers as efficient as possible; we now use less than half the energy to run our data centers than the industry average. In the open letter below, I am very happy to welcome a group of industry leaders who collectively represent most of the world’s most advanced data center operators. -Urs Hoelzle, SVP, Operations and Google Fellow
Recently, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) added data centers to their building efficiency standard, ASHRAE Standard 90.1. This standard defines the energy efficiency for most types of buildings in America and is often incorporated into building codes across the country.
Data centers are among the fastest-growing users of energy, according to an EPA report, and most data centers have historically been designed and operated without regard to energy efficiency (for details, see this 2009 EPA Energy Star survey). Thus, setting efficiency standards for data centers is important, and we welcome this step.
We believe that for data centers, where the energy used to perform a function (e.g., cooling) is easily measured, efficiency standards should be performance-based, not prescriptive. In other words, the standard should set the required efficiency without prescribing the specific technologies to accomplish that goal. That’s how many efficiency standards work; for example, fuel efficiency standards for cars specify how much gas a car can consume per mile of driving but not what engine to use. A performance-based standard for data centers can achieve the desired energy saving results while still enabling our industry to innovate and find new ways to improve our products.
Unfortunately, the proposed ASHRAE standard is far too prescriptive. Instead of setting a required level of efficiency for the cooling system as a whole, the standard dictates which types of cooling methods must be used. For example, the standard requires data centers to use economizers — systems that use ambient air for cooling. In many cases, economizers are a great way to cool a data center (in fact, many of our companies’ data centers use them extensively), but simply requiring their use doesn’t guarantee an efficient system, and they may not be the best choice. Future cooling methods may achieve the same or better results without the use of economizers altogether. An efficiency standard should not prohibit such innovation.
Thus, we believe that an overall data center-level cooling system efficiency standard needs to replace the proposed prescriptive approach to allow data center innovation to continue. The standard should set an aggressive target for the maximum amount of energy used by a data center for overhead functions like cooling. In fact, a similar approach is already being adopted in the industry. In a recent statement, data center industry leaders agreed that Power Usage Effectiveness (PUE) is the preferred metric for measuring data center efficiency. And the EPA Energy Star program already uses this method for data centers. As leaders in the data center industry, we are committed to aggressive energy efficiency improvements, but we need standards that let us continue to innovate while meeting (and, hopefully, exceeding) a baseline efficiency requirement set by the ASHRAE standard.
Chris Crosby, Senior Vice President, Digital Realty Trust
Hossein Fateh, President and Chief Executive Officer, Dupont Fabros Technology
James Hamilton, Vice President and Distinguished Engineer, Amazon
Urs Hoelzle, Senior Vice President, Operations and Google Fellow, Google
Mike Manos, Vice President, Service Operations, Nokia
Kevin Timmons, General Manager, Datacenter Services, Microsoft
I have spent the last 12 months working with some of the smartest and best funded marketing people on the planet, I have been working with the the really big, household name, IT Infrastructure vendors. I learned lots, lots about marketing, lots about honest analysis and lots about human nature. I learned that marketing isn’t that tough to do but it can be terribly hard to do it well and be constantly effective.
Boiled down to it’s basics marketing is about understanding a business problem really well whilst at the exact same time being paranoid that you are completely wrong about that business problem in every respect. After you get the paranoia right, everything else is just process. Sure there is a ton of creative stuff that needs to get done to do it well but if you have smart people and a pile of cash that is never a problem.
So step one in our marketing 101 lesson is understanding the problem. Marketing people sometimes call this Getting to the Insight and you have to get it right. Get it wrong and everything else you do is likely to be completely useless and may even be counter productive.
Once you get the Insight you need to work out who has the problem. Is it SMEs, Enterprises, Startups, the Medical Profession? Marketing people call that the Segment and the process is called segmentation.
Now we have the Insight and the Segment we need to work out a set of solutions that solve the problem for each sector. Notice I said solutions, not solution. We need solutions because each segment may need a different solution to the same problem.
We are almost there. Now we need a set of Messages that explain and position the Solutions to the Segments we identified earlier.
Insight -> Segment -> Solutions -> Messages
So this is what I have been doing at ESG, helping marketing departments understand the problem, that is get to the Insight that matters that will make the vendor successful. I have also been trying my hardest to disprove insights, often with help from research and polls. Generally as long a you can’t disprove an insight it’s OK. Strangely enough it is usually almost impossible to prove insights, because no one ever has complete market visibility.
These insights need to be backed up with research and customer validation, we need to be paranoid. Insights are nebulous and time bound. What might have been an amazing insight at one time won’t be that way always – once the problem is solved for the relevant segments the insight goes away. Challenging the insights that vendors rely on to make major product investments is important and it needs to be verified constantly. IT insights can decay or morph very quickly, much more quickly than fast moving consumer goods but not quite as quickly as fashion and apparel items.
Understanding what customers think today about IT products and services is crucial but even more important is being right about how they will be thinking next quarter, next year, next decade. If we understand how insights change over time we can adjust our segmentation, alter our solutions and correct our messages. If we fail to foresee the changes we fail to correct our messages, we get a misalignment between customer brand perception and marketing department messaging.
The IT business moves so fast that is often a fatal mistake.
Picture Copyright (c) toothpastefordinner.com
The problem with computing is everyone wants to make it uniform – fit it into a neat box, categorise it as ‘all the same’, make it autonomic, self-managing and move on. In fact, IT is anything but uniform, so these simplistic approaches fall at the first hurdle.
Smart CIOs understand applications need to be treated differently depending on their value to the business. There are four key types – invest, operate, contain and kill.
‘Invest’ applications generally make up around 10-15% of the full estate. These are the applications the CEO knows about – the ones that when they get better, faster or more functional have a direct impact on business value. Examples are key CRM systems or MRP platforms, applications that underpin vital business processes and touch customers. These applications are not terribly cost sensitive, so when CTOs look to virtualise to take out cost, CIOs resist. Virtualisation is useful for invest applications but only if it improves agility, speed of deployment, adds functionality or reduces risk.
‘Operate’ applications represent 40-80% of the estate. No matter how much better we make them, they don’t improve business performance. Examples might be email or document management, internal HR systems or archiving systems. They need to be reliable and cheap. Virtualisation works here as a method of taking out costs. So does outsourcing and software as a service (SaaS) delivery.
‘Contain’ applications are those we wish we didn’t have – old stuff that’s expensive to run and difficult to change or manage. We get the amount of these we deserve: under-invest and the category grows. They have one other characteristic: they are difficult to re-platform and change to ‘operate’ status. Although they’re important to the organisation, they don’t typically make the business run any better if we improve them. We just want them to run silently for as long and as cheaply as possible. ‘Operate’ applications that have not had proper investment, love and attention will eventually move into this category.
‘Kill’ applications are always a nightmare. These are the ones that are impossibly expensive to run and maintain. By definition, they only represent a tiny handful of the estate (perhaps 1-5%). They are impossibly difficult to change. Often the guys who wrote and maintained the code are retired (or dead). No one else you know still has the hardware, except the Natural History Museum, and the vendor no longer supports the operating system. These might have been ‘contain’ applications that just wouldn’t stay contained, or ‘invest’ applications where you didn’t invest (silly you). There is only one thing to do with a ‘kill’ application – bin it. You know it’ll cause pain and disruption, as well as costing a lot of money, but it has to be done.
Smart CIOs know this already and take a pragmatic approach to their applications, understanding instinctively where to spend money and where to bleed a previous investment. And really smart CIOs never reach the point where they need a kill category.
I wanted to share with you my latest news. A little over a year ago Steve Duplessie and I created ESG EMEA to help reach out and serve new European clients as well as provide local support for many of our ESG US based relationships. We have been immensely successful in building this business, working with clients at senior levels to help them better understand the market and target their products and services into the right segments armed with the right message and solutions. Late last year, ESG EMEA and I had the most unexpected honor of being named as one of the top 50 most influential IT analysts in the world.
I have been offered a fantastic opportunity to lead an enormous IT undertaking with the Qatar Foundation in Doha to create a visionary 21st Century cloud IT operation for the Middle East and North Africa. I will be leading that effort directly, and as a result I will be unavailable to take briefings or to provide direct services to clients for the foreseeable future. Please continue to leverage the expertise and talents of the other fine ESG analysts in this regard.
I will continue to keep you updated on our progress, challenges, and thoughts on data centre and IT operations via the Hot Aisle and on the ESG site.
There are a lot of them around, Data Centres. A few of them are designed and operated very well and deliver great Power Usage Efficiency. Some could do a bit better, perhaps an airside economiser or two, or some hot or cold aisle containment, or maybe some DC power. Some are just a nightmare and could benefit from the administration of a wrecking ball. For some data centres, it seems that no amount of fixing them up, improving plant and applying best practice will make any measurable difference. Let’s call them clunker data centres! (Maybe we can get the Government to do a cash for clunkers program for data centres?)
A clunker starts off with a ceiling height that is too low for hot air to separate out and migrate towards the CRAC units without too much mixing. The plenum under the raised floor is shallow and clogged up with cables and other detritus choking off airflow from the CRACs. The floor tiles are perforated and have low airflow characteristics. The cabinets are all lined up like a schoolroom. front to back to front to back…. You could cook turkeys in the back row. The CRAC units are low capacity and that capacity is exhausted. Naturally the boss wants you to install some 10KW racks in a hurry for a critical business project.
What can you do? Say “no way”? Offer a co-location option in a commercial facility as an option? Start looking for a new job?
I bumped into a possible solution a few days back on Twitter when I connected with Mary Hecht-Kissell (@PR_Strategies) who looks after Coolcentric. The problem set, defined above, that makes a clunker data centre is all about getting enough cold into servers to remove the excess heat. Every element in the clunker conspires to make delivering more cold air virtually impossible. That’s where the coolcentric solution makes a difference. It delivers cold water right up against the servers. It adds additional cooling capacity that enables that set of additional 10KW (or more) racks to be installed in a data centre that seemed like a lost cause. It’s a fairly simple piece of technology, that has been well engineered to be retrofitted to most types of existing cabinets. It’s a water cooled door.
The water cooled door is fitted onto the back of the rack so that the hot air exhausting out of the cabinet gets chilled immediately and very efficiently. Liquids are about 4000 times more efficient at removing heat from a server than air, so these water cooled doors can remove significantly more heat with very low pumping energy.
One smart way to think about it is that the water cooled door acts like a mini, contained hot aisle for environments (like our clunker data centre) where cabinet alignment, roof height and plenum problems make hot aisle containment impossible.
Sounds like a pretty decent alternate to Semtex!
Cloudera struck lucky in getting a $5M A-round away just before the markets shut down in response to the collapse of the global financial system. Backed by Accel Partners and more recently Greylock Partners they are making a bet that Hadoop with a smart scale out approach to managing large amounts of data is a winning strategy.
Mike is an industry veteran, having been through the normal form, build and sell cycle a number of times, most notably with Illustra into Informix and SleepyCat Software into Oracle Corp. During the 3 quarters between the A round and a subsequent B round, Mike has been able to build a credible and valuable company that adds significantly to the Apache Hadoop distribution, without funded competition.
In that time Cloudera have built a 30 person firm, the Cloudera Distribution for Hadoop, built a support and professional services capability and created a training and certification business. He also made the very smart move of recruiting Doug Cutting out of Yahoo, the original author of the Hadoop system.
Hadoop’s initial use case has been in managing and processing Internet scale web at Yahoo and Facebook but is now seeing significant levels of interest in other markets where processing large scale data in real time is a competitive advantage. These include, financial services, government security, credit card fraud, genomics, digital media (3D) and national scale telecommunications firms.
Mike sees the development of Hadoop as a platform as the key area for now, with enhancements to add enterprise level management capabilities, “to avoid the need to have a team of Stanford graduates to hand crank the system” he told me. To address just this issue, they have developed the Cloudera Desktop that enables the centralised management of internal and public Hadoop clusters.
Later, Mike expects to see ISV’s delivering enterprise solutions based on a Hadoop platform that enable hitherto impossible feats of analytic prowess. Early enterprise adopters in this space are likely to be able to leapfrog the competition with smarter decisions and more insightful products and services that serve customers better.
Nice guy, smart company, killer product.
Recently I spoke to David Emery a friend and colleague from my time at Coopers & Lybrand. He is now working on a major social media initiative for a global mobile telco. I was interested in David’s perspective as he has been working on a set of solutions to process log files at enormous scale. You might think this is a somewhat trivial use case but many modern business processes at scale generate impossibly large quantities of data that needs to be turned into information.
David and his colleagues have been using a number of open source components to attempt to solve the issue that scale up won’t scale enough and leveraging cheap compute and storage plus smart software and algorithms to deliver a solution. I think David makes a number of important comments that vendors would be well advised to heed:
- Massive Internet scale problems are now solvable and enterprises want to mine the data to generate business information
- The value of the whole solution is enormous but the sheer scale can make it unaffordable
- Open source software and scale out commodity hardware are one possible solution to scale and affordability
- Smart techniques or approaches like Hadoop and MapReduce are now becoming commonly used tools
Here is David’s story:
“Demand for storage capacity continues unabated, rising upwards along an exponential growth curve (Kryders Law) that has challenged vendors to squeeze more bang per buck into SAN, NAS and a whole array (pun intended) of predominantly vertical scaled enterprise class storage solutions.
Improvements and innovations over the years in the form cramming more ‘bits’ per inch onto a hard disk (magnetic bit density), RAID configurations and fibre optic technology connectivity have given us ever faster, larger and resillient storage solutions that we quickly fill and consume. This demand is unlikely to diminsh as applications and datasets become evermore enormous and sophisticated.
It’s not only super computing project applications driving huge demand. Whilst the data generated from the Large Hadron Collider may be an extreme example, it currently generates 2GB per 10 seconds of use, there are many less esoteric applications demanding huge volumes of storage: think Genome, DNA and RNA analysis, pharmaceutical research, financial modelling, Internet Search, Email and Web 2.0 Social networking sites.
The latter examples seem less obvious until you consider the sheer number of users: Facebook recently surpassed 400m customer accounts. It’s no surprise then, that the leading internet companies have taken a different approach to increasing storage demands rather than solely relying on the bottom up vertical approach of the traditional storage vendors.
Google and Yahoo have been key players in the development of distributed storage and analysis efforts (where there is data there is a demand to analyse and report on that data) that have yielded amongst others Hadoop, MapReduce, HDFS (Hadoop Distributed File System ) and GFS (Google File System).
In the massive scale out architectures required to drive the Google Search and Facebook web applications of the world, horizontal scale-out is king. Tiered architectures remain valid, but they are increasingly underpinned by free open source software.
It’s not only web start ups that have grown from small beginnings to large corporates that have embraced the free software stack, (Apache, Linux, Squid, MySql, Perl, Python, Nagios etc) to support expansion whilst avoiding crippling licensing costs, both small and large enterprises have joined the bandwagon as many of the barriers to entry have become irrelevant.
Product stability, maturity, wide spread adoption and readily available support have mitigated many of the perceived risks. The architecture scales, the software works and can all be built on a foundation of cheap commodity based servers. Virtualization and Cloud Computing have only reinforced this trend and Infrastructure is increasingly provided as a Service (IaaS) where the bare metal plaform is entirely abstracted and increasingly irrelevant.
The distributed architecture and horiztonal scale out approach is now beginning to shake up the Storage and Database tier and therfore, the Storage Market place. Customers want massive capacity, reliability and good performance, but they also want to avoid to vendor lock-in and large upfront investment costs. They also want more effective ways to process to such huge volumes of data.
Distributed File systems and Distributed compute processing make all of this possible. An emerging sector with players such as GlusterFS, Lustre and Ibrix has grown and the traditional storage vendors are shoring up their product ranges with similar solutions. HP bought Ibrix whilst Gluster is going down the monetized service Open Source route.
Logfile collection and processing provide a highly relevant, if more mundane, example of how these building blocks can be pulled to together to form a innovative and cost effective solution, that grows as the customer demands increase. In an infrastructure supporting a web based service supporting just under two million users, I’ve recently seen systems generate over 100GB of log file data per day.
Historically, collecting and storing such data is often overlooked or poorly implemented, if at all. It is often seen as a costly process, of limited use (typically because the value in the data is widely spread out cannot easily be retrieved in a meaningful way) and ultimately becomes little more than a burdensome risk, rentention and compliance requirement for many organisations.
Much of the data that is kept ends up on tape gathering dust. How can a customer expect to grow their service from two million users, to five and then twenty and beyond without crippling storage costs, let alone handle such large volumes of log file data and do something useful with it?
A storage platform fronted by a Distributed File System provides one possible answer. The DFS can be built upon multiple nodes running on cheap commodity hardware. More nodes can be added as required, the underlying hardware can be changed and can compromise many different nodes running on different platforms. The DFS provides the clustering, reliability and scaleout storage architecture under a single namespace, accessible by any number of standards protocols e.g. CIFS, NFS, HTTP, iSCSI etc. What’s more a multiple node system can provide readily available processing power, suitable for MapReduce type applications. Of course an alternative is to stick with large scale vendor specific storage platforms, where cost is reduced through economies of scale and risk is somewhat mitigated at the expense of lock-in.
A similar DFS approach has been successfully implemented by MailTrust (Rackspace’s mail division) to capture, collate and process huge volumes of daily log files using Syslog, Hadoop and MySQL. This may be ‘just’ log files, but the power of the data can be harnessed for better support operations and identify trends.
Of course this is possible with traditional tools and storage, but the key here is scale and affordability. I’ve recently seen other companies looking to build similar Distributed storage platforms that will also form the backbone of a private storage cloud, fronted by Eucalyptus software. Again, the whole architecture can be comprised of OpenSource software running on cheap commodity hardware.
It is the software and open standards that are increasingly enabling organisations to build massive internet web services, requiring massive storage. The database and the storage layers remain the last vertical bottleneck, but this is changing. SAN and NAS technology will not disappear, rather consumption will probably continue to grow (in line with Kryders Law), but DFS and greater flexibility are here to stay.
The success of companies such as Gluster and the wider spread adoption of HDFS and Google FS will remain the key as to how many customers, and by how much, move from hardware specific storage plaforms provided by the likes of HP, IBM and NetApps to more Open standard based solutions not requiring proprietary hardware. The same vendors will be providing much of the commodity storage anyway, but it’ll make interesting viewing watching the larger vendors respond.”
A while back I met Kathrin Winkler, Chief Sustainability Officer at EMC. She was delivering a briefing about EMC’s Corporate Social Responsibility (CSR) activities to a group of industry analysts. Most CSR briefings are as dull as ditchwater and devoid of anything remotely innovative or challenging. CSR is for some just going through the motions rather than an integral part of the brand, culture and values of a company. CSR needs to enhance brand equity or else it becomes an irrelevance that has no place at the boardroom table.
Kathrin broke the mould presenting a structured program of activities that crosses every part of EMC from sourcing, manufacturing, logistics through to disposal of equipment. Kathrin demonstrated to me that CSR is deeply embedded into EMC’s DNA, part of every business process, integrated into EMC’s brand and sponsored at board level.
It is no surprise to learn today that Kathrin was today (Tuesday 23rd February 2010) asked to present to Senator John Kerry’s (D-Mass.), US Senate Commerce Subcommittee on Communications, Technology, and the Internet, on the relationship between energy efficiency and technological innovation.
The hearing explored how expanding broadband, strengthening smart grid technologies, and improving consumer understanding of their energy usage can lead to dramatic energy savings and reductions in greenhouse gas emissions. It will also addressed how firms in the information and communications sectors are driving change and how government as consumer and regulator can help drive incentives to innovate.
Here are the insights that Kathrin put to the hearing as actions that Congress can take to help reduce the impact of ICT on the environment:
1. Demand the Federal Government lead by example to drive energy-efficiency throughout its ICT enterprise by aggressively pursuing virtualization, and ICT/data center consolidation. Congress, through its various Committees, has oversight responsibility for the largest ICT infrastructure in the world; the President’s FY 2011 budget requests $79.3 Billion for information technology. OMB included in the FY 2011 budget a plan to drive ICT consolidation: “OMB will work with agencies to develop a Government-wide strategy and agency plans to reduce the number and cost of Federal data centers. This will reduce energy consumption, space usage and en-vironmental impacts, while increasing the utilization and efficiency of IT assets…” Congress should request and review these strategic plans as part of the annual appropriation process and provide the resources necessary to accelerate OMB’s ICT consolidation plans.
2. Bridge split financial incentives in federal data centers – In many government data centers, those responsible for purchasing and operating the ICT equipment report to the CIO while those responsible for the power and cooling infrastructure typically pay the utility bills. This leads to a split incentive, in which those who are most able to control the energy use of the ICT equipment (and therefore the data center) have little incentive to do so or even insight into their own usage. This could be remedied by Congress requiring that agency CIO’s report on data center energy consumption and provide a baseline to Congress for future comparison.
3. Continued investment in cloud computing and next generational ICT research at NIST – Government has become an early adopter of cloud computing. As with the deployment of other promising technologies like smart grid and electronic health records, cloud computing will not be fully realized without open interoperability, data portability, and security standards. Congress should fully fund NIST’s Cloud Computing Standards Effort.
4. Collaborate with industry to promote the development of measurement tools for government and private sector data center operators. – Industry continues to struggle to develop acceptable models to measure data center efficiency. Without reliable efficiency methodologies on which to base rebate programs, it is difficult and expensive for utilities to conduct tests themselves and many simply forego rebate programs. With an estimated 1200 regulated utility service areas in the United States, there is tremendous potential for replication of successful programs. With Energy Efficiency Resource Standards mandates in more than 19 states, Congress should assist in providing useful measurement tools for the state PUCs to incentivize energy conservation in data centers.
Kathrin is 100% right, the key is ensuring that the artificial economics of Government that hide the costs of power from the costs of IT are ended and replaced with the realistic economics of the full end-to-end total life costs including disposal and operation. Private business could learn a lot from this same approach and put an end to the crazy policy where facilities pay the utility bill and IT buy the equipment.
Data Centre design is an evolutionary process and we can see the first signs of significant change in the latest sites. Co-generation, liquid cooling, cloud computing, high density are all likely to feature in the 2020 Data Centre. How are you placed with your existing Data Centre investments to take advantage of these changes? Will 20th Century Data Centres have to close because they just can’t deliver the level of efficiency that government legislation and economics demands?
Join Steve O’Donnell for a live data centre summit on the 21st April 2010 at 13:00 GMT, 8:00 EST.
Most people who read this won’t have a clue what a Hollerith punched card is. I only just caught the end of the era at University where I learned to program in FORTRAN coding one punched card at a time. Once the stack of cards was complete, I delivered it to the computer operator for scheduling and execution.
Jobs were scheduled one at a time because that is how the primitive Burroughs scheduler and operating system was designed. Running more than one program at a time was still a pipe dream in those days so hardware engineers focused on making programs run faster by scaling up the hardware. Faster Processor, faster IO, more main memory, and more powerful instruction sets that did more in fewer clock cycles.
This propensity to scale up, make computers more and more powerful and IO faster and faster has been at the center of the whole industry for decades, an arms race for more clock cycles. In fact Gordon Moore a founder at Intel coined the phrase Moore’s Law to describe the rapid and continuous performance improvements in processor performance we have seen over the last 40 years.
The same is true for networking. Token Ring networks ran at 4MB/s, Ethernet at 10MB/s in the early days of the LAN, now 10GB/s is the norm for new installations, a three orders of magnitude improvement in 20 years.
Storage systems also have shown massive performance improvements with systems like Oracle’s ExaData offering 1M IOPS performance levels. Database technology has also seen massive performance improvements driven in part by smart data design and and great database technology. Performance levels we see today in these scale up systems are unimaginable only a few decades ago.
I remember in 1975 Donald Michie Professor of the Machine Intelligence and Perception unit at Edinburgh University proving mathematically that we would never see a computer beat a grand master at chess within out lifetimes. The problem was too big to solve with current technology and the rate of growth of performance required to beat a grand master, he told us was just unbelievable.
The fact that the unbelievable levels of performance we see today are still not enough for the largest Internet scale tasks such as hosting Twitter, Facebook or LinkedIn or managing the search indexes at Yahoo or Google. Scale up just doesn’t scale up enough. None of these Internet scale enterprises use scale up technology any more. They scale out at every level. Scale out compute, storage, network, application architecture and even at the database level.
Scale out applications are becoming more common with developers adopting a MapReduce style approach to coding, where a master process splits the problem into a number of smaller parts and then farms them out to a large number of processes that derive the answer. The master process then combines the answers to deliver a single consolidated output. For the largest scale computational problems this is often the only way to get to the answer in a meaningful timescale.
Scale out compute is now commonplace, with any number of hypervisor technologies (VMware, Xen, KVM, Hyper-V) supported by a cloud operating system to handle virtualisation and load balancing.
Scale out storage is also a growth industry with products like HP’s X9000 (IBRIX) and IBM’s XIV gaining traction in the market. Object storage is also gaining popularity with URI or HTTP protocols becoming commonplace on any number of offerings such as Amazon’s S3. Open source file systems such as Apache Hadoop add an additional feature of understanding the location of the data so that compute and storage elements can be closely co-located to reduce network latency and end to end bandwidth demands.
Scale out networking follows the logic that most network traffic in a scale out world is edge to edge so why bother with a core network? Converge on 10G lossless Ethernet using top of rack switches supporting iSCSI, NAS and HTTP protocols to converge the SAN and LAN into a common routable IP system.
Scale out databases are now commonly referred to as NOSQL databases that go back in time to pre-relational designs that do not provide ACID consistency guarantees (atomicity, consistency, isolation, durability) but allow sharding to split the data sets over multiple systems to improve the parallelism of the overall system.
The legacy of the punched card is still with us because Information Technology is an evolutionary process. Scale up approaches continue to support the evolution, but one day the dinosaurs will die out.
If you have been following the storage business for a while, you will have noticed a few changes:
- Introduction of Flash Memory components as Solid State Disks
- Serial ATA (SATA) disks becoming popular and growing in capacity (2TB soon)
There are lots of other disk technologies around like Fibre Channel and SAS but SSD and SATA are getting the big press and are taking market share. You might ask why? Disks in a data centre, use power day and night, 365 days a year. A typical disk (Seagate Cheetah 15K.4 147GB SCSI) uses about 18W. In a data centre that means that it’s lifetime (5 years) power consumption including cooling and power protection (PUE 1.6) is likely to be 1.26 MWh. At 10c per KWh that equates to $126 per disk. So for 1PB of storage the lifetime cost of power will be $860,000 not including capital plant.
So getting the power that disks use down to a reasonable level is important. The formula that engineers quote for power consumption is:
Power ∝ Diameter 4.6 x RPM 2.8
So if we use large physical disks like in the old days where 8″ and 14″ were common disk formats we get 7717 times more power needed to drive a 14″ disk than a smaller 2″ one.
Power of 4.6
Ratio to 2″ disk
So the world is moving to smaller and smaller disks to reduce power demand, reduce heat output and deliver increased densities.
Spin speed has a similar impact so low spin speed disks use a lot less power than their high speed equivalents.
Power of 2.8
Ratio to 5400 RPM
Slow spin speed, small disks use less power than larger high spin speed disks.
As the price of SSD continues to drop, the high spin speed disks that we use for high IOPS solutions will increasingly become replaced with SSD, whilst capacity will be served by low spin speed SATA migratting the storage world to Flash and Trash.
Inevitable and proved by the maths.
My old CIO at BT, Al-noor Ramji had a most delightful and endearing way of describing just how unimportant and disconnected IT Infrastructure is from reality by describing us as “The toilet cleaner’s toilet cleaners”. Like other successful CIOs Al-noor had the ability to cut through the noise and explain things as they are.
In every business conversations start at the CEO level who is focussed on understanding his customers’ needs and executing on a plan to service them better than the competition. This is where business value is created and strategic visions are formed. It is here that CEOs deploy capital to create business value, revenues and EBIT that eventually translates into shareholder returns. This is the engine room of capitalism and it is here that many businesses win or lose.
The strategy and vision typically filters down, layer by layer in the organisation, through marketing or product management who work out how to combine products and services together to deliver a compelling answer for the customer. It is also here that the first glimmering of an idea for supporting IT Services are formed and it is here that the first washroom attendant is called in, the CIO. Typically the CIO gets a briefing that she needs to deliver some changes to the CRM system and some new workflow for the call centre.
Already the layers of filtration have dulled the vision and strategy formed in the CEO’s office suite.
Usually the first that IT Infrastructure get to hear about this new initiative is when it is a few days away from deployment, too late even to properly introduce it into service properly. The washroom attendant’s washroom attendant is used in a purely reactive way, responding in real time to seemingly disconnected and random acts of violence to the IT estate.
We can all recognise this behaviour, repeated time and again in the largest enterprises and always leading to a sub optimal outcome. As consumers, we have also experienced business initiatives that have the CIO and IT Infrastructure fully integrated into the initial conversations that have enabled a competition crushing solution to be deployed. Google has wiped out the 20th century advertising industry and outperformed their digital competition by being joined up. Apple took on the music industry with iTunes and now own the space. There are many other examples but unfortunately they are swamped by the normal, broken approach that delivers these frustratingly sub optimal outcomes.
To engage in the initial conversation that forms strategy and vision, IT must become a trusted advisor that stops talking about IT return on investment and starts talking about business return on investment. Only when IT can’t help the CEO kill the competition does it need to be cheap and silent. Otherwise we serve our companies badly if we don’t speak up and become part of the initial conversation.
Being part of that conversation is all about delivering business agility, reduced cycle times to deliver products and services, reduced business risk and more certain outcomes. All deliverables that IT was set up for in the past.
We forget this lesson at our peril and will certainly be consigned to being cheap and silent forever.
Both Seagate and Western Digital announced Q2 results last week perhaps signalling a return of confidence in the disk drive channel. Component manufacturers in the enterprise IT channel are an interesting bellwether of market confidence as orders need to be placed in advance of shipments of finished goods. There is a significant delay in revenue recognition from disk drive component to populated storage controller. On this basis we should see some interesting positive results for the desktop, server and storage controller vendors in Q1 2010.
Revenue was up at both Western Digital (44% y/y) and at Seagate (33.4% y/y) as were shipments. Seagate reported 36% growth whilst Western Digital 29% growth.
Around this time, analysts at ESG pull together a ten point list of predictions for the coming year. One of my areas of coverage and of expertise is in the Data Center around power, cooling, reliability and economics. So what’s different this year from prior years?
Strengthening fundamental drivers will likely make 2010 materially different from previous years for data centers. These drivers include continued increases in the cost of power, lack of investment in new general-purpose facilities during the recent economic crisis, and the continued drive for higher density implementations. Poor-quality data centers will become increasingly uncompetitive and costly to run. This development, combined with lack of new capacity ready to come on-stream, will drive up costs significantly. This combined with an accelerating economic recovery are going to make 2010 interesting.
The fixed (tiers) definition of what a data center should be has been becoming less relevant for some time. Over the course of 2010 it will become apparent that there are many valid alternate designs that can deliver service whilst continuing to be reliable, but improving on operational and capital costs. A number of newer (or reintroduced) approaches will start to become important and gain market share. Among the trends I will be following in 2010:
- The gradual migration towards liquid cooling will commence with strong leadership from IBM with the launch of the Z11 mainframe with water cooled options. The massive efficiency benefit of liquids – being some 4,000 times more efficient than air at removing heat – will drive adoption for the highest density deployments such as HPC (high-performance computing) and mainframe first, followed by general purpose computing later.
- Conventional lead acid battery strings combined with UPS (uninterruptible power supply) will give way to flywheels for AC power protection implementations. Sustainability and efficiency gains make this inevitable in the developed world with increasing government regulation around the proper disposal of heavy metals.
- The raised floor will begin to become unnecessary as cooling, power and data feeds start to be supplied from above for most new installations. Raised floors have always been problematic, especially in the area of maximum static and rolling load. Furthermore, pushing cold air from below the floor has always been a sub-optimal design. Cables and power feeds are much easier maintained if delivered from above.
- DC power options will start to become more common on IT equipment with many forward-thinking data centers offering optional AC or DC power feeds. This will leverage the higher efficiency of DC power delivery and inherent reliability.
- Converged edge networks with smart switching driven by FCoE will reduce the need for manual patch configurations and change the layout of the data center. The edge will be located in-row and at the top of cabinets. The number of cables will reduce dramatically but the criticality of connectivity will increase.
- Increasing levels of server, storage and network virtualization will continue, mopping up what remains of the development and test platforms and gradually moving into the critical production application space driven by tight integration between the application and hypervisor. Operational flexibility rather than efficiency will be the main driver for change in the critical application space, overcoming the inertia of risk-averse CIOs.
- Reliability will continue to migrate towards the application layer, reducing the dependency on data center infrastructure. Critical prerequisites will be high-performance networks and de-duplication technology that enable rapid migration of data between sites.
- Data centers move to different locales. Choosing a data center site because it is close to corporate headquarters will no longer be viable as real-estate and power cost constraints will restrict city-center data centers to latency sensitive applications only.
- Combined Heat and Power (CHP) plant will replace backup generators (engines) in many city center locations as the global lack of investment in electrical power grids continues to hamper growth of latency-sensitive application hosting.
- Demand for co-location data centers will begin to tail off in demand and be replaced by data centers hosting IT as a service offering as business migrate to cloud computing models.
Data center investments are long term bets and as a result, change can appear to take a long time to materialize. Data center capacity of the right type is becoming scarce as demand continues to increase at the exact same time as an industry-wide a lack of investment in new capacity due to the economic downturn. As the macroeconomic recovery continues to accelerate, the latency caused by lengthy data center building and fit out will exacerbate scarcity.
The outcomes are likely to be:
- Much more aggressive take-up of alternative and more power-efficient technologies at the mechanical & electrical layer in a desperate attempt to control costs at existing facilities
- Customers demanding increasingly tight integration between applications and virtualization to improve agility
- Older data center sites becoming increasingly uncompetitive – forcing reductions in depreciation cycles – as refresh becomes critical to remaining in business
Today Microsoft and HP announced an expanded partnership in order to deliver fully integrated application to hardware stacks. It’s a brilliant move, absolutely stunningly smart and spot on for HP.
I wrote about Oracle VM and the fully integrated stack that Larry Ellison has been promoting to his customers. Superficially it might seem like a piece of vendor lock in but actually it is a very powerful and compelling solution for risk adverse enterprise customers. Do not underestimate how CIOs treat risk where their critical applications are concerned. It is undoubtably the number one driver and motivator.
By offering a single, fully supported (at the source code level) stack I believe Oracle got it right. For a while they have been the only player on the field to be in a position to make that offer.
Today HP and Microsoft became the second fully integrated stack in play.
I better explain what I mean by a fully integrated stack it includes the following layers
- Middleware and Database
- Programming language and framework
- Operating System
VCE (VMware, Cisco and EMC) and Citrix play in this space too but have too many missing parts to be fully vertically integrated.
|Oracle / Sun||HP/MSFT||VCE||Citrix|
|Middleware and Database||Y||Y||N||N|
|Programming language and framework||Y||Y||N||N|
So VMware have the largest market share and a ton of trained VMware engineers in the field but there are more Microsoft MCSE guys out there and everyone understand Proliant,
The biggest point is who wins the ISV and developer mindshare? Microsoft have .net, Oracle have Java. Microsoft play better with developers than Oracle but maybe Larry can learn a bit from Sun about winning Java mindshare?
If I was Larry, I would borrow some Iranian nukes and bomb the EU, if I was VMware or Citrix I would be sucking up to developers like crazy. Mark Hurd is definitely feeling smug, very smug today.
Back in 2008, Steve O’Donnell wrote an article here on The Hot Aisle explaining one of the challenges he set his team during his time at BT, the difficult task of getting Asset Management right.
To summarise, Steve kicked off an audit of the whole estate, and where owners couldn’t be found for kit on the floor, the hard line was taken of switching it off. In some cases developers and engineers got annoyed when their precious server was threatened with shutdown, and when it was explained why it was being turned off, there was a surge of people updating the CMDB and making sure that nothing was left unaccounted for where it was required.
It soon became apparent that much of this kit was no longer in use and it enabled BT to switch off 10% of its server estate with a cost saving of roughly $7M in electricity costs alone.
Job done? Absolutely not.
Over the coming days, I will be blogging about the power real knowledge of your Data Centre estate can bring, the issues it will help eliminate, and tools that I have developed to harness this data and provide automated, management reports to data centre managers, strategic data centre planners and space management boards alike.
First up, Power Outages and Load Balancing.
As the demand for Data Centre space increased, BT faced the difficult issue of power outages. PDUs were regularly tripping causing a fail over to other PDUs. It became apparent that we faced the risk of cascade failure where a single PDU tripping out could swamp others and cause a data centre to fail.
However, we realised that wasn’t simply a case of “That’s it, we’ve used all our PDU capacity, we need to invest in new ones!”
Over the years, the loading of PDUs hadn’t always been done methodically and fully thought through. It was guessed at that perhaps, PDU1 was running at about 30%, so let’s attach this new server to that and to be on the safe side, lets dual feed it to PDU3. Often the PDU attachment was never recorded for a server.
When power demand started getting high, problems were encountered. There wasn’t even load balancing on PDUs, and what’s more, there were no records to identify where this load balancing needed addressing.
The simple question of “Where of my business critical apps?” could easily be answered following the clean up and continued management of the CMDB, but the question of “are these apps running on equipment which is resilient to power failures, dual fed, on evenly loaded PDUs?” could not!
There was a gap in our knowledge and reliable power feeds were are risk because of it!
I spoke to Steve about this and said that if we have a record of all PDUs within a site, their capacities, and the kit that they are feeding, I can provide you with the following reports to quickly tackle this problem and set the guys targets of where to begin.
1) The load on each PDU within a data centre, including KW used, KW Remaining, % Loaded, % Free
2) A list of all single, dual and triple fed equipment and the load of each PDU feeding that equipment
3) A list of all single fed equipment holding business critical applications, dual fed and triple fed equipment hosting non critical, development environments
How were we able to answer these questions? Most of the data was already there! An audit of the estate provided the location of kit, the kit models, the applications that ran on them. Knowing the kit model meant that we could integrate 3rd party data telling us the theoretical power utilisation of that kit (which could be factored down to provide more accurate, real world figures).
Once these reports are available, we could then go about go about phase 1, resolving these load balancing issues and deciding where we may need to invest in added PDU capacity.
So the audit began, I ran a report with a list of equipment, rack by rack, and the M&E guys went about collecting the data and feeding it back to me.
In the meantime, I developed the tool which would digest this data and return the reports as promised. The tool was web based and was securely accessed over the Intranet, access was managed and given to those who needed the information, and phase 1 of the Data Centre Power Tool was complete.
The process of developing this phase of the tool was literally knocked up over night. We had the correct processes in place for gathering the data, I had the skills to manage the team and communicate exactly what was needed and more importantly, why I was asking for it, and I used my knowledge of data centre infrastructure alongside development skills to begin the process of developing a powerful and invaluable management tool. And of course, I had Steve to call upon should we run into any stumbling blocks!
The load balancing was soon addressed, some new PDUs were purchased and installed, and the whole operation began to run a lot smoother. Processes were put in place to record and maintain the PDU linkages to kit inside the CMDB and the Load Balancing tool was left within BT for continued use.
The process of getting the data right within the CMDB, our collective understanding of data centre infrastructure and the development skills on hand helped solve a problem that could have been very expensive and very embarrassing within weeks. The process of integrating this tool with the client CMDB, meant that this sort of issue should never arise again within BT.
Later, I will blog about how I developed this tool into a powerful strategic planning system which was used by both the M&E Infrastructure Team and the Space Management Board to aid the process of planning “where to place equipment” in our data centres.
My good friend and ESG colleague Terri McClure @esganalysttmac recently blogged about a thought leadership piece I had presented at an analyst call last week. She made a very good job of explaining it and so I thought that I would write a little more about it here:
I call the concept the “Golden Triangle” and it represents the three key influences that C-level enterprise IT buyers have when they come to make a large scale IT procurement decision.
Quite often the buyer does not consciously consider each of the three influences but nevertheless they play a significant part in the decision process. Let’s look at them in more detail:
Cost is always an influence but is (perhaps surprisingly) rarely the most significant. Let’s look at the evidence, it is rare for a market leading product to be the least expensive (ask any EMC, VMware or Oracle salesperson). They are market leading because they sell more than everyone else – not because they are cheaper – I assure you they are not, nor would they want to be. Point made?
So actually the most compelling influences are Risk and Cycle Time. These influences can unseat an incumbent supplier or glue him firmly in place.
Cycle Time is all about BUSINESS agility – not about being able to stand up a server or roll out a new LUN faster (although they may in themselves have a positive influence on a business process). The question is, does this purchase decision help the business people to kill off their competition, serve customers better, fight off a strong competitor or be able to deliver new products faster before the competition does? If it does that is a MUCH stronger influence to buy than just being cheaper!
Risk refers to Business Risk – not much to do with ensuring that backups are taken regularly or equipment reliability of itself (although again these may have a bearing on a business risk point). Much more about business certainty, ensuring that the customer service agents are able to deal with customer order in a timely fashion or Invoices are sent out on time or even that the ambulance gets sent to the right address. Again this is a very strong influence on a buying decision.
So selling conversations that focus on technical features – mine is bigger / faster / more reliable than the other under consideration won’t play well in a world over supplied with product size, capacity, reliability and speed.
Here is the vendor lesson for the day – if you can’t define a clear BUSINESS advantage in terms of cycle time and risk reduction, you end up on a downward price spiral that only firms with deep pockets and efficient manufacturing capability can survive.
Incumbent vendors (unwittingly) leverage risk and cycle time to be sticky and maintain their customer base – why change – is it worth the risk? Why change – it is much easier and faster to stay with your current technology, process and services?
Competitors can overcome these objections if they are able to demonstrate business influencing cycle time and risk advantages.
(Lets have another look at cost. Cost can be made up of a number of elements, the Capital Costs of acquisition, the Operational Costs of running the product or service, as well as the write off cost of any asset that is being displaced before it is fully depreciated are all well understood but the main cost can often be forgotten, the cost of doing nothing. The cost of doing nothing in replacing old equipment can be greater than all of the other costs combined. Higher energy efficiency and lower support costs can dwarf the replacement costs.)
My colleague at ESG John McKnight, just briefed some summary output from our IT Spending survey. The results are presented below hot off the press. Security, Storage and Network see the biggest increases in growth but Virtualization software continues to lead in absolute terms.
It would be interesting to know how many Enterprises are looking to rectify their poor licence compliance in the virtualized world?
Here is a video from my friend Professor Masood Amin @Massoud_Amin who is the world authority on Smart Grid. If we want to save the planet from global warming, prevent terrorists shutting down our economy and prevent catastrophic failure of our power distribution systems. This is the template of what we need to do and why:
Note the need to move to DC Power – I’ve been saying this for years.
For years I have been a prolific photographer of Data Centres all over the world. I have hundreds of images, some brilliant, some out of focus and under-exposed, (I am a data centre guy, not a professional photographer). I thought that I might publish some of the better ones with a bit of narrative about where they are and what they show.
This photograph was taken at the BT Reuters Data Centre in Nutley New Jersey USA. It shows a physical security device that checks the retina of the individual trying to enter against the security database to enable or deny access. Actually this is much more practical than a fingerprint scanner as:
- A surprisingly large number of people have no or poor fingerprints
- It is contactless and hygenic
- It can be used when hands are full of kit or manuals
This photograph (taken in the middle of winter) shows a large bank of Air Side Economizers in the process of being expanded. See the empty support brackets in the photograph foreground. Economizers work best in winter as the delta between the hot water in the radiators and the cold air (it was sub-zero) mean that we don’t need to run the fans much. In warmer times the fans run to blow tepid air over the radiator coils and consume electricity. In hot external weather, the economizers are no longer effective and the site refrigeration plant must take over. This consumes very large amounts of electricity.
This photograph shows a VESDA system which is designed to detect small traces of smoke early and set off the alarms or trigger a gas discharge.
My friend and colleague Steve Duplessie at ESG just blogged about an interesting court case happening in the USA on The Bigger Truth.
“An e-mail archiving company, ZL Technologies, Inc., has sued, been dismissed, and re-sued Gartner – basically claiming that ZL’s placement in Gartner’s “Magic Quadrant” has caused the company damage – namely, that since Gartner places ZL in the “niche” spot, large customers don’t consider them, although the company contends their offerings are superior to those listed in a more prominent spot. Gartner counter claims that the suit is without merit because the MQ represents opinion, and therefore there is no legal leg for ZL to stand on.”
Here’s my take, nobody who knows anything about IT would pay any attention to Gartner anyway, the Magic Quadrant is uni-dimentional focussed on solving a particular silo of technology problem. Enterprise IT guys have a completely different focus – on Business issues and business problems. CIOs look for three key measures when considering technology: – Business Risk, Business Agility (Cycle Time) and Business Cost.
In my career I probably spent a few more billions of dollars on IT than the average and I can honestly say that I never paid any attention at all. Magic Quadrants just don’t help with any of that so are more important to vendors as bragging rights than useful to buyers.
I attended a very interesting dinner event recently at The Boxwood Cafe in London hosted by Andrew Barnes from Neverfail. The objective was to have a discussion about Disaster Recovery and the implications for IT and business. It was a very well attended event with some great input from the CIO and IT Director attendees. Sarah Hoyle took some notes of the discussion that really captured the main points and I reproduce the output here:
Many organisations see business continuity as just an IT issue and not a concern for the wider business. Trying to change this view is proving challenging. Generally except in organizations where there are regulatory imperatives (Financial Services and Sarbane Oxley) to deliver structured Business Continuity capability there is little interest from business people.
There was a strong emphasis on the need for IT executives to get the wider business to understand the value of business continuity and either positively accept the risks or invest time, money and resources into it.
Analysis – this is symptomatic of IT being seen as a cost rather than a business driver
Some companies started out wanting to protect every server and every application but as the cost of doing was prohibitive they were forced to identify their real business needs and define SLA’s and recovery times so that sensible decisions about business continuity could be made. The point was emphasised several times that business need should be clarified before even considering a business continuity solution as it is the only way to ensure the solution will address the need.
Analysis - this is generally true of all business initiatives – if you don’t start with a vision of what is needed, it is certain you won’t achieve it.
Many companies don’t even consider disaster recovery until after they have suffered some kind of disaster or outage – maybe because after experiencing significant downtime they’re able to put a price on what this has cost the business and are then able to justify investing in business continuity. A better option though would be for IT to quantify to quantify the anticipated cost of downtime before a disaster occurs.
Analysis – many businesses fail to understand the implications and costs of interuptions to business process, either caused by IT outages or otherwise. CFOs often look at the cost of protection rather than the value of investment in business continuity.
There is often a perception at board level that business continuity/disaster recovery is expensive but this doesn’t have to be the case.
IT is often seen, incorrectly, as creating a need for business continuity rather than identifying the risk. IT systems are often seen as part of the ‘plumbing’ and the assumption is made that it will always be available, come what may, without any need for investment in business continuity.
Analysis – Sometimes business continuity planning is about documenting processes and providing simple work arounds.
Many companies concerned about the legal implications of risk, particularly legal, finance and insurance.
The consensus was that the business itself should write the business continuity plan from a broader perspective and that IT would write the technical recovery plan to support the business need.
It was generally regarded as key for IT to drive business continuity/disaster recovery and manage the business’s expectations accordingly.
Analysis – this is absolutely spot on.
The conversation then covered some specific examples:
- One organisation insists that each department has its own business recovery plan, which is reviewed every 6 months.
- For some industries the opportunities around HA/DR offered by Cloud computing is not appropriate as there is a legal requirement for the data to be stored in the same country, which may not be possible.
- Some organisations see no immediate need for disaster recovery and are much more concerned with resilience.
- The implications of downtime vary, examples given included:
- Sheraton hotel website not available on a Saturday morning because of planned maintenance so the person booked a room with a different hotel chain.
- Whilst some industries, for instance estate agency, may be able to work reasonable efficiently without access to IT systems for a short while, customer satisfaction and retention is impacted if the agency’s websites are unavailable to house hunters – the house hunters won’t go elsewhere but the sellers get very concerned if their home isn’t available to view online. However, when a deal is about to close, or sealed bids are used, access to email and other IT systems becomes critical.
- The question was asked whether lack of availability affects staff morale and retention, particularly with sales staff who view having the right tools to do their job as essential to their ability to earn money.
- The point was made that using back up tapes and then transporting them to another location was often impractical as it would take too long to get them back and then recover from them.
- Although back up and recovery are ingrained in people’s minds when considering business continuity, recovery actually means you’ve failed.
- Simple issues like building access control can be critical if they fail as staff cannot get in or out of the building.
- Protecting data is essential (particularly personal data) as people get fired for losing it.
- 99.99% uptime is seen as the standard but this still means 0.1% of downtime or lost data a year, which may not be acceptable and availability often isn’t appreciated until it’s lost.
- Many people believe that eventually HA/DR will automatically be built in as an integral part of IT systems at point of purchase.
- Many companies focus on the most recent big event (bombs, floods, swine flu) forgetting that a plan needs to be in place to address the consequences of downtime, not a specific scenario.
- Although data may be secure, applications and configurations may not be so easily recoverable.
- Storage, bandwidth etc becoming commoditised but demand is also much greater, however the bigger concern is the staff resource required to manage the IT infrastructure.
- Sometimes legacy environments are built in such a way, and amended over such a long period of time that it is impossible to rebuild them.
Some companies are looking at virtualisation in order to enable maintenance windows, business continuity and to break the dependence on specific hardware. Also to reduce costs although this is a secondary objective.
Virtualisation is seen as a catalyst for resiliency as:
- It reduces reliance on hardware
- Allows increased flexibility
- Allows an environment to be replicated and run on any infrastructure
Last week I met with my friend Steve Sole of Nubis who wanted to tell me about the work they have been doing in data centers around improving energy efficiency. Nubis make demountable Aisle Containment systems – they don’t care if it is Hot or Cold Aisle – the objective is to stop the air mixing.
Aisle containment has a number of well known issues mainly around safety and fire supression. Steve has a design that is optimised to get out of the way of fire detection and fire supression systems with two drop-down roof panel options ; Fusible Link and Water Soluble.
It is often the case that existing water-based fire suppression systems must be extended through the roof panels into the aisle to be effective, creating additional work, cost and disruption within a highly sensitive area of the business. Nubis recognises that in existing data centres enterprises want to preserve this investment.
In the event of a fire in the data centre Nubis’s Fusible Link roof panels fall away at 58C to allow the fire suppression system to be effective (fire suppression systems typically kick in at 70C). If Water Soluble roof panels are deployed then water sprinklers simple dissolve them.
Gas fire suppression is usually deployed through the floor tiles and in these cases the Nubis Gas Roof Panel provides an air tight seal to contain the gas and extinguish the fire.
All the panels are quick and easy to remove by one person and allow full access to room facilities above the cabinets.
Nubis further minimises disruption during ACS installation since it is non-invasive when attaching to sensitive equipment such as storage arrays, tape backup systems and PABX’s.
Air Containment Solutions (ACS) are recognised as a simple yet highly effective means of reducing the power requirements to cool the data centre resulting in fewer equipment outages so improving business continuity. ACS reduces the data centre PUE rating, thereby minimising the carbon footprint resulting in savings of up to 20% from cooling power costs.
This solution is neat because it is designed for existing data centers. The only constraint is that you have to have hot and cold aisles in place already
There is cloud and then there is cloud. Cloud with take it or leave it service levels or cloud with service availability that supports the UK’s emergency services (911, 999, 112). I know about this life or death service level because a few years back I actually ran the BT IT operational department that supported the emergency operators team at BT.
It’s one thing dropping an investment bank’s trading systems (for that you just get fired) but dropping the emergency services systems so that the Police, Fire Brigade and Ambulance don’t come out any more and someone’s Granny dies….. That is at a completely different level of responsibity, believe me. Firms that are focussed on search or selling books have one kind of culture, perhaps the wrong kind if you need availability and promises that will be kept.
So when I heard that BT Global Services had launched their Virtual Data Centre (VDC) service I was very interested. Here is a service that promises to have the flexibility of a cloud service - easy in – easy out with professional, enterprise support levels. It won’t ever be the cheapest but BT Global Services customers won’t buy on price. Now BT needs to leverage it’s brand and the skills of it’s people to execute on delivering Enterprise Cloud to the corporate market.
Here is the press release:
BT’s VDC offers a new concept in service delivery where customers are able to create and manage their own infrastructure service through a secure online portal. The service includes virtualised security, servers, storage and networks orchestrated and automated through the portal.
Here is what my friend and former colleague, Steve Holt, general manager IT services in BT Global Services, said: “Initial feedback from our early adopter customers has been really positive. Customers are really impressed with the simplicity and the agility of the service. We have four pilot customers already signed up for the service, with another five in the pipeline.”
“The availability of service is a significant achievement for BT. It really pushes the market boundaries to demonstrate clear commitment through our investment to bring innovative enterprise class services to our customers.”
BT’s virtual data centre means customers can enjoy significant benefits over traditional approaches. For example, it makes it far more cost effective, simple and quicker for customers to deploy and run their data centre infrastructure – at the same time offering higher levels of service assurance.
BT also offers a range of consultancy services to ensure that migrations happen quickly and effectively.
Plans are well underway to further enhance the core functionality and launch the VDC service across EMEA over the next quarter.