I have been thinking about data storage and protection recently and how our behaviour is driving massive growth in cost and complexity. The issue seems to eminate from the fact that we focus on solving backup when the real problem is restoring data:
- We need to be able to deal with data corruption and incorrectly added transactions (rollback)
- We need to be able to deal with partual or complete data loss caused by hardware malfunction, malicious acts or human error (data recovery)
- We need to be able to deal with total data centre loss resulting in loss of data (site recovery)
- We need to provide archive data to comply with regulatory and legal requirements (compliance)
- We need to be able to take snapshots of point in time data for testing (copy data)
Each of these requirements tends to drive a separate solution involving multiple copies of data in seperate towers. Complexity increases, volumes grow out of hand and we singularly fail to achieve the key objectives:
- Rapid data recovery with engineered recovery point objective
- Safe and secure business continuity protection
- Compliance with regulations and laws
- Clear understanding of where our data is and what version it is at
If we were looking to solve these problems and meet these requirements then tape backup, deduplication and archiving proceedures that we commonly use today would not be where to start. Storage, Server and Network Virtualisation has freed us from the tyrany of phyical connections between hardware and application, yet data protection pulls us back again. We need a new protection and availability architecture.
I thought that you might be interested in a networking event being run in London in parallel with IP Expo on the 19th and 20th of October 2011.
What is ExecEvent?
ExecEvent is a highly successful global event run by Greg Duplessie, brother of Steve Duplessie of ESG fame. The ExecEvent is an exclusive networking event for virtualization, cloud, storage and security industry executives. Plus those folks that want to interact with these executives (service providers, recruiters, technology law/tax firms, etc.) It is different than any other networking event you’ve ever been to. Our mission is to create a compelling event for industry insiders—one that focuses on networking and building relationships and does not require exhibiting or catering to end-users. This unique networking event will provide educational speaking topics and a spotlight for emerging products or companies (where appropriate), as well as plenty of time for your own meetings as you see fit.
When and Where?
Our next event is called the ExecEvent London 2011. It is scheduled for October 19-20th at the Earls Court Conference Centre in London. With a pre-event cocktail reception the evening of October 18th. We are working with IP EXPO, and the conference centre is a very short walk from their exhibition hall. The ExecEvent is specifically for senior executives in the virtualization, cloud, storage and data center space, as well as for press, analysts, consultants and financial professionals.
Why Should I Attend?
If you want end-users, then IP EXPO is the perfect forum and show. But what about your business partners, resellers, OEMs, VCs and investment bankers, consultants, etc. ? These shows are too crazy to focus on the business behind your business. That’s where the ExecEvent comes in. We bring together industry executives in very meaningful way.
If you make one or two solid connections at this event, it is worth your time and effort. “If you are a vendor in the cloud, virtualization or storage space and you are not here (at an ExecEvent), slap yourself and get signed up for the next one.” So says George Crump, Senior Analyst for Storage Switzerland
“I think it’s a great idea to separate business development and networking events like this with other events geared towards end-users and outbound marketing. With this event we can have the right people in the right meetings, without having to bring the whole company. Focus is key.” – Ed Walsh, former CEO of Storwize, Avamar and Virtual Iron, now an executive at IBM
How Much Does It Cost To Register?
The registration fee is £375 GBP. No VAT required.
What Goes On?
More information can be found at http://theexecevent.com/steve-odonnell-keynote-speaker/
Who Should Attend?
We specifically created this event for executives, regional and country managers and interested senior technology vendors in the following areas:
• High Performance Computing
• Hardware and Software providers
• Cloud Enablers
• Storage and networking equipment, software and services
• Consulting, training and development services
• System Integration / Consulting
• Virtualization solutions
• Data Centre technology
• Cloud Infrastructure
• SaaS, IaaS, PaaS, BPaaS providers
• Data security
• Research organisations
Google has the most amazingly clear terms of service for their cloud products. The terms are take it or leave it, no promises, no commitments and no come-back.
I love the statement that there is no warrant that the quality of services will meet your expectations. It’s a gem.
Thanks to Harqs Singh for pointing it out.
Yesterday Chris Leahy (my Technical Facilities Manager) and I were agonising over why we had low plenum pressure in our Data Center and why we were seeing symptoms of hot air trapped in the roof void. We looked at all the normal stuff:
- Leaks in the plenum space
- Badly sealed floor
- Cable access holes improperly sealed
- Blockages in the plenum
- Bad seals between the plenum and the CRAC units
In the end we worked out what the problem was. We have one of our CRAC units switched off as it is under maintenance. CRACS are pretty simple devices, they typically have dust filters, cold coils (water or DX) and an axial fan blower to drive the air down into the raised floor. In our case, because one CRAC was off, it was acting like an open chimney unrestrictedly delivering huge quantities of cold air high into the data center.
In fact it was behaving just like three fully open floor tiles. No wonder the plenum pressure was very low. We asked Stuart Hall at ARUP for his take:
A standby CRAC unit without non-return dampers will allow cold air to back-flow into the hot-zone.
We generally specify CRAC units in our designs with non-return dampers for this reason. The CFD software which I use includes them on all CRAC units by default (though they can be removed).
The amount of air returning would depend on:
- The amount of floor grilles and other openings within the raised floor
- The flow rate of air delivered to the floor by the operational CRAC units
- The resistance to airflow caused by the idle-fan, cooling coil and other geometry within the CRAC
I have witnessed this behaviour whilst surveying data centres. It is also common to see a small quantity of back-flow through CRACs with non-return dampers as they don’t make a perfect seal.
The only arguments that I can think of against installing non-return dampers would be space or cost. Perhaps a business decision was made to accept reduced flow performance in the event of a failure in exchange for cheaper units or physically smaller units.
So here is the lesson of the day – don’t make asumptions that switched off kit is neutral! If you have CRAC that are not switched on they may be leaking your precious cold air into the roof void.
Worldwide, there has been a lot of focus in recent years on reducing the environmental impact of Data Centers. Green always comes at a cost, but once it is viewed as a long-term investment rather than as a quick return on investment (ROI), it can be a viable cost cutting option. Data center investments are enormous; they are built with a long term vision of 10 to 15 years and, therefore, any means of reducing the amount of capital required need to be seriously considered. By improving the power efficiency of our Data Centers, we can simultaneously reduce the capital equipment required and the cost of power and maintenance to operate. This dramatically improves the ROI on our data center investments.
Energy costs in Qatar are exceptionally low (less than 2c per KWh) and this can contribute to the perception that there is a weak ROI on Green Energy Efficient investments. However, this perception fails to take into account the lost opportunity of re-selling valuable energy resources abroad rather than consuming them wastefully on the domestic market.
So, we had enough of the whys, now how do we go green? The simple magic words are Energy Efficiency and High Utilization. We could choose to have 1,000 servers operating at 5% average efficiency or 100 servers operating at 50% average efficiency. Many organizations choose to operate one application on one server, so 1,000 applications means 1,000 servers. By introducing virtualization we can run multiple applications on shared servers reducing costs, improving efficiency and being green all at the same time.
Virtualization can enable a perfect storm of efficiency; we reduce the number of servers, thereby reducing both the capital cost and operation costs of those servers. Virtualization enables reductions in both data center capital costs and the operational cost of supplying electricity and cooling. One often forgotten additional advantage is the reduction in software licensing and maintenance costs because we are managing a smaller IT estate.
Storage is an important issue to focus on to improve efficiency; one enterprise disk of storage consumes as much as 1MWh (megawatt-hour) over its useful life. CIOs and COOs often find it very difficult to delete data that is no longer useful. Studies show that the chances of needing a document or a spreadsheet reduce exponentially across time, so the chances of you needing that 7-year-old spreadsheet ever again are close to zero. Best practice establishes that we should set a data retention policy and ruthlessly delete data that goes beyond that time.
Data deduplication can be an additional useful tool toward reducing data storage costs and increasing storage efficiency. Data deduplication is a specific form of compression where redundant data is eliminated, typically to improve storage utilization. Deduplication is able to reduce the required storage capacity since only the unique data is stored. This in turn can reduce the overall footprint inside the data center. Deduplication can help by squeezing out between 10 to 20 percent more storage space just by getting rid of duplicated data.
Another valuable solution is to upgrade the Data Center equipment into Modern equipment that has larger capacity, e.g: multiple old servers can be replaced by one modern server combining the benefit of energy efficiency and smaller space requirements. Upgrading old Data center equipment can roughly increase energy efficiency by 3-4 times.
Cooling problems are clearly a major growing concern for Data Center Managers. As per Gartner Research , it is estimated that data Centers typically waste more than 60% of their energy just in cooling their equipment. Traditional cooling techniques are inadequate both economically and operationally. The solutions stemming from newer technologies are District Cooling and Hot Aisle Containment.
MEEZA data centers at QSTP leverage the massive investment made by the Qatar Foundation in district cooling, offering extremely efficient large scale plants that could not be replicated elsewhere in the country. Expert MEEZA engineers routinely virtualize customer applications, reducing the need for large numbers of servers and dramatically reducing overall energy consumption. MEEZA leads the way in IT sustainability by demonstrating best practice in Green IT.
Throughout my 20 plus year career in IT consulting, I have noticed that the most successful businesses often have something in common – they run their IT like a business and they treat IT like a key part of the business, and not like an add-on function. I have been in Qatar for only just over one month now but I have already seen encouraging signs that businesses here are starting to recognize that IT can be a key enabler for achieving strategic objectives. The next step for many companies in Qatar is to run their IT like a business, with the same deliverables, service levels and outcomes that is expected from any other business.
Businesses decisions are driven by three constraints, affordability, risk and time to deliver. Generally, suppliers to business leverage this understanding to deliver something that their customer needs yet is constrained from doing themselves. So, for example, if you need a 24 x 7 security patrol for your company premises, you can either choose to employ and manage a team of security officers or you can outsource the service to a professional security firm.
The outsourced security firm can be an effective business choice because it reduces risk and you can determine what you need by defining a service level (such as a full perimeter patrol every 2 hours and check all doors and windows are secure at 7PM) rather than deciding to employ a team of employees to deliver security. Service levels are the key business driver here because they force us to think about the problem before we think about the solution.
Much the same is true of Information Technology, companies can choose to think of IT as hardware, software and people that somehow come together to help business execute or, alternatively, as a set of underpinning business services with service level agreements and requirements. Companies that start off thinking about the problem – what they are trying to deliver – generally do a better job than those who leave it all to chance.
By thinking of IT as a set of services that underpin your core business processes (selling cars or homes, banking, insurance, liquefying gas) you can start aligning your business and IT requirements and make significantly better investment decisions. Research shows that the most successful and profitable businesses have mature business processes underpinned by mature IT processes. No surprise then that here in Qatar, IT Infrastructure Library (ITIL) training is extremely popular as fast growing businesses look to grow their IT and business maturity.
The basis of ITIL Is that IT becomes a set of services delivered as standard processes with service level agreements in a structured and repeatable way. Businesses are looking to make IT repeatable, standard and reliable with defined costs and reduced risk.
So in the same way that security, cleaning, and facilities management have long been recognized as being suitable for outsourcing as a managed service, many parts of IT delivery are equally suitable. Managed storage, managed network, managed email and managed data center services are common across the world. These reflect the IT outsourcers’ ability to build repeatable capability at low cost by leveraging scale and investment in process and technology.
The characteristics of a service that is suitable for outsourcing are:
- Definable by a service level
- Requirement to scale up and down depending on demand
- Benefits from delivery by a mature specialist organization with defined processes
- Benefits from volumes of scale above your own requirements
Reliable IT delivery is becoming business critical with outages often meaning that customers take their business elsewhere or employees cannot work. IT outages cost money and damage brand reputation so careful management and delivery of IT is critical. Service levels align business needs to IT delivery ensuring that the right levels of service design and service operation are put in place to avoid problems.
For businesses to truly reap the advantages that IT can provide, there needs to be this focus on service levels, outcomes and deliverables. Running IT like a business will enable IT to help businesses prosper and grow.
I don’t normally plug press releases straight from vendors but today I received an email from Emily Wood at Google with a message that I agree 100% with. Cooling data centers is not just about refrigeration – there are lots of options – many of which we have written about here on The Hot Aisle – Fresh Air cooling, Liquid Cooling, Spray cooling, and others we haven’t even thought about yet (there are tons of smart engineers out there doing great work).
I guess it is unsurprising that ASHRAE the American Society of Heating, Refrigerating and Air-Conditioning Engineers write standards that are about Refrigeration after all turkeys don’t vote for Thanksgiving.
Here is the article in it’s entirety:
Setting efficiency goals for data centers
For the past decade, we have been working to make our data centers as efficient as possible; we now use less than half the energy to run our data centers than the industry average. In the open letter below, I am very happy to welcome a group of industry leaders who collectively represent most of the world’s most advanced data center operators. -Urs Hoelzle, SVP, Operations and Google Fellow
Recently, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) added data centers to their building efficiency standard, ASHRAE Standard 90.1. This standard defines the energy efficiency for most types of buildings in America and is often incorporated into building codes across the country.
Data centers are among the fastest-growing users of energy, according to an EPA report, and most data centers have historically been designed and operated without regard to energy efficiency (for details, see this 2009 EPA Energy Star survey). Thus, setting efficiency standards for data centers is important, and we welcome this step.
We believe that for data centers, where the energy used to perform a function (e.g., cooling) is easily measured, efficiency standards should be performance-based, not prescriptive. In other words, the standard should set the required efficiency without prescribing the specific technologies to accomplish that goal. That’s how many efficiency standards work; for example, fuel efficiency standards for cars specify how much gas a car can consume per mile of driving but not what engine to use. A performance-based standard for data centers can achieve the desired energy saving results while still enabling our industry to innovate and find new ways to improve our products.
Unfortunately, the proposed ASHRAE standard is far too prescriptive. Instead of setting a required level of efficiency for the cooling system as a whole, the standard dictates which types of cooling methods must be used. For example, the standard requires data centers to use economizers — systems that use ambient air for cooling. In many cases, economizers are a great way to cool a data center (in fact, many of our companies’ data centers use them extensively), but simply requiring their use doesn’t guarantee an efficient system, and they may not be the best choice. Future cooling methods may achieve the same or better results without the use of economizers altogether. An efficiency standard should not prohibit such innovation.
Thus, we believe that an overall data center-level cooling system efficiency standard needs to replace the proposed prescriptive approach to allow data center innovation to continue. The standard should set an aggressive target for the maximum amount of energy used by a data center for overhead functions like cooling. In fact, a similar approach is already being adopted in the industry. In a recent statement, data center industry leaders agreed that Power Usage Effectiveness (PUE) is the preferred metric for measuring data center efficiency. And the EPA Energy Star program already uses this method for data centers. As leaders in the data center industry, we are committed to aggressive energy efficiency improvements, but we need standards that let us continue to innovate while meeting (and, hopefully, exceeding) a baseline efficiency requirement set by the ASHRAE standard.
Chris Crosby, Senior Vice President, Digital Realty Trust
Hossein Fateh, President and Chief Executive Officer, Dupont Fabros Technology
James Hamilton, Vice President and Distinguished Engineer, Amazon
Urs Hoelzle, Senior Vice President, Operations and Google Fellow, Google
Mike Manos, Vice President, Service Operations, Nokia
Kevin Timmons, General Manager, Datacenter Services, Microsoft
I have spent the last 12 months working with some of the smartest and best funded marketing people on the planet, I have been working with the the really big, household name, IT Infrastructure vendors. I learned lots, lots about marketing, lots about honest analysis and lots about human nature. I learned that marketing isn’t that tough to do but it can be terribly hard to do it well and be constantly effective.
Boiled down to it’s basics marketing is about understanding a business problem really well whilst at the exact same time being paranoid that you are completely wrong about that business problem in every respect. After you get the paranoia right, everything else is just process. Sure there is a ton of creative stuff that needs to get done to do it well but if you have smart people and a pile of cash that is never a problem.
So step one in our marketing 101 lesson is understanding the problem. Marketing people sometimes call this Getting to the Insight and you have to get it right. Get it wrong and everything else you do is likely to be completely useless and may even be counter productive.
Once you get the Insight you need to work out who has the problem. Is it SMEs, Enterprises, Startups, the Medical Profession? Marketing people call that the Segment and the process is called segmentation.
Now we have the Insight and the Segment we need to work out a set of solutions that solve the problem for each sector. Notice I said solutions, not solution. We need solutions because each segment may need a different solution to the same problem.
We are almost there. Now we need a set of Messages that explain and position the Solutions to the Segments we identified earlier.
Insight -> Segment -> Solutions -> Messages
So this is what I have been doing at ESG, helping marketing departments understand the problem, that is get to the Insight that matters that will make the vendor successful. I have also been trying my hardest to disprove insights, often with help from research and polls. Generally as long a you can’t disprove an insight it’s OK. Strangely enough it is usually almost impossible to prove insights, because no one ever has complete market visibility.
These insights need to be backed up with research and customer validation, we need to be paranoid. Insights are nebulous and time bound. What might have been an amazing insight at one time won’t be that way always – once the problem is solved for the relevant segments the insight goes away. Challenging the insights that vendors rely on to make major product investments is important and it needs to be verified constantly. IT insights can decay or morph very quickly, much more quickly than fast moving consumer goods but not quite as quickly as fashion and apparel items.
Understanding what customers think today about IT products and services is crucial but even more important is being right about how they will be thinking next quarter, next year, next decade. If we understand how insights change over time we can adjust our segmentation, alter our solutions and correct our messages. If we fail to foresee the changes we fail to correct our messages, we get a misalignment between customer brand perception and marketing department messaging.
The IT business moves so fast that is often a fatal mistake.
Picture Copyright (c) toothpastefordinner.com
The problem with computing is everyone wants to make it uniform – fit it into a neat box, categorise it as ‘all the same’, make it autonomic, self-managing and move on. In fact, IT is anything but uniform, so these simplistic approaches fall at the first hurdle.
Smart CIOs understand applications need to be treated differently depending on their value to the business. There are four key types – invest, operate, contain and kill.
‘Invest’ applications generally make up around 10-15% of the full estate. These are the applications the CEO knows about – the ones that when they get better, faster or more functional have a direct impact on business value. Examples are key CRM systems or MRP platforms, applications that underpin vital business processes and touch customers. These applications are not terribly cost sensitive, so when CTOs look to virtualise to take out cost, CIOs resist. Virtualisation is useful for invest applications but only if it improves agility, speed of deployment, adds functionality or reduces risk.
‘Operate’ applications represent 40-80% of the estate. No matter how much better we make them, they don’t improve business performance. Examples might be email or document management, internal HR systems or archiving systems. They need to be reliable and cheap. Virtualisation works here as a method of taking out costs. So does outsourcing and software as a service (SaaS) delivery.
‘Contain’ applications are those we wish we didn’t have – old stuff that’s expensive to run and difficult to change or manage. We get the amount of these we deserve: under-invest and the category grows. They have one other characteristic: they are difficult to re-platform and change to ‘operate’ status. Although they’re important to the organisation, they don’t typically make the business run any better if we improve them. We just want them to run silently for as long and as cheaply as possible. ‘Operate’ applications that have not had proper investment, love and attention will eventually move into this category.
‘Kill’ applications are always a nightmare. These are the ones that are impossibly expensive to run and maintain. By definition, they only represent a tiny handful of the estate (perhaps 1-5%). They are impossibly difficult to change. Often the guys who wrote and maintained the code are retired (or dead). No one else you know still has the hardware, except the Natural History Museum, and the vendor no longer supports the operating system. These might have been ‘contain’ applications that just wouldn’t stay contained, or ‘invest’ applications where you didn’t invest (silly you). There is only one thing to do with a ‘kill’ application – bin it. You know it’ll cause pain and disruption, as well as costing a lot of money, but it has to be done.
Smart CIOs know this already and take a pragmatic approach to their applications, understanding instinctively where to spend money and where to bleed a previous investment. And really smart CIOs never reach the point where they need a kill category.
I wanted to share with you my latest news. A little over a year ago Steve Duplessie and I created ESG EMEA to help reach out and serve new European clients as well as provide local support for many of our ESG US based relationships. We have been immensely successful in building this business, working with clients at senior levels to help them better understand the market and target their products and services into the right segments armed with the right message and solutions. Late last year, ESG EMEA and I had the most unexpected honor of being named as one of the top 50 most influential IT analysts in the world.
I have been offered a fantastic opportunity to lead an enormous IT undertaking with the Qatar Foundation in Doha to create a visionary 21st Century cloud IT operation for the Middle East and North Africa. I will be leading that effort directly, and as a result I will be unavailable to take briefings or to provide direct services to clients for the foreseeable future. Please continue to leverage the expertise and talents of the other fine ESG analysts in this regard.
I will continue to keep you updated on our progress, challenges, and thoughts on data centre and IT operations via the Hot Aisle and on the ESG site.
There are a lot of them around, Data Centres. A few of them are designed and operated very well and deliver great Power Usage Efficiency. Some could do a bit better, perhaps an airside economiser or two, or some hot or cold aisle containment, or maybe some DC power. Some are just a nightmare and could benefit from the administration of a wrecking ball. For some data centres, it seems that no amount of fixing them up, improving plant and applying best practice will make any measurable difference. Let’s call them clunker data centres! (Maybe we can get the Government to do a cash for clunkers program for data centres?)
A clunker starts off with a ceiling height that is too low for hot air to separate out and migrate towards the CRAC units without too much mixing. The plenum under the raised floor is shallow and clogged up with cables and other detritus choking off airflow from the CRACs. The floor tiles are perforated and have low airflow characteristics. The cabinets are all lined up like a schoolroom. front to back to front to back…. You could cook turkeys in the back row. The CRAC units are low capacity and that capacity is exhausted. Naturally the boss wants you to install some 10KW racks in a hurry for a critical business project.
What can you do? Say “no way”? Offer a co-location option in a commercial facility as an option? Start looking for a new job?
I bumped into a possible solution a few days back on Twitter when I connected with Mary Hecht-Kissell (@PR_Strategies) who looks after Coolcentric. The problem set, defined above, that makes a clunker data centre is all about getting enough cold into servers to remove the excess heat. Every element in the clunker conspires to make delivering more cold air virtually impossible. That’s where the coolcentric solution makes a difference. It delivers cold water right up against the servers. It adds additional cooling capacity that enables that set of additional 10KW (or more) racks to be installed in a data centre that seemed like a lost cause. It’s a fairly simple piece of technology, that has been well engineered to be retrofitted to most types of existing cabinets. It’s a water cooled door.
The water cooled door is fitted onto the back of the rack so that the hot air exhausting out of the cabinet gets chilled immediately and very efficiently. Liquids are about 4000 times more efficient at removing heat from a server than air, so these water cooled doors can remove significantly more heat with very low pumping energy.
One smart way to think about it is that the water cooled door acts like a mini, contained hot aisle for environments (like our clunker data centre) where cabinet alignment, roof height and plenum problems make hot aisle containment impossible.
Sounds like a pretty decent alternate to Semtex!
Cloudera struck lucky in getting a $5M A-round away just before the markets shut down in response to the collapse of the global financial system. Backed by Accel Partners and more recently Greylock Partners they are making a bet that Hadoop with a smart scale out approach to managing large amounts of data is a winning strategy.
Mike is an industry veteran, having been through the normal form, build and sell cycle a number of times, most notably with Illustra into Informix and SleepyCat Software into Oracle Corp. During the 3 quarters between the A round and a subsequent B round, Mike has been able to build a credible and valuable company that adds significantly to the Apache Hadoop distribution, without funded competition.
In that time Cloudera have built a 30 person firm, the Cloudera Distribution for Hadoop, built a support and professional services capability and created a training and certification business. He also made the very smart move of recruiting Doug Cutting out of Yahoo, the original author of the Hadoop system.
Hadoop’s initial use case has been in managing and processing Internet scale web at Yahoo and Facebook but is now seeing significant levels of interest in other markets where processing large scale data in real time is a competitive advantage. These include, financial services, government security, credit card fraud, genomics, digital media (3D) and national scale telecommunications firms.
Mike sees the development of Hadoop as a platform as the key area for now, with enhancements to add enterprise level management capabilities, “to avoid the need to have a team of Stanford graduates to hand crank the system” he told me. To address just this issue, they have developed the Cloudera Desktop that enables the centralised management of internal and public Hadoop clusters.
Later, Mike expects to see ISV’s delivering enterprise solutions based on a Hadoop platform that enable hitherto impossible feats of analytic prowess. Early enterprise adopters in this space are likely to be able to leapfrog the competition with smarter decisions and more insightful products and services that serve customers better.
Nice guy, smart company, killer product.
Recently I spoke to David Emery a friend and colleague from my time at Coopers & Lybrand. He is now working on a major social media initiative for a global mobile telco. I was interested in David’s perspective as he has been working on a set of solutions to process log files at enormous scale. You might think this is a somewhat trivial use case but many modern business processes at scale generate impossibly large quantities of data that needs to be turned into information.
David and his colleagues have been using a number of open source components to attempt to solve the issue that scale up won’t scale enough and leveraging cheap compute and storage plus smart software and algorithms to deliver a solution. I think David makes a number of important comments that vendors would be well advised to heed:
- Massive Internet scale problems are now solvable and enterprises want to mine the data to generate business information
- The value of the whole solution is enormous but the sheer scale can make it unaffordable
- Open source software and scale out commodity hardware are one possible solution to scale and affordability
- Smart techniques or approaches like Hadoop and MapReduce are now becoming commonly used tools
Here is David’s story:
“Demand for storage capacity continues unabated, rising upwards along an exponential growth curve (Kryders Law) that has challenged vendors to squeeze more bang per buck into SAN, NAS and a whole array (pun intended) of predominantly vertical scaled enterprise class storage solutions.
Improvements and innovations over the years in the form cramming more ‘bits’ per inch onto a hard disk (magnetic bit density), RAID configurations and fibre optic technology connectivity have given us ever faster, larger and resillient storage solutions that we quickly fill and consume. This demand is unlikely to diminsh as applications and datasets become evermore enormous and sophisticated.
It’s not only super computing project applications driving huge demand. Whilst the data generated from the Large Hadron Collider may be an extreme example, it currently generates 2GB per 10 seconds of use, there are many less esoteric applications demanding huge volumes of storage: think Genome, DNA and RNA analysis, pharmaceutical research, financial modelling, Internet Search, Email and Web 2.0 Social networking sites.
The latter examples seem less obvious until you consider the sheer number of users: Facebook recently surpassed 400m customer accounts. It’s no surprise then, that the leading internet companies have taken a different approach to increasing storage demands rather than solely relying on the bottom up vertical approach of the traditional storage vendors.
Google and Yahoo have been key players in the development of distributed storage and analysis efforts (where there is data there is a demand to analyse and report on that data) that have yielded amongst others Hadoop, MapReduce, HDFS (Hadoop Distributed File System ) and GFS (Google File System).
In the massive scale out architectures required to drive the Google Search and Facebook web applications of the world, horizontal scale-out is king. Tiered architectures remain valid, but they are increasingly underpinned by free open source software.
It’s not only web start ups that have grown from small beginnings to large corporates that have embraced the free software stack, (Apache, Linux, Squid, MySql, Perl, Python, Nagios etc) to support expansion whilst avoiding crippling licensing costs, both small and large enterprises have joined the bandwagon as many of the barriers to entry have become irrelevant.
Product stability, maturity, wide spread adoption and readily available support have mitigated many of the perceived risks. The architecture scales, the software works and can all be built on a foundation of cheap commodity based servers. Virtualization and Cloud Computing have only reinforced this trend and Infrastructure is increasingly provided as a Service (IaaS) where the bare metal plaform is entirely abstracted and increasingly irrelevant.
The distributed architecture and horiztonal scale out approach is now beginning to shake up the Storage and Database tier and therfore, the Storage Market place. Customers want massive capacity, reliability and good performance, but they also want to avoid to vendor lock-in and large upfront investment costs. They also want more effective ways to process to such huge volumes of data.
Distributed File systems and Distributed compute processing make all of this possible. An emerging sector with players such as GlusterFS, Lustre and Ibrix has grown and the traditional storage vendors are shoring up their product ranges with similar solutions. HP bought Ibrix whilst Gluster is going down the monetized service Open Source route.
Logfile collection and processing provide a highly relevant, if more mundane, example of how these building blocks can be pulled to together to form a innovative and cost effective solution, that grows as the customer demands increase. In an infrastructure supporting a web based service supporting just under two million users, I’ve recently seen systems generate over 100GB of log file data per day.
Historically, collecting and storing such data is often overlooked or poorly implemented, if at all. It is often seen as a costly process, of limited use (typically because the value in the data is widely spread out cannot easily be retrieved in a meaningful way) and ultimately becomes little more than a burdensome risk, rentention and compliance requirement for many organisations.
Much of the data that is kept ends up on tape gathering dust. How can a customer expect to grow their service from two million users, to five and then twenty and beyond without crippling storage costs, let alone handle such large volumes of log file data and do something useful with it?
A storage platform fronted by a Distributed File System provides one possible answer. The DFS can be built upon multiple nodes running on cheap commodity hardware. More nodes can be added as required, the underlying hardware can be changed and can compromise many different nodes running on different platforms. The DFS provides the clustering, reliability and scaleout storage architecture under a single namespace, accessible by any number of standards protocols e.g. CIFS, NFS, HTTP, iSCSI etc. What’s more a multiple node system can provide readily available processing power, suitable for MapReduce type applications. Of course an alternative is to stick with large scale vendor specific storage platforms, where cost is reduced through economies of scale and risk is somewhat mitigated at the expense of lock-in.
A similar DFS approach has been successfully implemented by MailTrust (Rackspace’s mail division) to capture, collate and process huge volumes of daily log files using Syslog, Hadoop and MySQL. This may be ‘just’ log files, but the power of the data can be harnessed for better support operations and identify trends.
Of course this is possible with traditional tools and storage, but the key here is scale and affordability. I’ve recently seen other companies looking to build similar Distributed storage platforms that will also form the backbone of a private storage cloud, fronted by Eucalyptus software. Again, the whole architecture can be comprised of OpenSource software running on cheap commodity hardware.
It is the software and open standards that are increasingly enabling organisations to build massive internet web services, requiring massive storage. The database and the storage layers remain the last vertical bottleneck, but this is changing. SAN and NAS technology will not disappear, rather consumption will probably continue to grow (in line with Kryders Law), but DFS and greater flexibility are here to stay.
The success of companies such as Gluster and the wider spread adoption of HDFS and Google FS will remain the key as to how many customers, and by how much, move from hardware specific storage plaforms provided by the likes of HP, IBM and NetApps to more Open standard based solutions not requiring proprietary hardware. The same vendors will be providing much of the commodity storage anyway, but it’ll make interesting viewing watching the larger vendors respond.”
A while back I met Kathrin Winkler, Chief Sustainability Officer at EMC. She was delivering a briefing about EMC’s Corporate Social Responsibility (CSR) activities to a group of industry analysts. Most CSR briefings are as dull as ditchwater and devoid of anything remotely innovative or challenging. CSR is for some just going through the motions rather than an integral part of the brand, culture and values of a company. CSR needs to enhance brand equity or else it becomes an irrelevance that has no place at the boardroom table.
Kathrin broke the mould presenting a structured program of activities that crosses every part of EMC from sourcing, manufacturing, logistics through to disposal of equipment. Kathrin demonstrated to me that CSR is deeply embedded into EMC’s DNA, part of every business process, integrated into EMC’s brand and sponsored at board level.
It is no surprise to learn today that Kathrin was today (Tuesday 23rd February 2010) asked to present to Senator John Kerry’s (D-Mass.), US Senate Commerce Subcommittee on Communications, Technology, and the Internet, on the relationship between energy efficiency and technological innovation.
The hearing explored how expanding broadband, strengthening smart grid technologies, and improving consumer understanding of their energy usage can lead to dramatic energy savings and reductions in greenhouse gas emissions. It will also addressed how firms in the information and communications sectors are driving change and how government as consumer and regulator can help drive incentives to innovate.
Here are the insights that Kathrin put to the hearing as actions that Congress can take to help reduce the impact of ICT on the environment:
1. Demand the Federal Government lead by example to drive energy-efficiency throughout its ICT enterprise by aggressively pursuing virtualization, and ICT/data center consolidation. Congress, through its various Committees, has oversight responsibility for the largest ICT infrastructure in the world; the President’s FY 2011 budget requests $79.3 Billion for information technology. OMB included in the FY 2011 budget a plan to drive ICT consolidation: “OMB will work with agencies to develop a Government-wide strategy and agency plans to reduce the number and cost of Federal data centers. This will reduce energy consumption, space usage and en-vironmental impacts, while increasing the utilization and efficiency of IT assets…” Congress should request and review these strategic plans as part of the annual appropriation process and provide the resources necessary to accelerate OMB’s ICT consolidation plans.
2. Bridge split financial incentives in federal data centers – In many government data centers, those responsible for purchasing and operating the ICT equipment report to the CIO while those responsible for the power and cooling infrastructure typically pay the utility bills. This leads to a split incentive, in which those who are most able to control the energy use of the ICT equipment (and therefore the data center) have little incentive to do so or even insight into their own usage. This could be remedied by Congress requiring that agency CIO’s report on data center energy consumption and provide a baseline to Congress for future comparison.
3. Continued investment in cloud computing and next generational ICT research at NIST – Government has become an early adopter of cloud computing. As with the deployment of other promising technologies like smart grid and electronic health records, cloud computing will not be fully realized without open interoperability, data portability, and security standards. Congress should fully fund NIST’s Cloud Computing Standards Effort.
4. Collaborate with industry to promote the development of measurement tools for government and private sector data center operators. – Industry continues to struggle to develop acceptable models to measure data center efficiency. Without reliable efficiency methodologies on which to base rebate programs, it is difficult and expensive for utilities to conduct tests themselves and many simply forego rebate programs. With an estimated 1200 regulated utility service areas in the United States, there is tremendous potential for replication of successful programs. With Energy Efficiency Resource Standards mandates in more than 19 states, Congress should assist in providing useful measurement tools for the state PUCs to incentivize energy conservation in data centers.
Kathrin is 100% right, the key is ensuring that the artificial economics of Government that hide the costs of power from the costs of IT are ended and replaced with the realistic economics of the full end-to-end total life costs including disposal and operation. Private business could learn a lot from this same approach and put an end to the crazy policy where facilities pay the utility bill and IT buy the equipment.
Data Centre design is an evolutionary process and we can see the first signs of significant change in the latest sites. Co-generation, liquid cooling, cloud computing, high density are all likely to feature in the 2020 Data Centre. How are you placed with your existing Data Centre investments to take advantage of these changes? Will 20th Century Data Centres have to close because they just can’t deliver the level of efficiency that government legislation and economics demands?
Join Steve O’Donnell for a live data centre summit on the 21st April 2010 at 13:00 GMT, 8:00 EST.
Most people who read this won’t have a clue what a Hollerith punched card is. I only just caught the end of the era at University where I learned to program in FORTRAN coding one punched card at a time. Once the stack of cards was complete, I delivered it to the computer operator for scheduling and execution.
Jobs were scheduled one at a time because that is how the primitive Burroughs scheduler and operating system was designed. Running more than one program at a time was still a pipe dream in those days so hardware engineers focused on making programs run faster by scaling up the hardware. Faster Processor, faster IO, more main memory, and more powerful instruction sets that did more in fewer clock cycles.
This propensity to scale up, make computers more and more powerful and IO faster and faster has been at the center of the whole industry for decades, an arms race for more clock cycles. In fact Gordon Moore a founder at Intel coined the phrase Moore’s Law to describe the rapid and continuous performance improvements in processor performance we have seen over the last 40 years.
The same is true for networking. Token Ring networks ran at 4MB/s, Ethernet at 10MB/s in the early days of the LAN, now 10GB/s is the norm for new installations, a three orders of magnitude improvement in 20 years.
Storage systems also have shown massive performance improvements with systems like Oracle’s ExaData offering 1M IOPS performance levels. Database technology has also seen massive performance improvements driven in part by smart data design and and great database technology. Performance levels we see today in these scale up systems are unimaginable only a few decades ago.
I remember in 1975 Donald Michie Professor of the Machine Intelligence and Perception unit at Edinburgh University proving mathematically that we would never see a computer beat a grand master at chess within out lifetimes. The problem was too big to solve with current technology and the rate of growth of performance required to beat a grand master, he told us was just unbelievable.
The fact that the unbelievable levels of performance we see today are still not enough for the largest Internet scale tasks such as hosting Twitter, Facebook or LinkedIn or managing the search indexes at Yahoo or Google. Scale up just doesn’t scale up enough. None of these Internet scale enterprises use scale up technology any more. They scale out at every level. Scale out compute, storage, network, application architecture and even at the database level.
Scale out applications are becoming more common with developers adopting a MapReduce style approach to coding, where a master process splits the problem into a number of smaller parts and then farms them out to a large number of processes that derive the answer. The master process then combines the answers to deliver a single consolidated output. For the largest scale computational problems this is often the only way to get to the answer in a meaningful timescale.
Scale out compute is now commonplace, with any number of hypervisor technologies (VMware, Xen, KVM, Hyper-V) supported by a cloud operating system to handle virtualisation and load balancing.
Scale out storage is also a growth industry with products like HP’s X9000 (IBRIX) and IBM’s XIV gaining traction in the market. Object storage is also gaining popularity with URI or HTTP protocols becoming commonplace on any number of offerings such as Amazon’s S3. Open source file systems such as Apache Hadoop add an additional feature of understanding the location of the data so that compute and storage elements can be closely co-located to reduce network latency and end to end bandwidth demands.
Scale out networking follows the logic that most network traffic in a scale out world is edge to edge so why bother with a core network? Converge on 10G lossless Ethernet using top of rack switches supporting iSCSI, NAS and HTTP protocols to converge the SAN and LAN into a common routable IP system.
Scale out databases are now commonly referred to as NOSQL databases that go back in time to pre-relational designs that do not provide ACID consistency guarantees (atomicity, consistency, isolation, durability) but allow sharding to split the data sets over multiple systems to improve the parallelism of the overall system.
The legacy of the punched card is still with us because Information Technology is an evolutionary process. Scale up approaches continue to support the evolution, but one day the dinosaurs will die out.
If you have been following the storage business for a while, you will have noticed a few changes:
- Introduction of Flash Memory components as Solid State Disks
- Serial ATA (SATA) disks becoming popular and growing in capacity (2TB soon)
There are lots of other disk technologies around like Fibre Channel and SAS but SSD and SATA are getting the big press and are taking market share. You might ask why? Disks in a data centre, use power day and night, 365 days a year. A typical disk (Seagate Cheetah 15K.4 147GB SCSI) uses about 18W. In a data centre that means that it’s lifetime (5 years) power consumption including cooling and power protection (PUE 1.6) is likely to be 1.26 MWh. At 10c per KWh that equates to $126 per disk. So for 1PB of storage the lifetime cost of power will be $860,000 not including capital plant.
So getting the power that disks use down to a reasonable level is important. The formula that engineers quote for power consumption is:
Power ∝ Diameter 4.6 x RPM 2.8
So if we use large physical disks like in the old days where 8″ and 14″ were common disk formats we get 7717 times more power needed to drive a 14″ disk than a smaller 2″ one.
Power of 4.6
Ratio to 2″ disk
So the world is moving to smaller and smaller disks to reduce power demand, reduce heat output and deliver increased densities.
Spin speed has a similar impact so low spin speed disks use a lot less power than their high speed equivalents.
Power of 2.8
Ratio to 5400 RPM
Slow spin speed, small disks use less power than larger high spin speed disks.
As the price of SSD continues to drop, the high spin speed disks that we use for high IOPS solutions will increasingly become replaced with SSD, whilst capacity will be served by low spin speed SATA migratting the storage world to Flash and Trash.
Inevitable and proved by the maths.
My old CIO at BT, Al-noor Ramji had a most delightful and endearing way of describing just how unimportant and disconnected IT Infrastructure is from reality by describing us as “The toilet cleaner’s toilet cleaners”. Like other successful CIOs Al-noor had the ability to cut through the noise and explain things as they are.
In every business conversations start at the CEO level who is focussed on understanding his customers’ needs and executing on a plan to service them better than the competition. This is where business value is created and strategic visions are formed. It is here that CEOs deploy capital to create business value, revenues and EBIT that eventually translates into shareholder returns. This is the engine room of capitalism and it is here that many businesses win or lose.
The strategy and vision typically filters down, layer by layer in the organisation, through marketing or product management who work out how to combine products and services together to deliver a compelling answer for the customer. It is also here that the first glimmering of an idea for supporting IT Services are formed and it is here that the first washroom attendant is called in, the CIO. Typically the CIO gets a briefing that she needs to deliver some changes to the CRM system and some new workflow for the call centre.
Already the layers of filtration have dulled the vision and strategy formed in the CEO’s office suite.
Usually the first that IT Infrastructure get to hear about this new initiative is when it is a few days away from deployment, too late even to properly introduce it into service properly. The washroom attendant’s washroom attendant is used in a purely reactive way, responding in real time to seemingly disconnected and random acts of violence to the IT estate.
We can all recognise this behaviour, repeated time and again in the largest enterprises and always leading to a sub optimal outcome. As consumers, we have also experienced business initiatives that have the CIO and IT Infrastructure fully integrated into the initial conversations that have enabled a competition crushing solution to be deployed. Google has wiped out the 20th century advertising industry and outperformed their digital competition by being joined up. Apple took on the music industry with iTunes and now own the space. There are many other examples but unfortunately they are swamped by the normal, broken approach that delivers these frustratingly sub optimal outcomes.
To engage in the initial conversation that forms strategy and vision, IT must become a trusted advisor that stops talking about IT return on investment and starts talking about business return on investment. Only when IT can’t help the CEO kill the competition does it need to be cheap and silent. Otherwise we serve our companies badly if we don’t speak up and become part of the initial conversation.
Being part of that conversation is all about delivering business agility, reduced cycle times to deliver products and services, reduced business risk and more certain outcomes. All deliverables that IT was set up for in the past.
We forget this lesson at our peril and will certainly be consigned to being cheap and silent forever.
Both Seagate and Western Digital announced Q2 results last week perhaps signalling a return of confidence in the disk drive channel. Component manufacturers in the enterprise IT channel are an interesting bellwether of market confidence as orders need to be placed in advance of shipments of finished goods. There is a significant delay in revenue recognition from disk drive component to populated storage controller. On this basis we should see some interesting positive results for the desktop, server and storage controller vendors in Q1 2010.
Revenue was up at both Western Digital (44% y/y) and at Seagate (33.4% y/y) as were shipments. Seagate reported 36% growth whilst Western Digital 29% growth.
Around this time, analysts at ESG pull together a ten point list of predictions for the coming year. One of my areas of coverage and of expertise is in the Data Center around power, cooling, reliability and economics. So what’s different this year from prior years?
Strengthening fundamental drivers will likely make 2010 materially different from previous years for data centers. These drivers include continued increases in the cost of power, lack of investment in new general-purpose facilities during the recent economic crisis, and the continued drive for higher density implementations. Poor-quality data centers will become increasingly uncompetitive and costly to run. This development, combined with lack of new capacity ready to come on-stream, will drive up costs significantly. This combined with an accelerating economic recovery are going to make 2010 interesting.
The fixed (tiers) definition of what a data center should be has been becoming less relevant for some time. Over the course of 2010 it will become apparent that there are many valid alternate designs that can deliver service whilst continuing to be reliable, but improving on operational and capital costs. A number of newer (or reintroduced) approaches will start to become important and gain market share. Among the trends I will be following in 2010:
- The gradual migration towards liquid cooling will commence with strong leadership from IBM with the launch of the Z11 mainframe with water cooled options. The massive efficiency benefit of liquids – being some 4,000 times more efficient than air at removing heat – will drive adoption for the highest density deployments such as HPC (high-performance computing) and mainframe first, followed by general purpose computing later.
- Conventional lead acid battery strings combined with UPS (uninterruptible power supply) will give way to flywheels for AC power protection implementations. Sustainability and efficiency gains make this inevitable in the developed world with increasing government regulation around the proper disposal of heavy metals.
- The raised floor will begin to become unnecessary as cooling, power and data feeds start to be supplied from above for most new installations. Raised floors have always been problematic, especially in the area of maximum static and rolling load. Furthermore, pushing cold air from below the floor has always been a sub-optimal design. Cables and power feeds are much easier maintained if delivered from above.
- DC power options will start to become more common on IT equipment with many forward-thinking data centers offering optional AC or DC power feeds. This will leverage the higher efficiency of DC power delivery and inherent reliability.
- Converged edge networks with smart switching driven by FCoE will reduce the need for manual patch configurations and change the layout of the data center. The edge will be located in-row and at the top of cabinets. The number of cables will reduce dramatically but the criticality of connectivity will increase.
- Increasing levels of server, storage and network virtualization will continue, mopping up what remains of the development and test platforms and gradually moving into the critical production application space driven by tight integration between the application and hypervisor. Operational flexibility rather than efficiency will be the main driver for change in the critical application space, overcoming the inertia of risk-averse CIOs.
- Reliability will continue to migrate towards the application layer, reducing the dependency on data center infrastructure. Critical prerequisites will be high-performance networks and de-duplication technology that enable rapid migration of data between sites.
- Data centers move to different locales. Choosing a data center site because it is close to corporate headquarters will no longer be viable as real-estate and power cost constraints will restrict city-center data centers to latency sensitive applications only.
- Combined Heat and Power (CHP) plant will replace backup generators (engines) in many city center locations as the global lack of investment in electrical power grids continues to hamper growth of latency-sensitive application hosting.
- Demand for co-location data centers will begin to tail off in demand and be replaced by data centers hosting IT as a service offering as business migrate to cloud computing models.
Data center investments are long term bets and as a result, change can appear to take a long time to materialize. Data center capacity of the right type is becoming scarce as demand continues to increase at the exact same time as an industry-wide a lack of investment in new capacity due to the economic downturn. As the macroeconomic recovery continues to accelerate, the latency caused by lengthy data center building and fit out will exacerbate scarcity.
The outcomes are likely to be:
- Much more aggressive take-up of alternative and more power-efficient technologies at the mechanical & electrical layer in a desperate attempt to control costs at existing facilities
- Customers demanding increasingly tight integration between applications and virtualization to improve agility
- Older data center sites becoming increasingly uncompetitive – forcing reductions in depreciation cycles – as refresh becomes critical to remaining in business
Today Microsoft and HP announced an expanded partnership in order to deliver fully integrated application to hardware stacks. It’s a brilliant move, absolutely stunningly smart and spot on for HP.
I wrote about Oracle VM and the fully integrated stack that Larry Ellison has been promoting to his customers. Superficially it might seem like a piece of vendor lock in but actually it is a very powerful and compelling solution for risk adverse enterprise customers. Do not underestimate how CIOs treat risk where their critical applications are concerned. It is undoubtably the number one driver and motivator.
By offering a single, fully supported (at the source code level) stack I believe Oracle got it right. For a while they have been the only player on the field to be in a position to make that offer.
Today HP and Microsoft became the second fully integrated stack in play.
I better explain what I mean by a fully integrated stack it includes the following layers
- Middleware and Database
- Programming language and framework
- Operating System
VCE (VMware, Cisco and EMC) and Citrix play in this space too but have too many missing parts to be fully vertically integrated.
|Oracle / Sun||HP/MSFT||VCE||Citrix|
|Middleware and Database||Y||Y||N||N|
|Programming language and framework||Y||Y||N||N|
So VMware have the largest market share and a ton of trained VMware engineers in the field but there are more Microsoft MCSE guys out there and everyone understand Proliant,
The biggest point is who wins the ISV and developer mindshare? Microsoft have .net, Oracle have Java. Microsoft play better with developers than Oracle but maybe Larry can learn a bit from Sun about winning Java mindshare?
If I was Larry, I would borrow some Iranian nukes and bomb the EU, if I was VMware or Citrix I would be sucking up to developers like crazy. Mark Hurd is definitely feeling smug, very smug today.
Back in 2008, Steve O’Donnell wrote an article here on The Hot Aisle explaining one of the challenges he set his team during his time at BT, the difficult task of getting Asset Management right.
To summarise, Steve kicked off an audit of the whole estate, and where owners couldn’t be found for kit on the floor, the hard line was taken of switching it off. In some cases developers and engineers got annoyed when their precious server was threatened with shutdown, and when it was explained why it was being turned off, there was a surge of people updating the CMDB and making sure that nothing was left unaccounted for where it was required.
It soon became apparent that much of this kit was no longer in use and it enabled BT to switch off 10% of its server estate with a cost saving of roughly $7M in electricity costs alone.
Job done? Absolutely not.
Over the coming days, I will be blogging about the power real knowledge of your Data Centre estate can bring, the issues it will help eliminate, and tools that I have developed to harness this data and provide automated, management reports to data centre managers, strategic data centre planners and space management boards alike.
First up, Power Outages and Load Balancing.
As the demand for Data Centre space increased, BT faced the difficult issue of power outages. PDUs were regularly tripping causing a fail over to other PDUs. It became apparent that we faced the risk of cascade failure where a single PDU tripping out could swamp others and cause a data centre to fail.
However, we realised that wasn’t simply a case of “That’s it, we’ve used all our PDU capacity, we need to invest in new ones!”
Over the years, the loading of PDUs hadn’t always been done methodically and fully thought through. It was guessed at that perhaps, PDU1 was running at about 30%, so let’s attach this new server to that and to be on the safe side, lets dual feed it to PDU3. Often the PDU attachment was never recorded for a server.
When power demand started getting high, problems were encountered. There wasn’t even load balancing on PDUs, and what’s more, there were no records to identify where this load balancing needed addressing.
The simple question of “Where of my business critical apps?” could easily be answered following the clean up and continued management of the CMDB, but the question of “are these apps running on equipment which is resilient to power failures, dual fed, on evenly loaded PDUs?” could not!
There was a gap in our knowledge and reliable power feeds were are risk because of it!
I spoke to Steve about this and said that if we have a record of all PDUs within a site, their capacities, and the kit that they are feeding, I can provide you with the following reports to quickly tackle this problem and set the guys targets of where to begin.
1) The load on each PDU within a data centre, including KW used, KW Remaining, % Loaded, % Free
2) A list of all single, dual and triple fed equipment and the load of each PDU feeding that equipment
3) A list of all single fed equipment holding business critical applications, dual fed and triple fed equipment hosting non critical, development environments
How were we able to answer these questions? Most of the data was already there! An audit of the estate provided the location of kit, the kit models, the applications that ran on them. Knowing the kit model meant that we could integrate 3rd party data telling us the theoretical power utilisation of that kit (which could be factored down to provide more accurate, real world figures).
Once these reports are available, we could then go about go about phase 1, resolving these load balancing issues and deciding where we may need to invest in added PDU capacity.
So the audit began, I ran a report with a list of equipment, rack by rack, and the M&E guys went about collecting the data and feeding it back to me.
In the meantime, I developed the tool which would digest this data and return the reports as promised. The tool was web based and was securely accessed over the Intranet, access was managed and given to those who needed the information, and phase 1 of the Data Centre Power Tool was complete.
The process of developing this phase of the tool was literally knocked up over night. We had the correct processes in place for gathering the data, I had the skills to manage the team and communicate exactly what was needed and more importantly, why I was asking for it, and I used my knowledge of data centre infrastructure alongside development skills to begin the process of developing a powerful and invaluable management tool. And of course, I had Steve to call upon should we run into any stumbling blocks!
The load balancing was soon addressed, some new PDUs were purchased and installed, and the whole operation began to run a lot smoother. Processes were put in place to record and maintain the PDU linkages to kit inside the CMDB and the Load Balancing tool was left within BT for continued use.
The process of getting the data right within the CMDB, our collective understanding of data centre infrastructure and the development skills on hand helped solve a problem that could have been very expensive and very embarrassing within weeks. The process of integrating this tool with the client CMDB, meant that this sort of issue should never arise again within BT.
Later, I will blog about how I developed this tool into a powerful strategic planning system which was used by both the M&E Infrastructure Team and the Space Management Board to aid the process of planning “where to place equipment” in our data centres.
My good friend and ESG colleague Terri McClure @esganalysttmac recently blogged about a thought leadership piece I had presented at an analyst call last week. She made a very good job of explaining it and so I thought that I would write a little more about it here:
I call the concept the “Golden Triangle” and it represents the three key influences that C-level enterprise IT buyers have when they come to make a large scale IT procurement decision.
Quite often the buyer does not consciously consider each of the three influences but nevertheless they play a significant part in the decision process. Let’s look at them in more detail:
Cost is always an influence but is (perhaps surprisingly) rarely the most significant. Let’s look at the evidence, it is rare for a market leading product to be the least expensive (ask any EMC, VMware or Oracle salesperson). They are market leading because they sell more than everyone else – not because they are cheaper – I assure you they are not, nor would they want to be. Point made?
So actually the most compelling influences are Risk and Cycle Time. These influences can unseat an incumbent supplier or glue him firmly in place.
Cycle Time is all about BUSINESS agility – not about being able to stand up a server or roll out a new LUN faster (although they may in themselves have a positive influence on a business process). The question is, does this purchase decision help the business people to kill off their competition, serve customers better, fight off a strong competitor or be able to deliver new products faster before the competition does? If it does that is a MUCH stronger influence to buy than just being cheaper!
Risk refers to Business Risk – not much to do with ensuring that backups are taken regularly or equipment reliability of itself (although again these may have a bearing on a business risk point). Much more about business certainty, ensuring that the customer service agents are able to deal with customer order in a timely fashion or Invoices are sent out on time or even that the ambulance gets sent to the right address. Again this is a very strong influence on a buying decision.
So selling conversations that focus on technical features – mine is bigger / faster / more reliable than the other under consideration won’t play well in a world over supplied with product size, capacity, reliability and speed.
Here is the vendor lesson for the day – if you can’t define a clear BUSINESS advantage in terms of cycle time and risk reduction, you end up on a downward price spiral that only firms with deep pockets and efficient manufacturing capability can survive.
Incumbent vendors (unwittingly) leverage risk and cycle time to be sticky and maintain their customer base – why change – is it worth the risk? Why change – it is much easier and faster to stay with your current technology, process and services?
Competitors can overcome these objections if they are able to demonstrate business influencing cycle time and risk advantages.
(Lets have another look at cost. Cost can be made up of a number of elements, the Capital Costs of acquisition, the Operational Costs of running the product or service, as well as the write off cost of any asset that is being displaced before it is fully depreciated are all well understood but the main cost can often be forgotten, the cost of doing nothing. The cost of doing nothing in replacing old equipment can be greater than all of the other costs combined. Higher energy efficiency and lower support costs can dwarf the replacement costs.)
My colleague at ESG John McKnight, just briefed some summary output from our IT Spending survey. The results are presented below hot off the press. Security, Storage and Network see the biggest increases in growth but Virtualization software continues to lead in absolute terms.
It would be interesting to know how many Enterprises are looking to rectify their poor licence compliance in the virtualized world?
Here is a video from my friend Professor Masood Amin @Massoud_Amin who is the world authority on Smart Grid. If we want to save the planet from global warming, prevent terrorists shutting down our economy and prevent catastrophic failure of our power distribution systems. This is the template of what we need to do and why:
Note the need to move to DC Power – I’ve been saying this for years.