The Hot Aisle Logo
Fresh Thinking on IT Operations for 100,000 Industry Executives

Recently Published

Enterprise Flash Storage is not all the same

There are three generalised approaches to using flash memory in storage architectures, categorised in the diagram below:

Three Categories of Solution Design

The Performance Optimised category takes a no-compromise-on-cost approach to designing for performance. A centralised storage controller that leverages flash memory characteristics to dramatically reduce IO latency and improve IOPS. Significant new engineering work is required to optimise read performance (stripe across multiple memory components), to optimise write performance (wear levelling, garbage collection management and striping across multiple memory components) and latency optimisation (choice of interface, Infiniband, PCI, Fibre Channel or 10gE, internal data path optimisation, no compression or deduplication). Violin Memory adopts this approach in the 6000 Flash Memory Array, as do WhipTail Technologies with their XLR8r Storage Array, and Texas Memory Systems with their RamSan-820.

The Caching or Tiered category stores most of the data on conventional magnetic media with flash memory used either in the form of a PCI based card in one or more of the connected servers or as an extra storage tier contained within the central controller along with conventional magnetic disk. The engineering for this approach only becomes complex if both read and write caching is configured (as the write component drives a requirement to manage cache concurrency across PCI cards in multiple servers). Flash as a storage tier is on general availability from most controller manufacturers although automated intra-tier migration of the hottest data is still a fairly new technology. Fusion-io offer product in this space with their ioDrive, ioCACHE combination as does EMC with the VFCache product, these are both limited to read cache at this time.

The Capacity Optimised category, by far the most popular architecture, sacrifices some of the potential latency and IOPS improvements of the Performance Optimised approach by trading this for enhanced capacity. This enables IOPS and latency improvements over conventional disk arrays, at a per-gigabyte price point that is more generally affordable and could address price sensitive markets and use cases. Compressing and deduplicating the data in-line as it is written enhances the controller’s data storage capacity. By leveraging the high performance characteristics of flash memory the latency and IOPS penalty is reduced to the point that performance is still better than an all-magnetic media solution. Most of the new breed of flash storage vendors has entered this space including Pure Storage, SolidFire, Tintri, Nimbus Data Systems, GreenBytes and others.

The introduction of flash components into a well-designed storage controller has a number of effects: The price per raw (uncompressed, duplicated) gigabyte increases, the average latency reduces and the average number of IOPS increases.

In a magnetic only design, increases in IOPS can only be delivered by worse than linear price increases. This has been the single most significant driver for the rapid migration to flash based storage.

Performance Enhanced designs address use cases where increased IOPS and reduced latency are the problem rather than capacity and usually delver better price-performance than other designs.

Caching designs take an evolutionary approach for problems where capacity demands are still significant but a performance boost is required.

Capacity Optimised approaches try to solve both capacity and performance problems in one design with the bold aspiration to replace magnetic only solutions.

Not even remotely to scale

Solid State Storage changes everything

I interviewed Ron Bianchini CEO of Avere Systems about their new third generation FXT appliance. (Ron was the CEO of Spinnaker Networks acquired by NetApp in November 2003 for $300M).

We started the conversation by looking at the key values of Spinnaker, it’s ability to scale and cluster NAS filers, a technology long since incorporated into the NetApp OnTap environment. Ron rather wistfully, told me that when the acquisition completed Spinnaker hadn’t done a lot of work on optimising performance in the way he had wanted to do it. Avere has been busily addressing this over the last four years resulting in the AOS 3.0 FXT Edge Filer.

Ron has had the opportunity to use Flash Technology in his new product with integrated SSDs as well as a load of DRAM and some 600GB SAS drives. Back in the Spinnaker Network days, if you wanted performance you integrated Fibre Channel disks and striped across lots of them. The new start has enabled the low latency, high read rate characteristics of flash technology to be integrated as a standard feature.

There is nothing unique about the hardware – it is a SuperMicro server with a good mix of RAM, SSD and SAS disk, as one would expect, it’s the software that makes the Avere FXT Edge Filer interesting. Ron told me that the FXT is the first device that delivers both read and write cacheing.  Not only that but the FXT can act as either a stand-alone NAS appliance if the working set fits within it’s own capacity or as a cache if it overflows. This is very interesting as some of the other features of the FXT make it an ideal front end for a Cloud Storage based back end where all of the latency and performance issues of cloud can be eliminated.

Ron explained that the block management algorithms in the FXT are designed to simultaneously deliver high performance and to manage the write issues often seen with SSD based storage controllers. Writes get staged to battery backed RAM and then blocked up in large chunks onto the SAS disks, later being moved, in bulk and as necessary onto the SSD middle tier. He gave an example of frequently accessed, large PowerPoint file where the first few blocks are held in the SSD with the next blocks on internal SAS and the remainder of the file on the back end cloud or legacy NAS SATA based filer. By carefully managing the streaming of the file to the consumer, and kicking off requests for the subsequent blocks ahead of the actual SMB or NFS request the FXT filer can deliver SSD level performance whilst using only a small portion of the SSD capacity.

Large NAS filer environments are extremely difficult to manage in an operational environment and Ron’s experience at Spinnaker with these massive scale operational problems has led to some of the really smart features of the FXT:

  • The FXT can run in full cacheing or synchronous write through mode – this enables servers to be migrated to the Edge Filer one by one without having a major (and impossible to schedule) planned outage – this is similar functionality to block storage virtualisation functionality seen in IBM’s XIV
  • The FXT is just as easy to take out as it is to put in, the same principles apply, switch to write through mode, flush the cache and start moving servers one by one
  • The FXT works in a cluster (minimum 2, optimal minimum 3) which enables non-disruptive firmware and software updates
  • FlashMove capability enables large mounted filesystems to be migrated from an old back-end NAS to the Cloud or to a SATA based Filer without interruption of service
All in all, this is an extremely interesting product in a hot space, delivering a large performance uplift at a relatively modest price, on a legacy NAS estate as well as offering some significant operational advantages.

Here is a copy of the Avere Press Release:

Avere Unveils Edge Filer and New Architecture for NAS


AOS 3.0 transforms FXT appliance into Edge filer optimized for today’s storage challenges


PITTSBURGH – March 28, 2012 Avere Systems today unveiled a new NAS architecture that will ensure enterprise IT is best positioned to leverage the performance benefits of Flash, the consolidation benefits of virtualization and the collaborative and economic benefits of the cloud. The new architecture for NAS puts the fastest media and the intelligence to manage it closest to the user, boosting performance and removing storage bottlenecks created by legacy NAS architectures.

Avere’s Edge Filer

As part of today’s announcement, Avere introduces its first NAS filer, the Edge filer, which operates in concert with legacy or Core filers, to implement the new architecture for NAS. The Edge filer has all of the data handling capabilities of legacy, or Core filers, but differs in data management.  Edge filers manage the global user namespace across multiple filers and remote facilities.  The new data management capabilities available with AOS 3.0 allow customers to easily move, synchronize and replicate data between storage devices, from data center to data center or remote office, and from data center to cloud.

Avere’s Edge filer overcomes the traditional and costly approach of adding larger and more expensive controllers and over provisioning all types of high-speed storage media to boost performance, which adds cost and complexity to the data center and does nothing to solve the latency problem inherent across long geographical distances that has relegated cloud storage to backup and archival use only.  Additionally, it provides a model to ensure that the gains in efficiency realized by virtualization aren’t negated by losses in storage performance due to changes in I/O profiles. Avere has created a new NAS architecture that eliminates trade-offs between performance and cost and enables primary storage to be located where it makes the most economic sense for business.

“Since its inception, Avere has been challenging the concept of using a traditional, monolithic NAS filer as a single tier of storage,” said Terri McClure, ESG Senior Analyst.  “With the introduction of its first full-fledged filer, the Edge filer, Avere is well positioned to offer an alternative model to solve some of the biggest technological and business challenges of NAS deployments, including the latency associated with remote storage deployments.”

FlashMoveä and FlashMirrorä

FlashMove: Avere FlashMove takes the pain out of data migrations.  With FlashMove, there is no need to halt applications or suspend access to data during migrations.  FXT Edge filers serve active data to application servers and users while behind the scenes FlashMove software moves data transparently between Core filers.  FlashMove dramatically simplifies the management of NAS environments; it enables live data to be load-balanced across existing systems, transparent archival to secondary storage, new storage and new vendors to be added to the NAS environment, and old storage which is past its useful life to be decommissioned.

FlashMirror: Avere FlashMirror dramatically simplifies the implementation of a disaster recovery practice on a NAS infrastructure.  FlashMirror replicates data on primary and secondary Core filers and keeps them closely in sync by sending updates directly and in parallel to both filers.  FlashMirror offloads the replication-processing load from the storage and supports clustering to scale replication performance to any level required.  FlashMirror is simple to install in existing environments and is the only storage-side replication solution that works with all NAS vendors’ products.

“Within the last few years, we’ve seen the potential benefits of powerful new technologies – Flash, virtualization and cloud – be addressed in a piecemeal fashion by incumbent storage vendors that still rely on product architectures built for an older generation of technology and a twentieth-century data center silo. It’s time for a fundamental change in NAS design — one that opens up the network across the globe, makes it much easier to manage, and delivers efficiency without sacrificing performance,” said Ron Bianchini, President and CEO of Avere Systems. “With the introduction of AOS 3.0 and the Avere Edge filer, we are poised to be a catalyst in enabling customers to reap the maximum benefits from the latest technologies.”

Pricing and Availability

AOS 3.0 will be generally available within the next 30 days. The 3.0 software release is a free upgrade for existing customers. For new and existing customers FlashMove and FlashMirror require separate licenses.

About Avere Systems

Avere Systems brings to the market NAS Optimization solutions designed specifically to scale performance and capacity separately and take advantage of new storage media using real-time tiering. Avere’s FXT Series Edge filers allow organizations to achieve unlimited application performance scaling, free applications from the confines of the data center by eliminating latency and cut storage costs by more than half. Learn more at, and you can follow the company on


EMC Validates Solid State Storage as best for UCS Platform

I just finished reading EMC company blogger, Chuck Hollis article about EMC VFCache, a server side flash storage technology that competes with the Fusion-io, ioDrive head on.  Most interestingly, the product name is chosen carefully to ensure customers don’t get confused that VFCache might be designed to replace the normal disk based EMC storage array controller products.  In fact VFCache acts as a caching layer between a back end storage controller and a front end server demanding massive IOPs. The VFCache is integrated via a filtering driver that reduces the load on the back end disk based array whilst maintaining and managing cache concurrency between the VFCache and back end.

EMC call VFCache, “Server-side flash storage cards that integrate with the rest of the extended storage environment” – neatly covering off one of the major objections to PCI card flash storage for mission critical applications, that is, what do we do about resilience, hardware failure and disaster recovery? Acording to Chris Mellor at The Register“The EMC cache increases 4KB – 64KB block random read I/O speed but not write I/O speed.  VFCache will not cache read I/Os larger than 64KB. There is no write caching.”  So we can expect VFCache to deal with intense write activity quite badly, where the back end storage is not able to keep up with the front end cache.  A risk of imbalanced systems I sense.

Dedicated, solid state only storage offers quite significant advantages for the consumer of high IOPs applications, like SAP, Exchange Server, SQL Server, Oracle etc..  Quite dramatic increases in read and write performance combined with extreme low latency drive application performance to new levels whilst simultaneously reducing the quantity of hardware (and therefore application licences) needed to run them. Good to see that EMC are waking up and recognising the value of engineered solid state storage products even if they are (unsurprisingly) still strongly wedded to their core disk business.  A much better approach is to recognise that the storage world is moving inextricably towards flash and trash, Solid State for performance and SATA disk for capacity.  Prices are falling rapidly as flash fabrication plants come online and it won’t be long before high spin speed disks are totally dead replaced by their much better performing SSDs.

Technology Predictions for 2012

Around this time, I usually pull together a ten point list of predictions for the coming year. I am interested in the Data Center (around power, cooling, reliability and economics), IT Security (particularly Database and Mobile), Big Data and Storage (Particularly Solid State). So what’s different this year from prior years?

Strengthening fundamental drivers are likely to make 2011 materially different from previous years in these ways:

  1. Mid-sized businesses will continue to accelerate their uptake in cloud services (mainly SaaS) rather than invest in co-location data centre services and DIY computing. Demand for retail co-location data centers will begin to tail off in demand and be replaced by data centers hosting IT as a service offering as business migrate to cloud computing models. Email, CRM, Accounting and other back office systems have migrated over very rapidly already.
  2. For Data Centres larger businesses will continue to host their own systems only where their is a competitive advantage in them doing so. CIOs are under increasing pressure to push commodity IT services out into the cloud or to large outsourcers who can leverage scale, automation and labor arbritage to drive out costs.
  3. For IT Security, a fundamental move towards Bring Your Own Computer (BYOC) and Mobility combined with increasing sophisticated attacks from professional hackers (like the state sponsored, Chinese Internet Army) will make muck of the conventional security measures ineffective and irrelevant. There will be significant upside for vendors who offer protection for corporate systems being hosted on an employees mobile device such as Mobile Active Defence
  4. Data protection strategies network access rules and firewalls backed up by signature based virus and malware protection will die out as the effect of professional hackers who can defeat these measures becomes apparent. Data protection will move inwards around the database, the email folders, the document repostitory and the unstructured file system. Automatic security metadata and information asset management approaches such as those from Chalet Tech will become critically important.
  5. The gradual migration towards liquid cooling will continue with strong leadership from IBM with the Z11 mainframe and pSeries machines with water cooled options. The massive efficiency benefit of liquids – being some 4,000 times more efficient than air at removing heat – will drive adoption for the highest density deployments such as HPC (high-performance computing) and mainframe first, followed by general purpose computing later. Organisations such as Iceotope who have developed and delivered commodity liquid cooling with prosper.
  6. Converged edge networks with smart switching driven by 10G Ethernet will continue to reduce the need for manual patch configurations and change the layout of the data center. The edge are being located in-row and at the top of cabinets. The number of cables will reduce dramatically but the criticality of connectivity will increase. IBM’s purchase of Blade Network Technologies and HP’s continued drive into the network space is proving a significant challenge to Cisco who rely on profit from core switch technology.
  7. Engineered Database and Application appliances such as Exadata and Exalogic from Oracle will continue to gain share at the top end of the market delivering massive performance improvement by leveraging solid state disks, low latency networks (10G and Infiniband) and huge DRAM footprints (96GB plus). Cisco UCS, HP Proliant and IBM pSeries teamed up with Violin Memory solid state storage controllers will offer very stiff competition to Oracle.
  8. Solid State storage will really start to eat into the performance disk market share with 15000 RPM disks dying off first and leading to a storage ecosystem populated by Flash and Trash, Solid State for performance and SATA for capacity. The price of solid state components will continue to drop as manufacturing capacity at Samsung and Toshiba comes on line.
  9. Tablets are now 100% flash, Laptops will follow shortly.  Enterprise Data Centre adoption will be driven by low latency implications rather than cost. Low latency storage dramatically improves CPU utilisation driving out per-core software licencing costs.
  10. Smaller and less sophisticated companies will start to demand access to the benefits of Big Data. This is a space occupied by PhD geeks for the configuration of the hardware and storage systems but also for the data mining and business analysis. Prepacked Cassandra and Hadoop solutions such as the Acunu Data Platform and Greenplum from EMC and other will start to make an impact. Global consultancies will start to make major investments in training to be able to keep up with the demands.


Cloud adoption is moving much faster in small businesses than in corporates

Globally the annual mid-market (companies with fewer than 1000 staff) spend for IT hardware, software and services exceeds $250B acording to Andy Monshaw, IBM’s General Manager for Mid-Market.  Andy lives and breathes mid-market, he personally leads as many as 15 end user and IBM partner round tables a month, spending his time listening to customer problems and getting intimately familiar with the output of IBM’s comprehensive smart analytics reports and dashboards.

Mid-market is a highly valuable segment that is served almost exclusively by the channel, predominately Independent Software Vendors (ISV), Managed Service Providers (MSP), Systems Integrators (SI) and the other independent partners of the major players (IBM, Dell, HP, SAP, Microsoft etc..).  The majors support this route to market because of the costs and complexities of engaging smaller businesses with limited IT capabilities and modest annual budgets in line with their size.  Interestingly this is a market that Oracle does not figure strongly in at all!

Andy also claims that 20% of the total addressable market (TAM) from this segment is driven by direct hardware and software sales, so only one fifth of expenditure is do-it-yourself supported by in-house IT staff, leaving a massive $200B of revenues coming from solutions, consisting of integrated hardware, software and services. Competition for this solutions market is tough with tens of thousands of integrators in play and no single vendor owning even 1% of the market.  IBM research shows that mid-market end customers make product and integrator selection by leveraging the Internet with 85% of new business coming from search and social media.

IBM have recognised the importance of social media, search, blogs, twitter and other Internet based influences in the IT mid-market, in the same way that LinkedIn has changed the job market and TripAdvisor changed the hotel business.  A $100M per annum marketing budget is in place to support IBM’s mid-market presence in the social media ecosystem.

My own research shows that small and medium enterprise customers are adopting cloud services much more strongly that their corporate equivalents. This massive uptake in cloud is supported (although not scientifically) by a recent IBM roundtable straw poll where 80% of attendees indicated have already adopted SaaS in both the messaging and collaboration and CRM areas.

IBM’s approach has been to develop proven, cloud based solutions and market them as branded IBM products or as white label services delivered by their partners. IBM research shows that one of the major impediments to cloud adoption – concern about security – is actually a market advantage in the mid-market. Mid-market clients are certain that IBM’s investments in securtity and attention to detail in the IBM Cloud outclass anything that they might do internally. The IBM brand stands for reliability, security and availability much more that Amazon, Google or Microsoft who also play in this space. This is a powerful insight showing that different market segments react to risk quite differently.

The IBM product development areas that Andy has focussed on for the mid-market are:

  • Business Analytics (Cognos)
  • Cloud Integration (Cast Iron)
  • CRM (Sugar)
  • Back Office (MS Dynamics, SAP)
  • Messaging & Collaberation (Lotus, Sametime)
  • Security (Tivoli BigFix)

Partners add value by integrating these products with ISV offerings, local and domain expertise to serve the mid-market clients better.

IBM really seem to get the mid-market segment with an understanding of the new dynamics driven by cloud services and social media.  Andy Monshaw is driving a very significant investment program in terms of product development, social media and channel support.  IBM have been smart enough to put a very senior and experienced General Manager in charge in Andy Monshaw. Andy is a seasoned change agent who stood IBM’s strorage business on it’s head some years back with the introduction of XIV along with a new brand and salesforce who went on to win significant market share against the incumbents.  Watch this space.


Data Protection Strategies

Last night a few friends (IT Operations Geeks like me) and I had a great “Brains Trust” event at the OXO Tower in London. The topic was a continuation of the last Hot Aisle blog entry Why do we try to solve backup when restore is the problem?  Apart from the food, view and wine being quite exceptional we had a great conversation.

The first key insight is that everyone without exception thought that date protection was way too complex, too risky and restores were always touch and go (heart in mouth). Everyone hated what they currently did.  Everyone was clear that the tools and processes used to mange data protection and copy data were flawed at best and not fit for purpose in the main.  This was a big insight for most of the brains trust as they all operated data protection because they had always done it that way and were just too busy to think about how dumb it was.

We discussed Big Data as this was universally seen as a huge exacerbation of the data protection issue and a few of felt that Big Data was going to bust data protection, making it unsustainable and useless. We all saw the importance of Big Data being driven by an increased business demand to measure business performance and customer behaviour.  Everyone felt Big Data was inevitable in their organisation.

The topic got really interesting quite quickly when it became evident that Restore wasn’t the only problem that needed to be addressed.  At IT Operations guys we create Copy Data of Production Data for a number of key reasons:

  1. Data Protection (software error, user mistake, sabotage, hardware failure, loss of data centre)
  2. Development Support (Snapshots to help in development, UAT and non functional tests)
  3. Regulatory Compliance (Maintaining data to abide by national and international rules and laws, contractual commitments)
  4. Performance Enhancement (Creating point in time copies to run Business Intelligence reporting against because the production system can’t manage the extra IO load)

We then started considering how ofter data got deleted as a matter of operational best practice and all but one of us agreed that the pain and risk of deleting data was so great that we didn’t do it any more.  The single exception was our operations guy from a very large global legal firm who said his firm took proactive data deletion very seriously and controlling data retention and immutability was a critical issue for him.

Everyone was emphatic that production data and copy data MUST be kept separate but few could put hand on heart and claim 100% compliance. Everyone found managing copy data complex and inefficient – when can we delete copy data? Huge fear of messing up and causing a problem so we keep everything.

We then started thinking about the relative ratio of production data to copy data (for all of the reasons above) and came to the conclusion that in real environments we could easily have somewhere between five and fifteen copies of each piece of production data, depending mainly on how smart we were about copying the copy data (e.g. backing up a development snapshot or the BI data).  The unanimous conclusion was that production data volumes are growing exponentially (maybe 40% CAGR) and copy data exacerbates the issue because we keep each of the four types of copy data in different silos.

We then started thinking about why we did things this way, whey we keep these four different siloed approaches going,  the unanimous conclusion was that we did it this way because we always did it this way and no one has stopped for long enough to think about a better approach.


Why do we try to solve backup when restore is the problem?

I have been thinking about data storage and protection recently and how our behaviour is driving massive growth in cost and complexity.  The issue seems to eminate from the fact that we focus on solving backup when the real problem is restoring data:

  • We need to be able to deal with data corruption and incorrectly added transactions (rollback)
  • We need to be able to deal with partual or complete data loss caused by hardware malfunction, malicious acts or human error (data recovery)
  • We need to be able to deal with total data centre loss resulting in loss of data (site recovery)
  • We need to provide archive data to comply with regulatory and legal requirements (compliance)
  • We need to be able to take snapshots of point in time data for testing (copy data)

Each of these requirements tends to drive a separate solution involving multiple copies of data in seperate towers. Complexity increases, volumes grow out of hand and we singularly fail to achieve the key objectives:

  • Rapid data recovery with engineered recovery point objective
  • Safe and secure business continuity protection
  • Compliance with regulations and laws
  • Clear understanding of where our data is and what version it is at

 If we were looking to solve these problems and meet these requirements then tape backup, deduplication and archiving proceedures that we commonly use today would not be where to start.  Storage, Server and Network Virtualisation has freed us from the tyrany of phyical connections between hardware and application, yet data protection pulls us back again.  We need a new protection and availability architecture.



Come to ExecEvent at IP Expo in London

Dear Colleague,

I thought that you might be interested in a networking event being run in London in parallel with IP Expo on the 19th and 20th of October 2011.

What is ExecEvent?

ExecEvent is a highly successful global event run by Greg Duplessie, brother of Steve Duplessie of ESG fame. The ExecEvent is an exclusive networking event for virtualization, cloud, storage and security industry executives. Plus those folks that want to interact with these executives (service providers, recruiters, technology law/tax firms, etc.) It is different than any other networking event you’ve ever been to. Our mission is to create a compelling event for industry insiders—one that focuses on networking and building relationships and does not require exhibiting or catering to end-users. This unique networking event will provide educational speaking topics and a spotlight for emerging products or companies (where appropriate), as well as plenty of time for your own meetings as you see fit.

When and Where?

Our next event is called the ExecEvent London 2011. It is scheduled for October 19-20th at the Earls Court Conference Centre in London. With a pre-event cocktail reception the evening of October 18th. We are working with IP EXPO, and the conference centre is a very short walk from their exhibition hall. The ExecEvent is specifically for senior executives in the virtualization, cloud, storage and data center space, as well as for press, analysts, consultants and financial professionals.

Why Should I Attend?

If you want end-users, then IP EXPO is the perfect forum and show. But what about your business partners, resellers, OEMs, VCs and investment bankers, consultants, etc. ? These shows are too crazy to focus on the business behind your business. That’s where the ExecEvent comes in. We bring together industry executives in very meaningful way.

If you make one or two solid connections at this event, it is worth your time and effort. “If you are a vendor in the cloud, virtualization or storage space and you are not here (at an ExecEvent), slap yourself and get signed up for the next one.” So says George Crump, Senior Analyst for Storage Switzerland

“I think it’s a great idea to separate business development and networking events like this with other events geared towards end-users and outbound marketing. With this event we can have the right people in the right meetings, without having to bring the whole company. Focus is key.” – Ed Walsh, former CEO of Storwize, Avamar and Virtual Iron, now an executive at IBM

How Much Does It Cost To Register?

The registration fee is £375 GBP. No VAT required.

What Goes On?

More information can be found at

Who Should Attend?

We specifically created this event for executives, regional and country managers and interested senior technology vendors in the following areas:

• High Performance Computing
• Hardware and Software providers
• Cloud Enablers
• Storage and networking equipment, software and services
• Consulting, training and development services
• System Integration / Consulting
• Virtualization solutions
• Data Centre technology
• Cloud Infrastructure
• SaaS, IaaS, PaaS, BPaaS providers
• Data security
• Research organisations


There is no such thing as a free lunch

Google has the most amazingly clear terms of service for their cloud products.  The terms are take it or leave it, no promises, no commitments and no come-back.

I love the statement that there is no warrant that the quality of services will meet your expectations. It’s a gem.


Thanks to Harqs Singh for pointing it out.

So you think you understand your data center?

Yesterday Chris Leahy (my Technical Facilities Manager) and I were agonising over why we had low plenum pressure in our Data Center and why we were seeing symptoms of hot air trapped in the roof void. We looked at all the normal stuff:

  • Leaks in the plenum space
  • Badly sealed floor
  • Cable access holes improperly sealed
  • Blockages in the plenum
  • Bad seals between the plenum and the CRAC units

In the end we worked out what the problem was. We have one of our CRAC units switched off as it is under maintenance. CRACS are pretty simple devices, they typically have dust filters, cold coils (water or DX) and an axial fan blower to drive the air down into the raised floor. In our case, because one CRAC was off, it was acting like an open chimney unrestrictedly delivering huge quantities of cold air high into the data center.

In fact it was behaving just like three fully open floor tiles. No wonder the plenum pressure was very low. We asked Stuart Hall at ARUP for his take:

A standby CRAC unit without non-return dampers will allow cold air to back-flow into the hot-zone.

We generally specify CRAC units in our designs with non-return dampers for this reason. The CFD software which I use includes them on all CRAC units by default (though they can be removed).

The amount of air returning would depend on:

  • The amount of floor grilles and other openings within the raised floor
  • The flow rate of air delivered to the floor by the operational CRAC units
  • The resistance to airflow caused by the idle-fan, cooling coil and other geometry within the CRAC

I have witnessed this behaviour whilst surveying data centres. It is also common to see a small quantity of back-flow through CRACs with non-return dampers as they don’t make a perfect seal.

The only arguments that I can think of against installing non-return dampers would be space or cost. Perhaps a business decision was made to accept reduced flow performance in the event of a failure in exchange for cheaper units or physically smaller units.

So here is the lesson of the day – don’t make asumptions that switched off kit is neutral!  If you have CRAC that are not switched on they may be leaking your precious cold air into the roof void.

Greening the Data Center in Qatar is not a quick win

Worldwide, there has been a lot of focus in recent years on reducing the environmental impact of Data Centers. Green always comes at a cost, but once it is viewed as a long-term investment rather than as a quick return on investment (ROI), it can be a viable cost cutting option. Data center investments are enormous; they are built with a long term vision of 10 to 15 years and, therefore, any means of reducing the amount of capital required need to be seriously considered. By improving the power efficiency of our Data Centers, we can simultaneously reduce the capital equipment required and the cost of power and maintenance to operate. This dramatically improves the ROI on our data center investments.

Energy costs in Qatar are exceptionally low (less than 2c per KWh) and this can contribute to the perception that there is a weak ROI on Green Energy Efficient investments. However, this perception fails to take into account the lost opportunity of re-selling valuable energy resources abroad rather than consuming them wastefully on the domestic market.

So, we had enough of the whys, now how do we go green? The simple magic words are Energy Efficiency and High Utilization. We could choose to have 1,000 servers operating at 5% average efficiency or 100 servers operating at 50% average efficiency. Many organizations choose to operate one application on one server, so 1,000 applications means 1,000 servers. By introducing virtualization we can run multiple applications on shared servers reducing costs, improving efficiency and being green all at the same time.

Virtualization can enable a perfect storm of efficiency; we reduce the number of servers, thereby reducing both the capital cost and operation costs of those servers. Virtualization enables reductions in both data center capital costs and the operational cost of supplying electricity and cooling. One often forgotten additional advantage is the reduction in software licensing and maintenance costs because we are managing a smaller IT estate.

Storage is an important issue to focus on to improve efficiency; one enterprise disk of storage consumes as much as 1MWh (megawatt-hour) over its useful life. CIOs and COOs often find it very difficult to delete data that is no longer useful. Studies show that the chances of needing a document or a spreadsheet reduce exponentially across time, so the chances of you needing that 7-year-old spreadsheet ever again are close to zero. Best practice establishes that we should set a data retention policy and ruthlessly delete data that goes beyond that time.

Data deduplication can be an additional useful tool toward reducing data storage costs and increasing storage efficiency. Data deduplication is a specific form of compression where redundant data is eliminated, typically to improve storage utilization. Deduplication is able to reduce the required storage capacity since only the unique data is stored. This in turn can reduce the overall footprint inside the data center. Deduplication can help by squeezing out between 10 to 20 percent more storage space just by getting rid of duplicated data.

Another valuable solution is to upgrade the Data Center equipment into Modern equipment that has larger capacity, e.g: multiple old servers can be replaced by one modern server combining the benefit of energy efficiency and smaller space requirements. Upgrading old Data center equipment can roughly increase energy efficiency by 3-4 times.

Cooling problems are clearly a major growing concern for Data Center Managers. As per Gartner Research , it is estimated that data Centers typically waste more than 60% of their energy just in cooling their equipment. Traditional cooling techniques are inadequate both economically and operationally. The solutions stemming from newer technologies are District Cooling and Hot Aisle Containment.

MEEZA data centers at QSTP leverage the massive investment made by the Qatar Foundation in district cooling, offering extremely efficient large scale plants that could not be replicated elsewhere in the country. Expert MEEZA engineers routinely virtualize customer applications, reducing the need for large numbers of servers and dramatically reducing overall energy consumption. MEEZA leads the way in IT sustainability by demonstrating best practice in Green IT.

Running IT as a Business

Throughout my 20 plus year career in IT consulting, I have noticed that the most successful businesses often have something in common – they run their IT like a business and they treat IT like a key part of the business, and not like an add-on function.  I have been in Qatar for only just over one month now but I have already seen encouraging signs that businesses here are starting to recognize that IT can be a key enabler for achieving strategic objectives. The next step for many companies in Qatar is to run their IT like a business, with the same deliverables, service levels and outcomes that is expected from any other business.

Businesses decisions are driven by three constraints, affordability, risk and time to deliver. Generally, suppliers to business leverage this understanding to deliver something that their customer needs yet is constrained from doing themselves. So, for example, if you need a 24 x 7 security patrol for your company premises, you can either choose to employ and manage a team of security officers or you can outsource the service to a professional security firm.

The outsourced security firm can be an effective business choice because it reduces risk and you can determine what you need by defining a service level (such as a full perimeter patrol every 2 hours and check all doors and windows are secure at 7PM) rather than deciding to employ a team of employees to deliver security. Service levels are the key business driver here because they force us to think about the problem before we think about the solution.

Much the same is true of Information Technology, companies can choose to think of IT as hardware, software and people that somehow come together to help business execute or, alternatively, as a set of underpinning business services with service level agreements and requirements. Companies that start off thinking about the problem –  what they are trying to deliver –  generally do a better job than those who leave it all to chance.

By thinking of IT as a set of services that underpin your core business processes (selling cars or homes, banking, insurance, liquefying gas) you can start aligning your business and IT requirements and make significantly better investment decisions. Research shows that the most successful and profitable businesses have mature business processes underpinned by mature IT processes. No surprise then that here in Qatar, IT Infrastructure Library (ITIL) training is extremely popular as fast growing businesses look to grow their IT and business maturity.

The basis of ITIL Is that IT becomes a set of services delivered as standard processes with service level agreements in a structured and repeatable way. Businesses are looking to make IT repeatable, standard and reliable with defined costs and reduced risk.

So in the same way that security, cleaning, and facilities management have long been recognized as being suitable for outsourcing as a managed service, many parts of IT delivery are equally suitable. Managed storage, managed network, managed email and managed data center services are common across the world. These reflect the IT outsourcers’ ability to build repeatable capability at low cost by leveraging scale and investment in process and technology.

The characteristics of a service that is suitable for outsourcing are:

  • Definable by a service level
  • Requirement to scale up and down depending on demand
  • Benefits from delivery by a mature specialist organization with defined processes
  • Benefits from volumes of scale above your own requirements

Reliable IT delivery is becoming business critical with outages often meaning that customers take their business elsewhere or employees cannot work. IT outages cost money and damage brand reputation so careful management and delivery of IT is critical. Service levels align business needs to IT delivery ensuring that the right levels of service design and service operation are put in place to avoid problems.

For businesses to truly reap the advantages that IT can provide, there needs to be this focus on service levels, outcomes and deliverables. Running IT like a business will enable IT to help businesses prosper and grow.

ASHRAE need to join the 21st Century

I don’t normally plug press releases straight from vendors but today I received an email from Emily Wood at Google with a message that I agree 100% with. Cooling data centers is not just about refrigeration – there are lots of options – many of which we have written about here on The Hot Aisle – Fresh Air cooling, Liquid Cooling, Spray cooling, and others we haven’t even thought about yet (there are tons of smart engineers out there doing great work).

I guess it is unsurprising that ASHRAE the American Society of Heating, Refrigerating and Air-Conditioning Engineers write standards that are about Refrigeration after all turkeys don’t vote for Thanksgiving.

Here is the article in it’s entirety:

Setting efficiency goals for data centers

For the past decade, we have been working to make our data centers as efficient as possible; we now use less than half the energy to run our data centers than the industry average. In the open letter below, I am very happy to welcome a group of industry leaders who collectively represent most of the world’s most advanced data center operators. -Urs Hoelzle, SVP, Operations and Google Fellow

Recently, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) added data centers to their building efficiency standard, ASHRAE Standard 90.1. This standard defines the energy efficiency for most types of buildings in America and is often incorporated into building codes across the country.

Data centers are among the fastest-growing users of energy, according to an EPA report, and most data centers have historically been designed and operated without regard to energy efficiency (for details, see this 2009 EPA Energy Star survey). Thus, setting efficiency standards for data centers is important, and we welcome this step.

We believe that for data centers, where the energy used to perform a function (e.g., cooling) is easily measured, efficiency standards should be performance-based, not prescriptive. In other words, the standard should set the required efficiency without prescribing the specific technologies to accomplish that goal. That’s how many efficiency standards work; for example, fuel efficiency standards for cars specify how much gas a car can consume per mile of driving but not what engine to use. A performance-based standard for data centers can achieve the desired energy saving results while still enabling our industry to innovate and find new ways to improve our products.

Unfortunately, the proposed ASHRAE standard is far too prescriptive. Instead of setting a required level of efficiency for the cooling system as a whole, the standard dictates which types of cooling methods must be used. For example, the standard requires data centers to use economizers — systems that use ambient air for cooling. In many cases, economizers are a great way to cool a data center (in fact, many of our companies’ data centers use them extensively), but simply requiring their use doesn’t guarantee an efficient system, and they may not be the best choice. Future cooling methods may achieve the same or better results without the use of economizers altogether. An efficiency standard should not prohibit such innovation.

Thus, we believe that an overall data center-level cooling system efficiency standard needs to replace the proposed prescriptive approach to allow data center innovation to continue. The standard should set an aggressive target for the maximum amount of energy used by a data center for overhead functions like cooling. In fact, a similar approach is already being adopted in the industry. In a recent statement, data center industry leaders agreed that Power Usage Effectiveness (PUE) is the preferred metric for measuring data center efficiency. And the EPA Energy Star program already uses this method for data centers. As leaders in the data center industry, we are committed to aggressive energy efficiency improvements, but we need standards that let us continue to innovate while meeting (and, hopefully, exceeding) a baseline efficiency requirement set by the ASHRAE standard.

Chris Crosby, Senior Vice President, Digital Realty Trust
Hossein Fateh, President and Chief Executive Officer, Dupont Fabros Technology
James Hamilton, Vice President and Distinguished Engineer, Amazon
Urs Hoelzle, Senior Vice President, Operations and Google Fellow, Google
Mike Manos, Vice President, Service Operations, Nokia
Kevin Timmons, General Manager, Datacenter Services, Microsoft

Your brand is not what YOU say it is, it’s what YOUR CUSTOMERS say it is.

toothpaste for dinner

I have spent the last 12 months working with some of the smartest and best funded marketing people on the planet, I have been working with the the really big, household name, IT Infrastructure vendors. I learned lots, lots about marketing, lots about honest analysis and lots about human nature. I learned that marketing isn’t that tough to do but it can be terribly hard to do it well and be constantly effective.

Boiled down to it’s basics marketing is about understanding a business problem really well whilst at the exact same time being paranoid that you are completely wrong about that business problem in every respect.  After you get the paranoia right, everything else is just process. Sure there is a ton of creative stuff that needs to get done to do it well but if you have smart people and a pile of cash that is never a problem.

So step one in our marketing 101 lesson is understanding the problem. Marketing people sometimes call this Getting to the Insight and you have to get it right. Get it wrong and everything else you do is likely to be completely useless and may even be counter productive.

Once you get the Insight you need to work out who has the problem. Is it SMEs, Enterprises, Startups, the Medical Profession? Marketing people call that the Segment and the process is called segmentation.

Now we have the Insight and the Segment we need to work out a set of solutions that solve the problem for each sector. Notice I said solutions, not solution. We need solutions because each segment may need a different solution to the same problem.

We are almost there. Now we need a set of Messages that explain and position the Solutions to the Segments we identified earlier.

Insight -> Segment -> Solutions -> Messages

So this is what I have been doing at ESG, helping marketing departments understand the problem, that is get to the Insight that matters that will make the vendor successful. I have also been trying my hardest to disprove insights, often with help from research and polls. Generally as long a you can’t disprove an insight it’s OK. Strangely enough it is usually almost impossible to prove insights, because no one ever has complete market visibility.

These insights need to be backed up with research and customer validation, we need to be paranoid. Insights are nebulous and time bound. What might have been an amazing insight at one time won’t be that way always – once the problem is solved for the relevant segments the insight goes away. Challenging the insights that vendors rely on to make major product investments is important and it needs to be verified constantly. IT insights can decay or morph very quickly, much more quickly than fast moving consumer goods but not quite as quickly as fashion and apparel items.

Understanding what customers think today about IT products and services is crucial but even more important is being right about how they will be thinking next quarter, next year, next decade. If we understand how insights change over time we can adjust our segmentation, alter our solutions and correct our messages. If we fail to foresee the changes we fail to correct our messages, we get a misalignment between customer brand perception and marketing department messaging.

The IT business moves so fast that is often a fatal mistake.

Picture Copyright (c)

Why Virtualisation isn’t a universal panacea

The problem with computing is everyone wants to make it uniform – fit it into a neat box, categorise it as ‘all the same’, make it autonomic, self-managing and move on. In fact, IT is anything but uniform, so these simplistic approaches fall at the first hurdle.

Smart CIOs understand applications need to be treated differently depending on their value to the business. There are four key types – invest, operate, contain and kill.

‘Invest’ applications generally make up around 10-15% of the full estate. These are the applications the CEO knows about – the ones that when they get better, faster or more functional have a direct impact on business value. Examples are key CRM systems or MRP platforms, applications that underpin vital business processes and touch customers. These applications are not terribly cost sensitive, so when CTOs look to virtualise to take out cost, CIOs resist. Virtualisation is useful for invest applications but only if it improves agility, speed of deployment, adds functionality or reduces risk.

‘Operate’ applications represent 40-80% of the estate. No matter how much better we make them, they don’t improve business performance. Examples might be email or document management, internal HR systems or archiving systems. They need to be reliable and cheap. Virtualisation works here as a method of taking out costs. So does outsourcing and software as a service (SaaS) delivery.

‘Contain’ applications are those we wish we didn’t have – old stuff that’s expensive to run and difficult to change or manage. We get the amount of these we deserve: under-invest and the category grows. They have one other characteristic: they are difficult to re-platform and change to ‘operate’ status. Although they’re important to the organisation, they don’t typically make the business run any better if we improve them. We just want them to run silently for as long and as cheaply as possible. ‘Operate’ applications that have not had proper investment, love and attention will eventually move into this category.

‘Kill’ applications are always a nightmare. These are the ones that are impossibly expensive to run and maintain. By definition, they only represent a tiny handful of the estate (perhaps 1-5%). They are impossibly difficult to change. Often the guys who wrote and maintained the code are retired (or dead). No one else you know still has the hardware, except the Natural History Museum, and the vendor no longer supports the operating system. These might have been ‘contain’ applications that just wouldn’t stay contained, or ‘invest’ applications where you didn’t invest (silly you). There is only one thing to do with a ‘kill’ application – bin it. You know it’ll cause pain and disruption, as well as costing a lot of money, but it has to be done.

Smart CIOs know this already and take a pragmatic approach to their applications, understanding instinctively where to spend money and where to bleed a previous investment. And really smart CIOs never reach the point where they need a kill category.

Building a world leading cloud in the Middle East

I wanted to share with you my latest news.  A little over a year ago Steve Duplessie and I created ESG EMEA to help reach out and serve new European clients as well as provide local support for many of our ESG US based relationships.  We have been immensely successful in building this business, working with clients at senior levels to help them better understand the market and target their products and services into the right segments armed with the right message and solutions. Late last year, ESG EMEA and I had the most unexpected honor of being named as one of the top 50 most influential IT analysts in the world.

I have been offered a fantastic opportunity to lead an enormous IT undertaking with the Qatar Foundation in Doha to create a visionary 21st Century cloud IT operation for the Middle East and North Africa.  I will be leading that effort directly, and as a result I will be unavailable to take briefings or to provide direct services to clients for the foreseeable future.  Please continue to leverage the expertise and talents of the other fine ESG analysts in this regard.

I will continue to keep you updated on our progress, challenges, and thoughts on data centre and IT operations via the Hot Aisle and on the ESG site.

Is there an option other than Semtex to fix my Data Centre?

There are a lot of them around, Data Centres. A few of them are designed and operated very well and deliver great Power Usage Efficiency. Some could do a bit better, perhaps an airside economiser or two, or some hot or cold aisle containment, or maybe some DC power. Some are just a nightmare and could benefit from the administration of a wrecking ball.  For some data centres, it seems that no amount of fixing them up, improving plant and applying best practice will make any measurable difference. Let’s call them clunker data centres! (Maybe we can get the Government to do a cash for clunkers program for data centres?)

A clunker starts off with a ceiling height that is too low for hot air to separate out and migrate towards the CRAC units without too much mixing. The plenum under the raised floor is shallow and clogged up with cables and other detritus choking off airflow from the CRACs. The floor tiles are perforated and have low airflow characteristics. The cabinets are all lined up like a schoolroom. front to back to front to back….  You could cook turkeys in the back row. The CRAC units are low capacity and that capacity is exhausted. Naturally the boss wants you to install some 10KW racks in a hurry for a critical business project.

What can you do?  Say “no way”? Offer a co-location option in a commercial facility as an option? Start looking for a new job?

I bumped into a possible solution a few days back on Twitter when I connected with Mary Hecht-Kissell (@PR_Strategies) who looks after Coolcentric. The problem set, defined above, that makes a clunker data centre is all about getting enough cold into servers to remove the excess heat. Every element in the clunker conspires to make delivering more cold air virtually impossible. That’s where the coolcentric solution makes a difference. It delivers cold water right up against the servers. It adds additional cooling capacity that enables that set of additional 10KW (or more) racks to be installed in a data centre that seemed like a lost cause. It’s a fairly simple piece of technology, that has been well engineered to be retrofitted to most types of existing cabinets. It’s a water cooled door.

The water cooled door is fitted onto the back of the rack so that the hot air exhausting out of the cabinet gets chilled immediately and very efficiently. Liquids are about 4000 times more efficient at removing heat from a server than air, so these water cooled doors can remove significantly more heat with very low pumping energy.

One smart way to think about it is that the water cooled door acts like a mini, contained hot aisle for environments (like our clunker data centre) where cabinet alignment, roof height and plenum problems make hot aisle containment impossible.

Sounds like a pretty decent alternate to Semtex!

Interview with Mike Olson CEO of Cloudera

Cloudera struck lucky in getting a $5M A-round away just before the markets shut down in response to the collapse of the global financial system. Backed by Accel Partners and more recently Greylock Partners they are making a bet that Hadoop with a smart scale out approach to managing large amounts of data is a winning strategy.

Mike is an industry veteran, having been through the normal form, build and sell cycle a number of times, most notably with Illustra into Informix and SleepyCat Software into Oracle Corp. During the 3 quarters between the A round and a subsequent B round, Mike has been able to build a credible and valuable company that adds significantly to the Apache Hadoop distribution, without funded competition.

In that time Cloudera have built a 30 person firm, the Cloudera Distribution for Hadoop, built a support and professional services capability and created a training and certification business. He also made the very smart move of recruiting Doug Cutting out of Yahoo, the original author of the Hadoop system.

Hadoop’s initial use case has been in managing and processing Internet scale web at Yahoo and Facebook but is now seeing significant levels of interest in other markets where processing large scale data in real time is a competitive advantage. These include, financial services, government security, credit card fraud, genomics, digital media (3D) and national scale telecommunications firms.

Mike sees the development of Hadoop as a platform as the key area for now, with enhancements to add enterprise level management capabilities, “to avoid the need to have a team of Stanford graduates to hand crank the system” he told me. To address just this issue, they have developed the Cloudera Desktop that enables the centralised management of internal and public Hadoop clusters.

Later, Mike expects to see ISV’s delivering enterprise solutions based on a Hadoop platform that enable hitherto impossible feats of analytic prowess. Early enterprise adopters in this space are likely to be able to leapfrog the competition with smarter decisions and more insightful products and services that serve customers better.

Nice guy, smart company, killer product.

Internet scale log files break scale up architectures

Recently I spoke to David Emery a friend and colleague from my time at Coopers & Lybrand. He is now working on a major social media initiative for a global mobile telco. I was interested in David’s perspective as he has been working on a set of solutions to process log files at enormous scale. You might think this is a somewhat trivial use case but many modern business processes at scale generate impossibly large quantities of data that needs to be turned into information.

David and his colleagues have been using a number of open source components to attempt to solve the issue that scale up won’t scale enough and leveraging cheap compute and storage plus smart software and algorithms to deliver a solution. I think David makes a number of important comments that vendors would be well advised to heed:

  • Massive Internet scale problems are now solvable and enterprises want to mine the data to generate business information
  • The value of the whole solution is enormous but the sheer scale can make it unaffordable
  • Open source software and scale out commodity hardware are one possible solution to scale and affordability
  • Smart techniques or approaches like Hadoop and MapReduce are now becoming commonly used tools

Here is David’s story:

“Demand for storage capacity continues unabated, rising upwards along an exponential growth curve (Kryders Law) that has challenged vendors to squeeze more bang per buck into SAN, NAS and a whole array (pun intended) of predominantly vertical scaled enterprise class storage solutions.

Improvements and innovations over the years in the form cramming more ‘bits’ per inch onto a hard disk (magnetic bit density), RAID configurations and fibre optic technology connectivity have given us ever faster, larger and resillient storage solutions that we quickly fill and consume. This demand is unlikely to diminsh as applications and datasets become evermore enormous and sophisticated.

It’s not only super computing project applications driving huge demand. Whilst the data generated from the Large Hadron Collider may be an extreme example, it currently generates 2GB per 10 seconds of use, there are many less esoteric applications demanding huge volumes of storage: think Genome, DNA and RNA analysis, pharmaceutical research, financial modelling, Internet Search, Email and Web 2.0 Social networking sites.

The latter examples seem less obvious until you consider the sheer number of users: Facebook recently surpassed 400m customer accounts. It’s no surprise then, that the leading internet companies have taken a different approach to increasing storage demands rather than solely relying on the bottom up vertical approach of the traditional storage vendors.

Google and Yahoo have been key players in the development of distributed storage and analysis efforts (where there is data there is a demand to analyse and report on that data) that have yielded amongst others Hadoop, MapReduce, HDFS (Hadoop Distributed File System ) and GFS (Google File System).

In the massive scale out architectures required to drive the Google Search and Facebook web applications of the world, horizontal scale-out is king. Tiered architectures remain valid, but they are increasingly underpinned by free open source software.

It’s not only web start ups that have grown from small beginnings to large corporates that have embraced the free software stack, (Apache, Linux, Squid, MySql, Perl, Python, Nagios etc) to support expansion whilst avoiding crippling licensing costs, both small and large enterprises have joined the bandwagon as many of the barriers to entry have become irrelevant.

Product stability, maturity, wide spread adoption and readily available support have mitigated many of the perceived risks. The architecture scales, the software works and can all be built on a foundation of cheap commodity based servers. Virtualization and Cloud Computing have only reinforced this trend and Infrastructure is increasingly provided as a Service (IaaS) where the bare metal plaform is entirely abstracted and increasingly irrelevant.

The distributed architecture and horiztonal scale out approach is now beginning to shake up the Storage and Database tier and therfore, the Storage Market place. Customers want massive capacity, reliability and good performance, but they also want to avoid to vendor lock-in and large upfront investment costs. They also want more effective ways to process to such huge volumes of data.

Distributed File systems and Distributed compute processing make all of this possible. An emerging sector with players such as GlusterFS, Lustre and Ibrix has grown and the traditional storage vendors are shoring up their product ranges with similar solutions. HP bought Ibrix whilst Gluster is going down the monetized service Open Source route.

Logfile collection and processing provide a highly relevant, if more mundane, example of how these building blocks can be pulled to together to form a innovative and cost effective solution, that grows as the customer demands increase. In an infrastructure supporting a web based service supporting just under two million users, I’ve recently seen systems generate over 100GB of log file data per day.

Historically, collecting and storing such data is often overlooked or poorly implemented, if at all. It is often seen as a costly process, of limited use (typically because the value in the data is widely spread out cannot easily be retrieved in a meaningful way) and ultimately becomes little more than a burdensome risk, rentention and compliance requirement for many organisations.

Much of the data that is kept ends up on tape gathering dust. How can a customer expect to grow their service from two million users, to five and then twenty and beyond without crippling storage costs, let alone handle such large volumes of log file data and do something useful with it?

A storage platform fronted by a Distributed File System provides one possible answer. The DFS can be built upon multiple nodes running on cheap commodity hardware. More nodes can be added as required, the underlying hardware can be changed and can compromise many different nodes running on different platforms. The DFS provides the clustering, reliability and scaleout storage architecture under a single namespace, accessible by any number of standards protocols e.g. CIFS, NFS, HTTP, iSCSI etc. What’s more a multiple node system can provide readily available processing power, suitable for MapReduce type applications. Of course an alternative is to stick with large scale vendor specific storage platforms, where cost is reduced through economies of scale and risk is somewhat mitigated at the expense of lock-in.

A similar DFS approach has been successfully implemented by MailTrust (Rackspace’s mail division) to capture, collate and process huge volumes of daily log files using Syslog, Hadoop and MySQL. This may be ‘just’ log files, but the power of the data can be harnessed for better support operations and identify trends.

Of course this is possible with traditional tools and storage, but the key here is scale and affordability. I’ve recently seen other companies looking to build similar Distributed storage platforms that will also form the backbone of a private storage cloud, fronted by Eucalyptus software. Again, the whole architecture can be comprised of OpenSource software running on cheap commodity hardware.

It is the software and open standards that are increasingly enabling organisations to build massive internet web services, requiring massive storage. The database and the storage layers remain the last vertical bottleneck, but this is changing. SAN and NAS technology will not disappear, rather consumption will probably continue to grow (in line with Kryders Law), but DFS and greater flexibility are here to stay.

The success of companies such as Gluster and the wider spread adoption of HDFS and Google FS will remain the key as to how many customers, and by how much, move from hardware specific storage plaforms provided by the likes of HP, IBM and NetApps to more Open standard based solutions not requiring proprietary hardware. The same vendors will be providing much of the commodity storage anyway, but it’ll make interesting viewing watching the larger vendors respond.”

Has the Government a place in driving IT energy efficiency?

A while back I met Kathrin Winkler, Chief Sustainability Officer at EMC. She was delivering a briefing about EMC’s Corporate Social Responsibility (CSR) activities to a group of industry analysts. Most CSR briefings are as dull as ditchwater and devoid of anything remotely innovative or challenging. CSR is for some just going through the motions rather than an integral part of the brand, culture and values of a company. CSR needs to enhance brand equity or else it becomes an irrelevance that has no place at the boardroom table.

Kathrin broke the mould presenting a structured program of activities that crosses every part of EMC from sourcing, manufacturing, logistics through to disposal of equipment. Kathrin demonstrated to me that CSR is deeply embedded into EMC’s DNA, part of every business process, integrated into EMC’s brand and sponsored at board level.

It is no surprise to learn today that Kathrin was today (Tuesday 23rd February 2010) asked to present to Senator John Kerry’s (D-Mass.), US Senate Commerce Subcommittee on Communications, Technology, and the Internet, on the relationship between energy efficiency and technological innovation.

The hearing explored how expanding broadband, strengthening smart grid technologies, and improving consumer understanding of their energy usage can lead to dramatic energy savings and reductions in greenhouse gas emissions. It will also addressed how firms in the information and communications sectors are driving change and how government as consumer and regulator can help drive incentives to innovate.

Here are the insights that Kathrin put to the hearing as actions that Congress can take to help reduce the impact of ICT on the environment:

1. Demand the Federal Government lead by example to drive energy-efficiency throughout its ICT enterprise by aggressively pursuing virtualization, and ICT/data center consolidation. Congress, through its various Committees, has oversight responsibility for the largest ICT infrastructure in the world; the President’s FY 2011 budget requests $79.3 Billion for information technology. OMB included in the FY 2011 budget a plan to drive ICT consolidation: “OMB will work with agencies to develop a Government-wide strategy and agency plans to reduce the number and cost of Federal data centers. This will reduce energy consumption, space usage and en-vironmental impacts, while increasing the utilization and efficiency of IT assets…” Congress should request and review these strategic plans as part of the annual appropriation process and provide the resources necessary to accelerate OMB’s ICT consolidation plans.

2. Bridge split financial incentives in federal data centers – In many government data centers, those responsible for purchasing and operating the ICT equipment report to the CIO while those responsible for the power and cooling infrastructure typically pay the utility bills. This leads to a split incentive, in which those who are most able to control the energy use of the ICT equipment (and therefore the data center) have little incentive to do so or even insight into their own usage. This could be remedied by Congress requiring that agency CIO’s report on data center energy consumption and provide a baseline to Congress for future comparison.

3. Continued investment in cloud computing and next generational ICT research at NIST – Government has become an early adopter of cloud computing. As with the deployment of other promising technologies like smart grid and electronic health records, cloud computing will not be fully realized without open interoperability, data portability, and security standards. Congress should fully fund NIST’s Cloud Computing Standards Effort.

4. Collaborate with industry to promote the development of measurement tools for government and private sector data center operators. – Industry continues to struggle to develop acceptable models to measure data center efficiency. Without reliable efficiency methodologies on which to base rebate programs, it is difficult and expensive for utilities to conduct tests themselves and many simply forego rebate programs. With an estimated 1200 regulated utility service areas in the United States, there is tremendous potential for replication of successful programs. With Energy Efficiency Resource Standards mandates in more than 19 states, Congress should assist in providing useful measurement tools for the state PUCs to incentivize energy conservation in data centers.

Kathrin is 100% right, the key is ensuring that the artificial economics of Government that hide the costs of power from the costs of IT are ended and replaced with the realistic economics of the full end-to-end total life costs including disposal and operation.  Private business could learn a lot from this same approach and put an end to the crazy policy where facilities pay the utility bill and IT buy the equipment.

Live Efficient Data Centre Summit Webcast 21st April

Data Centre design is an evolutionary process and we can see the first signs of significant change in the latest sites. Co-generation, liquid cooling, cloud computing, high density are all likely to feature in the 2020 Data Centre. How are you placed with your existing Data Centre investments to take advantage of these changes? Will 20th Century Data Centres have to close because they just can’t deliver the level of efficiency that government legislation and economics demands?

Join Steve O’Donnell for a live data centre summit on the 21st April 2010 at 13:00 GMT, 8:00 EST.

Computing has never properly recovered from punched cards

Most people who read this won’t have a clue what a Hollerith punched card is. I only just caught the end of the era at University where I learned to program in FORTRAN coding one punched card at a time.  Once the stack of cards was complete, I delivered it to the computer operator for scheduling and execution.

Jobs were scheduled one at a time because that is how the primitive Burroughs scheduler and operating system was designed. Running more than one program at a time was still a pipe dream in those days so hardware engineers focused on making programs run faster by scaling up the hardware. Faster Processor, faster IO, more main memory, and more powerful instruction sets that did more in fewer clock cycles.


This propensity to scale up, make computers more and more powerful and IO faster and faster has been at the center of the whole industry for decades, an arms race for more clock cycles. In fact Gordon Moore a founder at Intel coined the phrase Moore’s Law to describe the rapid and continuous performance improvements in processor performance we have seen over the last 40 years.

The same is true for networking. Token Ring networks ran at 4MB/s, Ethernet at 10MB/s in the early days of the LAN, now 10GB/s is the norm for new installations, a three orders of magnitude improvement in 20 years.

Storage systems also have shown massive performance improvements with systems like Oracle’s ExaData offering 1M IOPS performance levels. Database technology has also seen massive performance improvements driven in part by smart data design and and great database technology. Performance levels we see today in these scale up systems are unimaginable only a few decades ago.


I remember in 1975 Donald Michie Professor of the Machine Intelligence and Perception unit at Edinburgh University proving mathematically that we would never see a computer beat a grand master at chess within out lifetimes. The problem was too big to solve with current technology and the rate of growth of performance required to beat a grand master, he told us was just unbelievable.

The fact that the unbelievable levels of performance we see today are still not enough for the largest Internet scale tasks such as hosting Twitter, Facebook or LinkedIn or managing the search indexes at Yahoo or Google. Scale up just doesn’t scale up enough. None of these Internet scale enterprises use scale up technology any more. They scale out at every level. Scale out compute, storage, network, application architecture and even at the database level.

Scale out applications are becoming more common with developers adopting a MapReduce style approach to coding, where a master process splits the problem into a number of smaller parts and then farms them out to a large number of processes that derive the answer. The master process then combines the answers to deliver a single consolidated output. For the largest scale computational problems this is often the only way to get to the answer in a meaningful timescale.

Scale out compute is now commonplace, with any number of hypervisor technologies (VMware, Xen, KVM, Hyper-V) supported by a cloud operating system to handle virtualisation and load balancing.

Scale out storage is also a growth industry with products like HP’s X9000 (IBRIX) and IBM’s XIV gaining traction in the market. Object storage is also gaining popularity with URI or HTTP protocols becoming commonplace on any number of offerings such as Amazon’s S3. Open source file systems such as Apache Hadoop add an additional feature of understanding the location of the data so that compute and storage elements can be closely co-located to reduce network latency and end to end bandwidth demands.

Scale out networking follows the logic that most network traffic in a scale out world is edge to edge so why bother with a core network? Converge on 10G lossless Ethernet using top of rack switches supporting iSCSI, NAS and HTTP protocols to converge the SAN and LAN into a common routable IP system.

Scale out databases are now commonly referred to as NOSQL databases that go back in time to pre-relational designs that do not provide ACID consistency guarantees (atomicity, consistency, isolation, durability) but allow sharding to split the data sets over multiple systems to improve the parallelism of the overall system.

The legacy of the punched card is still with us because Information Technology is an evolutionary process. Scale up approaches continue to support the evolution, but one day the dinosaurs will die out.

Why storage will inevitably migrate to flash and trash

If you have been following the storage business for a while, you will have noticed a few changes:

  • Introduction of Flash Memory components as Solid State Disks
  • Serial ATA (SATA) disks becoming popular and growing in capacity (2TB soon)

There are lots of other disk technologies around like Fibre Channel and SAS but SSD and SATA are getting the big press and are taking market share.  You might ask why? Disks in a data centre, use power day and night, 365 days a year. A typical disk (Seagate Cheetah 15K.4 147GB SCSI) uses about 18W. In a data centre that means that it’s lifetime (5 years) power consumption including cooling and power protection (PUE 1.6) is likely to be 1.26 MWh. At 10c per KWh that equates to $126 per disk. So for 1PB of storage the lifetime cost of power will be $860,000 not including capital plant.

So getting the power that disks use down to a reasonable level is important. The formula that engineers quote for power consumption is:

Power ∝ Diameter 4.6 x RPM 2.8

So if we use large physical disks like in the old days where 8″ and 14″ were common disk formats we get 7717 times more power needed to drive a 14″ disk than a smaller 2″ one.


Power of 4.6

Ratio to 2″ disk













So the world is moving to smaller and smaller disks to reduce power demand, reduce heat output and deliver increased densities.

Spin speed has a similar impact so low spin speed disks use a lot less power than their high speed equivalents.


Power of 2.8

Ratio to 5400 RPM













Slow spin speed, small disks use less power than larger high spin speed disks.

As the price of SSD continues to drop, the high spin speed disks that we use for high IOPS solutions will increasingly become replaced with SSD, whilst capacity will be served by low spin speed SATA migratting the storage world to Flash and Trash.

Inevitable and proved by the maths.

The washroom attendant’s washroom attendants

My old CIO at BT, Al-noor Ramji had a most delightful and endearing way of describing just how unimportant and disconnected IT Infrastructure is from reality by describing us as “The toilet cleaner’s toilet cleaners”. Like other successful CIOs Al-noor had the ability to cut through the noise and explain things as they are.


In every business conversations start at the CEO level who is focussed on understanding his customers’ needs and executing on a plan to service them better than the competition. This is where business value is created and strategic visions are formed. It is here that CEOs deploy capital to create business value, revenues and EBIT that eventually translates into shareholder returns. This is the engine room of capitalism and it is here that many businesses win or lose.

The strategy and vision typically filters down, layer by layer in the organisation, through marketing or product management who work out how to combine products and services together to deliver a compelling answer for the customer. It is also here that the first glimmering of an idea for supporting IT Services are formed and it is here that the first washroom attendant is called in, the CIO. Typically the CIO gets a briefing that she needs to deliver some changes to the CRM system and some new workflow for the call centre.

Already the layers of filtration have dulled the vision and strategy formed in the CEO’s office suite.

Usually the first that IT Infrastructure get to hear about this new initiative is when it is a few days away from deployment, too late even to properly introduce it into service properly. The washroom attendant’s washroom attendant is used in a purely reactive way, responding in real time to seemingly disconnected and random acts of violence to the IT estate.

We can all recognise this behaviour, repeated time and again in the largest enterprises and always leading to a sub optimal outcome. As consumers, we have also experienced business initiatives that have the CIO and IT Infrastructure fully integrated into the initial conversations that have enabled a competition crushing solution to be deployed. Google has wiped out the 20th century advertising industry and outperformed their digital competition by being joined up. Apple took on the music industry with iTunes and now own the space. There are many other examples but unfortunately they are swamped by the normal, broken approach that delivers these frustratingly sub optimal outcomes.

To engage in the initial conversation that forms strategy and vision, IT must become a trusted advisor that stops talking about IT return on investment and starts talking about business return on investment. Only when IT can’t help the CEO kill the competition does it need to be cheap and silent. Otherwise we serve our companies badly if we don’t speak up and become part of the initial conversation.

Being part of that conversation is all about delivering business agility, reduced cycle times to deliver products and services, reduced business risk and more certain outcomes. All deliverables that IT was set up for in the past.

We forget this lesson at our peril and will certainly be consigned to being cheap and silent forever.

Strong results from Seagate and Western Digital signal an uptick in confidence

Both Seagate and Western Digital announced Q2 results last week perhaps signalling a return of confidence in the disk drive channel. Component manufacturers in the enterprise IT channel are an interesting bellwether of market confidence as orders need to be placed in advance of shipments of finished goods. There is a significant delay in revenue recognition from disk drive component to populated storage controller. On this basis we should see some interesting positive results for the desktop, server and storage controller vendors in Q1 2010.

Revenue was up at both Western Digital (44% y/y) and at Seagate (33.4% y/y) as were shipments. Seagate reported 36% growth whilst Western Digital 29% growth.