The Storage Area Network (SAN) is now a key component of most reasonably sized IT Infrastructures. The SAN we rely on today evolved from very humble beginnings into a mission critical Enterprise IT component. The first SANs were introduced to handle small numbers of enterprise scale systems (zSeries and iSeries computers from IBM) being connected to storage arrays a few cabinets away.
Note: SANs support block level access to data, the support for filesystems and metadata is contained at a higher level in operating system code or sometimes within the database or Java appliance. There are other types of shared storage that are file oriented (as opposed to block) these are known as Network Attached Storage (NAS) or Object Stores (ObS).
NAS storage often supports CIFS (Common Internet File System) and NFS (Network File System). CIFS is a Windows centric filesystem whilst NFS is usually for UNIX based platforms.
Both NAS and ObS use TCP/IP to communicate between the host computers and the shared storage controllers. This IP traffic is usually carried over the LocaL Area Network (LAN) rather than a dedicated SAN.
SAN traffic between the application and storage array is significantly more sensitive to delayed or lost data than is similar LAN traffic. To prevent customer affecting events, the SAN is typically designed to avoid overloading. One common solution is to construct the SAN so that it can accommodate every port sending and receiving at full speed. Unfortunately, SAN connections are expensive and SANs are cost constrained. SAN architects balance cost and risk and recognize that a mistake can result in customer impacting outages.
The protocols of the SAN are
- FC (Fibre Channel)
- FICON (Fibre Connectivity) – a version of FC for IBM zSeries computers
- FCoE (Fibre Channel over Ethernet)
- FCIP (Fibre Channel over IP)
- iSCSI (IP Small Computer Systems Interface)
All are enterprise scale derivatives of the peer to peer SCSI (Small Computer Systems Interface) of the 1980s. The latter two (FCIP and iSCSI) are IP encapsulated protocols and are carried over a LAN.
FC is implemented at the same protocol level as Ethernet but unlike Ethernet is designed as a lossless protocol with very predictable latency characteristics. FC was designed to support the sensitive application to storage interface maximising throughput and reliability.
Note: FCIP and iSCSI don’t have any mechanism to deal with LAN overloading and as a result they are prone to delays and frame loss. As a result they not suitable in larger environments where engineered levels of performance and resilience are required. Their main benefit is that they can be implemented at low cost on an existing LAN.
Physical layer over-provisioning is compounded when implementing complex, multi-tier, edge-core-edge switch designs in very large scale environments. Intended to provide physical reconfiguration flexibility to accommodate growth, these designs actually introduce substantial additional cost, complexity, and resultant risk.
SANs are typically designed around a fan-out ratio between host connections and storage array ports of between 6:1 and 12:1 depending on the estimated intensity of the storage activity. Low utilization hosts can be theoretically supported at the upper fan out ratios whilst highly utilized hosts need much lower fan out ratios.
In a virtualized world, applications are untethered from the underlying physical hardware, the same hardware that has physical network connections to the SAN. For example, VMware VMotion, (deployed in production at 60-70% of VMware customers according to VMware), enables the migration of an entire running virtual machine, application and all, from one physical host and physical storage system to another seamlessly and without user interruption.
Cloud computing offers many benefits to the Enterprise, obscuring the physical complexity of servers, storage and networks from applications, enabling rapid deployment and enhanced availability through outages. Cloud does not, however offer a free lunch; inadequate underlying SAN infrastructure will become very rapidly exposed as more and more workload is virtualized and automated tools move and migrate critical business applications between nodes on ever larger clusters of servers and storage.
Cloud not only insulates and obscures the physical complexity of the underlying hardware from applications; it also has the potential to obscure the cause of an outage from IT Operations, increasing time to resolve and reducing right first time diagnostics. It is not unusual for a fairly simple fault that can be repaired in a few minutes to take many hours to diagnose the root cause with much technical hand-off between support teams on the way.
The choice is stark: over-engineer, pay the price and hope for the best or adopt a more scientific approach and manage the SAN proactively.