<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-35811827</id><updated>2011-04-21T17:39:39.643-07:00</updated><title type='text'>Think Tank for Hire</title><subtitle type='html'>Executive manager with experience as CEO, CTO, EVP of Technology, VP of Engineering, and founder of eight previous companies. While in these positions my respective teams and I have delivered unique solutions that transformed corporate goals into reality, built effective business solutions, and produced rapid and sustained business growth.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>8</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-35811827.post-116053279156001862</id><published>2006-10-10T19:08:00.000-07:00</published><updated>2006-10-10T19:13:11.696-07:00</updated><title type='text'></title><content type='html'>From: &lt;a id="_ctl0_ContentPlaceHolder1_ArticleMain_AFromLinkLib" title="See more articles from Computer Technology Review" onclick="var s=s_gi('highbeamcom');s.linkTrackVars='prop34';s.prop34='elibrary article free to By-Line Publication';s.tl(this,'o','elibrary article free to By-Line Publication');return true;" href="http://www.highbeam.com/Search.aspx?q=%22kris+land%22" rel="nofollow"&gt;Computer Technology Review&lt;/a&gt;   Date: &lt;a id="_ctl0_ContentPlaceHolder1_ArticleMain_ADateLink" title="See more articles from a few days before and after June 1, 2003" onclick="var s=s_gi('highbeamcom');s.linkTrackVars='prop34';s.prop34='elibrary article free to By-Line Date';s.tl(this,'o','elibrary article free to By-Line Date');return true;" href="http://www.highbeam.com/Search.aspx?q=%22kris+land%22" rel="nofollow"&gt;June 1, 2003&lt;/a&gt;   Written by: &lt;a title="See more results for '&amp;quot;kris land&amp;quot;'" href="http://www.highbeam.com/Search.aspx?q=%22kris+land%22" rel="nofollow"&gt;"kris land"&lt;/a&gt;&lt;br /&gt;&lt;a title="" href="http://www.highbeam.com/Search.aspx?q=%22kris+land%22" rel="nofollow"&gt;&lt;/a&gt;&lt;br /&gt;The emergence of ATA drives as a serious alternative to enterprise storage holds the promise of significantly reducing storage acquisition costs. This is amplified by the advent of Serial ATA, which brings features like hot-pluggability, CRC for all communications (including data, commands and status), and thin flexible cabling to further decrease the gap between ATA and more expensive "server" class drive. However, in order to fully realize the advantages of the ATA platform for enterprise storage, new software technologies are required to guarantee the reliability and maximize the performance of the platform.&lt;br /&gt;&lt;br /&gt;Specifically, RAID technologies currently used with SCSI and Fibre Channel storage implementations are ill-suited for use in the ATA arena. The pervasive use of write-back caching and the high cost of NVRAM-based board solutions negatively impacts the reliability and price advantages of the ATA platform, introducing the possibility of corruption and data loss and negating much of the cost benefit for the enterprise user. Similarly, the clear attractiveness of RAID Level 5 for large capacity storage is all but eliminated because existing methods for implementing low-cost RAID-5 systems have severe limitations in performance or reliability. On the ATA platform, this results in the undesirable flight to RAID-10 for most types of workloads and directly reduces the cost benefit of the platform.&lt;br /&gt;&lt;br /&gt;In order for ATA-based storage to achieve its full potential in the enterprise, it is necessary to understand the limitations of today's hardware-assisted RAID solutions as these attempt, imperfectly, to address the unique characteristics of the ATA drive platform. Particular attention is placed on RAID Level 5, which is the most promising RAID type given its natural application to the larger capacity storage applications that will dominate networked ATA adoption.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;ATA Characteristics&lt;/strong&gt;&lt;br /&gt;ATA disk drives emerged in the late 1980s, as desktop computers began their ascent into the mainstream of the IT universe. ATA is an acronym that stands for "AT Attachment," a reference to the IBM PC/AT that served as the de facto reference specification for the desktop since its introduction in the early 1980s. Though synonymous with IDE (Integrated Drive Electronics), the ATA designation is the subject of various ANSI specifications that have evolved the platform over time and is generic to the category.&lt;br /&gt;&lt;br /&gt;Since their initial shipments in 1986, ATA drives have grown substantially in volume. Today, ATA drive shipments outnumber SCSI drive shipments by a factor of 6 to 1. And they outnumber Fibre Channel drive shipments by a factor of 10 to 1. Their volume differences are accounted for by the continuing centrality of ATA's role in the highest volume segment of the PC universe, the desktop computer. Because of their substantial volume advantages, they are subject to far more significant price competition than higher end drive platforms, and on average cost between 3 and 4 times less than SCSI or FC drives. The result has been an increased desire by IT end users to employ ATA drives in enterprise data settings as opposed to using them exclusively in desktop PC devices and workstations.&lt;br /&gt;&lt;br /&gt;As engineered products, ATA magnetic disk drives harness the same basic technologies found in higher-end drives that employed different interfaces, most common of which are SCSI and Fibre Channel drives. They employ platters, actuators and a variety of micromotors. As such, ATA drives take advantage of the rapid advances in these component technologies that all disk drive manufacturers are continuously exploiting. Ranging from greater volumetric densities to enhancements in seek performance, ATA drives leverage that same basic technologies as SCSI and FC drives.&lt;br /&gt;&lt;br /&gt;However, ATA drives do have significant differences from higher end drive platforms, and these differences must be addressed if the ATA platform is to emerge as a enterprise class storage platform. The first major difference is that ATA drives are subject to different sorting criteria than higher end platforms. Quality control is relaxed because of the relative tradeoff in profitability and defect rates. Instead of 1 percent component rejection levels as seen in SCSI drives, ATA drives are typically subject to a less demanding 5 percent rejection rate. The other differences between ATA and SCSI flow from their different end use targets. Because they are intended for desktop computers, ATA drives use different motors that generate less heat and ambient noise than SCSI. They are also slower than their SCSI counterparts from a RPM basis, given similar design goals to minimize desktop heat and noise but also to maintain SCSI performance advantages at similar capacity levels. That is, drive manufacturers frequentl y release similar capacity SCSI and ATA drives with higher RPMs available first in the SCSI device.&lt;br /&gt;&lt;br /&gt;To compensate for decreased performance, ATA drive manufacturers have employed a variety of techniques to enhance the ATA platform. The most important of these techniques is called Write Back Caching. Write Back Caching involves the use of small memory chips contained in the drive electronics that buffer data transfers to the ATA disk. By using these memory modules, which are typically deployed in 2MB to 8MB configurations, the ATA drive can signal the completion of writes more quickly than if it had to Wait until that data was completely transferred to the disk media. However, even as write back caching provides a performance boost, it introduces a series of reliability concerns that contribute to the failure of the drive platform to achieve enterprise-class acceptance. These and other obstacles to reliability in the ATA drive platform will be discussed in detail later.&lt;br /&gt;&lt;br /&gt;One of the most significant developments in the ATA world has been the evolution of the platform from a parallel bus architecture to a serial one. This evolution was undertaken to accelerate the use of ATA in networked storage environments and it has proven to be a crucial step in raising the awareness of the platform in multi-drive configurations. Technically, the Serial ATA drive is a seven-wire replacement for the physical ribbon of parallel ATA with a variety of benefits for denser storage implementations. The most important of these includes the cabling change (which facilitates better airflow and easier assembly) as well as the addition of capabilities like hot-pluggability and a point-to-point topology that enables full data-path switching. The first Serial ATA specification was completed in 2000 and drives supporting serial ATA begin initial production runs in the second half of 2002. Major research houses like IDC predict that Serial ATA will dominate the ATA platform within three years, rising to a 95-99 percent share of new drive shipments by the mid-2000s. In the area of networked storage, IDC further predicts the possibility of Serial ATA commanding at least 20 percent of entry-level servers by 2004.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Serial ATA Characteristics&lt;/strong&gt;&lt;br /&gt;Several years ago, the ANSI Parallel ATA specification was amended with the Ultra DMA protocol, which brought advanced CRC algorithms into the ATA world. These have been carried into the Serial product. While this inclusion has addressed low-level data transfer integrity issues, a new series of problems have surfaced that stand to pose the largest obstacle to ATA acceptance in the enterprise storage world. These problems center around the use of RAID technologies that have been largely tailored and refined through their application to multi-drive SCSI and Fibre Channel storage. As ATA begins to enter the multi-drive network storage world, enterprising vendors are attempting to apply legacy RAID strategies to multi-drive ATA installations but are achieving mixed results. Today, all hardware-assisted RAID technologies native to the ATA platform--as well as ascendant software RAID packages--fail to address key performance and reliability concerns that are unique to the ATA market. By failing to address these pro blems, it is unlikely that the ATA platform will break beyond the entry-level category that IDC and others envision for it.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RAID-5&lt;/strong&gt;&lt;br /&gt;RAID-5 is one of the methods for achieving higher performance and greater resilience to drive component failure that was originally developed by the U.C. Berkeley RAID team in the late 1980s and early 1990s under the auspices of principal investigators David Patterson, Randy Katz and their students. RAID is an acronym that refers to Redundant Array of Inexpensive Disks, and the original RAID project was conceived as a way to exploit the benefits of high-volume magnetic disk drives by using strings of lower cost drives together, in order to achieve the same benefits as more expensive storage configurations popular in the high-end systems of the day. The groundbreaking work of the RAID team and the industry acceptance that shortly followed have made RAID strategies and resultant technologies the ascendant paradigm for dealing with magnetic disk storage today.&lt;br /&gt;&lt;br /&gt;RAID-5 specifically is a methodology for achieving redundancy of data on a group of drives without sacrificing half of the available capacity as mirroring (RAID-1) and its variations (i.e., RAID-10) do. RAID-5 achieves this storage efficiency by performing a parity calculation on the data written to disk and storing this parity information on an additional drive. Should a disk drive fail, the data can be recovered by computing the missing data using the parity and data blocks in the remaining drives. RAID-5 is an especially popular methodology for achieving redundancy because it is more economical than RAID-1 insofar as more disk-drive capacity can be rendered usable from a group of active drives. It has been estimated that RAID-5 accounts for 70 percent of all drive volumes shipped into RAID configurations (the actual percentage of RAID-S per discrete RAID configuration is lower, given the popularity of striping and mirroring with OLTP). This would be sensible given that RAID-5 is typically associated with f ile serving and similar workloads, which account for significantly more capacity usage on a global basis than higher intensity OLTP workloads, for which RAID-5 is rarely used.&lt;br /&gt;&lt;br /&gt;The attractiveness of RAID-5 to the ATA storage opportunity is even more pronounced. Given the great volumetric density advantages of the ATA platform versus SCSI and Fibre Channel, ATA is ideally suited for larger capacity storage installations. The capacity efficient RAID Level 5 is functionally allied with this focus on maximum capacity per dollar of storage cost. Though some have speculated that the high density advantage of the ATA platform will result in a willingness of end users to employ mirroring given a surplus of raw capacity, the fundamental laws of technology would seem to argue against this. The sharp and continuous rise in the processing power of the Intel chip, for instance, has not been accompanied by an increase in the sales of 4-way or 8-way servers--quite the reverse is true, with one- and two-way processor servers today dominating most application usages on the market. In the storage market, given its long evidenced storage elasticity, greater volumetric densities will be accompanied by a growth in the desire to maximize capacity as well as prevent disruption from drive failure. In this view data protection based on parity strategies, as opposed to redundancy ones, will be maximally appealing--provided that they pose no crippling obstacles in their implementation.&lt;br /&gt;Today, even for expensive solutions on SCSI and Fibre Channel platforms, there are obstacles to the universal ascendance of RAID Level 5 and the foremost among these is speed. For instance, one reason that RAID-5 is rarely used for OLTP application storage is because of its low performance for such workloads. As a tradeoff to its storage efficiency benefits, RAID-5 imposes additional computational as well as I/O burdens on the underlying magnetic disk storage. These additional burdens in many cases result in the general characterization that RAID-5 is slower than other types of RAID. And, in fact, with many commercial RAID controller technology--both hardware and software-- RAID-5 is often the slowest performing configuration, especially when compared to straight striping (RAID-0), mirroring (RAID-1) or striping + mirroring (RAID-10). In some cases--for instance, software RAID from vendors like VERITAS--the difference in performance between RAID-S and RAID-0 is as much as lox.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Conventional RAID-5 Performance Penalties&lt;/strong&gt;&lt;br /&gt;The reason that RAID-5 imposes performance penalties when compared to other methods of RAID is due to two principal and related requirements. The first is the calculation of the parity itself, which requires computational resources and takes place in real time. This calculation can be accelerated by the use of specialized hardware such as an XOR engine, and most hardware RAID controllers employ this type of component to assist performance. The second performance cost, by far the most extensive, is due to the way that RAID-S typically conducts its writes. This process is called Read-Modify-Write.&lt;br /&gt;&lt;br /&gt;During the process of a sequential write, the RAID-5 implementation will attempt to write data in full stripes corresponding to the number of drives in the RAID group. However, at the end of any sequential write process and during any modification of data in place, it is not possible to write a complete stripe and the technique of Read-Modify-Write must be employed. The Read-Modify-Write process is the prototypical RAID-5 process and it is responsible for much of the performance limitations seen in most implementations of RAID-5.&lt;br /&gt;&lt;br /&gt;In a typical Read-Modify-Write operation, multiple I/Os must be executed for each logical write request. The first 110 involves reading an existing block or sequence of blocks on the disk. The second I/O involves reading the parity associated with the block(s) that will be modified. The third I/O involves writing the new data blocks, and the fourth 110 involves updating the parity associated with the relevant block(s) corresponding to the new data that is being written. No matter how small the set of drives that comprise the RAID group, the minimum number of I/Os required in a single write operation that involves the standard Read-Modify-Write approach is four, with an even greater number of I/Os associated with multiple data block writes in larger RAID sets. Furthermore, certain approaches to ensuring reliability in RAID-5 implementations (see section below) involve additional 110 activity such as logging atomic parity updates separately which increases the minimum number of Read-Modify-Write I/Os to six or higher. It is desired to update block D2 with D2'. It is also necessary to update the parity P to P'. Two reads are needed to obtain block D2 and P. D2' and P' are then computed. Finally, two writes are performed to write D2' and P' to disks.&lt;br /&gt;&lt;br /&gt;Because of the multiple I/Os required in existing RAID-5 implementations, write performance is characteristically poor, often 5X-10X slower than mirroring or striping alternatives. There are hardware limits to the performance that is achievable given the amount of 110 activity that is generated upon each write.&lt;br /&gt;&lt;br /&gt;In addition to low write performance, conventional RAID-5 implementations have other performance limitations that are unique to its RAID flavor. Two of the most common are RAID group initialization and RAID group rebuilding. In RAID-5 group initialization, the RAID solution needs to perform a scan of every data sector on each disk in the RAID set and initialize the corresponding parity. This initialization process is time consuming, the magnitude of which is directly related to the size of the RAID set and the capacity of each drive in the group.&lt;br /&gt;RAID-5 rebuilding is a process that must occur after a RAID-5 set experiences a disk failure. When a disk fails in a RAID-5 set, the missing data and parity contained on the failed drive must be regenerated on a replacement drive once the new working drive is inserted into the set or an existing hot spare is activated as the replacement drive target. Similar to initialization, the process of rebuilding requires that each data block on the system is read and the XOR computations are performed in order to obtain the absent data and parity blocks, which are then written onto the new disk. Often, during the process of reading all data from the disk to recompute the missing data and parity, bad sectors may be encountered, and it is no longer possible to rebuild the array. Depending on the size of the RAID group and the capacity of each drive, the rebuilding process is time consuming and may degrade the use of the drives in the RAID-5 set for normal activity. Both the initialization and the rebuild processes are ad ditional performance and reliability penalties of conventional RAID-5 implementations that will occur as a matter of normal operation.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Conventional RAID-5 Reliability Penalties&lt;/strong&gt;&lt;br /&gt;Based on the dominant approach to implementing RAID-5 at present, there are several discrete reliability problems that arise in common implementations. Many of these reliability concerns are generated by events like power failure, which can often set in motion a cascade of correlated failures. For instance, a power failure not only interrupts active writes, which can invalidate any parity that is in the process of being updated, but can also bum out disks with aging components. As a result, power failures can often cause data loss in many types of RAID implementations by destroying both the parity and data associated with a "parity stripe." Part of this is due to characteristics of the ATA platform itself, such as differences in assembly line quality control processes that have more tolerance for production variability. However a large part of the quality differential is due to ineffective strategies employed by the ATA RAID community using legacy RAID methodologies.&lt;br /&gt;&lt;br /&gt;The most salient reliability problem in the ATA RAID arena is the nearly universal use of write back caching in all ATA implementations, even those driven by hardware RAID solutions. Write-back caching is a function that is enabled by the inclusion of small cache memory components within the disk drive electronics. By providing this additional memory, the drive is able to commit to write commands by buffering bursts of data in memory prior to the full completion of writing data onto the disk platter. When the drive signals that a write has been completed, the application moves on to its subsequent operation even if the data in question remains in the drive's write-back cache. Quicker completion of writes leads to faster application performance when disk latency is the primary performance limitation. Because of this, the logic behind making write-back caching a default strategy is straightforward: to increase the performance of the disk platform.&lt;br /&gt;&lt;br /&gt;This performance enhancement is understandable given ATA's traditional role as a desktop device with most target implementations limited to one or two drives. Drive manufacturers have sought to differentiate the high-volume ATA offering from the higher margin SCSI and Fibre Channel drive business by limiting rotational speed thresholds on the platform. This gives pressure to optimize for performance gains like those presented by write back caching, and for the most part the industry benchmarks the ATA platform with write back caching enabled. It is possible that this will change in the future, but at the present moment this strategy is so pervasive that drive manufacturers presume write-back caching to be enabled when certifying their ATA products.&lt;br /&gt;&lt;br /&gt;Though performance enhancement is helpful, the use of write-back caching in ATA RAID implementations presents at least two severe reliability drawbacks. The first involves the integrity of the data in the write-back cache during a power failure event. When power is suddenly lost in the drive bays, the data located in the cache memories of the drives is also lost. In fact, in addition to data loss, the drive may also have reordered any pending writes in its write-back cache. Because this data has been already committed as a write from the standpoint of the application, this may make it impossible for the application to perform consistent crash recovery. When this type of corruption occurs, it not only causes data loss to specific applications at specific places on the drive but can frequently corrupt file systems and effectively cause the loss of all data on the "damaged" disk.&lt;br /&gt;&lt;br /&gt;The reason that this more global type of corruption occurs is due to another problem with using a write-back cache. This second problem involves the sequencing of data that enters and exits the write-back cache. That is, ATA drives are free to reorder any pending writes in its write-back cache. This allows the write-back cache to obtain additional performance improvements. Instead of issuing sector commitments and then initiating rotational seeks for each sector in the exact sequence that commits were made, the drive places data on sectors that it encounters as platters rotate through an increasing or decreasing sector path. This reduces seek times and speeds up cache throughput. However, if a power or component failure occurs during a write process, the identity of sectors that make it to disk will not correspond to the sequence in which they were written. This causes corruption as applications are unable to recover from drive failures because they have no way of resolving the order in which data made it to the disk media versus which data was lost in cache. Even if individual drives did not reorder writes, there is no convenient way of preventing the reordering of writes that are striped across multiple drives that use write-back caching, since any individual drive is unaware of the writes being serviced by another drive.&lt;br /&gt;&lt;br /&gt;These write-back cache problems are a common cause of data corruption. In fact, the weakness of the write-back cache is even a relatively well understood problem, and in higher end drive platforms RAID devices and sophisticated storage administrators will default to a policy of prohibiting the use of the SCSI write back cache. However, in the ATA RAID arena, the write-back cache is usually enabled by default, and performance measurement is conducted with the caching enabled, which is misleading given that the reliability implicit in RAID is compromised by the use of write-back caching.&lt;br /&gt;&lt;br /&gt;Deactivation of write-back caching prevents the most severe of the ATA RAID corruption problems. The tradeoff for RAID-5, however, involves even lower performance. As discussed in the previous section, the legacy methodologies for RAID-5 impose a significant performance limitation on this type of RAID, one that is partially addressed by vendors through the default use of write-back caching. Unfortunately, deactivating write-back caching usually has a dire effect on performance.&lt;br /&gt;&lt;br /&gt;And yet, there is a further dilemma. Since ATA vendors are not currently certifying the recovery of drives that deactivate write-back caching, it is possible that drives operating without this function will have greater failure rates. So, while vendors do achieve the goal of preventing an obvious source of data corruption, they run the risk of increasing drive failure.&lt;br /&gt;The other showstopper problem posed by disk failure in ATA RAID-5 solutions is the parity recalculation problem. If the system crashes during the middle of a write process, the parity calculation that applied to the active data write may be inconsistent. As a result, when the system is powered back on, it is necessary to regenerate this parity and write it to disk. Since the system will not be able to determine where the last active write was in progress, one solution is to recalculate all of the parity on the RAID-5 group. This recalculation process takes time and every sector of each participating RAID group must be scanned. Based on various leading system implementations currently available, the parity recalculation process can take between 45 minutes for a standard RAID-5 group of five or six drives to several hours for larger sets.&lt;br /&gt;Currently, the parity recalculation problem is a significant drawback of software RAID-5 solutions. There is no easy way to avoid this penalty when using the traditional read-modify-write approach to RAID-5. Some RAID-5 solutions in the ATA universe do avoid this limitation, however, through the use of "pointers" that records the positions of the in-place updates. These pointers are stored either on another disk or within a small NVRAM component. This technique is called "dirty region logging." If the pointer is stored on. another disk, it generates an additional I/O step that will further degrade performance. Nonetheless, it will deliver a performance benefit by avoiding the need to recalculate all parity upon power failure; however, it does not eliminate the associated reliability problem since, in the event of a crash, some parity will still be left in an inconsistent state until recovery can be performed. If dirty region logging is combined with write-back-caching, the original reliability problem caused by a power failure or power spike event will result in inconsistent or corrupt data. Another solution is to log the data and parity to a separate portion of the disks before responding to the write request; the logged data and parity are then copied to the actual RAID stripe. In the event of a failure, the data and parity can be copied back to the RAID stripe. This approach, while much more reliable than dirty region logging, imposes additional disk latency and makes RAID-5 writes significantly slower.&lt;br /&gt;A complete, high-performance way around these parity update problems in RAID-5 is to use significant quantities of NVRAM with reliable battery backup. Unfortunately, the use of NVRAM will tend to degrade RAID-5 performance for streaming where throughput rather than latency is important. NVRAM is often employed in higher-end SCSI and Fibre Channel RAID controllers because it improves performance for many applications and confers reliability benefits in the face of power failure. Nevertheless, it is undesirable for the ATA world to move to this type of solution. One of the most important aspects of the ATA storage opportunity involves its cost savings over alternative drive platforms. Given this, vendors do not have the luxury to equip ATA RAID solutions with a lot of expensive hardware components. Moreover, there is some expectation within the ATA community that the widespread adoption of serial ATA will result in an increase of drive counts within standard rack-mount servers. In many of these scenarios, the r eal estate required for additional board-level components will not be readily available on motherboards or easily addressable through the use of expansion boards. This means that the ATA world will continue to have relatively few options available for addressing reliability concerns associated with RAID-5 implementations simply by applying more hardware.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Conclusion&lt;br /&gt;&lt;/strong&gt;The advent of Serial ATA drive technology holds the promise for radically altering the economics of networked storage. However, the ATA drive platform is largely unsuitable for enterprise class storage because of severe reliability problems in RAID solutions addressing the ATA universe. These reliability problems are exacerbated in the case of RAID Level 5, which amplifies susceptibility to drive failures and imposes crippling performance limitations. While RAID Level S has great popularity and should top demand for the overwhelming bulk of drive shipments that address mass storage, it fails to confer these advantages in the ATA world where expensive NVRAM-based hardware is economically infeasible and performance limitations make it impractical. As a result, data protection must be achieved through mirroring rather than parity, which is wasteful for many applications and reduces the cost savings advantage of the ATA platform.&lt;br /&gt;&lt;br /&gt;A new methodology to conduct RAID-5 is required if its promise in an era of low-cost drive platforms is to be realized. Such a methodology would provide enterprise-class reliability without NVRAM and would deliver near-wire-speed write performance within existing ATA rotational speed frameworks. If this type of solution were available, RAID Level 5 ATA-based storage would achieve rapid and ready acceptance throughout the enterprise-class universe.&lt;br /&gt;Boon Storage Technologies, Inc. has a breakthrough RAID-5 technology called SR5 that overcomes the limitation of existing ATA RAID-5 solution. SR5 truly makes ATA drives enterprise quality; it delivers the ultimate cost benefit to ATA drives while delivering high reliability and high performance to ATA RAID-5.&lt;br /&gt;&lt;a href="http://www.sr5tech.com"&gt;www.sr5tech.com&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Serial ATA Characteristics&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Narrower Cabling&lt;br /&gt;Supports Lower Power Requirements&lt;br /&gt;Lower Pin Counts&lt;br /&gt;10-year Roadmap&lt;br /&gt;Higher Performance&lt;br /&gt;Improved Connectivity (No Master-Slave)&lt;br /&gt;Longer Cabling&lt;br /&gt;PC Economies of Scale&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RAID-5 Performance Limitations&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Multiple I/Os in Read-Modify-Write&lt;br /&gt;Parity Calculation Overhead&lt;br /&gt;Fixed Stripe Size&lt;br /&gt;RAID Group Initialization&lt;br /&gt;RAID Group Rebuilding&lt;br /&gt;&lt;br /&gt;Table 3&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;ATA RAID-5 Reliability Penalties&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Platform requires write-back-caching&lt;br /&gt;Data loss on power failure&lt;br /&gt;Data out-of-sequence failure&lt;br /&gt;Parity recalculation failure&lt;br /&gt;File system corruption&lt;br /&gt;NVRAM is not an economic answer&lt;br /&gt;Single drive failure problem&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RELATED ARTICLE:&lt;/strong&gt;&lt;br /&gt;RAID (n) Implements MAID Technology.&lt;br /&gt;Nobody talks about RAID storage anymore, yet virtually every computer system with two or more hard drives is running some form of RAID. The basic technology has not changed in over 10 years, but many versions of RAID have emerged, each with a myriad of options. What does this mean for the end user? As drive capacities double and systems grow in complexity, it is increasingly difficult for users to balance cost with usability, capacity, performance, and reliability.&lt;br /&gt;&lt;br /&gt;RAID (n) offers a solution. This new, patented technology out-performs RAID-5 and RAID-10 systems by providing cost-efficient high performance across all drives while allowing any number of random drive failures with no data loss. In addition to enhanced performance and fail-safe reliability, RAID (n) offers superior usability with the option to dynamically adjust usable capacity and/or insurance against data loss while the system is running.&lt;br /&gt;&lt;br /&gt;Roughly 80% of the RAID-5 arrays used in today's data centers have one or more "Hot Spares." These drives run all day while providing no direct benefit to the drive's associated array. The result is a potentially dangerous false sense of security. If a second drive fails, or if the wrong drive is removed accidentally while the array is rebuilding, all of the data contained on the system is lost. RAID (n) technology allows the user to set the insurance to 1 + (Hot Spare/s) without the need to purchase additional drives. As a result, the RAID (n) array can tolerate a higher total number of random simultaneous drive failures without data loss. As a further benefit, "Hot Spares" used in conjunction with RAID (n) add directly to the overall system performance.&lt;br /&gt;&lt;br /&gt;Users of RAID-10 (or RAID 1+0) systems, where performance and safety are of foremost concern, know that these systems have one serious drawback: the cost of inverse capacity. Any system, such as this one, that contains mirroring will require twice the number of hard drives to affect the same capacity of RAID-S minus one drive. So, as an example, a 2-Terabyte (TB) RAID-5 system using 181GB drives would cost $10,128, assuming each drive was $844.00-today's price for this size drive. That same 2TB system using RAID-10 would cost $18,568, but it would not provide any write performance benefits with the extra ten drives needed under RAID-10.&lt;br /&gt;&lt;br /&gt;With RAID (n), a 2TB array with one drive redundancy would cost the same as the RAID-5 system; however, if the user wanted the two-drive failure benefit of RAID-10, the cost would only increase by $844.00 or a total of $10,972 instead of the $18,568. Concurrently, three-drive redundancy would only cost $11,816. If, on the other hand, the RAID-10 system for $18,568 was already in place, the RAID (n) system could be implemented with three drives of insurance while providing 3.4TB of usable capacity.&lt;br /&gt;&lt;br /&gt;For companies looking for ways to decrease costs and increase usability, capacity, performance, and reliability, RAID (n) provides a cost-efficient solution. RAID (n) offers a means to improve the overall capacity of current RAID arrays while at the same time enhancing the overall performance and reliability of the total system.&lt;br /&gt;&lt;br /&gt;Kris Land is chief technology officer at InoStor Corp. (Poway, Calif.)&lt;br /&gt;www.inostor.com&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116053279156001862?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116053279156001862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116053279156001862' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053279156001862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053279156001862'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/from-computer-technology-review-date.html' title=''/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116053105773686213</id><published>2006-10-10T18:44:00.000-07:00</published><updated>2006-10-10T18:56:40.063-07:00</updated><title type='text'>Kris Land, Recent patents listed: 20050182992 - Method and apparatus for raid conversion</title><content type='html'>&lt;strong&gt;A general RAID conversion method&lt;/strong&gt; is described for converting between different RAID configurations. The method includes reading a unit of user data from the source devices according to the source RAID algorithm, writing the user data together with redundant data (if any) to the target devices according to the target RAID algorithm, and from time to time releasing portions of the source devices containing data that has been converted. The conversion may be used to expand or contract the array, to increase or decrease usable capacity, and to increase or decrease the device-loss insurance level. Conversion may be performed on line (dynamically) or off line. The flexibility of the method allows the implementation of manual and/or rule-based RAID reconfiguration that automatically adjusts system parameters based on user request and/or a set of rules and conditions respectively. It may also be used to perform self-healing after one or more devices in the array have failed. [0001] This application is related to U.S. Pat. No. 6,557,123, issued Apr. 29, 2003 and U.S. patent application Ser. No. 10/371,628, filed Feb. 20, 2003, both of which are incorporated by reference herein in their entirety. BACKGROUND OF THE INVENTION [0002] 1. Field of the Invention [0003] This invention relates to RAID (Redundant Array of Inexpensive (or Independent) Disks (or Devices)) systems, and in particular, to method and apparatus for converting between different species of RAID's and rule-based RAID reconfiguration. [0004] 2. Description of the Related Art [0005] RAID is a data storage system that provides a certain level of redundancy so that a certain number of disks (devices) of the disk (device) array may be lost without any loss of user data stored thereon. Various species of RAID systems are known, including RAID0, RAID1, RAID3 and RAID5 (known as standard RAID), and RAID2, RAID4 and RAID6 (known as non-standard RAID). Methods and apparatus that provide conversion or migration between different conventional RAID species have been described. For example, U.S. Pat. No. 6,275,898 describes converting from RAID5 to RAID1 (a contraction, or reduction of the usable capacity of the system, referred to as "promotion" in that patent) and converting from RAID1 to RAID5 (an expansion, or increase of the usable capacity of the system, referred to as "demotion" in that patent). The conversion must be done off line, i.e. the system cannot take user request while performing the conversion. In the context of this patent "RAID1" includes the compound RAID, which we call "RAID10". U.S. Pat. No. 6,154,853 describes a special case of an "even" conversion (where the usable capacity in the system is unchanged), by converting an n-disk RAID5 to a 2(n-1) disk RAID10 and back. U.S. Pat. No. 5,524,204 and U.S. Pat. No. 5,615,352 describe a method for expanding a RAID5 to a bigger RAID5 with a larger number of disks. The conversion may be accomplished without interrupting service, i.e. while the system is online. These two patents do not describe an array contraction. SUMMARY OF THE INVENTION [0006] Accordingly, the present invention is directed to a method and apparatus for RAID conversion that substantially obviates one or more of the problems due to limitations and disadvantages of the related art. [0007] An object of the present invention is to provide a flexible approach to RAID conversion and reconfiguration. [0008] Additional features and advantages of the invention will be set forth in the descriptions that follow and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings. [0009] To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, the present invention provides a method for RAID conversion in a redundant array of inexpensive devices (RAID) comprising a controller and a plurality of storage devices for storing user data, the controller storing a plurality of RAID algorithms to be implemented for writing data to and reading data from the storage devices, the method includes storing in the controller one or more rules for selecting a desired one of the plurality of RAID algorithms based on one or more conditions of the array; detecting the one or more conditions of the array; selecting the desired RAID algorithm based on the detected conditions and the stored rules; and when the desired RAID algorithm is different from the RAID algorithm currently implemented in the array, automatically converting the array from the currently implemented RAID algorithm to the desired RAID algorithm. [0010] In another aspect, the present invention provides a RAID system configured to carry out the above method steps. In yet another aspect, the invention provides a computer software product for implementing the above method steps in a RAID system. [0011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. BRIEF DESCRIPTION OF THE DRAWINGS [0012] FIGS. 1(a) and 1(b) are schematic diagrams showing a RAID system before and after an RAID conversion. [0013] FIG. 2 is a flow chart illustrating a method for RAID conversion. [0014] FIGS. 3(a) and 3(b) are a flow chart illustrating a method for off-line replication. [0015] FIGS. 4(a) and 4(b) are a flow chart illustrating a method for on-line conversion. [0016] FIG. 5 is a flow chart illustrating a rule-based RAID conversion method.&lt;br /&gt;DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0017] A new species of RAID, hereinafter referred to as "RAIDn", is described in commonly assigned U.S. Pat. No. 6,557,123, entitled "Data redundancy methods and apparatus", issued Apr. 29, 2003. U.S. Pat. No. 6,557,123 describes a data storage apparatus having a plurality of n disks, where data comprising a plurality of data groupings are stored respectively across the plurality of n disks. Each one of the n data groupings comprises a data portion and a data redundancy portion. Advantageously, the n data portions are recoverable from any and all combinations of n-m data grouping(s) on n-m disk(s) when the other m data grouping(s) are unavailable, where 1.ltoreq.m&lt;n. The disk storage apparatus may be configured for a parameter m which is selectable. In other words, the RAIDn method allows a user to select the level of redundancy (or "device-loss insurance") in the disk array. (For convenience, a notation "n:m" or "(n,m)" is used hereinafter to denote the parameters n and m in a RAIDn.) In particular, U.S. Pat. No. 6,557,123 describes a new family of codes, referred to as "Wiencko codes" (pronounced "WEN-SCO" codes), which also enables the RAIDn algorithms. A related method is described in U.S. patent application Ser. No. 10/371,628, filed Feb. 20, 2003, which is a continuation-in-part of U.S. Pat. No. 6,557,123. Application Ser. No. 10/371,628 describes method and apparatus for providing data recovery in a one or multiple disk loss situation using a set of codes similar to but different from the Wiencko codes. Further, an implementation method for RAIDn is described in U.S. patent application Ser. No. 10/361,446, filed Feb. 10.sup.th, 2003. The disclosures of the above three U.S. patents and patent applications are herein incorporated by reference in their entirety.&lt;br /&gt;[0018] As used in the present application, "RAIDn" is a RAID system according to the principles described in U.S. Pat. No. 6,557,123 and/or U.S. patent application Ser. No. 10/371,628, i.e., a RAID system where the level of redundancy is selectable or adjustable. "Conventional RAID", on the other hand, is used in the present application to refer to conventionally known RAID species such as RAID0, RAID1, RAID3, RAID5, RAID6, RAID2 and RAID4, and/or compound RAID's where any of the above RAID types are combined. "RAID" is used to generally refer to any RAID systems, including conventional RAID and RAIDn systems.&lt;br /&gt;[0019] Although the term disk is used in the present application, the method and apparatus are not limited to disks, but the RAID may comprise any type of suitable devices for data storage, including but not limited to magnetic disks, magnetic tapes, optical discs, memory, any block devices, servers, NAS (network attached servers) systems, JBOD's (Just a Bunch of Disks), clustered servers, etc. This application uses the term "disk", "drive" and "device" interchangeably, unless otherwise specified, without affecting the scope of the description. At least the term "device" should be understood to encompass all suitable storage devices including but not limited to those listed above.&lt;br /&gt;[0020] Embodiments of the present invention provide RAID conversion methods and apparatus for converting (or migrating) between a conventional RAID and a RAIDn system, and/or converting between two RAIDn systems. Other aspects of the invention include applications of rule-based RAID conversion where both RAID systems may be either a RAIDn or a conventional RAID. For convenience, the RAID system before a RAID convention is referred to as the source RAID and the RAID system after the conversion is referred to as the target RAID.&lt;br /&gt;[0021] According to embodiments of the present invention, the RAID conversion may be an expansion where the number of disks in the array increases, or a contraction where the number of disks in the array decreases. The conversion may either increase or decrease usable capacity, which is defined as the total capacity of the system usable for storing user data. The conversion may either increase or decrease the number of total disks in the array. The conversion may either increase or decrease device-loss insurance, which is defined as the maximum number of disks that may fail without the loss of user data. The conversion may translate between two different RAID/RAIDn species whose physical characteristics (i.e. Number of devices, device-loss and/or usable capacity) remain the same; this flexibility of the system allows implementation of rule-based RAID reconfiguration that automatically adjusts one or more system parameters based on a prescribed set of rules and conditions. In particular, RAID conversion may be used to perform self-healing after one or more devices in the array failed, in which situation the source array will be the remaining devices of the original RAID (from which all user data can be reconstructed), and the target array will be either a reconfigured RAID on the same remaining devices or an array that includes replacement devices for the failed devices. In addition, the conversion may be performed either in an on line fashion (i.e. dynamically), where the system will accept and process user I/O requests while performing the conversion, or in an off line fashion, where the system will not accept and process user I/O requests while performing the conversion.&lt;br /&gt;[0022] Referring now to FIG. 1(a), a RAID system includes an array of n1 storage devices 14-1, 14-2, . . . 14-i, . . . 14-n1 connected to a controller 12. A controller useful in embodiments of this invention can be either a physical "Hard Ware" device or a virtual "Software loadable module" managing the RAID functions. FIG. 1(b) shows the system after a RAID conversion, where the array now comprises an array of n2 devices 16-1, 16-2, . . . 16-j, . . . 16-n2. The controller 12, which preferably includes a processor or logic circuits, implements a plurality of RAID algorithms, controls the read and write operations of the devices 14-i or 16-j, and carries out the RAID conversion. The controller 12 is also connected to a host device via any suitable interface device (not shown), for receiving read and write requests from the host, and transmitting or receiving user data to or from the host. The invention does not impose any requirement on the physical identity of the source devices 14-i and target devices 16-j. When the source array and the target array share some of the same physical devices, RAID conversion involves reading data from portions of some devices (as source devices) and writing data to unused portions of the same physical devices (as target devices). When the source array and the target array are separate and distinct physical devices, the RAID conversion may be referred to as replication, and involves copying of user data from the source array to the target array which may be configured as a different RAID.&lt;br /&gt;[0023] RAID conversion methods according to embodiments of the present invention generally involves the following steps (FIG. 2): (1) reading a predefined amount of user data from the source devices according to the RAID algorithm implemented in the source RAID; (2) writing the user data together with redundant data (if any) to the target devices according to the RAID algorithm implemented in the target RAID; and (3) releasing portions of the source devices containing data that has been converted and making such portions available for use as target devices. The read step (1) includes, when appropriate, decoding the received data according to the source RAID algorithm to obtain user data. The write step (2) includes, when appropriate, calculating redundancy date from the user data according to the target RAID algorithm. The write step may include a step of verifying the data written onto the target RAID. During conversion, a watermark is maintained for the source array to indicate the conversion progress. This allows the read and write steps to be carried out for a unit of data at a time, so that user I/O requests can be handled during conversion. The read and write steps are repeated until all data is converted. The capacity release step (3) may be carried out from time to time or when necessary, depending on the amount of unused capacity in the physical device.&lt;br /&gt;[0024] In the write step, the data may be optionally written to a scratch area to avoid "write holes". A known problem in RAID systems, "write holes" refer to possible interruptions of multi-step sequences that may cause data integrity problems. For example, during writing of a data stripe across a RAID5 array, data may be lost if a power failure occurs before sufficient data has been written to enable recovery of the entire stripe. Writing updates to a scratch area substantially eliminates the write hole problem.&lt;br /&gt;[0025] FIGS. 3(a) and 3(b) illustrate an off-line replication method, and FIGS. 4(a) and 4(b) illustrate an on-line conversion method. Both methods are specific examples of the more general method described in FIG. 2.&lt;br /&gt;[0026] The conversion method according to embodiments of the present invention is described in more detail below using a specific example. In this example, it is assumed that the number of bytes in any data chuck is a power of 2. (Generally, the data chunks, chunk sizes, chunk boundaries and byte offsets may be of any defined values and the present invention is not limited to the specifics of this example given here.) When chunk sizes are not fixed, it is assumed that a larger size chunk always starts on chunk boundary of any smaller size chunk. In fact, absolute byte offset of chunk start is a multiple of chunk size. It is also assumed that virtual stripes start at a multiple of their size in absolute byte offset.&lt;br /&gt;[0027] Any virtual stripe size is an integer multiple of a chunk size, and therefore any two abstract RAID's (conventional RAID or RAIDn), have a least common multiple which is an exact integer multiple of both their chunk sizes. Watermarks at absolute byte offsets equal to integer multiples of this least common multiple are used as virtual stripe boundaries for both abstract RAID's. These are referred to herein as "shared stripe boundaries". For example, a virtual stripe on a 9:2 RAIDn is 63 chunks, while a virtual stripe on a 9-disk RAID5 is 8 chunks. The least common multiple will be 504 chunks, or about 2 megabytes with 4 Kbyte chunks. Conversion is preferably carried out in units of virtual stripes, as follows.&lt;br /&gt;[0028] First, a subset of possible shared stripe boundaries is defined as "step watermarks". The step watermarks should be spaced so that full conversion between neighboring step watermarks takes a desired amount of time, such as on the order of {fraction (1/10)} second, or less. The controller 12 alternates (e.g. on the order of once a second) between a converting state and a user I/O state. When entering the converting state, the controller flushes all pending user requests to the array, with the cooperation of the upper level driver connected to the controller 12, so that no I/O to this array is issued while the state remains converting. Preferably, the upper level driver either sends a pause, which will not return until, or the driver can queue user requests until, the entire conversion to the next step watermark is completed. The controller then converts the data from the source array to the next step watermark. The new watermark is stored in the controller, the controller flushes watermark data and the controller enters the user I/O state. During user I/O state, normal user I/O takes place to the array with the watermark fixed at its new location. Since the watermark location indicates which portions of the data has been converted and hence exist on the target RAID, and which portions of the data have not yet been converted and hence exist on the source RAID, user read requests can be handled appropriately by reading data from either the source RAID or the target RAID. User write requests are preferably handled by writing data onto the target array with an appropriate watermark indicating the boundary of such data. The above steps are repeated until all the data is converted.&lt;br /&gt;[0029] The above-described method may involve small pauses in data availability to the users, but is relatively easy to implement. Alternatively, if smoother data availability is to be maintained during conversion, a moving RAID0, RAID1, RAID10, or some other RAID section embracing at least two steps may be implemented, preferably on a separate storage device such as a solid-state disk or battery backed memory. By placing an intermediary RAID device and/or cache between new user I/O and the target array during the step watermark I/O operation substantially eliminates all potential user I/O pauses. Additionally this would eliminate "write holes" even if there are pauses.&lt;br /&gt;[0030] The RAID conversion method described above may be applied where the source and target RAID's may be any species of RAID, including conventional RAID's and RAIDn with any desirable n:m parameters. As a result, the RAID conversion method is flexible and general in that it can implement a contraction as well as an expansion, with increased or decreased usable capacity and increased or decreased device-loss insurance. Further, conversion may be carried out either on-line (dynamically) or off-line. This flexibility allows practical applications for reconfiguring RAID systems not offered by conventional conversion methods. One category of such applications is rule-based RAID reconfiguration. Rule-based reconfiguration may be implemented by storing a set of rules in the controller (or in an upper level user application), which causes automatic conversion (reconfiguration) of the RAID system when certain conditions are met (FIG. 5). Some examples of rule-based RAID conversion include:&lt;br /&gt;[0031] Capacity utilization-based rules. Device-loss insurance level may be automatically adjusted, between a minimum and a maximum level set by the user, based on capacity utilization (i.e. amount of total device capacity that is utilized by user data). For example, a 20-drive array may be set to have a maximum insurance level of 5 disks and a minimum insurance level of 2 disks. If the utilization of available capacity of the array is at or below 50%, the RAID is configured as 20:5; if the capacity utilization is between 50% and 60%, the RAID is configured as 20:4; etc. Additionally idle drives can be added to maintain both capacity and insurance by using a predetermined number of idle drives and/or idle drives know as Global spares.&lt;br /&gt;[0032] Performance requirement-based rules. Different species of RAID's have different performance in terms of read and write speeds. For example, RAID0 had the fastest performance for both reads and writes but no safety. The level of device-loss insurance in RAIDn affects write performance to a certain degree and affects read performance to a lesser degree. A rule may be defined to increase or decrease the insurance level based on performance requirements. If, for example, from RAID0 each one disk of insurance increase results in a write penalty of 10%, and if a performance level of 60% of the maximum performance is acceptable, then the device-loss insurance may be set as high as 4. The RAID may be automatically reconfigured when the performance requirement changes.&lt;br /&gt;[0033] Self-healing fixed insurance. Rules may be set up so that the RAID will automatically add devices and/or borrow usable capacity from the array to maintain a certain level of device-loss insurance. For example, if an insurance level of 3 is always to be maintained, and one device in a 9-device array fails, the remaining 8 devices may be reconfigured into an 8:3 RAIDn (assuming total capacity is adequate). Alternatively, if a spare device is available, it may be added to the 8 remaining devices and reconfigured into a 9:3 RAID.&lt;br /&gt;[0034] Self-healing minimal insurance. A RAID system may be supplied by a supplier and set to an initial high level of insurance. As devices fail, self-healing is performed to reconfigure the remaining devices, until a minimal insurance threshold is reached which triggers a maintenance call. This may be especially useful when a preventive maintenance contract is in place as it reduces the number of maintenance calls to the user site, and/or allows maintenance to be performed at a desired time during a window instead of at each device failure.&lt;br /&gt;[0035] Data criticality-based rules. Device-loss insurance level may be automatically adjusted, between a minimum and a maximum level set by the user, based on the importance of the user data. Such rule-based settings will dynamically change from higher insurance (for more important data) to lower insurance (for less important data) and vice versa. Data criticality may be measured or defined by any suitable methods such as the class of user, the use of directories that are designated at higher insurance levels, files marked with higher priorities etc.&lt;br /&gt;[0036] Data recency and repetition-based rules. Device-loss insurance level may be automatically adjusted, between a minimum and a maximum level set by the user, based on recency and repetition (R&amp;R) of the user data. Such rule-based setting will dynamically change from higher insurance (for higher R&amp;amp;R) to lower insurance (for lower R&amp;R) and vice versa. R&amp;amp;R may be measured or defined by any suitable methods such as the number of files R/W over a period of time and/or the number of accesses of one or more files over a period of time.&lt;br /&gt;[0037] Device vulnerability-based rules. Device-loss insurance level may be automatically adjusted, between a minimum and a maximum level set by the user, based on the device type, vulnerability of the type of device, and/or location of the user data (for example. the location of user data may be in remote locations such as mobile offices, home offices, remote offices etc., or a managed data center). Such rule-based settings will dynamically change from higher insurance (for more vulnerable devices) to lower insurance (for less vulnerable devices) and vice versa. [0038] In the above rule-based RAID conversion methods, each of the source and target RAIDs may be a conventional RAID or a RAIDn.&lt;br /&gt;[0039] It will be apparent to those skilled in the art that various modification and variations can be made in the RAID conversion methods and apparatus of the present invention without departing from the spirit or scope of the invention. For examples, although a set of possible rules are described, the invention is not limited to these rules and any suitable rules may be used. Thus, it is intended that the present invention cover modifications and variations that come within the scope of the appended claims and their equivalents.What is claimed is: 1. In a redundant array of inexpensive devices (RAID) comprising a controller and a plurality of storage devices for storing user data, the controller storing a plurality of RAID algorithms to be implemented for writing data to and reading data from the storage devices, a method for RAID conversion comprising: storing in the controller one or more rules for selecting a desired one of the plurality of RAID algorithms based on one or more conditions of the array; detecting the one or more conditions of the array; selecting the desired RAID algorithm based on the detected conditions and the stored rules; and when the desired RAID algorithm is different from the RAID algorithm currently implemented in the array, converting the array from the currently implemented RAID algorithm to the desired RAID algorithm. 2. The method of claim 1, wherein the converting step comprises: (a) reading a unit of user data from the storage devices according to the currently implemented RAID algorithm; (b) defining a watermark indicating the position where the data is read from the current RAID; and (c) writing user data on appropriate storage devices according to the desired RAID algorithm. 3. The method of claim 2, further comprising: alternating between performing steps (a), (b) and (c), and processing user I/O requests. 4. The method of claim 1, wherein the desired RAID has fewer storage devices storing user data than the currently implemented RAID. 5. The method of claim 1, wherein the desired RAID has more storage devices storing user data than the currently implemented RAID. 6. The method of claim 1, wherein the conditions include the current capacity utilization of the array. 7. The method of claim 1, wherein the conditions include a performance requirement. 8. The method of claim 1, wherein the conditions include a change in the number of available storage devices in the array. 9. The method of claim 1, wherein the conditions include a decrease in the number of available storage devices in the array. 10. The method of claim 1, wherein the conditions include an increase in the number of available storage devices in the array. 11. The method of claim 1, wherein the conditions include a measure of data criticality of the user data. 12. The method of claim 1, wherein the conditions include a measure of recency and repetition of the user data. 13. The method of claim 1, wherein the conditions include a measure of vulnerability of the storage devices. 14. The method of claim 1, wherein the converting step is performed on line. 15. The method of claim 1, wherein the converting step is performed off line. 16. The method of claim 1, wherein at least some of the RAID algorithms stored in the controller are characterized by a number of storage devices in the array (n), and a device-loss insurance level (m) such that when up to m devices of the array are unavailable, user data is fully recoverable from the remaining n-m devices, where 1.ltoreq.m&lt;n, and wherein the selecting step determines desired n and m values based on the detected conditions and the stored rules.&lt;br /&gt;17. The method of claim 16, wherein the device-loss insurance level of the desired RAID is greater than the device-loss insurance level of the currently implemented RAID.&lt;br /&gt;18. The method of claim 16, wherein the device-loss insurance level of the desired RAID is less than the device-loss insurance level of the currently implemented RAID.&lt;br /&gt;19. The method of claim 16, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has the same device-loss insurance level as the currently implemented RAID.&lt;br /&gt;20. The method of claim 16, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has a lower device-loss insurance level than the currently implemented RAID.&lt;br /&gt;21. The method of claim 16, wherein the rules define a maximum device-loss insurance level and a minimum device-loss insurance level for a given n value, and one or more conditions based on which a desired device-loss insurance level is determined, the desired device-loss insurance level falling between the maximum and minimum device-loss insurance levels.&lt;br /&gt;22. In a redundant array of inexpensive devices (RAID) comprising a controller and a plurality of storage devices for storing user data, the controller storing a plurality of RAID algorithms to be implemented for writing data to and reading data from the storage devices, wherein at least some of the RAID algorithms are characterized by a number of storage devices in the array (n), and a device-loss insurance level (m) such that when up to m devices of the array are unavailable, user data is fully recoverable from the remaining n-m devices, where 1.ltoreq.m&lt;n, a RAID conversion method comprising: implementing a first RAID algorithm on the array; selecting a second RAID algorithm characterized by a number of storage devices n2 and a device-loss insurance level m2, n2 and m2 being selectable; and converting the array from the first RAID algorithm to the second RAID algorithm, the converting step comprising: (a) reading a unit of user data from the storage devices according to the first RAID algorithm; (b) defining a watermark indicating the position where the data is read from the first RAID; and (c) writing user data on appropriate storage devices according to the second RAID algorithm.&lt;br /&gt;23. The method of claim 22, wherein the writing step includes writing updates to a semi-permanent cache.&lt;br /&gt;24. A redundant array of inexpensive devices (RAID) system comprising: a plurality of n storage devices for storing user data thereon; and a controller connected to the storage devices for controlling writing and reading data to and from the storage devices according to a RAID algorithm, the controller storing a plurality of RAID algorithms to be implemented for writing data to and reading data from the storage devices, the controller further storing one or more rules for selecting a desired one of the plurality of RAID algorithms based on one or more conditions of the array, the controller having stored program instructions or a logic circuit operable to detect the one or more conditions of the array, to select the desired RAID algorithm based on the detected conditions and the stored rules, and when the desired RAID algorithm is different from the RAID algorithm currently implemented in the array, to convert the array from the currently implemented RAID algorithm to the desired RAID algorithm.&lt;br /&gt;25. The system of claim 24, wherein the controller has stored program instructions or a logic circuit operable to convert the array by: (a) reading a unit of user data from the storage devices according to the currently implemented RAID algorithm, (b) defining a watermark indicating the position where the data is read from the current RAID, and (c) writing user data on appropriate storage devices according to the desired RAID algorithm.&lt;br /&gt;26. The system of claim 25, wherein the controller has stored program instructions or a logic circuit operable to alternate between performing steps (a), (b) and (c), and processing user I/O requests.&lt;br /&gt;27. The system of claim 24, wherein the desired RAID has fewer storage devices storing user data than the currently implemented RAID.&lt;br /&gt;28. The system of claim 24, wherein the desired RAID has more storage devices storing user data than the currently implemented RAID.&lt;br /&gt;29. The system of claim 24, wherein the conditions include the current capacity utilization of the array.&lt;br /&gt;30. The system of claim 24, wherein the conditions include a performance requirement.&lt;br /&gt;31. The system of claim 24, wherein the conditions include a change in the number of available storage devices in the array.&lt;br /&gt;32. The system of claim 24, wherein the conditions include a decrease in the number of available storage devices in the array.&lt;br /&gt;33. The system of claim 24, wherein the conditions include an increase in the number of available storage devices in the array.&lt;br /&gt;34. The system of claim 24, wherein the conditions include a measure of data criticality of the user data.&lt;br /&gt;35. The system of claim 24, wherein the conditions include a measure of recency and repetition of the user data.&lt;br /&gt;36. The system of claim 24, wherein the conditions include a measure of vulnerability of the storage devices.&lt;br /&gt;37. The system of claim 24, wherein the converting step is performed on line.&lt;br /&gt;38. The system of claim 24, wherein the converting step is performed off line.&lt;br /&gt;39. The system of claim 24, wherein at least some of the RAID algorithms stored in the controller are characterized by a number of storage devices in the array (n), and a device-loss insurance level (m) such that when up to m devices of the array are unavailable, user data is fully recoverable from the remaining n-m devices, where 1.ltoreq.m&lt;n, and wherein the selecting step determines desired n and m values based on the detected conditions and the stored rules.&lt;br /&gt;40. The system of claim 39, wherein the device-loss insurance level of the desired RAID is greater than the device-loss insurance level of the currently implemented RAID.&lt;br /&gt;41. The system of claim 39, wherein the device-loss insurance level of the desired RAID is less than the device-loss insurance level of the currently implemented RAID.&lt;br /&gt;42. The system of claim 39, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has the same device-loss insurance level as the currently implemented RAID.&lt;br /&gt;43. The system of claim 39, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has a lower device-loss insurance level than the currently implemented RAID.&lt;br /&gt;44. The system of claim 39, wherein the rules define a maximum device-loss insurance level and a minimum device-loss insurance level for a given n value, and one or more conditions based on which a desired device-loss insurance level is determined, the desired device-loss insurance level falling between the maximum and minimum device-loss insurance levels.&lt;br /&gt;45. A computer program product comprising a computer usable medium having a computer readable code embodied therein for controlling a redundant array of inexpensive devices (RAID), the RAID comprising a controller and a plurality of storage devices for storing user data, the controller storing a plurality of RAID algorithms to be implemented for writing data to and reading data from the storage devices, the computer program product comprising: first computer readable program code configured to cause the controller to storing one or more rules for selecting a desired one of the plurality of RAID algorithms based on one or more conditions of the array; second computer readable program code configured to cause the controller to detect the one or more conditions of the array; third computer readable program code configured to cause the controller to select the desired RAID algorithm based on the detected conditions and the stored rules; and fourth computer readable program code configured to cause the controller to, when the desired RAID algorithm is different from the RAID algorithm currently implemented in the array, convert the array from the currently implemented RAID algorithm to the desired RAID algorithm.&lt;br /&gt;46. The computer program product of claim 45, wherein the fourth computer readable program code comprises: fifth computer readable program code configured to cause the controller to read a unit of user data from the storage devices according to the currently implemented RAID algorithm; sixth computer readable program code configured to cause the controller to define a watermark indicating the position where the data is read from the current RAID; and seventh computer readable program code configured to cause the controller to write user data on appropriate storage devices according to the desired RAID algorithm.&lt;br /&gt;47. The computer program product of claim 46, further comprising seventh computer readable program code configured to cause the controller to process user 1/0 requests; and eighth computer readable program code configured to cause the controller to alternate between executing the fifth, sixth and seventh program codes and executing the seventh program code. 48. The computer program product of claim 45, wherein the desired RAID has fewer storage devices storing user data than the currently implemented RAID.&lt;br /&gt;49. The computer program product of claim 45, wherein the desired RAID has more storage devices storing user data than the currently implemented RAID.&lt;br /&gt;50. The computer program product of claim 45, wherein the conditions include the current capacity utilization of the array.&lt;br /&gt;51. The computer program product of claim 45, wherein the conditions include a performance requirement.&lt;br /&gt;52. The computer program product of claim 45, wherein the conditions include a change in the number of available storage devices in the array.&lt;br /&gt;53. The computer program product of claim 45, wherein the conditions include a decrease in the number of available storage devices in the array.&lt;br /&gt;54. The computer program product of claim 45, wherein the conditions include an increase in the number of available storage devices in the array.&lt;br /&gt;55. The computer program product of claim 45, wherein the conditions include a measure of data criticality of the user data.&lt;br /&gt;56. The computer program product of claim 45, wherein the conditions include a measure of recency and repetition of the user data.&lt;br /&gt;57. The computer program product of claim 45, wherein the conditions include a measure of vulnerability of the storage devices.&lt;br /&gt;58. The computer program product of claim 45, wherein the converting step is performed on line.&lt;br /&gt;59. The computer program product of claim 45, wherein the converting step is performed off line.&lt;br /&gt;60. The computer program product of claim 45, wherein at least some of the RAID algorithms stored in the controller are characterized by a number of storage devices in the array (n), and a device-loss insurance level (m) such that when up to m devices of the array are unavailable, user data is fully recoverable from the remaining n-m devices, where 1.ltoreq.m&lt;n, and wherein the selecting step determines desired n and m values based on the detected conditions and the stored rules.&lt;br /&gt;61. The computer program product of claim 60, wherein the device-loss insurance level of the desired RAID is greater than the device-loss insurance level of the currently implemented RAID. 62. The computer program product of claim 60, wherein the device-loss insurance level of the desired RAID is less than the device-loss insurance level of the currently implemented RAID.&lt;br /&gt;63. The computer program product of claim 60, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has the same device-loss insurance level as the currently implemented RAID.&lt;br /&gt;64. The computer program product of claim 60, wherein the condition is a decrease in the number of available storage devices in the array and the desired RAID after conversion has a lower device-loss insurance level than the currently implemented RAID.&lt;br /&gt;65. The computer program product of claim 60, wherein the rules define a maximum device-loss insurance level and a minimum device-loss insurance level for a given n value, and one or more conditions based on which a desired device-loss insurance level is determined, the desired device-loss insurance level falling between the maximum and minimum device-loss insurance levels.&lt;br /&gt;&lt;br /&gt;Browse Industry: &lt;a href="http://www.freshpatents.com/Error-detection-correction-and-fault-detection-recovery-dtnewntc714.php"&gt;USPTO Class 714&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116053105773686213?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116053105773686213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116053105773686213' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053105773686213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053105773686213'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/kris-land-recent-patents-listed_10.html' title='Kris Land, Recent patents listed: 20050182992 - Method and apparatus for raid conversion'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116053025350687966</id><published>2006-10-10T18:20:00.000-07:00</published><updated>2006-10-10T18:30:53.596-07:00</updated><title type='text'>Kris Land, Recent patents listed: 20050097388 - Data distributor</title><content type='html'>&lt;strong&gt;A data distribution system&lt;/strong&gt; that includes a plurality of access ports (APs) connected by one or more crossbar switches. Each crossbar switch has a plurality of serial connections, and is dynamically configurable to form connection joins between serial connections. Each AP has one or more serial connections, a processor, memory, and a bus. A first subset of the APs are host and/or peripheral device APs which further include host and/or peripheral device adapters for connecting to hosts and/or peripheral devices. A second subset of the APs are CPU-only APs that are not connected to a host or peripheral device but perform data processing functions. The data distributor system accomplishes efficient data distribution by eliminating a central CPU that essentially processes every byte-of data passing through the system. The data distributor system can be implemented in a storage system such as a RAID system.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;BACKGROUND OF THE INVENTION&lt;/strong&gt;&lt;br /&gt;[0001] 1. Field of the Invention&lt;br /&gt;[0002] This invention relates to data distribution, and in particular, to data distribution in a data storage system or other distributed data handling systems.&lt;br /&gt;[0003] 2. Description of the Related Art&lt;br /&gt;[0004] In peer-to-peer and mass storage development, data throughput has been a limiting factor, especially in applications such as movie downloads, pre and post film editing, virtualization of streaming media and other applications where large amounts of data must be moved on and off storage systems. One cause of this limitation is that in current systems, every byte of data passing through is handled by a central CPU, internal system buses and the associated main memory. In the following description, a RAID (Redundant Array of Inexpensive Disks) is used as an example of a data storage system, but the analysis is applicable to other systems. A general description of RAID and a description of specific species of RAID, referred to as RAIDn here, may be found in U.S. Pat. No. 6,557,123, issued Apr. 29, 2003 and assigned to the assignee of the present application.&lt;br /&gt;[0005] FIG. 5 is an exemplary configuration for a conventional hardware RAID system. One or more hosts 502 and one or more disks 504 are connected to an EPCI bus or buses 508 (or other suitable bus or buses) via host bus adapters (HBAs) 506. The hosts may be CPUs or other computer systems. The EPCI bus 508 is connected to a EPCI bridge 510 (or other suitable bridge). A CPU 514 and a RAM 516 are connected via a front side bus 512 to the EPCI bridge 510. In a RAID system, the CPU 514 and the RAM 516 are typically local to the RAID hardware. In a RAID write operation, data flows from a host 502 to the CPU 514 via the HBA 506, the EPCI bus or buses 508, the EPCI bridge 510, and the front side bus 512; and flows back from the CPU to a disks 504 via the front side bus 512, the EPCI bridge 510, the EPCI bus 508, and the HBA 506. Data flows in the opposite direction in a read operation. All data processing, including RAID encoding and decoding, is handled by the CPU 514. SUMMARY OF THE INVENTION&lt;br /&gt;[0006] The present invention is directed to a data distribution system and method that substantially obviates one or more of the problems due to limitations and disadvantages of the related art.&lt;br /&gt;[0007] An object of the present invention is to provide a data distribution system that is capable of moving large amounts of data among multiple hosts and devices efficiently by using a scheme of destination control and calculation.&lt;br /&gt;[0008] Additional features and advantages of the invention will be set forth in the descriptions that follow and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.&lt;br /&gt;[0009] To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, the present invention provides a data distribution system, which includes one or more crossbar switches and a plurality of access ports. Each crossbar switch has a plurality of serial connections, and is dynamically configurable to form connection joins between serial connections to direct serial transmissions from one or more incoming serial connections to one or more outgoing serial connections. Each access port has one or more serial connections for connecting to one or more crossbar switches, a processor, memory, and an internal bus. Each of a first subset of the plurality of access ports further includes one or more host adapters and/or peripheral device adapters for connecting to one or more hosts and/or peripheral devices, and each of the first subset of access ports is connected to at least one crossbar switch. Each of a second subset of the plurality of access ports has one or more input serial connections and one or more output serial connections connected to one or more crossbar switches, and is adapted to perform data processing functions.&lt;br /&gt;[0010] Optionally, one of the crossbar switches is a control crossbar switch connected to all of the plurality of access ports for transmitting control signals among the plurality of access ports, and one of the plurality of access ports is an allocator CPU access port which is connected to the control crossbar switch via a serial connection, the allocator CPU access port being operable to control the other access ports to direct data transmissions between the other access ports connected via crossbar switches.&lt;br /&gt;[0011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. BRIEF DESCRIPTION OF THE DRAWINGS&lt;br /&gt;[0012] FIG. 1 is a schematic diagram of the basic configuration of a data distribution system according to an embodiment of the present invention.&lt;br /&gt;[0013] FIGS. 2(a)-2(f) schematically illustrate the structure of a data distributor according to embodiments of the present invention.&lt;br /&gt;[0014] FIG. 3 shows an access port.&lt;br /&gt;[0015] FIGS. 4(a) and 4(b) show a data crossbar with connections and join patterns for data write.&lt;br /&gt;[0016] FIG. 5 shows a system configuration for a conventional hardware RAID. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS&lt;br /&gt;[0017] In the following description, a RAID (Redundant Array of Inexpensive Disks) system is used as an example of a data storage system, but the invention may be applied to other systems, such as storage networking, storage pooling, storage virtualization and management, distributed storage, data pathways, data switches and other applications where using multicast and broadcast with this invention art allows for a highly efficient method of moving data.&lt;br /&gt;[0018] FIG. 1 is a schematic diagram illustrating an overview of the basic configuration of a data distribution system according to an embodiment of the present invention. The basic design of the data distribution system includes four sets of parallel components interconnected to one another. Specifically, the system includes a plurality of crossbar switches (XBAR) 102 which connect a plurality of hosts 104, a plurality of peripheral devices 106, and a plurality of processors 108 together. When two or more crossbars 102 are present, each of the crossbars is preferably connected to each of the hosts 104, to each of the data storage devices 106, and to each of the processors 108. The arrows of the connection lines in FIG. 1 indicate the direction of data movement for a data write operation of a storage device (such as RAID); arrows in opposite directions would apply to a data read operation.&lt;br /&gt;[0019] A host 104 is typically a local or remote computer capable of independent action as a master, and may include system threads, higher nested RAID or network components, etc. The plurality of hosts 104 make their demands in parallel and the timing of their demands is an external input to the data distribution system, not controlled by the rest of the system. A queuing mechanism is operated by a processor which may be a specialized one of the processors 108. Such queuing does not involve mass data passing, but only requests passing. A peripheral device 106 is typically a local or remote device capable of receiving or sending data under external control, and may be data storage devices (disks, tapes, network nodes) or any other devices depending on the specific system. The processors 108 may be microprocessors or standard CPUs, or specialized hardware such as wide XOR hardware. They perform required data processing functions for the data distribution system, including RAID encoding and decoding, data encryption and decryption or any related compression and decompression or redundancy algorithms that may relate to mass storage or distributed networks, etc. As described earlier, optionally, one or more processors 108 may be specialized in control functions and control the data flow and the operation of the entire system including other processors. Control of data flow will be described in more detail later with reference to FIGS. 2-4. The arrows in FIG. 1 indicate the basic way data would move for a write operation. Data movement would be different for a read or other operations, as will be describe in more detail with reference to FIGS. 2-4.&lt;br /&gt;[0020] By using the crossbars 102 to connect the other components 104, 106 and 108, each peripheral device 106 may serve any math processor 108 and any host 104, and each data math processor 108 may serve any host 104 and any data storage devices 106. In addition, the multiple processors 108 may share among themselves the tasks required by heavy demand from the hosts. Data may flow directly between the peripheral devices 106 and the hosts 104, or through the processors 108, depending on the need of the data distribution scheme.&lt;br /&gt;[0021] FIG. 2(a) shows a more specific example of a data distributor according to an embodiment of the present invention. As shown in FIG. 2(a), one or more hosts 202 are connected to one or more host access ports (APs) 206 via host connections 204, which may be a SCSI, Fibre Channel, HIPPI, Ethernet, one or more T1's or greater or other suitable types of connections. One or more peripheral devices (such as storage disks, other storage devices or block devices) 208 are connected to one or more peripheral device APs 212 via standard drive buses 210, such as SCSI buses, Fibre Channel, ATA, SATA, Ethernet, HIPPI, Serial SCSI and any other physical transport layer, or other suitable types of connections. The host APs 206, the peripheral device APs 212, and one or more CPU-only APs 220 are connected to crossbar switches (data XBARs) 214. The APs 206, 212 and 220 may optionally be connected to a crossbar switch (control XBAR) 218, which is also connected to an allocator CPU AP 222. All connections between a crossbar and an AP are fast serial connections 216. The host APs 206, peripheral device APs 212 and CPU-only APs 220 are connected to the allocator CPU AP 222 by interrupt lines 224.&lt;br /&gt;[0022] To avoid overcrowding the drawings, only one component of each kind is shown in FIG. 2(a), and the labels "*1", "*2", etc. designate the number of the corresponding component present in the system (when not indicated, the number of components is one). In addition, each illustrated connection line (e.g. the host connections 204, the standard drive buses 210, the serial connections 216 and the interrupt lines 224) represents a group of connections, each connecting one device at one end of the line with one, device at the other end of the line. For example, six (*6) peripheral devices 208 are connected to three (*3) peripheral device APs 212 via six (*6) standard drive buses 210. As another example, two (*2) data crossbars 214 are present, and each data crossbar is connected to each of the host APs 206, each of the peripheral device APs 212 and each of the CPU-only APs 220. Accordingly, six (*6) serial connections 216 are present between three (*3) peripheral device APs 212 and two (*2) data crossbars 214. Of course, the numbers of components shown in FIG. 2(a) are merely illustrative, and other numbers of components may be used in a data distributor system. For example, multiple (two) data crossbars 214 are shown in FIG. 2(a), but the data distributor may be implemented with a single crossbar (so long as the total number of required connection joins do not exceed the maximum for a crossbar). The system shown in FIG. 2(a) is a complex example of a data distributor, and not all components shown here are necessarily present in a data distributor, as will become clear later.&lt;br /&gt;[0023] In the data distributor of FIG. 2(a), both data and control signals are transmitted through the nodes (the host APs 206, peripheral device APs 212, and CPU-only APs 220). Typically, control signals or commands refer to signals that reprogram the access ports or affect their actions through program branching other than by transmission monitoring, transmission volume counting or transmission error detection. Data typically refers to signals, often clocked in large blocks, that are transmitted and received under the control of the programs in the hosts, peripheral devices, and access ports, without changing those programs or affecting them except through transmission monitoring, transmission volume counting and transmission error detection.&lt;br /&gt;[0024] In operation, data flows between nodes as directed by the paths in data crossbars 214. The allocator CPU AP 222, which is the master of the data distributor 200, controls the APs 206, 212 and 220 by transmitting control commands to these APs through the control crossbar 218, and receiving interrupt signals from the APs via the interrupt lines 224. The allocator CPU AP 222, under boot load or program control, transmits commands to other APs, receives interrupt or control signals from other APs as well as from hosts, peripheral devices, or other components (not shown) of the network system such as clocks, and synchronizes and controls the actions of all of these devices. The data crossbar 214 is controlled in-band by any sending AP, which is accomplished by preloading the data stream from the sending AP with crossbar in-path commands. For example, a data stream originating from the Host AP 206 may contain a command header in the data stream being sent to the Data Xbar 214 that instructs the Data Xbar 214 to "multi-cast" the data stream to a plurality of peripheral AP's 212. The Host AP may receive its instructions from the Allocator CPU AP 222. The receiving peripheral AP's 212 may receive instructions from the Allocator CPU AP on what to do with the data received from the data XBAR 214.&lt;br /&gt;[0025] The structure of an access port (AP) is schematically illustrated in FIG. 3. The AP 300 typically include a bus 302, one or more CPU 304, CPU RAM 314 (such as 128 MB of 70 ns DRAM), RAM 306 (such as a 256K to 512 k of fast column RAM), one or more interrupt lines 308, and one or more serial connections 310. The bus 302 may have a typical speed of 533 MB/sec or higher. The serial connections may be fast serial connections matched to the crossbar to which the AP is connected, and capable of communicating data and/or control commands in either direction. The AP 300 optionally contains a ROM (not shown) for an allocator or other coding applications. The AP 300 may also contain an IO adapter 312 capable of connecting to one or more hosts or peripheral devices, via SCSI, ATA, Ethernet, HIPPI, Fibre Channel, or any other useful hardware protocol. The IO adapter provides both physical transmission exchanges and transmission protocol management and translation. Because of the presence of the CPU RAM 314, the AP 300 is capable of running a program, accepting commands via a serial connection 310, sending notification via an interrupt line 308, etc. By using RAM 306, the AP 300 is capable of buffering data over delays caused by task granularity, switch delays, and irregular burdening. Of course, a single RAM or other RAM combination may be used in lieu of the CPU RAM 314 and the RAM 306 shown in FIG. 3 as the processor's internal memory; preferably, such RAM or RAM combination should allow the AP to perform the above described functions at a satisfactory speed. The AP 300, under program control, is capable of creating, altering, combining, deleting or otherwise processing data by programmed computation, in addition to transmitting or receiving data to or from hosts and peripheral devices. The programming on the APs is preferably multitasked to run control code in parallel with data transmissions.&lt;br /&gt;[0026] Depending on the presence or absence of the adapter and, if present, the type of the adapter, an APs may be (1) a peripheral device AP (such as devices 212 in FIG. 2(a)) adapted for connection to one or more peripheral devices, (2) a host AP (such as devices 206 in FIG. 2(a)) adapted for connection to one or more hosts, (3) a CPU-only AP (such as device 220 or 222 shown in FIG. 2(a)) lacking an adapter, or other suitable types of APs. For example, the adapter in a peripheral device AP 212 may be a SCSI HBA, and the adapter in a host AP 206 may be an HIPPI adapter. Alternatively, a single adapter suitable for connection to either hosts or peripheral devices may be used, and the AP may thus be a host/peripheral device AP. A host AP and peripheral device AP may be one-sided at any given time from the adapter and data serial line point of view. "One-sided" refers to certain interface types where the interface requires an initiator and a target that behave differently from each other; "one-sided" does not mean that data flows in one direction. However both the host AP and the peripheral device AP is preferably capable of transmission in both directions over their lifetime. Preferably, the programming for a host AP 206 causes it to look like a slave (such as a SCSI "target") to the host(s) 202 on its adapter, while the programming for a peripheral device AP 212 causes it to look like a master (such as a SCSI "initiator") to the peripheral device(s) 208 on its adapter. A host/peripheral device AP switches between these two rolls. The slave/master distinction is independent of which direction the data flows.&lt;br /&gt;[0027] A CPU-only AP lacks a host or peripheral device adapter, and is typically used for heavy computational tasks such as those imposed by data compression and decompression, encryption and decryption, and RAID encoding and decoding. A CPU-only AP typically requires two serial connections 310, i.e., both input and output serial data connections simultaneously.&lt;br /&gt;[0028] A special case of a CPU-only AP is the Allocator CPU AP (device 222 in FIG. 2(a)). Unlike other APs, which each has an output interrupt line, the Allocator CPU AP has several input interrupt lines. Also unlike other APs, it does not require serial data connections for transmitting data; it requires only serial control connections for transmitting control signals. It is typically supplied with a larger CPU RAM 314 to run the master control program, which may be placed on an onboard ROM, or transmitted in through an optional boot load connection.&lt;br /&gt;[0029] As is clear from the above description, not all components shown in FIG. 3 are required for an AP. The minimum requirement for an AP is an internal bus 302, a CPU 304, a RAM 306 or 314, and a serial connection 310. As will be described later with reference to FIGS. 2(b)-2(e), the interrupt lines 308 (224 in FIG. 2(a)) may be omitted and their function may be performed by a serial connection 310.&lt;br /&gt;[0030] A crossbar switch (XBAR) is a switching device that has N serial connections, and up to N(N-1)/2 possible connection joins each formed between two serial connections. A typical crossbar may have N=32 serial connections. It is understood that "serial connections" here refer to the ports or terminals in the crossbar that are adapted for fast serial connections, which ports or terminals may or may not be actually connected to other system components at any given time. In use, a subset of the N(N-1)/2 possible connection joins may be activated and connected to other system components, so long as the following conditions are satisfied. First, at a minimum, each activated connection join connects one device that transmits data and one device that receives data. Second, no two connection joins share a data receiving device. The access ports connected to the crossbars, under program control, control the crossbar switches by rearranging the serial transmission connections to be point to point (uni-cast), one to many (multi-cast) or one to all (broadcast). Preferably, rearrangement occurs when the previous transmissions through the switch are complete and new transmissions are ready. Thus, the crossbar can be configured dynamically, allowing the crossbar configurations to change whenever necessary as required by the data distribution scheme.&lt;br /&gt;[0031] FIGS. 4(a) and 4(b) illustrate two examples of connection join patterns of a data crossbar in normal host (uni-cast or point to point) and rapid host (Multi-cast) setups, respectively, for data write. The configurations for data read may be suitably derived; for example, in the case of FIG. 4(a), data read may be illustrated by reversing the direction of the arrows. FIG. 4(a) is an example for a RAID5 or RAIDn write, at a time when the parity calculation for a previous stripe is completed, and the parity calculation for the next stripe is just starting. In this exemplary system, each of two data crossbars 404a and 404b is connected to a host AP 402, to each of two CPU-only APs 406a and 406b, and to each of three peripheral device APs 408a, 408b and 408c, via fast serial connections. In particular, the connections between each data crossbar and each CPU-only AP include a pair of fast serial connections as shown in the figure, for example, connection 410a from the first data crossbar 404a to the first CUP-only AP 406a, and connection 410b in the reverse direction. The two data crossbars 404a and 404b have identical configurations, and each may receive every second bit, byte, block or unit of the data, for instance, for increased throughput.&lt;br /&gt;[0032] The dotted lines 412a, 412b and 412c shown within the data crossbars represents connection joins, i.e. the path of data movement between connections. In this particular example, data moves in a direction from left to right for data write (and reversed for data read, not shown). Specifically, at this stage of a RAID5 or RAIDn write, data is moving from the host AP 402 to the CPU-only AP 406a (via path 412a) to start the new parity calculation for the next stripe, as well as to the peripheral device AP 408c (via path 412b) for storage. The parity data calculated by the CPU-only AP 406b for the previous stripe is moving from that AP to another peripheral device AP 408a for storage. In the illustrated example, two CPU-only APs are employed, but other configurations are also possible.&lt;br /&gt;[0033] The crossbar configuration in FIG. 4(b) is am example for a RAID0 write, or a RAID5 or RAIDn write of a non-parity block. This configuration is similar to that of FIG. 4(a), but only one data crossbar 404 is involved in the operation. Data moves from the host 402 to each of the three peripheral device APs 408a, 408b and 408c via paths 412d, 412e and 412f. This configuration is advantageous in a situation where the host AP 402 has approximately the same connection speed as the total throughput of all connected peripheral device APs. Using this configuration, each data packet from the host AP is broadcast to all peripheral device APs, and data received may be selectively used or thrown away by the peripheral devices, without sacrificing system speed.&lt;br /&gt;[0034] In general, the APs, under program control, are capable to accumulate data in their RAMs and buffer the data as appropriate for the efficient interleaving and superimposing of transmissions through crossbar switches.&lt;br /&gt;[0035] One specific application of the data distributors according to embodiments of the present invention is a RAID data storage system, where a plurality of disks are connected to the data crossbar via disk APs. Various RAID configurations include RAID0, RAID 1, RAID10 (also referred to as combined RAID), RAID5, RAIDn (which is ideally tuned for this invention), etc. In a RAID0 configuration, each bit, byte or block of data is written once in one of the disks. In a RAID0 write operation in the conventional system (FIG. 5), the data goes from one host 502 sequentially to all of the disks 504. Each disk 504 receives a number of blocks (including one block) and then the next disk becomes active. In the conventional system (FIG. 5), the EPCI bus 508 is traversed seven times assuming a write to all of six Disks 504. Thus, assuming a bus speed of 266 MB/Sec, the maximum transfer rate would be 38 MB/Sec (266 MB/Sec dividing by seven). Using the data distributor according to an embodiment of the present invention (FIG. 2(a)), data can be broadcast from the host APs to the disk APs simultaneously, which the disk APs selectively use or throws away the received data. Using the data distributor according to another embodiment (FIG. 4(b)), when the Host 402 and Disk 408 busses are identical in speed to the conventional system example above (ULTRA SCSI 320), by using the data distributor design, the maximum transfer rate will be 320 MB/Sec., i.e., limited by the Host bus speed only. Further, by using a subtractive logic approach, the Disk AP's would simply ignore or delete the received data that would not be sent to their respective Disks.&lt;br /&gt;[0036] In a RAID10 configuration (using a six-disk RAID as an example), a RAID0 of three disks is mirrored by an identical RAID0 of three disks. The read of a RAID10 is equivalent to a RAID0 by alternating mirror selection stripe by stripe in the standard way. For RAID10 writes, two writes (to two disks) are performed for every read (from a host). In the conventional system (FIG. 5), each of the two HBA gets its copy from the RAM 516. In the data distributor (FIG. 2), data flows once from the host 202 via the host AP 206 to the data crossbar 214, and two copies are sent by the crossbar to two disks 210 via two disk APs 212.&lt;br /&gt;[0037] In a RAID5 storage system, parts of the data written to the disks are data provided by the user (user data) and parts of the data are redundancy data (parity) calculated from the user data. For example, a six-disk array may be configured so that six blocks of data are written for every five blocks of user data, with one block being parity. Data read for RAID5 is similar to the RAID0 and the RAID10 read in efficiency. Data write for RAID5 involves the steps of fetching five blocks of user data and calculating one block of parity, and storing the parity block, as follows: 1 a&gt;fetch Block0 b&gt;fetch/XOR Block1 c&gt;fetch/XOR Block2 d&gt;fetch/XOR Block3 e&gt;fetch/XOR Block4 f&gt;store Block5&lt;br /&gt;[0038] In the conventional system (FIG. 5), for every five blocks of user data, five blocks of data would move from the host 502 to the RAM 516, five from the RAM to the CPU 514 through the front end bus 512 in the fetching steps, one from the CPU to the RAM through the front end bus in the storing step, and six from the RAM to the disk 504. In the data distributor according to an embodiment of the present invention (FIG. 2(a)), the data crossbar 214 may be used to distribute data efficiently. For example, in the five fetching steps, the user data blocks may be directed simultaneously through the data crossbar to two destinations, the disk APs 212 and the CPU-only APs 220 (fetch). In addition, the block of parity data from the previous calculation may be directed to its parity disk for store at the same time the current first fetching step a&gt; is being performed, by an independent transmission through the data crossbar. It can be seen that the data distributor according to embodiment of the present invention increases the efficiency of data distribution in the RAID systems.&lt;br /&gt;[0039] Referring back to FIG. 2(a), in the data distributor system shown therein, data flow of the entire system is controlled by the Allocator CPU AP 222. The Allocator AP 222 communicates with the nodes (the host APs 206, the peripheral device APs 212 and the CPU-only APs 220) via serial connections 216 (through the control XBAR 218) and the interrupt lines 224. In addition, by setup transmissions to the nodes, the Allocator CPU AP 222 notifies the nodes to perform particular actions according to the desired data distribution scheme. The command transmissions and setup transmissions are typically small, infrequent and quick. The nodes then controls the XBARs by transmitting commands to the XBAR in-band. The data processing is primarily handles by the CPU-only APs 220; the Allocator CPU AP 222 merely directs large amounts of data to individual math processors after notifying them of their task, but does not actually process each byte of data. Thus, this data distribution system reduces the amount of data processing performed by any central processor, and increases the speed and efficiency of data distribution. For example, if data transmission is handled in packets of at least 512 bytes, and control by the CPU is managed by a single 4-byte word, an improvement of more than 100-fold may be achieved over a conventional system in which data processing (e.g. encoding and decoding) is performed by a central CPU.&lt;br /&gt;[0040] FIGS. 2(b)-2(e) illustrate alternative structures of a data distributor according to other embodiments of the present invention. Like components in FIGS. 2(b)-2(e) are designated by like or identical reference symbols as in FIG. 2(a). In the structure of FIG. 2(b), the interrupt lines are eliminated, and the Allocator CPU AP 222 communicates with the nodes 206, 212 and 220 in both directions via serial connections 216 and the control XBAR 218.&lt;br /&gt;[0041] In the structure of FIG. 2(b), the Allocator CPU AP 222 may be eliminated and its functions performed by one of the CPU-only APs 220, and/or the control XBAR 218 may be eliminated and its functions performed by one of the data XBARs 214. FIG. 2(c) illustrates a structure where the Allocator CPU AP 222 is eliminated and its functions performed by one of the CPU-only APs 220. This CPU-only AP 220 is connected to the host APs 206 and the peripheral device APs 212 via serial connections 216 and the control XBAR 218. FIG. 2(d) illustrates a structure where the control XBAR 218 is eliminated and its functions performed by one of the data XBARs 214. The Allocator CPU AP 222 is connected to this data XBAR 214 via a serial connection 216 and communicates with the nodes 206, 212 and 220 via this data XBAR 214. This is referred to as in-path communication. FIG. 2(e) illustrate a structure where the Allocator CPU AP 222 is eliminated and its functions performed by one of the CPU-only APs 220, and the control XBAR 218 is eliminated and its functions performed by one of the data XBARs 214. Comparing FIG. 2(e) with the overview illustration of FIG. 1, it may be seen that components 202 and 206 correspond to component 104, components 212 and 208 correspond to component 106, component 220 corresponds to component 108, and component 214 correspond to component 102.&lt;br /&gt;[0042] In the structures of FIGS. 2(b)-2(e), the APs have structures similar to that shown in FIG. 3 but without the interrupt line(s) 308. Other aspects of the alternative structures of FIGS. 2(b)-2(e) are identical or similar to those described above for the structure of FIG. 2(a).&lt;br /&gt;[0043] It will be apparent to those skilled in the art that various modifications and variations can be made in a data distribution system and method of the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover modifications and variations that come within the scope of the appended claims and their equivalents.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;What is claimed is:&lt;/strong&gt;&lt;br /&gt;1. A data distribution system for distributing data among components of a data processing system including hosts and peripheral devices, the system comprising: one or more crossbar switches each having a plurality of serial connections, each crossbar switch being dynamically configurable to form connection joins between serial connections to direct serial transmissions from one or more incoming serial connections to one or more outgoing serial connections; and a plurality of access ports each having one or more serial connections for connecting to one or more crossbar switches, a processor, a memory, and an internal bus, wherein each of a first subset of the plurality of access ports further includes one or more host adapters and/or peripheral device adapters for connecting to one or more hosts and/or peripheral devices, and each is connected to at least one crossbar switch, and wherein each of a second subset of the plurality of access ports includes one or more input serial connections and one or more output serial connections connected to one or more crossbar switches, and is adapted to perform data processing functions.&lt;br /&gt;2. The data distribution system of claim 1, wherein at least one of the plurality of access ports is an allocator CPU access port which is connected to at least one crossbar switch via a serial connection, the allocator CPU access port being operable to control the other access ports to direct data transmissions between the other access ports connected via the crossbar switches.&lt;br /&gt;3. The data distribution system of claim 3, further comprising interrupt lines connected between the allocator CPU access port and the other access ports.&lt;br /&gt;4. The data distribution system of claim 1, wherein one of the crossbar switches is a control crossbar switch connected to all of the plurality of access ports for transmitting control signals among the plurality of access ports.&lt;br /&gt;5. The data distribution system of claim 1, wherein the crossbar switches are dynamically configured in-band by one or more access ports connected thereto.&lt;br /&gt;6. The data distribution system of claim 1, wherein each of the first subset of access ports is operable to transmit or receive data to or from hosts or peripheral devices .&lt;br /&gt;7. The data distribution system of claim 1, wherein at least some of the access ports are operable to buffer data within the access ports and to transmit buffered data in an interleaving or superimposing manner through crossbar switches.&lt;br /&gt;8. The data distribution system of claim 1, wherein a crossbar switch is configured to simultaneously direct an incoming serial transmission from one sending access port to a plurality of receiving access ports, each receiving access port either discards the transmission, or utilizes the transmission for further processing or transmission.&lt;br /&gt;9. The data distribution system of claim 1, wherein the second subset of access ports operate to perform parallel computations and are connected with a plurality of crossbar switches.&lt;br /&gt;10. The data distribution system of claim 1, wherein the second subset of access ports operate individually or in parallel to compute RAID and/or RAIDn parity encoding and decoding.&lt;br /&gt;11. The data distribution system of claim 1, wherein the second subset of access ports operate individually or in parallel to compute data encryption and decryption.&lt;br /&gt;12. The data distribution system of claim 1, wherein any of the first subset of access ports operate in a uni-cast mode, a multicast mode, and/or a broadcast mode.&lt;br /&gt;13. The data distribution system of claim 1, wherein the host adapters and/or peripheral device adapters provides both physical transmission exchange and transmission protocol management and translation.&lt;br /&gt;14. A data distribution system for distributing data among components of a data processing system including hosts and peripheral devices, the system comprising: one or more crossbar switches each having a plurality of serial connections, each crossbar switch being dynamically configurable to form connection joins between serial connections to direct serial transmissions from one or more incoming serial connections to one or more outgoing serial connections; and a plurality of access ports each having one or more serial connections for connecting to one or more crossbar switches, a processor, a memory, and an internal bus, wherein each of a first subset of the plurality of access ports further includes one or more host adapters and/or peripheral device adapters for connecting to one or more hosts and/or peripheral devices, and each is connected to at least one crossbar switch, wherein each of a second subset of the plurality of access ports includes one or more input serial connections and one or more output serial connections connected to one or more crossbar switches, and is adapted to perform data processing functions, wherein one of the crossbar switches is a control crossbar switch connected to all of the plurality of access ports for transmitting control signals among the plurality of access ports, and wherein at least one of the plurality of access ports is an allocator CPU access port which is connected to the control crossbar switch via a serial connection, the allocator CPU access port being operable to control the other access ports to direct data transmissions between the other access ports connected via crossbar switches.&lt;br /&gt;15. The data distributor system of claim 14, further comprising interrupt lines connected between the allocator CPU access port and the other access ports.&lt;br /&gt;&lt;br /&gt;Browse Industry: &lt;a href="http://www.freshpatents.com/Error-detection-correction-and-fault-detection-recovery-dtnewntc714.php"&gt;USPTO Class 714&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116053025350687966?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116053025350687966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116053025350687966' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053025350687966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116053025350687966'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/kris-land-recent-patents-listed.html' title='Kris Land, Recent patents listed: 20050097388 - Data distributor'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116051241391282975</id><published>2006-10-10T13:28:00.000-07:00</published><updated>2006-10-10T13:33:34.023-07:00</updated><title type='text'>Technology thwarts RAID data loss</title><content type='html'>&lt;a href="http://www.findarticles.com/p/articles/mi_zdewk"&gt;eWEEK&lt;/a&gt;,  &lt;a href="http://www.findarticles.com/p/articles/mi_zdewk/is_200012"&gt;December, 2000&lt;/a&gt; , by &lt;a href="http://www.findarticles.com/p/search?tb=art&amp;qt=%22Kris+Land%22"&gt;Kris Land&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Land-5 Corp. has developed a software-based technology that gives RAID storage systems the ability to sustain multiple drive failures without risking data loss.&lt;br /&gt;&lt;br /&gt;Based on 10 years of R&amp;D, Land-5's RAIDn technology lets IT managers select how many disk drive failures a storage subsystem can sustain without risking data loss.&lt;br /&gt;&lt;br /&gt;"The last enhancement to RAID was 15 years ago," said Kris Land, founder and chief technology officer of Land-5, in San Diego. "Yet every single storage system on the planet runs on 15-year-old technology. It is about time to update it."&lt;br /&gt;&lt;br /&gt;Thus far, RAID Level 1+5, which offers both mirroring and parity, can handle up to three disk drive failures.&lt;br /&gt;&lt;br /&gt;"For the first time, you have the ability on a drive-per-drive basis to select the level of insurance you want on your RAID box," Land said. "You get to choose the level you want by simply choosing that number."&lt;br /&gt;&lt;br /&gt;One systems integrator believes Land-5 is raising the bar for RAID technology.&lt;br /&gt;"This is going to change the way the world does [RAID]," said Lee Elizer, president and CEO of DataThink Inc., in Boulder, Colo. "RAIDn needs some evangelizing, but this is the RAID of the future."&lt;br /&gt;&lt;br /&gt;Land-5 officials said they expect to begin beta tests on RAIDn within two weeks.&lt;br /&gt;Currently, RAIDn technology has been implemented in Linux operating system software. Land said it will also be available in firmware for hardware controllers in 30 to 60 days, and it will be implemented in an application-specific integrated circuit in about 90 days.&lt;br /&gt;&lt;br /&gt;One of the biggest challenges customers face with RAID has been deciding which level to choose. If customers wanted higher reliability, they sacrificed cost, capacity and performance. If customers wanted higher performance, the trade-off was lower reliability.&lt;br /&gt;&lt;br /&gt;For instance, RAID 0, which stripes data across drives, offers maximum performance capacity, but reliability is sacrificed because if one drive is lost, data is lost.&lt;br /&gt;&lt;br /&gt;RAID 1, which also stripes data across drives, offers higher reliability because it can handle at least one drive loss, but it gives lower performance.&lt;br /&gt;&lt;br /&gt;"This is going to change the way people think," Land said. "Now you only have to ask two questions: What are the total number of drives you want to use, and what is the total level of redundancy you want on them?"&lt;br /&gt;&lt;br /&gt;Kris Land is the chief technical officer of LAND-5 Corporation (San Diego, CA).&lt;br /&gt;Copyright © 2004 Ziff Davis Media Inc. All Rights Reserved. Originally appearing in eWEEK&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116051241391282975?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116051241391282975/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116051241391282975' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116051241391282975'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116051241391282975'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/technology-thwarts-raid-data-loss.html' title='Technology thwarts RAID data loss'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116050980259972370</id><published>2006-10-10T12:49:00.000-07:00</published><updated>2006-10-10T12:50:02.863-07:00</updated><title type='text'>The user experience</title><content type='html'>&lt;a href="http://www.findarticles.com/p/articles/mi_m0REL"&gt;RELease 1.0&lt;/a&gt;, &lt;a href="http://www.findarticles.com/p/articles/mi_m0REL/is_n4_v93"&gt;April 23, 1993&lt;/a&gt;, by &lt;a href="http://www.findarticles.com/p/search?tb=art&amp;qt=%22Kris+Land%22"&gt;Kris Land&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Most of what we have described so far just helps users collect even more names that need to be replicated everywhere. Luckily, next-generation OSes will support API-addressable directory services and address-exchange protocols. That means a user need maintain only one address book locally. The OS will make that list available to all a user's applications, replacing her address books with the central book.&lt;br /&gt;&lt;br /&gt;The X.500 spec says nothing about the user interface. That's where vendors can differentiate themselves. MAPI, OCt, PenPoint, Telescript, MIME (the Multimedia Internet Nessage Exchange) and other such systems will act as the glue that allows users to choose the front-end they prefer, yet still be good citizens in the distributed systems architecture. To succeed, the companies leading these efforts have to agree on interchange standards.&lt;br /&gt;&lt;br /&gt;Wouldn't you like to be able to have the Resources &amp;amp; Phone Numbers at the end of each issue of Release 1.0 put automatically in your PIM&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Super PIM?&lt;br /&gt;&lt;/strong&gt;PIMs will be effective tools only when they are aware of the surrounding infrastructure and can collaborate with it. The ideal PIN is a smart shell that owns and stores as little information locally as possible. Where it can, it funnels queries to other applications or data stores. The addressbook facet of a PIN helps users set up ad-hoc conference calls or create and disband workgroups. Users shouldn't have to worry about the intricacies of addressing schemes and source routing; PIMs should hand addresses to other applications in the proper transmission format.&lt;br /&gt;&lt;br /&gt;"If users have to spend more than one minute a day maintaining their addresses, it won't work."'&lt;br /&gt;-- Mark Jackson, AT&amp;T EasyLink&lt;br /&gt;&lt;br /&gt;As we mentioned already (page 3), some applications and operating systems are moving in this direction, such as GO's PenPoint, which includes a Dialing Location Sheet. Today, every time the user changes status by leaving the office or traveling to a different area code, she has to find the Sheet and manually change the settings. A future version will sport an easy-to-change menu item. Long term, portable devices will monitor their own status and switch settings automatically. They will also communicate their status to message hubs via wireless links. That should iron the kinks out of many offerings that are Just not convenient enough today.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;OCTUS: PLAY PONG WITH YOUR CALLERS&lt;/strong&gt;&lt;br /&gt;As important as an individual's PIM is the interface that facilitates interaction between individuals. Octus has done a nice job of combining GUI ease of use with call-management functions usually missing from pc products.&lt;br /&gt;&lt;br /&gt;Nolan Bushnell, creator of the Pong videogame that helped launch this industry, and founder of Atari and the Chuck E. Cheese Pizza Parlors, is at it again. Nolan, Kris Land [12] and a team of 30 engineers are working to resolve the problems we encounter as we wrestle with our phones, e-mail, faxes, etc. Their approach starts locally, with address books and small workgroups; it's the other end of the spectrum from the Brobdingnagian, network-based services we described earlier.&lt;br /&gt;&lt;br /&gt;With Subway, Octus' product suite, users can place calls, swap business-card data with other Subway users during (or before) a conversation and decide what to do with incoming calls -- answer, take voicemail, play a message ("hold on, I'll be right with you") or forward them elsewhere -- automatically. Subway holds contact information about others you communicate with. It also monitors the recency and frequency of calls, and tries to make those numbers handiest.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Make the possible, visible&lt;br /&gt;&lt;/strong&gt;When you try to communicate, Subway's interface will make visible what is possible at any given time. Want to send e-mail to Zoe? She has no e-mail account listed, but we can turn your note into a fax and send it on its way. Subway, currently in beta testing, will eventually feature voice recording and playback, and will act as a simple voicemail or voice-response system, or record and play back clips of audio, including your phone conversations. Subway requires the installation of an OctoBox at each desk, between the workstation and phone.&lt;br /&gt;&lt;br /&gt;More interesting, though, are the ways Subway helps manage communication in a small workgroup. For example, the first time Zoe calls, a receptionist can log her into Octus's address book, which is no more effort than typing the information into e-mail phone messages. The next time she calls, the receptionist can pick her name from the existing list (if Caller ID doesn't do it first), which brings up a flash note with all the right information (as long as she isn't calling from a friend's office). The receptionist sends it to the person being called, who can associate different rings with different callers and manage the call with the options described above. Octus eventually plans to add more powerful rule-building capabilities, and perhaps support for wireless phones. Subway is for office workers who must receive phone calls and faxes often, and who have been losing the assistants they once took for granted.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;APPENDIX: OUR OWN LITTLE PROBLEMS&lt;br /&gt;&lt;/strong&gt;Our messaging adventure starts with our telephone numbers -- one for Connecticut, one for Manhattan, plus a fax in each place. (Lucky we don't use a cellular phone or pager yet.) Our 700 number (0-700-CURIOUS) is handy for people who need to call us often, but not simple enough to give to Just anyone (see page 10). We never (well, except when talking to AT&amp;T or NCR people) leave the 700 number in phone messages.&lt;br /&gt;&lt;br /&gt;When we first subscribed, we forwarded the number often, especially because it was easy: The system would pick up our inbound number, if possible, and use that as a default assignment. But AT&amp;amp;T removed the Caller ID feature because it could be misused. Now we usually leave the 700 number pointing to our own voicemail number (we're not comfortable relying on hotels to collect and forward all the messages), and use the 700 number to check our own message bin; the 700 number is less expensive than calling cards.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Telecommuting is no picnic&lt;/strong&gt;&lt;br /&gt;When messages arrive for us in the Manhattan office, we have no automatic way to receive them in Connecticut, since the in-office message system is Notework (from the company of the same name), a TSR that's handy on DOS machines, but weak on external connectivity. Urgent callers are asked to call Connecticut; other messages await our trips to Manhattan.&lt;br /&gt;&lt;br /&gt;The EDventure office uses BeyondMail to connect to the rest of the world, via an MHS gateway to MCI Mail and CompuServe. BeyondMail and Notework don't communicate. We have installed BeyondMail on our Connecticut pc, as well as on our notebook pc (a Safari on loan from NCR). We would like to have this environment available on the Safari, but we're not that far yet. Instead, we log into the WELL through the CompuServe packet network using Software Ventures' MicroPhone Pro and check our mail.&lt;br /&gt;&lt;br /&gt;The MHS gateway works nicely with the EDventure office in Manhattan (after much frustration with the notoriously troublesome Telepath modern from Gateway), but is not working with CompuServe. (BeyondMail for Windows sends attachments as a default with each message; CompuServe's MHS gateway coughs them back to us; a patch is on the way.) We have to exit BeyondMail to run the MHS gateway. In fact, we can generally only have one communication program open at a time, or the modern locks up. We also use Delrina's WinFax, though we suspect the Telepath gremlins have been at work again, because our fax/modern transmissions aren't reliable.&lt;br /&gt;&lt;br /&gt;Our Gateway 486 is armed with AG Communication Systems' WindowPhone system, a card and software that enables a pc to detect Caller ID information, then looks up the number (if you've entered it, and if the call is from a direct line). Unfortunately, it doesn't have much opportunity, since most of our calling is interstate and no agreements on the handoffs of Caller ID data between carriers have been struck. Also, only numbers served locally by modern switches can transmit Caller ID, further reducing coverage. WindowPhone does let us autodial, which is convenient; however, it's another list of names and numbers to maintain.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;A messaging maelstrom&lt;/strong&gt;&lt;br /&gt;We have a series of online accounts and e-mail addresses. From an Internet perspective, these are: jmichalski@RadioMail.net (for a Viking Express mobile radio messaging loaner we're about to surrender -- thanks, RAM Mobile); spiff@WELL.sf.ca.us (the WELL, our principal e-mail bin and online haunt); 76367.2432@CompuServe.com; Rheo@AOL.com (America Online), and Jerry. BMail@Rheomode.MHS.CompuServe.com (a BeyondMail-MHS-CompuServe triple play). All have different passwords and need to be connected to separately (not all support forwarding of messages). Ironically enough, that last, most complex address is where we'd like mail to go, because it is the only way we can read separate messages, rather than work in a terminal session (we tried several front-end programs; TapCIS's interface was too rough, and Sweeper crashed our machine, despite two downloads from the WELL).&lt;br /&gt;&lt;br /&gt;We don't mess with this many services because we want to be difficult or clever or even for the opportunity to test each one; we do it because each network has different resources. AOL's subscribers are different from Compuserve's and from the WELL's, for instance. Unless these services adopt some common storage and interchange formats, as well as common offline readers, we'll be doing this dance forever.&lt;br /&gt;&lt;br /&gt;We're about to return the Viking Express kit we've had on loan from RAN. It's been trusty, even impressive (send a friend e-mail while riding an airport Jitney), but carrying it plus the NCR Safari 3170 notebook plus associated recharging subsystems is a bit much. Because we don't have the HP 95LX Connectivity Pack, which would let us upload files directly, we forward e-mail that we want to keep to our WELL account.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;What about groupware? File synchronization?&lt;br /&gt;&lt;/strong&gt;It gets worse. In addition to paper, we publish our newsletter via Amix, which has a less-than-ideal user interface. We have Lotus Notes installed in Connecticut, but it apparently hates the Telepath modern, so we have not used it yet to synchronize with servers at New Science Associates, the ITAA or Lotus. We loaded Microcora's Carbon Copy, but it doesn't support our monitor's resolution, so we decided to do without rather than change permanently to a lower graphics setting. Lotus Organizer is an elegant PIN, but its address book isn't very flexible, and synchronizing between Organizer on the Gateway and the Safari is difficult, so we've postponed using it. To transfer bigger files between the two, we use Traveling Software's LapLink (for which we have to unplug the mouse, change the autoexec file and reboot the Gateway several times). Needless to say, we don't do that very often.&lt;br /&gt;&lt;br /&gt;Practically every one of the systems we've just described has an address book. None of them talk to each other. Worse still, our old contacts are in text files, exported from Apple's HyperCard and not imported into a database or address book because we Just can't figure out which one to use. Besides, importing and exporting functions are never smart enough to sort out field separators and address formats. There may well be ways to work around some of the things we've just described (we certainly hope so), but we haven't found them. So far, entropy and chaos rule.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;The next time someone gives you a business card, it may be electronic&lt;br /&gt;&lt;/strong&gt;[12] Land's company, Paradox Development (now merged with Octus), built a networked fax server called FaxConductor, which captures, OURs and routes inbound faxes automatically (if senders put double parentheses around an inbound routing number). Octus is merging FaxConductor with Subway.&lt;br /&gt;&lt;br /&gt;Kris Land is the chief technical officer of OCTUS Inc. (San Diego, CA).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116050980259972370?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116050980259972370/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116050980259972370' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050980259972370'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050980259972370'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/user-experience.html' title='The user experience'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116050858375389248</id><published>2006-10-10T12:20:00.000-07:00</published><updated>2006-10-10T12:29:43.766-07:00</updated><title type='text'>Insuring The Reliability Of Fibre Channel RAID Storage - Industry Trend or Event</title><content type='html'>&lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ"&gt;Computer Technology Review&lt;/a&gt;,  &lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ/is_1_20"&gt;Jan, 2000&lt;/a&gt;  by &lt;a href="http://www.findarticles.com/p/search?tb=art&amp;qt=%22Kris+Land%22"&gt;Kris Land&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;A major benefit of Storage Area Networks is fast "any to any" server or client access to RAID storage. In a mission-critical environment, this places emphasis on ensuring high availability of not only the data access paths, but also the RAID storage system itself.&lt;br /&gt;&lt;br /&gt;Fortunately, standardized Fibre Channel layers define media and interface characteristics, as well as specifying highly reliable transmission protocols with low bit-error rates. SAN fabrics have evolved to include redundancies among switches and access paths, providing failover insurance against hardware problems.&lt;br /&gt;&lt;br /&gt;From a hardware perspective, RAID systems typically include such high-availability features as redundancies, hot-swappability, and thermal management to dissipate heat build-up. Fibre Channel RAID systems with dual-loop architectures even provide protection against internal disk channel failures. Alarm systems and remote management capabilities further contribute to the reliability of today's RAID storage systems.&lt;br /&gt;&lt;br /&gt;The storage industry has embraced traditional RAID levels (1, 3, 5) and variations thereof (0+1, 1+5, 6, etc.) as means of protecting critical information against the likelihood of disk drive failures. Typically, however, this protection is limited to a single drive failure (RAID 3 or 5). At most, protection against three concurrent inoperable drives is achieved, but at the cost of expensive mirroring. Even exotic arrays of this nature have limitations on the conditions under which drive failures can be sustained.&lt;br /&gt;&lt;br /&gt;LAND-5 has developed patented algorithms that allow a disk RAID array consisting of "N" drives to sustain operations even in the event of "M" drive failures, where 1[less than]=M[less than]N. Called "eRAID," this breakthrough technology can be implemented with far fewer disk drives than mirroring while also yielding higher performance and enhanced reliability.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;INTRODUCTION&lt;/strong&gt;&lt;br /&gt;With the growth of mission-critical information requiring twenty-four hour access, the reliability of storage systems is paramount. Downtime is extremely costly. Customers, vendors, employees, and prospects can no longer conduct essential business or critical operations. There is a "lost opportunity" cost to storage failures, as well, in terms of business lost to competitors. Well-documented studies place the cost of downtime in the tens of thousands (or even millions) of dollars per hour.&lt;br /&gt;&lt;br /&gt;Consider the recent problems with eBay, a major online auction Website with 2 million customers that suffered extended equipment crashes. The company, which saw its stock value slide by almost 20 percent, lost significant revenue over the three-day period--eBay warned that the latest 22-hour outage would knock between $3 million to $5 million off Q2 sales. However, the greater damage could be to eBay's reputation, especially if it continues to be plagued by outages. In a recent survey of consumers, Jupiter Communications found that 46 percent of online consumers leave a preferred site if they experience technical or performance problems.&lt;br /&gt;&lt;br /&gt;The need for large amounts of reliable online storage is fueling demand for fault-tolerant technology. According to International Data Corporation, the market for disk storage systems last year grew by 12 percent, topping $27 billion. More telling than that figure, however, is the growth in capacity being shipped, which grew 103 percent in 1998. Much of this explosive growth can be attributed to the space-eating demands of endeavors such as year 2000 testing, installation of data-heavy enterprise resource planning applications, and the deployment of widespread Internet access.&lt;br /&gt;&lt;br /&gt;The rising tide of Storage Area Networks (SAN) is fueled by the prospect of providing "any to any" high-performance access by networked servers and clients to critical information on a continuous basis. RAID storage is the underlying foundation of SAN technology, necessary to insure that mission-critical data is available when needed. Access to online storage on a 24x7 basis is essential to most SAN configurations. Thus, the reliability of Fibre Channel RAID storage is, in a sense, the Achilles heel of a SAN fabric.&lt;br /&gt;&lt;br /&gt;In examining SAN storage, attention is quickly focused on three elements that are essential to reliability:&lt;br /&gt;* The error checking scheme inherent in the transmission protocol&lt;br /&gt;* The reliability of the RAID storage unit itself&lt;br /&gt;* The ability of the RAID storage system to withstand multiple drive failures&lt;br /&gt;&lt;br /&gt;This article discusses each topic in turn. Industry answers exist for the first two subjects, but the storage community is still applying expensive "band-aids" in an attempt to overcome the inevitability of disk drive failures in large storage arrays.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;TRANSMISSION PROTOCOLS&lt;br /&gt;&lt;/strong&gt;Fibre Channel has five layers: FC-0 through FC-4. The FC-0 layer defines the media and interface characteristics of full-duplex serial links between points. It lets Fibre Channel scale its signaling rates and define conforming cabling and connectors without affecting upper level protocols. As such, the FC-0 layer facilitates high-performance availability to Fibre Channel storage systems.&lt;br /&gt;&lt;br /&gt;The FC-l layer defines transmission protocols. It defines how FC-0 signals are patterned to carry data and how port-to-port links are initialized and, if necessary, recovered from error conditions. Within a Fibre Channel network, the transmitter keeps track of the number of binary 0s and 1s. Likewise, the receiver also tracks the running disparity of 0s and 1s to detect any errors. Fibre Channel also uses a control character to synchronize word boundaries. With a specified bit-error rate of less than one bit error in 1012 bits, the FC-1 layer provides low-cost, reliable transmit-and-receive circuits and a transmission protocol that is independent of media, distance, or data rate.&lt;br /&gt;&lt;br /&gt;Together, the FC-0 and FC-1 layers provide a solid foundation for reliable, high-performance access to Fibre Channel storage systems configured in a switched fabric network. Along with the other layers, they also present a standard, open architecture for interfacing Fibre Channel storage systems, supporting a competitive atmosphere that benefits the consumer.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RAID SYSTEM RELIABILITY&lt;/strong&gt;&lt;br /&gt;Most enterprise-level storage is "mission critical" these days. Corporate Intranets are the lifeblood of employees, vendors, and contractors. Presenting an appealing Web site to customers and prospects on a 24x7 basis is essential to competitive survival. Online databases and consumer activities require storage systems that are impervious to normal fatigue or thermal failures.&lt;br /&gt;&lt;br /&gt;Reliable storage systems are crucial for SANs. "High availability" is implemented through redundancy of critical components, hot-swappability in the event of component failure, and management of heat build-up.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Thermal Management&lt;br /&gt;&lt;/strong&gt;Heat, or thermal energy, is transferred from one body to another by virtue of a temperature differential. In short, heat flows from a high-temperature area to a lower-temperature area. If there is no means of removing heat, then a steady state condition will eventually be reached wherein the internal temperature of a system enclosure equals that of its hottest element.&lt;br /&gt;In general, there are three methods, or modes, of heat transfer: conduction (transfer of heat through a solid caused by molecular oscillations), convection (transfer of heat from the surface of a solid to the surrounding air), and radiation. LAND-5's PolAIRis, a thermal management system, focuses on removing this heat by using strategically placed conduits and fans to direct an optimized volume of airflow through its system enclosures.&lt;br /&gt;&lt;br /&gt;Fast and highly integrated circuits generate large amounts of heat. Although a typical ECL gate dissipates less than 10 milliwatts, 10,000 of these gates integrated onto a chip can bring total power consumption easily up to 20-30W At high temperatures, corrosion mechanisms accelerate and stresses are generated at the material interfaces because of different expansion coefficients. As a result, solder and wire bonds fail. In addition, CMOS switching speed degrades as the temperature increases. To eliminate negative temperature effects, heat must be removed rapidly from semiconductor devices.&lt;br /&gt;&lt;br /&gt;In computer equipment, disk drives, processors, ASICs, and power supplies tend to be the hottest components. Disk drives operate at high Revolutions Per Minute (RPM) and quickly begin to generate considerable heat, the leading cause of disk drive failure. High-performance CPUs typically have a dedicated fan and a heat sink to dissipate heat build-up. However, most dedicated I/O subsystems now contain powerful processors, as well (such as Intel's 1960), and these generate considerable heat that must be discharged by the enclosure's thermal management. Likewise, most ASICs become local hot spots within an enclosure, endangering surrounding components unless their heat is rapidly dissipated. Power supplies, critical to continuous system operation, also quickly fail without adequate cooling.&lt;br /&gt;&lt;br /&gt;Thus, it is clear that disk drive failures are often related to heat build-up within an enclosure. Generally speaking, disk drive reliability drops sharply as internal enclosure temperature rises above 45[degrees]C (113[degrees]F). A reduction in temperature of five degrees Centigrade can significantly improve disk drive reliability from 15% to 40%, depending on the actual inside cabinet air temperatures.&lt;br /&gt;&lt;br /&gt;To address the requirement for extreme system uptime, better RAID storage systems implement sophisticated thermal management using a reverse air cooling process, multiple fans, and a chassis design that creates a "wind tunnel" effect, drawing cool air across heat-generating components. Airflow is controlled to conform to one direction, thereby maximizing the cooling effect of multiple fans. Heat dissipation is further aided by designing the system to reduce heat sources. As a final measure, temperature monitoring, along with visual and audio alarms, is required.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;System Redundancies&lt;br /&gt;&lt;/strong&gt;Even with excellent thermal management, hardware failures are inevitable in any storage system. Hence, it is essential to design a high-availability RAID system with redundancies in order to ensure that storage access is not interrupted whenever a failure occurs.&lt;br /&gt;The most common redundancy is dual power supplies. If one power supply fails, the remaining power supply should be sufficient to allow continued system operation for an indefinite period. Added safety is achieved by designing the power system to include automatic load balancing, thereby prolonging the life cycle of each power supply. Having separate power cords allows each power supply to be plugged into a separate circuit, enhancing protection against the failure of an electrical system within the building. Adding an UPS buys time in the event of a complete power outage.&lt;br /&gt;&lt;br /&gt;As discussed, aggressive system cooling is essential to continuous operation. Hence, redundant fans are critical. Many RAID systems have unfortunately not learned this lesson and their users suffer accordingly.&lt;br /&gt;&lt;br /&gt;Redundant RAID controllers have two benefits. They provide a fail-over capability in case one fails. Moreover, in an "active-active" mode, the controllers can share the workload, thereby enhancing system performance.&lt;br /&gt;&lt;br /&gt;Channel failures do occur. A truly mission-critical RAID system compensates for this possibility by having built-in redundancies in the form of A-B loops for each internal channel. If one loop fails, the remaining loop kicks in to ensure continued operations. In the future, RAID controllers will be able to take advantage of dual loop architectures to significantly increase transfer rates through "active-active" operation. For example, the new LAND-5 ICEbox FC 2500 RAID storage system has three disk channels, each supported by independent A-B loop access. Now providing up to an aggregate 300MB/sec transfer rate, the potential exists to double performance to 600MB/sec when controllers that support simultaneous dual-loop data access become available.&lt;br /&gt;&lt;br /&gt;Having at least a global hot-spare disk drive is a universal requirement for a mission-critical RAID system. More sophisticated systems also support local hot spares.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Hot Swappability for Critical Components&lt;/strong&gt;&lt;br /&gt;Mission-critical storage systems demand the ability to perform repairs without interrupting operations. Thus, major system components that are the most likely to fail over time must be "hot swappable." Local personnel must be able to access and swap out a failed disk drive, fan, or power supply with minimal effort. Better systems with redundant RAID controllers also support replacement of a failed controller "on the fly."&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RAID System Architecture Considerations&lt;/strong&gt;&lt;br /&gt;Two backplane architectures are available in commercial RAID systems--active and passive. Both support an A-B loop architecture. A passive backplane allows hot-swappability of controller and channel interface boards. However, its architecture increases the design complexity and cost. Active backplanes allow channel segmentation, a performance boost. They are also less costly to design and build. The downside is that if a channel fails, then the entire backplane must be replaced.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;em&gt;PROTECTION AGAINST MULTIPLE DISK DRIVE FAILURE&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;RAID storage configurations have proven to be the best hedge against the possibility of a single drive failure within an array. Each RAID level, however, has its pluses and minuses:&lt;br /&gt;&lt;br /&gt;* While RAID 0 delivers high performance, it cannot sustain even a single drive failure because there is no parity information or data redundancy.&lt;br /&gt;&lt;br /&gt;* Although the most costly, mirroring data on separate drives (RAID 1), means that if one drive fails, critical information can still be accessed from the mirrored drive. Typically, RAID 1 involves replicating all data on two separate "stacks" of disk drives on separate SCSI channels, incurring the cost of twice as many disk drives. There is a performance impact, as well, since data must be written twice, consuming both RAID system and possibly server resources.&lt;br /&gt;&lt;br /&gt;* RAID 3 and RAID 5 allow continued (albeit, degraded) operation by reconstructing lost information "on the fly" through parity checksum calculations. Adding a global hot spare provides the ability to perform a background rebuild of lost data.&lt;br /&gt;&lt;br /&gt;With the exception of costly RAID 1 (or combinations of RAID 1 with RAID 0 or RAID 5) configurations, there has been no solution for recovering from a multiple drive failure within a RAID storage system. Even the exceptions sustain multiple drive failures only under very limited circumstances. For example, a RAID 1 configuration can obviously lose multiple (or all) drives in one mirrored stack, as long as not more than one disk falls in its mirrored partner. Combining striping and parity within mirrored stacks buys some additional capabilities, but is still subject to these drive-failure limitations.&lt;br /&gt;&lt;br /&gt;Why would a system need protection against more than one drive failure at a time? Isn't the reliability of today's disk drives so high that the chances of a multiple drive failure are remote?&lt;br /&gt;Disk drive manufacturers publish Mean Time Between Failure (MTBF) figures as high as 800,000 hours (91 years). Yet, as one examines these claims, disk drive manufacturers readily admit that such claims are unrealistic. In fact, the practical life of a disk drive is five to seven years of continuous use. Information Technology managers can painfully testify that disk drives fail with great frequency. That's why all companies place emphasis on storage backup and there is such a large market for tape systems.&lt;br /&gt;&lt;br /&gt;It is clear that the likelihood of a drive failure increases as more drives are added to a disk RAID storage system. For example, a terabyte of RAID 5 storage consisting of fiftyeight 18GB disk drives can expect a drive to fail every 44 days! Moreover, when one drive fails, the statistical odds of a second drive failing increase dramatically and if two drives fail, the odds of a third failure jump again. In short, the more drives configured in a RAID storage system, the greater is its potential for suffering multiple drive failures.&lt;br /&gt;&lt;br /&gt;Also, disk drives configured within a RAID storage system can be of different ages, including a mixture of new and older drives. This profile increases the odds of a multiple drive failure.&lt;br /&gt;The consequences of a multiple-drive failure can be devastating. Typically, if more than one drive fails, or a service person accidentally removes the wrong drive when attempting to replace a failed drive, the entire RAID storage system is out of commission. Access to critical information is not possible until the RAID system is re-configured, tested, and a backup copy restored. Transactions and information written since the last backup may be lost forever.&lt;br /&gt;&lt;br /&gt;Extensive research and development by LAND-S has resulted in a set of software and hardware algorithms that augments RAID storage by performing automatic, transparent recovery from multiple drive failures without interrupting ongoing operations. Called "eRAID," these patented algorithms allow users to select the degree of disk-loss insurance desired. Continued operations are possible even in the event of N1 drive failures. Moreover, because these algorithms have exceptionally fast computational speeds, storage transfer rate performance actually increases under eRAID while adding virtually unlimited data protection.&lt;br /&gt;&lt;br /&gt;eRAID consists of a series of software matrix array formulas. It involves breakthrough algorithms for accomplishing XOR calculations (which are the basis of RAID 5). eRAID dramatically alters the reliability of RAID storage by circumventing previous limitations on the number of permissible drive failures. With eRAID, all but one drive can fail (assuming sufficient capacity) and users will still have access to critical information.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;HOW DOES ERAID DIFFER FROM TRADITIONAL RAID?&lt;br /&gt;&lt;/strong&gt;Today, the ultimate protection for critical information is accomplished through RAID 1 (mirroring), overlaying RAID 5 (striping with parity), and then adding a global hot spare. For example, if user data consumes four disk drives, then reliability is improved by replicating this data on a second "stack" of four drives. Within each stack, however, losing just one drive would make the whole database useless. To further enhance reliability, each mirrored stack can be configured as an individual RAID 5 system. Since implementing parity requires an additional drive, user data and parity information are now striped across five drives within each stack. This provides protection against the loss of a single drive within each stack. So, from an original database that required just four drives, this RAID configuration has grown to include:&lt;br /&gt;* Four drives for the original data&lt;br /&gt;* Four drives for the mirrored data&lt;br /&gt;* One parity-drive (equivalent) for each stack (Two total)&lt;br /&gt;* One global hot spare (standby drive on which data can be rebuilt if a drive fails)&lt;br /&gt;&lt;br /&gt;This architecture now requires a total of eleven disk drives (Fig 1). Thus, seven drives have been added to protect data on the four (original) drives. This configuration can recover from a failed drive in either stack. Even if all the drives in one stack failed, the remaining drives in the surviving stack would still provide access to critical data. However, in this case, only one drive failure in the remaining stack could be tolerated. Overall, if multiple drive failures occur within each stack, access to the database is lost. Barring a total stack failure, its maximum protection is against the failure of three drives, but in a limited fashion (maximum of two failures in any one stack).&lt;br /&gt;&lt;br /&gt;Looking at the same example using eRAID to achieve equal protection against multiple drive failure (Fig 2), protection against three-drive failure is achieved at less cost and overhead:&lt;br /&gt;* Requires only eight disk drives compared toll for traditional RAID&lt;br /&gt;* Requires less administrative overhead&lt;br /&gt;&lt;br /&gt;Hence, if these disk drives cost $1,000 each, the eRAD solution saves $3,000 while providing better insurance, since any three random drives can fail and the system will continue to properly function. Many databases rely strictly upon RAID 5 with striping and parity for protection against drive failure because RAID 1 solutions are so costly. However, RAID 5 supports continued operation only in the event of a single inoperable drive at any one moment. Losing two or more drives under RAID 5 brings operations quickly to a halt. For the cost of adding just one more drive, eRAID mitigates the risk of data loss by providing the means to sustain up to two drive failures.&lt;br /&gt;&lt;br /&gt;LAND-5 eRAID, however, can support continuous operation even in the event several drives fail. Thus far, LAND-5 has successfully tested recovery when 50 percent of the disk drives fail. With eRAID, network administrators can manually assign the level of desired drive-failure protection. In short, eRAID allows the user the flexibility of selecting the level of drive-failure protection to fit specific needs.&lt;br /&gt;&lt;br /&gt;The tangible cost of eRAID is that an additional parity drive equivalent is consumed for each incremental protection level. For instance, if a user desires to protect a 100-drive storage system against the possibility of two concurrent drive failures, then the equivalent of two disk drive capacities will be allocated for eRAID parity-related data. Thus, while users can still read from 100 drives, they can write to only 98 drives, reducing usable storage capacity by two percent. Hence, protection from (say) five concurrent drive failures reduces data storage capacity by only five percent. As any Information Technology Manager will testify, this is a small price to pay for dramatically enhanced storage reliability.&lt;br /&gt;&lt;br /&gt;Aside from protection against multiple drive failures, some significant benefits of eRAID are:&lt;br /&gt;* eRAID supports continued operations even in the event of a total SCSI channel failure, whereas this would be catastrophic under traditional RAID 3 or 5.&lt;br /&gt;&lt;br /&gt;* In a traditional RAID 1 (or 0+1, 5+1, RAID 6, etc.) storage configuration, with (say) data mirrored on two independent SCSI channels, all data could be lost in one channel and operation would continue. However, if more than one drive failure concurrently occurs in both mirrored channels, then the entire storage system becomes inoperable. With eRAID, on the other hand, random multiple drive failures are sustainable.&lt;br /&gt;&lt;br /&gt;Kris Land is the chief technical officer of LAND-5 Corporation (San Diego, CA).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116050858375389248?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116050858375389248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116050858375389248' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050858375389248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050858375389248'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/insuring-reliability-of-fibre-channel.html' title='Insuring The Reliability Of Fibre Channel RAID Storage - Industry Trend or Event'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116050798372748397</id><published>2006-10-10T12:16:00.000-07:00</published><updated>2006-10-10T12:19:43.730-07:00</updated><title type='text'>The Mythical Hot-Spare - Tape/Disk/Optical Storage</title><content type='html'>&lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ"&gt;Computer Technology Review&lt;/a&gt;,  &lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ/is_1_22"&gt;Jan, 2002&lt;/a&gt;  by &lt;a href="http://www.findarticles.com/p/search?tb=art&amp;qt=%22Kris+Land%22"&gt;Kris Land&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hot Spare is a term given to a device that can be added to computer storage systems while the system is running without being required to shut the system down or interrupting service. In this article we'll be discussing hard drives and RAID storage and more importantly the belief that Hot Spares keep your data safe.&lt;br /&gt;&lt;br /&gt;A brief primer on RAID storage is needed to make sure the concept of a Hot Spare is understood and why RAID storage may not be as safe as you may have once thought!&lt;br /&gt;&lt;br /&gt;RAID originally was defined as "Redundant Array of Independent Drives" and is more commonly known as "Redundant Array of Inexpensive Devices". In either case RAID is a combination of software algorithms and hardware devices allowing companies to typically join multiple hard disk drives in order to gain capacity, performance, and safety. Selecting the different RAID levels, which are defined for some typical cases (in Table 1), does this.&lt;br /&gt;&lt;br /&gt;I'm sure many of you have already been through this dizzying matrix of choices before and have had to settle on one of these levels to manage your company's data. According to Salomon Smith Barney and Dataquest, 70% of the total RAID storage market is running on RAID 5, this is not surprising since this is the most cost efficient, largest capacity, reasonably safe RAID Level available today.&lt;br /&gt;&lt;br /&gt;Now the question, "where does this Hot Spare thing fit into all of this?" Hot Spares are combined with RAID systems to increase overall system reliability. This is done by adding one or more hard drives to an already existing RAID system. But the drive is never utilized until one of the existing RAID drives fails within the system. Of course if we only have to purchase one more hard drive and we get double the safety that's a great insurance policy right? And if by purchasing a couple of drives this safety margin goes up even more ... that's great, right?&lt;br /&gt;&lt;br /&gt;Statistically, all companies that store their data on RAID 5 systems agree with this idea and it turns out that 75% of the RAID 5 storage systems running today have one or more Hot Spares running and providing this insurance ...&lt;br /&gt;&lt;br /&gt;The Myth: Hot Spares do not provide instant insurance! If a hard drive fails and the Hot Spare comes into action there is a rebuild time. And with today's drives this rebuild time represents a significant opportunity for disaster!&lt;br /&gt;&lt;br /&gt;This has never been a problem in the past; why now? What happened is that hard drives in the past were much smaller and didn't take very long to copy the "safety data" back to a new drive "the Hot Spare". But over the last six years, drives have doubled in capacity every year while their relative performance to capacity has remained roughly the same. This means that each time the drive doubled in capacity the time it takes to update an entire drive almost doubled with it.&lt;br /&gt;&lt;br /&gt;In Table 2, the hard drive capacity versus performance is shown along with the average time it would take to rebuild the larger drives with new data. The performance data was retrieved from an excellent source at &lt;a href="http://www.storagereview.com"&gt;www.storagereview.com&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;What Table 2 shows is that the rebuild time for a Hot Spare in 1995 was between two and eight hours, which was by no means perfect but a company's data was only at risk for up to one day. Now with today's drives, the same company's data would be at risk for up to 13 days, just short of two full weeks. In addition, the total amount of data at risk has also doubled every year, now that the RAID 5 array may actually contain every piece of data the company owns.&lt;br /&gt;&lt;br /&gt;Imagine all of your company's data on 14 hard drives, 12 for actual storage and one for parity and the Mythical Hot Spare respectively. Using today's drives, that represents approximately two terabytes of capacity. This seems like a great system; most of the storage industry says this is the way to go, and because you bought that Hot Spare you have that extra safe insurance, right? Well not quite, if one morning at 10:00 a.m. you lost one hard drive under RAID 5, your data would still be intact and your employees would still be able to use the storage system.&lt;br /&gt;&lt;br /&gt;However the storage system is now running in degraded mode meaning if you lose any other drive before your Hot Spare rebuilds you will lose the entire two terabytes of data. And worse yet the system will be running in degraded mode for up to thirteen days depending on how much new data and system use you need during the rebuild time.&lt;br /&gt;&lt;br /&gt;Hot Spares do not protect against more then one drive failing at the same time or within a short period of each other, nor do they protect against someone accidentally removing the wrong drive when they really meant to remove the already dead drive.&lt;br /&gt;&lt;br /&gt;Table 1&lt;br /&gt;RAID LEVEL Basic Total Drive (1) Relative (2)&lt;br /&gt;Description Drives Redundancy Performance&lt;br /&gt;RAID 0 Striping 18 0 8&lt;br /&gt;RAID 1 Mirroring 2 1 1.5&lt;br /&gt;RAID 5 Parity 18 1 16&lt;br /&gt;RAID 10 RAID 1 + 0 18 1 * 13.5&lt;br /&gt;RAID 15 RAID 1 + 5 18 3 * 11.5&lt;br /&gt;RAID LEVEL Drives&lt;br /&gt;Available&lt;br /&gt;RAID 0 18&lt;br /&gt;RAID 1 1&lt;br /&gt;RAID 5 17&lt;br /&gt;RAID 10 9&lt;br /&gt;RAID 15 8&lt;br /&gt;(1) Drive Redundancy is the maximum number of random drive failures&lt;br /&gt;before catastrophic data loss, Mirroring combinations can lose more&lt;br /&gt;as long as they are not a mirrored set.&lt;br /&gt;(2) Relative Performance is the average read/white contribution&lt;br /&gt;of all drives minus read/write/verify penalties.&lt;br /&gt;&lt;br /&gt;Table 2&lt;br /&gt;Average Rebuild Rebuild&lt;br /&gt;Year Drive Speed Time (1) No Time (2)&lt;br /&gt;Introduced Capacity (MB/s) Overhead With Overhead&lt;br /&gt;1996 1 GB's 6.7 MB/s 2.11 Hrs. .36 Days&lt;br /&gt;1997 4 GB's 8.7 MB/s 6.51 Hrs. 1.12 Days&lt;br /&gt;1998 9 GB's 12 MB/s 10.63 Hrs. 1.82 Days&lt;br /&gt;1999 18 GB's 22 MB/s 11.59 Hrs. 1.99 Days&lt;br /&gt;2000 36 GB's 30 MB/s 17 Hrs. 2.92 Days&lt;br /&gt;2001 73 GB's 44 MB/s 23.50 Hrs. 4.03 Days&lt;br /&gt;2002 180 GB's 33 MB/s 77.27 Hrs. 13.26 Days&lt;br /&gt;(1) Write Time No overhead assumes the RAID controller is doing nothing&lt;br /&gt;else but rebuilding the data to the Hot Spare.&lt;br /&gt;(2) Write Time w/ overhead assumes the RAID controller is handling&lt;br /&gt;moderate to heavy user traffic while rebuilding.&lt;br /&gt;&lt;br /&gt;Kris Land is the CTO at Land-5 Corp. (San Diego, CA).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116050798372748397?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116050798372748397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116050798372748397' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050798372748397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050798372748397'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/mythical-hot-spare-tapediskoptical.html' title='The Mythical Hot-Spare - Tape/Disk/Optical Storage'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35811827.post-116050753700980173</id><published>2006-10-10T11:55:00.000-07:00</published><updated>2006-10-10T12:12:17.026-07:00</updated><title type='text'>New 15K drives provide more than speed: they offer increased safety in multi-TB storage - High Availability</title><content type='html'>&lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ"&gt;Computer Technology Review&lt;/a&gt;,  &lt;a href="http://www.findarticles.com/p/articles/mi_m0BRZ/is_7_22"&gt;July, 2002&lt;/a&gt;  by &lt;a href="http://www.findarticles.com/p/search?tb=art&amp;qt=%22Kris+Land%22"&gt;Kris Land&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There is no issue more critical to large storage centers than the preservation and integrity of their data. That said, a very close second is having the needed access and performance necessary to reading and writing all of the information that these centers keep online. To meet these needs, hard drive manufacturers have provided, once again, a new breed of device to provide even faster access and reliability for storage systems.&lt;br /&gt;&lt;br /&gt;A year ago, storage performance seemed as good as it was going to get, but the new and improved, 73GB, 15,000-RPM models are now available, along with today's SCSI hard drives that are up to 146GB at 10,000 RPM. So the big questions follow: How does this affect my data and my users, and what does 15,000 RPM, versus the current crop of 10,000-RPM drives, really buy me?&lt;br /&gt;&lt;br /&gt;Accessing Data In the storage world, there are basically two types of applications or methods for accessing data on a set or array of hard drives:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The first method involves long, sequential reads and writes that normally do not require the hard drive heads to jump around a lot, usually giving the best overall performance numbers for transfer rates measured in megabytes per second. Typical applications include pre- and post-production film editing, streaming video and/or multimedia, archiving of large storage repositories to and from other storage mediums and seismic data gathering devices, to name a few.&lt;/li&gt;&lt;li&gt;The second method for accessing data is measured in I/Os per second and primarily deals with multiple users and/or applications asking for many small pieces of data from virtually anywhere on the disk or disk array. This requires the hard drive head to move constantly to different locations on the platter, incurring the highest cost penalty in terms of getting and storing data on the hard drive. Examples of these applications include Web servers, SQL database applications, transaction-based systems, and ad-hoc report-building systems and generators.&lt;/li&gt;&lt;/ul&gt;The impact on RAID systems can be very interesting. RAID is a combination of multiple drives to provide safety, capacity, and performance above and beyond the capability of a single hard drive. One of the largest issues facing today's RAID installations is the number of hard drives combined to provide the needed performance and capacity. However, this does statistically affect the probability of multiple drive failures. As shown in Figure 1 of Compaq's "RAID Advanced Data Guarding: A Cost-Effective, Fault-Tolerant Solution" white paper, RAID 0 with 56 hard drives has a much higher probability of data loss then RAID 5, which can tolerate one drive failure with no data loss. This is, of course, expected since any single-drive loss under RAID 0 will cause total data loss.&lt;br /&gt;&lt;br /&gt;According to David Szabados of Seagate, "the 15,000-RPM drives are approximately 40% faster then the 10,000-RPM drives for random I/O access times." So, for database applications--which do literally thousands of 110 requests per second--this extra margin of speed is a big deal.&lt;br /&gt;But what most people do not think about is this: What happens if two drives fail at nearly the same time? With RAID ADG, any two drives can fail, and according to Compaq's white paper, the array would still be OK. Again, as seen by Figure 1, there is a big jump in reliability between RAID 5 and RAID ADG. Now imagine the ability to increase the insurance to any number between 0 and 16 random drive failures without any data loss, and you would have [RAID.sup.n] by Inostor Corporation. [RAID.sup.n] is the only technology available worldwide that allows for scalable insurance across large numbers of hard drives.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Increased Safety in Multi lerrabyte Storage&lt;/strong&gt;&lt;br /&gt;So where do the 15,000-RPM drives fit in all of this? The largest overhead in parity-based RAID systems for performance comes down to drive seek time or the ability to access data from different locations of the platter. With the new 15,000-RPM hard drives being 40% faster in overall I/Os or seek time, this allows for larger redundancy settings with minimal overall performance impact. So, from a user perspective, very large multi-terabyte database storage systems can be built with a very high degree of safety using [RAID.sup.n].&lt;br /&gt;&lt;br /&gt;Here is an example of a system that could be built, using 60 15,000-RPM (76GB) hard drives in a [RAID.sup.n] array. With an insurance level of seven, the system could tolerate any random, seven drive failures with out any data loss, while, at the same time, pre-serving a total capacity of just over 4TB of usable capacity. And with the 40% faster I/O times, the system would be able to achieve faster I/Os then the 60 10,000-RPM (76GB) hard drive solutions using standard RAID 5.&lt;br /&gt;&lt;br /&gt;So, in summary, the 15,000 RPM drives are a significant move towards the continued increase of performance across individual drives, as well as the overall performance increase of today's advanced RAID subsystems for both large sequential data transfers, as well as 110-intensive database transactions.In the storage world, there are basically two types of applications or methods for accessing data on a set or array of hard drives:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The first method involves long, sequential reads and writes that normally do not require the hard drive heads to jump around a lot, usually giving the best overall performance numbers for transfer rates measured in megabytes per second. Typical applications include pre- and post-production film editing, streaming video and/or multimedia, archiving of large storage repositories to and from other storage mediums and seismic data gathering devices, to name a few.&lt;/li&gt;&lt;li&gt;The second method for accessing data is measured in I/Os per second and primarily deals with multiple users and/or applications asking for many small pieces of data from virtually anywhere on the disk or disk array. This requires the hard drive head to move constantly to different locations on the platter, incurring the highest cost penalty in terms of getting and storing data on the hard drive. Examples of these applications include Web servers, SQL database applications, transaction-based systems, and ad-hoc report-building systems and generators.&lt;/li&gt;&lt;/ul&gt;The impact on RAID systems can be very interesting. RAID is a combination of multiple drives to provide safety, capacity, and performance above and beyond the capability of a single hard drive. One of the largest issues facing today's RAID installations is the number of hard drives combined to provide the needed performance and capacity. However, this does statistically affect the probability of multiple drive failures. As shown in Figure 1 of Compaq's "RAID Advanced Data Guarding: A Cost-Effective, Fault-Tolerant Solution" white paper, RAID 0 with 56 hard drives has a much higher probability of data loss then RAID 5, which can tolerate one drive failure with no data loss. This is, of course, expected since any single-drive loss under RAID 0 will cause total data loss.&lt;br /&gt;&lt;br /&gt;According to David Szabados of Seagate, "the 15,000-RPM drives are approximately 40% faster then the 10,000-RPM drives for random I/O access times." So, for database applications--which do literally thousands of 110 requests per second--this extra margin of speed is a big deal.&lt;br /&gt;But what most people do not think about is this: What happens if two drives fail at nearly the same time? With RAID ADG, any two drives can fail, and according to Compaq's white paper, the array would still be OK. Again, as seen by Figure 1, there is a big jump in reliability between RAID 5 and RAID ADG. Now imagine the ability to increase the insurance to any number between 0 and 16 random drive failures without any data loss, and you would have [RAID.sup.n] by Inostor Corporation. [RAID.sup.n] is the only technology available worldwide that allows for scalable insurance across large numbers of hard drives.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Increased Safety in Multi lerrabyte Storage&lt;/strong&gt;&lt;br /&gt;So where do the 15,000-RPM drives fit in all of this? The largest overhead in parity-based RAID systems for performance comes down to drive seek time or the ability to access data from different locations of the platter. With the new 15,000-RPM hard drives being 40% faster in overall I/Os or seek time, this allows for larger redundancy settings with minimal overall performance impact. So, from a user perspective, very large multi-terabyte database storage systems can be built with a very high degree of safety using [RAID.sup.n].&lt;br /&gt;&lt;br /&gt;Here is an example of a system that could be built, using 60 15,000-RPM (76GB) hard drives in a [RAID.sup.n] array. With an insurance level of seven, the system could tolerate any random, seven drive failures with out any data loss, while, at the same time, pre-serving a total capacity of just over 4TB of usable capacity. And with the 40% faster I/O times, the system would be able to achieve faster I/Os then the 60 10,000-RPM (76GB) hard drive solutions using standard RAID 5.&lt;br /&gt;&lt;br /&gt;So, in summary, the 15,000 RPM drives are a significant move towards the continued increase of performance across individual drives, as well as the overall performance increase of today's advanced RAID subsystems for both large sequential data transfers, as well as 110-intensive database transactions.&lt;br /&gt;&lt;br /&gt;Kris Land is the founder of Inostor Corp. (Poway, CA).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35811827-116050753700980173?l=kris-land.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kris-land.blogspot.com/feeds/116050753700980173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35811827&amp;postID=116050753700980173' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050753700980173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35811827/posts/default/116050753700980173'/><link rel='alternate' type='text/html' href='http://kris-land.blogspot.com/2006/10/new-15k-drives-provide-more-than-speed.html' title='New 15K drives provide more than speed: they offer increased safety in multi-TB storage - High Availability'/><author><name>Kris Land</name><uri>http://www.blogger.com/profile/14262331306208973883</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
