Tuesday, October 10, 2006

New 15K drives provide more than speed: they offer increased safety in multi-TB storage - High Availability

Computer Technology Review, July, 2002 by Kris Land

There is no issue more critical to large storage centers than the preservation and integrity of their data. That said, a very close second is having the needed access and performance necessary to reading and writing all of the information that these centers keep online. To meet these needs, hard drive manufacturers have provided, once again, a new breed of device to provide even faster access and reliability for storage systems.

A year ago, storage performance seemed as good as it was going to get, but the new and improved, 73GB, 15,000-RPM models are now available, along with today's SCSI hard drives that are up to 146GB at 10,000 RPM. So the big questions follow: How does this affect my data and my users, and what does 15,000 RPM, versus the current crop of 10,000-RPM drives, really buy me?

Accessing Data In the storage world, there are basically two types of applications or methods for accessing data on a set or array of hard drives:
  • The first method involves long, sequential reads and writes that normally do not require the hard drive heads to jump around a lot, usually giving the best overall performance numbers for transfer rates measured in megabytes per second. Typical applications include pre- and post-production film editing, streaming video and/or multimedia, archiving of large storage repositories to and from other storage mediums and seismic data gathering devices, to name a few.
  • The second method for accessing data is measured in I/Os per second and primarily deals with multiple users and/or applications asking for many small pieces of data from virtually anywhere on the disk or disk array. This requires the hard drive head to move constantly to different locations on the platter, incurring the highest cost penalty in terms of getting and storing data on the hard drive. Examples of these applications include Web servers, SQL database applications, transaction-based systems, and ad-hoc report-building systems and generators.
The impact on RAID systems can be very interesting. RAID is a combination of multiple drives to provide safety, capacity, and performance above and beyond the capability of a single hard drive. One of the largest issues facing today's RAID installations is the number of hard drives combined to provide the needed performance and capacity. However, this does statistically affect the probability of multiple drive failures. As shown in Figure 1 of Compaq's "RAID Advanced Data Guarding: A Cost-Effective, Fault-Tolerant Solution" white paper, RAID 0 with 56 hard drives has a much higher probability of data loss then RAID 5, which can tolerate one drive failure with no data loss. This is, of course, expected since any single-drive loss under RAID 0 will cause total data loss.

According to David Szabados of Seagate, "the 15,000-RPM drives are approximately 40% faster then the 10,000-RPM drives for random I/O access times." So, for database applications--which do literally thousands of 110 requests per second--this extra margin of speed is a big deal.
But what most people do not think about is this: What happens if two drives fail at nearly the same time? With RAID ADG, any two drives can fail, and according to Compaq's white paper, the array would still be OK. Again, as seen by Figure 1, there is a big jump in reliability between RAID 5 and RAID ADG. Now imagine the ability to increase the insurance to any number between 0 and 16 random drive failures without any data loss, and you would have [RAID.sup.n] by Inostor Corporation. [RAID.sup.n] is the only technology available worldwide that allows for scalable insurance across large numbers of hard drives.

Increased Safety in Multi lerrabyte Storage
So where do the 15,000-RPM drives fit in all of this? The largest overhead in parity-based RAID systems for performance comes down to drive seek time or the ability to access data from different locations of the platter. With the new 15,000-RPM hard drives being 40% faster in overall I/Os or seek time, this allows for larger redundancy settings with minimal overall performance impact. So, from a user perspective, very large multi-terabyte database storage systems can be built with a very high degree of safety using [RAID.sup.n].

Here is an example of a system that could be built, using 60 15,000-RPM (76GB) hard drives in a [RAID.sup.n] array. With an insurance level of seven, the system could tolerate any random, seven drive failures with out any data loss, while, at the same time, pre-serving a total capacity of just over 4TB of usable capacity. And with the 40% faster I/O times, the system would be able to achieve faster I/Os then the 60 10,000-RPM (76GB) hard drive solutions using standard RAID 5.

So, in summary, the 15,000 RPM drives are a significant move towards the continued increase of performance across individual drives, as well as the overall performance increase of today's advanced RAID subsystems for both large sequential data transfers, as well as 110-intensive database transactions.In the storage world, there are basically two types of applications or methods for accessing data on a set or array of hard drives:
  • The first method involves long, sequential reads and writes that normally do not require the hard drive heads to jump around a lot, usually giving the best overall performance numbers for transfer rates measured in megabytes per second. Typical applications include pre- and post-production film editing, streaming video and/or multimedia, archiving of large storage repositories to and from other storage mediums and seismic data gathering devices, to name a few.
  • The second method for accessing data is measured in I/Os per second and primarily deals with multiple users and/or applications asking for many small pieces of data from virtually anywhere on the disk or disk array. This requires the hard drive head to move constantly to different locations on the platter, incurring the highest cost penalty in terms of getting and storing data on the hard drive. Examples of these applications include Web servers, SQL database applications, transaction-based systems, and ad-hoc report-building systems and generators.
The impact on RAID systems can be very interesting. RAID is a combination of multiple drives to provide safety, capacity, and performance above and beyond the capability of a single hard drive. One of the largest issues facing today's RAID installations is the number of hard drives combined to provide the needed performance and capacity. However, this does statistically affect the probability of multiple drive failures. As shown in Figure 1 of Compaq's "RAID Advanced Data Guarding: A Cost-Effective, Fault-Tolerant Solution" white paper, RAID 0 with 56 hard drives has a much higher probability of data loss then RAID 5, which can tolerate one drive failure with no data loss. This is, of course, expected since any single-drive loss under RAID 0 will cause total data loss.

According to David Szabados of Seagate, "the 15,000-RPM drives are approximately 40% faster then the 10,000-RPM drives for random I/O access times." So, for database applications--which do literally thousands of 110 requests per second--this extra margin of speed is a big deal.
But what most people do not think about is this: What happens if two drives fail at nearly the same time? With RAID ADG, any two drives can fail, and according to Compaq's white paper, the array would still be OK. Again, as seen by Figure 1, there is a big jump in reliability between RAID 5 and RAID ADG. Now imagine the ability to increase the insurance to any number between 0 and 16 random drive failures without any data loss, and you would have [RAID.sup.n] by Inostor Corporation. [RAID.sup.n] is the only technology available worldwide that allows for scalable insurance across large numbers of hard drives.

Increased Safety in Multi lerrabyte Storage
So where do the 15,000-RPM drives fit in all of this? The largest overhead in parity-based RAID systems for performance comes down to drive seek time or the ability to access data from different locations of the platter. With the new 15,000-RPM hard drives being 40% faster in overall I/Os or seek time, this allows for larger redundancy settings with minimal overall performance impact. So, from a user perspective, very large multi-terabyte database storage systems can be built with a very high degree of safety using [RAID.sup.n].

Here is an example of a system that could be built, using 60 15,000-RPM (76GB) hard drives in a [RAID.sup.n] array. With an insurance level of seven, the system could tolerate any random, seven drive failures with out any data loss, while, at the same time, pre-serving a total capacity of just over 4TB of usable capacity. And with the 40% faster I/O times, the system would be able to achieve faster I/Os then the 60 10,000-RPM (76GB) hard drive solutions using standard RAID 5.

So, in summary, the 15,000 RPM drives are a significant move towards the continued increase of performance across individual drives, as well as the overall performance increase of today's advanced RAID subsystems for both large sequential data transfers, as well as 110-intensive database transactions.

Kris Land is the founder of Inostor Corp. (Poway, CA).

0 Comments:

Post a Comment

<< Home