RAID - Redundant Array of Independent Disks


This will explain the base function and data layout for the different levels of RAID.Each level or type of RAID data layout has specific performance and redundancy characteristics.These characteristics allow you to custom configure RAID sets to your performance criteria.Understanding the basics of RAID data layout is the first step to purchasing a RAID subsystemthat meets today's and tomorrow's storage requirements.

Let's get some terminology out of the way

ACCESS TIME
The time it takes to find, recover and start delivering data. If we were talking about a single disk,"access time" would be the time the disk took to receive the command, move the heads to the correct position,wait for the disk to rotate to the correct location and start to read. Access time is particularly importantto multi-user and database environments. These environments have many many requests to the same data.These requests do not necessarily involve long data transfers. "Access Time" is important when deciding what level of RAID you will implement.

ARRAY
An "array" is a grouping of disk drives. There could be two disk drives in the "array" or there could be 100 disk drives in the array.

CHUNK
The breaking up of data so that it can be stored across multiple disk drives. Imagine breaking up a 80 Kb file into five 16 Kb pieces. These 16 Kb pieces would be referred to as "chunks". Another commonly used term is STRIPE.

COST PER MEGABYTE PENALTY
Term used to indicated what the percentage of total storage is dedicated to providing data redundancy.


DISK SET
A "disk set" is a specific number of drives grouped together with a single characteristic (e.g., RAID 0 , RAID 5). A "disk set" can encompass a whole "array" or be a subset of the "array". In other words, there can be multiple "disk sets" in an array. A "disk set" will present itself to an operating systems as an individual disk drive. This will be important later when you are actually buying and configuring a RAID subsystem. "Disk Set" is also referred to as RAID SET.

DUPLEXING
"Duplexing" is when members of a "disk set" are spread across different SCSI busses. This is important for two reasons, it relieves dependance on one SCSI bus in the event of failure and increases performance by moving data across two different busses simultaneously.

MIRRORING
Data is stored twice (or more) on two or more different disk drives. Shadowing is a commonly shared term with "mirroring". Easy definition - What is written on one disk is written on another disk.

PARITY
Parity is a mathematical equation that allows data to be checked for integrity. It should be noted that the term parity is loosely used, the net effect is that data is generated that allows the stored data to be checked for integrity.

PARITY DISK
RAID levels 3, 4 and 5 have a dedicated drive for storing parity information. This information may stay on one disk or move between disks as parity chunks.

PARTITION
The breaking up of a "disk set" into smaller segments. The smaller segments will appear as individual disk drives to the host, while still maintaining the RAID properties of the "disk set".

RAID SET
A "RAID SET" is a specific number of drives grouped together with a single characteristic (e.g., RAID 0 , RAID 5). A "RAID Set" can encompass a whole "array" or be a subset of the "array". Another words there can be multiple "disk sets" in an array. A "RAID Set" will present itself to an operating systems as an individual disk drive. This will be important later when you are actually buying and configuring a RAID subsystem. "RAID Set" is also referred to as DISK SET.

STRIPING
The breaking up of data so that it can be stored across multiple disk drives. Imagine breaking up a 80 Kb file into five 16 Kb pieces. These 16 Kb pieces would be referred to as "stripes". Another commonly used term is CHUNKS.

TRANSFER RATE
How fast the RAID set or subsystem can transfer the data to the host. "Transfer Rate" is important when large contiguous blocks of data are being used. Video and image files are examples of large contiguous file transfers that occur in streaming mode. "Transfer Rate" is important when deciding what level of RAID you will implement.

RAID 0  -  Data Striping


RAID 0 allows a number of disk drives to be combined and presented as one large disk. RAID 0 does not provide any data redundancy - if one drive fails, all data is lost.

Access Time               Very Good
Transfer Rate             Good
Redundancy                None
Cost Per Megabyte         None
            Penalty
Applications Large disk requirements, high performance databases


RAID 1 -  Disk Mirroring/Disk Duplexing

RAID 1 mirrors (shadows) one disk drive to another. All data is stored twice on two or more identical disk drives.  When one disk drive fails, all data is immediately available on the other without any impact on the  data integrity - performance in degraded mode is also degraded. Performance is gained by splitting of functions. If multiple read requests are pending, the RAID controller will allows reads from different disk drives. If one disk is busy writing the other disk drive can supply read data, at a later time. The RAID controller will update the read drive with data from the already written disk drive. If each disk drive is connected with a separate SCSI channel, this is called "Disk Duplexing" (additional security and performance). RAID 1 represents a simple and highly efficient solution for data security and system availability. Use RAID 1 when large volumes of data are not required.

Access Time               Very Good
Transfer Rate              Good
Redundancy                Yes
Cost Per Megabyte      100% or more
        Penalty
Applications               Small disk capacities that require redundancy


RAID 0 + 1  -  Combination of RAID 1 and RAID 0
 

The idea behind RAID 0+1 is simply based on the combination of RAID 0 (Performance) and RAID 1 (Data Security).RAID 0+1 disk sets offer good performance and data security. Similar as in RAID 0, optimum performance is achieved in highly sequential load situations. The major draw back is a 100% "Cost Per Megabyte Penalty".

Access Time               Very Good
Transfer Rate              Good
Redundancy                Yes
Cost Per Megabyte      100%
        Penalty
Applications Multiuser environments, database servers, file serving, web site hosting


RAID 3 Data Bit Striping With a Dedicated Parity Drive
 

The data is striped at a byte/bit level across the disk drives. Additionally, the controller calculates parity information which is stored on a separate disk drive (aP, bP, ...). Even when one disk drive fails, all data is fully available. The missing data can be recalculated from the data still available and the parity information. This data calculation can also be used to restore data to a replaced defective disk. Because the data must be presented at the same time, the disk drive spindles must be synchronized for RAID 3 to be effective. This represents a practical implementation problem for RAID 3. Many RAID controller manufacturers are moving to a RAID 4 solution or using the term RAID 3 merely as a recognized marketing term for high data transfer capability.

Access Time               Good
Transfer Rate              Very Good
Redundancy                Yes
Cost Per Megabyte       Varies.  5 drive set = 20%, 6 drive set = 17%, 10 drive set = 10%
        Penalty
Applications                Imaging, geological, seismological, video


RAID 4 - Data Striping With a Dedicated Parity Drive


RAID 4 works just like RAID 0. The data is striped across disk drives. Additionally, the controller calculates parity information which is stored on a separate disk drive 
(P1, P2,...). Even when one disk drive fails, all data is fully available. The missing data can be recalculated from the data still available and the parity information. This data calculation can also be used to restore data to replaced defective disk. RAID 4 offers excellent transfer rates when used with large contiguous blocks of data. When used with with many small data blocks, the parity disk drive becomes a throughput bottle-neck because of it's fixed position. A RAID 4 disk set can only loose one disk from it's RAID set. Loosing another disk drive, before a replacement is restored, will loose all data in the RAID set.

Access Time               Good
Transfer Rate              Very Good
Redundancy                Yes
Cost Per Megabyte      Varies.  5 drive set = 20%, 6 drive set = 17%, 10 drive set = 10%
        Penalty
Applications               Imaging, geological, seismological, video


RAID 5 -  Data Striping with Striped Parity  

 
The data is striped across disk drives. Unlike RAID 4, the parity data in a RAID 5 set is striped across all disk drives. RAID 5 is designed to handle small data blocks. This makes RAID 5 the level of choice for multitasking, multiuser and database environments. RAID 5 offers the same level of security as RAID 4: when one disk drive fails, all data is
fully available, the missing data is recalculated from the data still available and the parity information. This data calculation can also be used to restore data to replaced defective disks. RAID 5 is particularily suited for systems with medium to large capacity requirements, with their "Cost Per Megabyte Penalty" is relatively low. A RAID 5 disk set can only loose one disk from it's RAID set. Loosing another disk drive, before a replacement is restored, will loose all data in the RAID set.

Access Time               Very Good
Transfer Rate              Good
Redundancy                Yes
Cost Per Megabyte      Varies.  5 drive set = 20%, 6 drive set = 17%, 10 drive set = 10%
        Penalty
Applications               Multiuser environments, database servers, file serving, web site hosting


JBOD -  Just a Bunch Of Disks


An allowance was made by virtually all RAID control manufacturers for adding a single disk inder the RAID controller that would not be a part of any RAID Set. A "JBOD" disk drive appears to the host as an add-on disk drive. Using JBODs is a convenient way of adding quick storage. If a JBOD disk drive breaks all data is lost

Access Time               Good
Transfer Rate              Good
Redundancy                No
Cost Per Megabyte      None
        Penalty
Applications               Quick increase in capacity


Mixing RAID Sets in a Disk Array
The following diagram illustrates that within a Disk Array many different RAID levels and capacities can be configured. This allows a RAID user to custom configure RAID level characteristics with the user's many different performance and capacity requirements.

 

0 comments:

Post a Comment