|
Hitachi's Drive Temperature Indicator Processor (Drive-TIP) helps ensure
high drive reliability
by Gary Herbst
Operating electronic components such as disk drives at high temperatures can dramatically reduce their reliability. In many
computer systems, failures in cooling components (such as clogged filters on fans) can go undetected for an extended time.
The resulting stress can lead to unexpected failures and even data loss. To prevent this from happening, Hitachi Global Storage Technologies has integrated
temperature sensors into its Ultrastar server disk drives. High temperature conditions are reported to
the host system using the Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.) standard. Once the computer system
is alerted to any temperature problems, the user or system administrator can take action.
This white paper describes how a new Ultrastar feature, Temperature Indicator Processor, Drive-TIP, works and its benefits
to users of data-intensive applications.
Today's applications require outstanding drive reliability
Network computing has elevated the role of servers from supporting small departmental workgroups to providing essential information
and services for the world's largest enterprises. Today, servers and workstations are being called upon to deliver mission-critical
applications to more people than ever before. From collaborative workgroup applications to image processing, video editing to data
mining, OnLine Transaction Processing (OLTP) to OnLine Analytical Processing (OLAP), today's data-intensive applications are
placing much higher demands on disk storage devices. In turn, these devices must provide more reliable access to much more data
faster than ever before.
Figure 1: Together, PFA and Drive-TIP work to provide the industry's best information for preventing drive failures
|
When it comes to capacity, performance, and reliability, one name stands above the rest: Hitachi's Ultrastar family of high-capacity,
high-performance disk drives. Hitachi's Ultrastar was the first drive family to implement the
features now defined in the S.M.A.R.T. standard.
Predictive Failure Analysis (PFA) monitors parameters such as head flying height, noise and signal amplitude,
signal coherence, and writing parameters. PFA predicts impending drive failures using algorithms that are robust enough to
help avoid failing good drives.
Likewise, Hitachi is first to market with temperature-sensing drives. Following on to PFA, the Drive-TIP feature is also
expected to find widespread use as an aid to improving data availability.
Heat has a major effect on drive reliability
Disk drives are complex electro-mechanical devices that can suffer performance degradation or failures due to a single
event or a combination of events occurring over time. Environmental conditions that affect drive reliability include ambient
temperature, cooling air flow rate, voltage, duty cycle, shock/vibration, and relative humidity. Fortunately, it is possible to
predict certain types of failures by measuring environmental conditions. One of the worst enemies of hard disk drives is heat.
Within a drive, the reliability of both the electronics and the mechanics (such as the spindle motor and actuator bearings)
degrades as temperature rises. Running any disk drive at extreme temperatures for long periods of time is detrimental and can
eventually lead to permanent data loss.

Figure 2: Drive reliability decreases significantly as temperature rises above recommended levels
|
Figure 2 shows the dramatic effect that temperature has on the overall reliability of a hard disk drive. Derivations from
a nominal operating temperature (assumed to be maintained over the life of a drive) can result in a derivation from the
nominal failure rate. As the temperature exceeds the recommended level, the failure rate increases two to three percent for
every one degree rise above it. For example, a hard disk drive running for an extended period of time at five degrees above the
recommended temperature can experience an increase in failure rate of 10 to 15 percent. Likewise, operating a drive below the
recommended temperature can extend drive life.
Several failure modes within a disk drive are exacerbated by temperature. Thermal tilt of the disk stack and actuator
arms can occur very quickly and cause off-track writes, corrupting data on adjacent cylinders. Outgassing of the lubricants
in the spindle motor and voice coil motor occurs at high temperatures (experienced over a relatively short 30-60 day time period),
which can lead to stiction failures or a possible head crash. Over an extended period of time, the bearings can wear out and
cause mechanical failures.
Heat can build up within computer systems due to a clogged fan, failure of air conditioning in a room, operating more
drives than the cooling system can handle, and so on. Unfortunately, these conditions can go completely unnoticed until a
failure occurs. Because of the essential nature of today's workstations and servers, such risks are unacceptable for many users.
What is needed is a way to identify high-temperature situations before they affect data integrity.
Drive-TIP helps warn of extreme temperatures
Since disk drives are the most critical component for retaining vital information, Hitachi has created a solution, Drive-TIP
shown in Figure 3, specifically to protect its drives from the long-term effects of excessive temperature. Drive-TIP
automatically monitors the temperature within the drive and alerts the drive controller when the drive exceeds its maximum
allowable temperature. This is accomplished through an electronic temperature sensor mounted on the back side of the electronics
card close to the base casting and the spindle motor hub. The output of the temperature sensor is continuously monitored by the
drive's microprocessor.
Two temperature trip points have been preprogrammed into Ultrastar drives. The first trip point is defined by the system
provider (or in some cases the system administrator) in the Vendor Unique Parameter Mode page (00h) in the drive. Typically,
this is set to the expected nominal temperature. The difiant is 50 degrees Celcius. The second trip point is 65 degrees
Celsius-the maximum allowable temperature of the base casting.
If the first temperature trip point is exceeded, Drive-TIP sets an internal flag in the drive. A warning is sent to the
drive controller when the PFA interval timer expires. The Information Exception Control (IEC) mode page (1Ch) controls
the interval for posting the PFA errors and warnings.

Figure 3: Back side of card showing the thermal server
|
The drive microprocessor reads the temperature when it is powered on and every 25 minutes thereafter, as part of the
Drive-TIP algorithm. The temperature warning is generated in compliance with the SCSI-3 standard as defined in Figure
4, which is a portion of Table 66 in the SCSI-3 Primary Commands (SPC) document, ASC and ASCQ Assignments. A unique Unit
Error Code (UEC) of 22F is also returned on a subsequent Sense command.
Figure 4: SCSI-3 Definition of temperature warnings
|
ASC and ASCQ Assignments
|
| ASC |
ASCQ |
DTLPWRSOMCAE |
DESCRIPTION |
| 61h |
00h |
S |
Video acquisition error |
| 65h |
00h |
DTLPWRSOMCAE |
Voltage fault |
| 0Bh |
00h |
DTLPWRSOMCAE |
Warning |
| 0Bh |
02h |
DTLPWRSOMCAE |
Warning - enclosure degraded |
| 0Bh |
01h |
DTLPWRSOMCAE |
Warning - specified temperature exceeded |
| 50h |
00h |
T |
Write append error |
When the first temperature trip point is exceeded, the sampling period changes from 25 minutes to 15 minutes. Also, a
log entry is made in the permanent drive error log that includes the temperature and Power-On Hours (POH) when it occurred.
As long as the temperature remains above the first trip point, it will continue to create log entries. If the temperature exceeds
the 65 degree trip point, the sampling period changes from 15 minutes to 10 minutes. The log entries into the permanent drive error
log continue at the 10 minute interval.
All log entries in the media error or hardware error logs also include the temperature at the time of the error. All unit
starts and unit stops also include the temperature. In addition, the disk drive records the accumulated power-on hours that
the temperature is above each trip points and the maximum temperature experienced during the life of the drive. This information
is stored in the non-customer data cylinders on the drive.
Applications of Drive-TIP
Drive-TIP works with the industry-wide S.M.A.R.T. standard developed to monitor and predict device performance and reliability.
Today, S.M.A.R.T. capable workstations and servers can warn users of some pending device failures so that action can be taken
before data is lost or operations impacted. With sufficient warning, users can back up vital data and replace suspect devices.
Now, with the addition of Drive-TIP, Hitachi's S.M.A.R.T. capability is enhanced to provide new levels of data integrity and
availability. Figure 5 summarizes how Drive-TIP works with S.M.A.R.T.

Figure 5: Summary of the Drive-TIP function
|
If the warning is recognized by system users or administrators, corrective action can actually save data. Systems now have the
information to vary cooling capacity based on component needs. For example, fan speed can be controlled based on temperature
within the system, producing better reliability for customer data.
The Ultrastar Family-Storage Solutions for Data-Intensive Applications
The Ultrastar family of server drives provides two ways of predicting future drive
failures-PFA and Drive-TIP. This information will be used by end users to correct problems before they result in data
loss and by systems providers to optimize the design of their computer systems.
Product description data represents design objectives and is provided for comparative purposes; actual results may vary
depending on a variety of factors. Product claims are true as of the date of the first printing. This product data does
not constitute a warranty. Questions regarding Hitachi warranty terms or the methodology used to derive this data should be
referred to an Hitachi representative. Data subject to change without notice.
All Rights Reserved
The following are trademarks or registered trademarks of the Hitachi in the
United States, other countries, or both: Hitachi, Ultrastar, Drive-TIP, No-ID,
and Predictive Failure Analysis.
Other product names are trademarks or registered trademarks of their respective
companies.
References in this publication to Hitachi products, programs, or services
do not imply that Hitachi intends to make them available in all countries
in which Hitachi operates.
|