Is my hard drive healthy?

The hard drive is rarely considered the primary cause in bottleneck cases; we usually tend to suspect the applications installed on the server.

People often think the source of system performance issues is either disk corruption or insufficient disk space, but Physical Disk: %disk time and Physical Disk: Current Disk Queue Length are equally important metrics that work in parallel. There are few other ways to detect hard drive problems using other metrics, but for now I will only focus on these two performance counters.

Physical Disk: %disk time monitors the percentage of time that the disk is in use. If it runs over 90%, then the system is struggling.

Physical Disk: Current Disk Queue Length indicates both the number of requests being served and the number currently waiting for disk access. This number should fluctuate, and not exceed 1.5 to 2 times the number of spindles¹ that make up the physical disk.

Figure 1: shows a healthy hard drive. Notice the Current Disk Queue Length (green line) is sometimes high, but it’s not an indication of a bottleneck since the %Disk time (red line) is below 90%.

Figure 2: When the peak stalls at high number, e.g., +90% (the vertical red line), then you must monitor the Current disk queue length (red circle). If the queue length number exceeds 2 or 4 (depending on the number of spindles), this is a good indication of a bottleneck.

Solution:

If you confirm that the hard drive is having issues, here are some steps to follow:

1- Run a defrag on the server: it is strongly recommended to do this OFF hours

2- Move some heavily used files and folders to another disk (not a partition) or another server, such as log files and the mail queue (if possible)

3- Run the command CHKDSK (without /F) to see if you have any disk problems

4- If you are using RAID, make sure that you are using write-back mode

5- Or, finally, get a new hard drive with a higher rpm (10000 or 15000 rpm).

¹The spindle is a shaft that holds the hard disk assembly and rotates the platter(s) at a speed that ranges from 5400 to 15000 rpm

1 comment

Mike Petsalis November 18, 2010 at 7:21 am

Indeed, Al. How many times do Support teams spend hours debugging a case, running through all applications only to find the issue is a faulty drive. And worse, the faults occur unpredictably and are very hard to reproduce. Prevention is the key and maintenance of drive health.

How about a post on those cases?

You must be logged in to post a comment.

Is my hard drive healthy?

1 comment

Leave a Comment

Products

Services