2016年2月18日 星期四

檢查硬碟的狀態

硬碟在系統裡面是消耗品,使用一段時間以後會慢慢老化,甚至損壞

一般來說,除非是壞到不行,不然很難察覺到硬碟出問題。下面的指令用來檢查硬碟狀態,還有硬碟相關資訊,列表會用到的指令:

  • smartctl -H /dev/sda
  • smartctl -a /dev/sda
  • hdparm -I /dev/sda
  • hdparm -Tt /dev/sda

如果說系統上有很多個硬碟,或是不知道系統上面裝幾個硬碟,可以用下面指令:
  • for i in sd{a..c}; do dev=/dev/$i; echo $dev; smartctl -H $dev | grep -e 'Health Status' -e 'overall-health'; done
  • for i in `lsblk -io NAME,TYPE | grep disk | cut -d' ' -f1 | sort`; do dev=/dev/$i; echo $dev; smartctl -H $dev | grep overall-health; done
第一行的意思是有sda, sdb, sdc三個硬碟要測試健康程度,需安裝smartmontools這套件
第二行則是自行用blkid這個指令(需要另外安裝)去抓取系統上的硬碟

smartctl查看硬碟S.M.A.R.T資訊

# smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
這邊看到PASSED可以安心使用,如想知道更詳細內容

# smartctl -a /dev/sda
== START OF INFORMATION SECTION === 硬碟硬體的相關資訊
Model Family:     Hitachi Deskstar 7K3000 (廠牌型號)
Device Model:     Hitachi HDS723020BLA642
Serial Number:    硬碟序號
LU WWN Device Id: 5 000cca 369d132fc
Firmware Version: MN6OA580
User Capacity:    2,000,398,934,016 bytes [2.00 TB] (硬碟大小)
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Feb 17 19:39:18 2016 CST
SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION === 硬碟上SMART的相關資訊
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (19665) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time: (   1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities:       (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.

SCT Data Table supported.


從這邊開始才是詳細資料,有註解的部分要注意一點
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0 (底層資料讀取錯誤率,越小越好)
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       86
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       410 (Average 356)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       63
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0 (重定位磁區計數,所謂的壞軌應該是看這邊,越小越好)
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   133   133   020    Pre-fail  Offline      -       27
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       37459
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0 (電機起轉重試,讓硬碟轉的馬達重新啟動的次數,越小越好)
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       116
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       116
194 Temperature_Celsius     0x0002   200   200   000    Old_age   Always       -       30 (Min/Max 23/47) (硬碟溫度)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0  (重定位事件計數,越小越好)
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0 (等候重定的磁區計數,不穩定磁區的數目,越小越好)
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0 (無法校正的磁區計數,壞掉磁區的數目,越小越好)

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

這邊要注意,即使TYPE是Pre-fail似乎也不代表這個硬碟快壞掉,有可能是用很久(?) 像上面這個硬碟1, 5, 10, 196, 197等數值都是0,但是還是會有Pre-fail這個標示
更加詳盡的S.M.A.R.T.的屬性說明參考中文的維基百科,注意粉色底部分超出安全範圍對效能會有嚴重影響

hdparm查看硬碟資訊

# hdparm -I /dev/sda
/dev/sda:

ATA device, with non-removable media
Model Number:       Hitachi HDS723020BLA642 (廠牌型號)                
Serial Number:      硬碟序號
Firmware Revision:  MN6OA580
Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: unknown (minor revision code 0x0029) 
Supported: 8 7 6 5 
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors:   16514064
LBA    user addressable sectors:  268435455
LBA48  user addressable sectors: 3907029168
Logical  Sector size:                   512 bytes
Physical Sector size:                   512 bytes
device size with M = 1024*1024:     1907729 MBytes
device size with M = 1000*1000:     2000398 MBytes (2000 GB) (容量)
cache/buffer size  = unknown
Form Factor: 3.5 inch (硬碟實體大小)
Nominal Media Rotation Rate: 7200 (硬碟轉速)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 0
Advanced power management level: disabled
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
    Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4 
    Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
Enabled Supported: (硬碟支援的功能)
  * SMART feature set
    Security Mode feature set
  * Power Management feature set
  * Write cache
  * Look-ahead
  * Host Protected Area feature set
  * WRITE_BUFFER command
  * READ_BUFFER command
  * NOP cmd
  * DOWNLOAD_MICROCODE
    Advanced Power Management feature set
    Power-Up In Standby feature set
  * SET_FEATURES required to spinup after power up
    SET_MAX security extension
  * 48-bit Address feature set
  * Device Configuration Overlay feature set
  * Mandatory FLUSH_CACHE
  * FLUSH_CACHE_EXT
  * SMART error logging
  * SMART self-test
    Media Card Pass-Through
  * General Purpose Logging feature set
  * WRITE_{DMA|MULTIPLE}_FUA_EXT
  * 64-bit World wide name
  * URG for READ_STREAM[_DMA]_EXT
  * URG for WRITE_STREAM[_DMA]_EXT
  * WRITE_UNCORRECTABLE_EXT command
  * {READ,WRITE}_DMA_EXT_GPL commands
  * Segmented DOWNLOAD_MICROCODE
    unknown 119[7]
  * Gen1 signaling speed (1.5Gb/s)
  * Gen2 signaling speed (3.0Gb/s)
  * Gen3 signaling speed (6.0Gb/s)
  * Native Command Queueing (NCQ)
  * Host-initiated interface power management
  * Phy event counters
  * NCQ priority information
    Non-Zero buffer offsets in DMA Setup FIS
  * DMA Setup Auto-Activate optimization
    Device-initiated interface power management
    In-order data delivery
  * Software settings preservation
  * SMART Command Transport (SCT) feature set
  * SCT LBA Segment Access (AC2)
  * SCT Error Recovery Control (AC3)
  * SCT Features Control (AC4)
  * SCT Data Tables (AC5)
Security: 
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
444min for SECURITY ERASE UNIT. 
Logical Unit WWN Device Identifier: 5000cca369d132fc
NAA : 5
IEEE OUI : 000cca
Unique ID : 369d132fc
Checksum: correct

#hdparm -Tt /dev/sda
/dev/sda:
 Timing cached reads:   6552 MB in  2.00 seconds = 3277.54 MB/sec
 Timing buffered disk reads:  446 MB in  3.01 seconds = 148.40 MB/sec
第一個速度(-T)是無關硬碟,是系統processor, cache, and memory of the system under test. 意思就是理論上這個系統讀取硬碟最快的速度。在這邊系統可以到3278 MB/s
第二個速度(-t)關於硬碟循序讀取的速度。This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. 在這邊這個硬碟循序讀取是148 MB/s

沒有留言:

張貼留言