Emask 0x409 media error - Исправление ошибок и поиск оптимальных решений проблем

Итак, суть этой печальной истории в том, что ВНЕЗАПНО мой жесткий диск начал издеваться надо мной, когда я хотел поставить LibreOffice. После того, как система дважды перемонтировала партицию в readonly, я начал подозревать неладное. Глянул dmesg, а там! Мать моя женщина!

Особо впечатлительным лучше не смотреть!

[ 858.617479] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 858.617489] ata1.00: irq_stat 0x40000008 [ 858.617497] ata1.00: failed command: READ FPDMA QUEUED [ 858.617512] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 858.617514] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 858.617521] ata1.00: status: { DRDY ERR } [ 858.617526] ata1.00: error: { UNC } [ 858.621932] ata1.00: configured for UDMA/133 [ 858.621952] ata1: EH complete [ 861.617427] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 861.617438] ata1.00: irq_stat 0x40000008 [ 861.617446] ata1.00: failed command: READ FPDMA QUEUED [ 861.617461] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 861.617464] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 861.617471] ata1.00: status: { DRDY ERR } [ 861.617476] ata1.00: error: { UNC } [ 861.621883] ata1.00: configured for UDMA/133 [ 861.621902] ata1: EH complete [ 864.276812] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 864.276819] ata1.00: irq_stat 0x40000008 [ 864.276826] ata1.00: failed command: READ FPDMA QUEUED [ 864.276840] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 864.276843] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 864.276849] ata1.00: status: { DRDY ERR } [ 864.276854] ata1.00: error: { UNC } [ 864.280789] ata1.00: configured for UDMA/133 [ 864.280801] ata1: EH complete [ 866.967174] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 866.967185] ata1.00: irq_stat 0x40000008 [ 866.967193] ata1.00: failed command: READ FPDMA QUEUED [ 866.967208] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 866.967211] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 866.967217] ata1.00: status: { DRDY ERR } [ 866.967222] ata1.00: error: { UNC } [ 866.971168] ata1.00: configured for UDMA/133 [ 866.971186] ata1: EH complete [ 870.317106] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 870.317116] ata1.00: irq_stat 0x40000008 [ 870.317124] ata1.00: failed command: READ FPDMA QUEUED [ 870.317139] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 870.317142] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 870.317149] ata1.00: status: { DRDY ERR } [ 870.317154] ata1.00: error: { UNC } [ 870.320871] ata1.00: configured for UDMA/133 [ 870.320889] ata1: EH complete [ 873.325328] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [ 873.325337] ata1.00: irq_stat 0x40000008 [ 873.325346] ata1.00: failed command: READ FPDMA QUEUED [ 873.325361] ata1.00: cmd 60/00:00:30:e9:78/02:00:2a:00:00/40 tag 0 ncq 262144 in [ 873.325364] res 41/40:00:89:e9:78/b1:00:2a:00:00/40 Emask 0x409 (media error) <F> [ 873.325371] ata1.00: status: { DRDY ERR } [ 873.325376] ata1.00: error: { UNC } [ 873.328743] ata1.00: configured for UDMA/133 [ 873.328799] sd 0:0:0:0: [sda] Unhandled sense code [ 873.328805] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 873.328814] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] [ 873.328825] Descriptor sense data with sense descriptors (in hex): [ 873.328831] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 873.328851] 2a 78 e9 89 [ 873.328860] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed [ 873.328872] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 2a 78 e9 30 00 02 00 00 [ 873.328891] end_request: I/O error, dev sda, sector 712567177 [ 873.328901] Buffer I/O error on device sda, logical block 89070897 [ 873.328909] Buffer I/O error on device sda, logical block 89070898 [ 873.328917] Buffer I/O error on device sda, logical block 89070899 [ 873.328924] Buffer I/O error on device sda, logical block 89070900 [ 873.328932] Buffer I/O error on device sda, logical block 89070901 [ 873.328940] Buffer I/O error on device sda, logical block 89070902 [ 873.328947] Buffer I/O error on device sda, logical block 89070903 [ 873.328955] Buffer I/O error on device sda, logical block 89070904 [ 873.328963] Buffer I/O error on device sda, logical block 89070905 [ 873.328970] Buffer I/O error on device sda, logical block 89070906 [ 873.329075] ata1: EH complete

Ну общую суть вы, думаю поняли — покупай новый жесткий диск@копируй разделы со старого.
Но! Проблема еще была в том, что некоторые разделы УЖЕ не монтировались, по причине

~~смерти~~

коррупции файловой системы. /me подумал-подумал и решил, а не попробовать ли как-то это дело замять, тем более, новый жесткий пока не входит в планы первоочерёдных покупок.
Гуугле мне подсказал парочку хороших решений, чем я незамедлительно и воспользовался.

Bad block HOWTO for smartmontools

Удивительная статья, которая мне очень сильно помогла. Для тех, кто не в ладах с языком международного общения, могу адаптировать статью, хотите — отпишитесь в комментариях.
Итак, разбор полётов, или что я сделал.
К счастью, у меня завалялась таблица разделов диска, сделанная fdisk -ul На самом деле, у меня было 10 разделов, но те три, которых здесь нет не были столь важны, как sda3+sda4(Зарезервированы под ~~FreeBSD~~DragonFlyBSD, sda1(загрузочный же, ёпта!) sda7+sda5(линуксовые разделы), ну и sda6(онимэ, музыка, прочий хлам)
Первым делом, был проведён тест, на то, какие партиции умерли, а какие еще живы. К моей радости, sda1 и sda6 остались живы, но о них попозже. все остальные монтироваться НЕ ЖЕЛАЛИ, а fsck завершался с ошибкой.
я запустил smartctl -t long /dev/sda и ушел на два часа. Через пару часов, возвратившись, я увидел сию картину smartctl -l selftest /dev/sda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/


=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      8962         712567177
# 2  Short offline       Completed without error       00%      5188         -
# 3  Extended offline    Aborted by host               90%      5188         -

Как видите, первая ошибка проявилась в блоке 712567177
Запускаем badblocks: badblocks -s -v -b 512 /dev/sda 712567277 712567077#-s показывает прогресс, -v увеличивает информативность, -b 512 - указывается размер блока, в данном случае - 512 байт, дальше указываем КОНЕЧНЫЙ и потом уже НАЧАЛЬНЫЙ блок, которые я выбрал из окружения в +-100 блоков к ошибочному.
И правда, выскакивают номера бэдблоков. Провел щадящий read-write тест (опция -n) — бэдблоки тоже продолжают появдяться. Далее я сделал то, что НЕ РЕКОММЕНДУЮ делать другим без полного понимания того, ЧТО они делают. badblocks -s -w -v -b 512 /dev/sda 712567277 712567077#-w - write-mode, заполнение определенными паттернами, МОЖЕТ ПОВРЕДИТЬ ДАННЫЕ. Что удивительно, после этого бэдблоки исчезли. Проведя еще несколько тестов smartctl, теперь уже с опцией -t short, я вычислил остальные бэдблоки и провел аналогичные операции. Теперь
smartctl -l selftest -d ata /dev/sda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/


=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8972         -
# 2  Extended offline    Completed: read failure       90%      8962         712567177
# 3  Short offline       Completed without error       00%      5188         -
# 4  Extended offline    Aborted by host               90%      5188         -

Как видим, ошибок больше нет, но раз уже они появились — жди новых//уже откладываю деньги на новый хард.

Теперь о минусах такого способа. После всех манипуляций, fdisk -ul /dev/sda показал мне ПОЛНОСТЬЮ голый диск. Запускаем testdisk, находим разделы. К сожалению, sda6 не был найден. Ну ладно, мы не боимся этого. fdisk -u /dev/sda, дальше жмём n/*новый раздел*/ дальше вводим какой тип раздела, logical или primary, потом вводим начало и конец сектора/*ну недаром же у меня была таблица разделов?*/, записываем таблицу разделов — w и выходим из fdisk -q.
пишем partprobe, чтобы ядро узнало о новом разделе, вуаля — раздел появился и он полностью жив.
Итак, любые вопросы, предложения и замечания буду рад прочитать в комментариях.

Источник

0

2

На ЖД 2 шифрованных раздела, boot на luks и root на luks2+btrfs. После аварийного отключения(с розетки) перестала запускаться система. Проблема в ЖД, вероятно программная.
Первый, загрузочный, раздел расшифровывается и прекрасно работает. А корневой расшифровывается, но срази же выдает ошибку:

ata1.00: exception Emask 0x0 SAct 0x1000 SErr 0x40000 action 0x0
ata1.00: irq_stat 0x40000008
ata1: SError: { CommWake }
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:60:70:cb:46/00:00:23:00:00/40 tag 12 ncq 4096
in
         res 51/40:08:70:cb:46/00:00:23:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
blk_update_request: I/O error, dev sda, sector 6512424 op 0x0:(READ) flags 0x0 phys_seq 1 prio class 0
device-mapper: integrity: Error on roading tags: -5
Buffer I/O error on dev dm-1, logical block 229560224, async page read
...

Вначале думал на повреждение суперблока, но восстановить через btrfs-tools не получилось(btrfs rescue super-recover <имя раздела с btrfs>):

Buffer I/O error on dev dm-1; logical block 16, async page read
Buffer I/O error on dev dm-1; logical block 16, async page read
No valid Btrfs found on /dev/mapper/btrfs
Usage or syntax errors

Предполагаю ошибку на уровне btrfs или luks. Но там и там попытка восстановление будет грозить полной потерей данных. Может кто подскажет в чем конкретно беда. Неужели в самом диске?

mount -t btrfs /dev/mapper/btrfs /mnt выдает:

Buffer I/O error on dev dm-1; logical block 16, async page read
mount: /mnt: can't read superblock on /dev/mapper/btrfs

Источник

The program that «caused» it (really, its caused by bad hardware, it’d be more appropriate to say «the program that was the victim of it») may not even exist anymore.

E.g., send off a write, and then exit. The write will sit in the kernel buffers until the kernel performs writeback. At which point an I/O error may occur.

When the program does still exist, it will already be told of the error. For example, read will set errno to EIO. (This error may also come back from write, fsync, fdatasync, or even close.)

The reason it takes forever has nothing to do with the kernel, it’s entirely the drive. The drive spends a while retrying the read to see if it can make sense of the corrupted sector. If you don’t want this (e.g., because you’re running on RAID, and will just reschedule the sector to the disk’s mirror) you can try changing the SCT Error Recovery Control settings using smartctl. Beware that many non-enterprise disks do not support this.

Except in the case of RAID (or similar), there is no way to automatically fix it. The data has been lost. The kernel can’t fix that.

If you’re running Linux software RAID (mdraid), with even a half-recent kernel, mdraid will automatically fix it by reading the errored sector from the mirror, then writing the correct sector back to the drive with a read error.

If you’re getting this on a write instead of a read, then replace the drive.

(BTW: READ FPDMA QUEUED is not an error. Its just the (S)ATA command that failed. «Medium Error» is the error.)

Источник

# 9 лет назад
Темы: 24 Сообщения: 189 Участник с: 06 апреля 2013	Добрый сегодня в логах нашел вот такое: TuxAdmin kernel: ata5.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0 TuxAdmin kernel: ata5.00: irq_stat 0x40000001 TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED TuxAdmin kernel: ata5.00: cmd 60/00:00:67:8e:68/01:00:73:00:00/40 tag 0 ncq 131072 in res 41/00:1b:4c:8e:68/00:00:73:00:00/40 Emask 0x1 (device error) TuxAdmin kernel: ata5.00: status: { DRDY ERR } TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED TuxAdmin kernel: ata5.00: cmd 60/00:08:67:8d:68/01:00:73:00:00/40 tag 1 ncq 131072 in res 41/40:00:4c:8e:68/00:00:73:00:00/40 Emask 0x409 (media error) <F> TuxAdmin kernel: ata5.00: status: { DRDY ERR } TuxAdmin kernel: ata5.00: error: { UNC } TuxAdmin kernel: ata5.00: configured for UDMA/133 TuxAdmin kernel: ata5: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4 TuxAdmin kernel: ata5: irq_stat 0x40000008 TuxAdmin kernel: sd 4:0:0:0: [sdc] Unhandled sense code TuxAdmin kernel: sd 4:0:0:0: [sdc] TuxAdmin kernel: Result: hostbyte=0x00 driverbyte=0x08 TuxAdmin kernel: sd 4:0:0:0: [sdc] TuxAdmin kernel: Sense Key : 0x3 [current] [descriptor] TuxAdmin kernel: Descriptor sense data with sense descriptors (in hex): TuxAdmin kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 TuxAdmin kernel: 73 68 8e 4c TuxAdmin kernel: sd 4:0:0:0: [sdc] TuxAdmin kernel: ASC=0x11 ASCQ=0x4 TuxAdmin kernel: sd 4:0:0:0: [sdc] CDB: TuxAdmin kernel: cdb[0]=0x28: 28 00 73 68 8d 67 00 01 00 00 TuxAdmin kernel: end_request: I/O error, dev sdc, sector 1936232012 TuxAdmin kernel: ata5: EH complete TuxAdmin kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 TuxAdmin kernel: ata5.00: irq_stat 0x40000008 TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED TuxAdmin kernel: ata5.00: cmd 60/08:00:47:8e:68/00:00:73:00:00/40 tag 0 ncq 4096 in res 41/40:00:4c:8e:68/00:00:73:00:00/40 Emask 0x409 (media error) <F> TuxAdmin kernel: ata5.00: status: { DRDY ERR } TuxAdmin kernel: ata5.00: error: { UNC } Полез смотреть S.M.A.R.T. [[email protected]]>>sudo smartctl -A /dev/sdc ~/ :( smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.6-1-ARCH] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 120 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 073 071 025 Pre-fail Always - 8369 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1286 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 22145 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1260 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 75 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 058 052 000 Old_age Always - 42 (Min/Max 14/48) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 153 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 4 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1369 Полагаю стоит задуматься о том что бы заменить его на что-то более здоровое ? kdeneur: https://github.com/brestows/kdeNeur awesome WM 3.5

#
9 лет назад

Темы:

Сообщения:

189

Участник с: 06 апреля 2013

Добрый сегодня в логах нашел вот такое:

 TuxAdmin kernel: ata5.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
 TuxAdmin kernel: ata5.00: irq_stat 0x40000001
 TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED
 TuxAdmin kernel: ata5.00: cmd 60/00:00:67:8e:68/01:00:73:00:00/40 tag 0 ncq 131072 in
                                             res 41/00:1b:4c:8e:68/00:00:73:00:00/40 Emask 0x1 (device error)
 TuxAdmin kernel: ata5.00: status: { DRDY ERR }
 TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED
 TuxAdmin kernel: ata5.00: cmd 60/00:08:67:8d:68/01:00:73:00:00/40 tag 1 ncq 131072 in
                                             res 41/40:00:4c:8e:68/00:00:73:00:00/40 Emask 0x409 (media error) <F>
 TuxAdmin kernel: ata5.00: status: { DRDY ERR }
 TuxAdmin kernel: ata5.00: error: { UNC }
 TuxAdmin kernel: ata5.00: configured for UDMA/133
 TuxAdmin kernel: ata5: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
 TuxAdmin kernel: ata5: irq_stat 0x40000008
 TuxAdmin kernel: sd 4:0:0:0: [sdc] Unhandled sense code
 TuxAdmin kernel: sd 4:0:0:0: [sdc]
 TuxAdmin kernel: Result: hostbyte=0x00 driverbyte=0x08
 TuxAdmin kernel: sd 4:0:0:0: [sdc]
 TuxAdmin kernel: Sense Key : 0x3 [current] [descriptor]
 TuxAdmin kernel: Descriptor sense data with sense descriptors (in hex):
 TuxAdmin kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
 TuxAdmin kernel:         73 68 8e 4c
 TuxAdmin kernel: sd 4:0:0:0: [sdc]
 TuxAdmin kernel: ASC=0x11 ASCQ=0x4
 TuxAdmin kernel: sd 4:0:0:0: [sdc] CDB:
 TuxAdmin kernel: cdb[0]=0x28: 28 00 73 68 8d 67 00 01 00 00
 TuxAdmin kernel: end_request: I/O error, dev sdc, sector 1936232012
 TuxAdmin kernel: ata5: EH complete
 TuxAdmin kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
 TuxAdmin kernel: ata5.00: irq_stat 0x40000008
 TuxAdmin kernel: ata5.00: failed command: READ FPDMA QUEUED
 TuxAdmin kernel: ata5.00: cmd 60/08:00:47:8e:68/00:00:73:00:00/40 tag 0 ncq 4096 in
                                             res 41/40:00:4c:8e:68/00:00:73:00:00/40 Emask 0x409 (media error) <F>
 TuxAdmin kernel: ata5.00: status: { DRDY ERR }
TuxAdmin kernel: ata5.00: error: { UNC }

Полез смотреть S.M.A.R.T.

[[email protected]]>>sudo smartctl -A /dev/sdc                                                                                                                                            ~/ :(
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.6-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       120
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   073   071   025    Pre-fail  Always       -       8369
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1286
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       22145
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1260
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       75
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   058   052   000    Old_age   Always       -       42 (Min/Max 14/48)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       153
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       4
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1369

Полагаю стоит задуматься о том что бы заменить его на что-то более здоровое ?

kdeneur: https://github.com/brestows/kdeNeur
awesome WM 3.5

dartsergius	# 9 лет назад
Темы: 18 Сообщения: 238 Участник с: 15 декабря 2011	Ну у меня при таких же симптомах файловая система не создавалась. Напряжение по 12в нормальное? Мб питания не хватает?

dartsergius

#
9 лет назад

Темы:

Сообщения:

238

Участник с: 15 декабря 2011

Ну у меня при таких же симптомах файловая система не создавалась. Напряжение по 12в нормальное? Мб питания не хватает?

brestows	# 9 лет назад
Темы: 24 Сообщения: 189 Участник с: 06 апреля 2013	винт не один еще 3 винта сверху, которые работают нормально, бп хороший 550W должно хватать, ком нагружается редко даже очень максимум перехешировать торрент и откомпилить софт мой не более того kdeneur: https://github.com/brestows/kdeNeur awesome WM 3.5

brestows

#
9 лет назад

Темы:

Сообщения:

189

Участник с: 06 апреля 2013

винт не один еще 3 винта сверху, которые работают нормально, бп хороший 550W должно хватать, ком нагружается редко даже очень максимум перехешировать торрент и откомпилить софт мой не более того

kdeneur: https://github.com/brestows/kdeNeur
awesome WM 3.5

kurych	# 9 лет назад
Темы: 0 Сообщения: 1394 Участник с: 06 ноября 2011	Я бы для начала шлейф поменял. Если картина останется та же, тогда уже задумываться о замене винта. В любом случае своевременный бекап важных данных не помешает.

kurych

#
9 лет назад

Темы:

Сообщения:

1394

Участник с: 06 ноября 2011

Я бы для начала шлейф поменял. Если картина останется та же, тогда уже задумываться о замене винта.
В любом случае своевременный бекап важных данных не помешает.

brestows	# 9 лет назад
Темы: 24 Сообщения: 189 Участник с: 06 апреля 2013	Ок спасибо, буду проовать смотреть что и как, потом отпишусь kdeneur: https://github.com/brestows/kdeNeur awesome WM 3.5

brestows

#
9 лет назад

Темы:

Сообщения:

189

Участник с: 06 апреля 2013

Ок спасибо, буду проовать смотреть что и как, потом отпишусь

kdeneur: https://github.com/brestows/kdeNeur
awesome WM 3.5

vasek	# 9 лет назад
Темы: 47 Сообщения: 11417 Участник с: 17 февраля 2013	Может пригодится — я бы все-таки запустил полный тест $ sudo smartctl —test=long /dev/sd…. Ошибки не исчезают с опытом — они просто умнеют

vasek

#
9 лет назад

Темы:

Сообщения:

11417

Участник с: 17 февраля 2013

Может пригодится — я бы все-таки запустил полный тест $ sudo smartctl —test=long /dev/sd….

Ошибки не исчезают с опытом — они просто умнеют

brestows	# 9 лет назад
Темы: 24 Сообщения: 189 Участник с: 06 апреля 2013	За ссылку спасибо, буду читать….. а как посмотреть результаты теста ? kdeneur: https://github.com/brestows/kdeNeur awesome WM 3.5

brestows

#
9 лет назад

Темы:

Сообщения:

189

Участник с: 06 апреля 2013

За ссылку спасибо, буду читать…..

а как посмотреть результаты теста ?

kdeneur: https://github.com/brestows/kdeNeur
awesome WM 3.5

lampslave	# 9 лет назад
Темы: 32 Сообщения: 4800 Участник с: 05 июля 2011	Результаты в том же smartctl -a отображаются (ближе к концу).

vasek	# 9 лет назад
Темы: 47 Сообщения: 11417 Участник с: 17 февраля 2013	Или по отдельности логи можно вывести так: — только атрибуты — $ sudo smartctl —attributes /dev/sda — только тест ……..- $ sudo smartctl —log=selftest /dev/sda — только ошибки ..- $ sudo smartctl —log=error /dev/sda Ошибки не исчезают с опытом — они просто умнеют

vasek

#
9 лет назад

Темы:

Сообщения:

11417

Участник с: 17 февраля 2013

Или по отдельности логи можно вывести так:
— только атрибуты — $ sudo smartctl —attributes /dev/sda
— только тест ……..- $ sudo smartctl —log=selftest /dev/sda
— только ошибки ..- $ sudo smartctl —log=error /dev/sda

Ошибки не исчезают с опытом — они просто умнеют

brestows	# 9 лет назад
Темы: 24 Сообщения: 189 Участник с: 06 апреля 2013	вот что мне показало: === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F3 Device Model: SAMSUNG HD103SJ Serial Number: S246JDWSC34540 LU WWN Device Id: 5 0024e9 002a0580e Firmware Version: 1AJ100E4 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Sat Jan 18 19:05:36 2014 FET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 40) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: ( 9300) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 155) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 120 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 073 071 025 Pre-fail Always - 8369 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1286 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 22191 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1260 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 75 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 058 052 000 Old_age Always - 42 (Min/Max 14/48) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 153 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 4 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1369 SMART Error Log Version: 1 ATA Error Count: 23 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 23 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ec 00 00 00 00 00 a0 00 00:00:00.951 IDENTIFY DEVICE ef 03 42 00 00 00 a0 00 00:00:00.951 SET FEATURES [Set transfer mode] ef 10 02 00 00 00 a0 00 00:00:00.951 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 00:00:00.951 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 00:00:00.951 IDENTIFY DEVICE Error 22 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ec 00 00 00 00 00 a0 00 00:00:00.938 IDENTIFY DEVICE 00 10 f0 1a 91 19 40 00 00:00:00.938 NOP [Reserved subcommand] [OBS-ACS-2] 60 10 00 0a 91 19 40 00 00:00:00.938 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 00:00:00.938 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 00:00:00.938 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] Error 21 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ec 00 00 00 00 00 a0 00 00:00:00.587 IDENTIFY DEVICE 00 10 00 12 53 ec 40 00 00:00:00.587 NOP [Reserved subcommand] [OBS-ACS-2] 00 10 00 12 52 ec 40 00 00:00:00.587 NOP [Reserved subcommand] [OBS-ACS-2] ec 00 00 00 00 00 a0 00 00:00:00.582 IDENTIFY DEVICE 00 10 00 fa e4 ea 40 00 00:00:00.582 NOP [Reserved subcommand] [OBS-ACS-2] Error 20 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ec 00 00 00 00 00 a0 00 00:00:00.421 IDENTIFY DEVICE ef 03 42 00 00 00 a0 00 00:00:00.421 SET FEATURES [Set transfer mode] ef 10 02 00 00 00 a0 00 00:00:00.421 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 00:00:00.421 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 00:00:00.421 IDENTIFY DEVICE Error 19 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ec 00 00 00 00 00 a0 00 00:00:00.406 IDENTIFY DEVICE ef 03 42 00 00 00 a0 00 00:00:00.406 SET FEATURES [Set transfer mode] ef 10 02 00 00 00 a0 00 00:00:00.406 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 00:00:00.406 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 00:00:00.406 IDENTIFY DEVICE SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 80% 22171 - # 2 Short offline Completed without error 00% 10892 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Interrupted [80% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. kdeneur: https://github.com/brestows/kdeNeur awesome WM 3.5

brestows

#
9 лет назад

Темы:

Сообщения:

189

Участник с: 06 апреля 2013

вот что мне показало:

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD103SJ
Serial Number:    S246JDWSC34540
LU WWN Device Id: 5 0024e9 002a0580e
Firmware Version: 1AJ100E4
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat Jan 18 19:05:36 2014 FET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (  40)	The self-test routine was interrupted
					by the host with a hard or soft reset.
Total time to complete Offline
data collection: 		( 9300) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 155) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       120
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   073   071   025    Pre-fail  Always       -       8369
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1286
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       22191
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1260
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       75
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   058   052   000    Old_age   Always       -       42 (Min/Max 14/48)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       153
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       4
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1369
SMART Error Log Version: 1
ATA Error Count: 23 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 23 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.951  IDENTIFY DEVICE
  ef 03 42 00 00 00 a0 00      00:00:00.951  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00      00:00:00.951  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:00.951  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:00.951  IDENTIFY DEVICE
Error 22 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.938  IDENTIFY DEVICE
  00 10 f0 1a 91 19 40 00      00:00:00.938  NOP [Reserved subcommand] [OBS-ACS-2]
  60 10 00 0a 91 19 40 00      00:00:00.938  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:00.938  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:00.938  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 21 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.587  IDENTIFY DEVICE
  00 10 00 12 53 ec 40 00      00:00:00.587  NOP [Reserved subcommand] [OBS-ACS-2]
  00 10 00 12 52 ec 40 00      00:00:00.587  NOP [Reserved subcommand] [OBS-ACS-2]
  ec 00 00 00 00 00 a0 00      00:00:00.582  IDENTIFY DEVICE
  00 10 00 fa e4 ea 40 00      00:00:00.582  NOP [Reserved subcommand] [OBS-ACS-2]
Error 20 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.421  IDENTIFY DEVICE
  ef 03 42 00 00 00 a0 00      00:00:00.421  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00      00:00:00.421  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:00.421  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:00.421  IDENTIFY DEVICE
Error 19 occurred at disk power-on lifetime: 10766 hours (448 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      00:00:00.406  IDENTIFY DEVICE
  ef 03 42 00 00 00 a0 00      00:00:00.406  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00      00:00:00.406  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:00.406  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:00.406  IDENTIFY DEVICE
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      80%     22171         -
# 2  Short offline       Completed without error       00%     10892         -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Interrupted [80% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

kdeneur: https://github.com/brestows/kdeNeur
awesome WM 3.5

Источник

Печать

Страницы: [1] Вниз

Тема: Что могут означать ошибки SMART hdd (Прочитано 1511 раз)

0 Пользователей и 1 Гость просматривают эту тему.

0x10c

Всем доброго дня! Проблема такого характера: диск работает как надо и всегда работал без ошибок, но dmesg и smartctl говорят о каких-то проблемах.
dmesg

smartctl

Подскажите в чем может быть дело это hdd накрывается или контроллер на матери глючит? Диск стоит от внешнего seagate expansion desk, изъял его из кейса, дабы подключить как внутренний hdd.

scsiman

5 Reallocated_Sector_Ct 0x0033 092 092 010 Pre-fail Always - 9240 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 2797 197 Current_Pending_Sector 0x0012 095 001 000 Old_age Always - 888 198 Offline_Uncorrectable 0x0010 095 001 000 Old_age Offline - 888
Сдох. В помойку.

Dell Studio XPS 16, Ubuntu 16.04 LTS (Home).
HP nx6110, Ubuntu 8.04 LTS => 10.04 LTS (Home).

Sly_tom_cat

Да, диску кирдык.

Если дохлые сектора где-то в одном месте сосредоточены (имеют близке номера) то можно конечно переразметить отсавив сбойную область неразмеченной. Но такое редко случается, а главное — это довольно трудно узнать т.к. сектора которые были перенесены — они не будут в отчете о сбойных (они на другие места переехали).

Но 888 pending — это уже совсем плохо… лучше все-таки покупать новый винт и сливать полезные данные с этого дохляка.

0x10c

Эх понятненько, благодарю за ответы!

Печать

Страницы: [1] Вверх

Источник

Какая-то черная полоса с техникой — сначала накрылась посудомойка, потом регистратор, а сегодня сервер домашней сети решил что весь мир подождет.

Nagios прояснил картину двумя алертами

Заглядываю в dmesg сервера, а там непрерывный поток ошибок, связанный с одним из дисков.

[1670400.363465] ata3.00: exception Emask 0x0 SAct 0x80000c00 SErr 0x0 action 0x0
[1670400.449986] ata3.00: irq_stat 0x40000008
[1670400.499057] ata3.00: failed command: READ FPDMA QUEUED
[1670400.562599] ata3.00: cmd 60/80:50:88:bd:7e/00:00:bb:00:00/40 tag 10 ncq dma 65536 in
                          res 51/40:30:d8:bd:7e/00:00:bb:00:00/40 Emask 0x409 (media error) <F>
[1670400.758250] ata3.00: status: { DRDY ERR }
[1670400.808368] ata3.00: error: { UNC }
[1670400.873536] ata3.00: configured for UDMA/133
[1670400.926758] sd 2:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=7s
[1670401.040311] sd 2:0:0:0: [sda] tag#10 Sense Key : Medium Error [current] 
[1670401.122676] sd 2:0:0:0: [sda] tag#10 Add. Sense: Unrecovered read error - auto reallocate failed
[1670401.229917] sd 2:0:0:0: [sda] tag#10 CDB: Read(16) 88 00 00 00 00 00 bb 7e bd 88 00 00 00 80 00 00
[1670401.339226] blk_update_request: I/O error, dev sda, sector 3145645528 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[1670401.472476] ata3: EH complete
[1670426.313174] ata3.00: exception Emask 0x0 SAct 0x104 SErr 0x0 action 0x0
[1670426.394513] ata3.00: irq_stat 0x40000008
[1670426.443575] ata3.00: failed command: READ FPDMA QUEUED
[1670426.507225] ata3.00: cmd 60/20:40:80:09:b2/00:00:00:00:00/40 tag 8 ncq dma 16384 in
                          res 51/40:20:80:09:b2/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[1670426.701929] ata3.00: status: { DRDY ERR }
[1670426.752052] ata3.00: error: { UNC }
[1670426.799092] ata3.00: configured for UDMA/133
[1670426.852370] sd 2:0:0:0: [sda] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=7s
[1670426.964863] sd 2:0:0:0: [sda] tag#8 Sense Key : Medium Error [current] 
[1670427.046201] sd 2:0:0:0: [sda] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
[1670427.152496] sd 2:0:0:0: [sda] tag#8 CDB: Read(16) 88 00 00 00 00 00 00 b2 09 80 00 00 00 20 00 00
[1670427.260856] blk_update_request: I/O error, dev sda, sector 11667840 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
[1670427.387855] md/raid1:md1: sda3: rescheduling sector 10774912
[1670427.457745] md/raid1:md1: sda3: rescheduling sector 10774920
[1670427.527534] md/raid1:md1: sda3: rescheduling sector 10774928
[1670427.597320] md/raid1:md1: sda3: rescheduling sector 10774936
[1670427.667116] ata3: EH complete
[1670429.070818] md/raid1:md1: redirecting sector 10774912 to other mirror: sdb3
[1670429.229305] md/raid1:md1: redirecting sector 10774920 to other mirror: sdb3
[1670430.301795] md/raid1:md1: redirecting sector 10774928 to other mirror: sdb3
[1670432.945317] md/raid1:md1: redirecting sector 10774936 to other mirror: sdb3

Смотрю подробности в S.M.A.R.T.

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.7.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 3.5" DT01ACA... Desktop HDD
Device Model:     TOSHIBA DT01ACA300
Serial Number:    Z3GHLUVGS
LU WWN Device Id: 5 000039 ff4d52fc5
Firmware Version: MX6OABB0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep  6 15:56:18 2020 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       70
  3 Spin_Up_Time            0x0007   155   155   024    Pre-fail  Always       -       322 (Average 416)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   089   089   005    Pre-fail  Always       -       359
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   126   126   020    Pre-fail  Offline      -       32
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       56068
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       40
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1745
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1745
194 Temperature_Celsius     0x0002   139   139   000    Old_age   Always       -       43 (Min/Max 22/52)
196 Reallocated_Event_Count 0x0032   087   087   000    Old_age   Always       -       409
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

С учетом наработанных часов (56060 или больше 5 лет непрерывной работы) ругать потребительскую железку глупо — для того их и стоит две в зеркале плюс резервные копии сбрасываются на внешний диск, который хранится отдельно.

Но теперь нужно выбрать что-то на замену и оказалось это сделать непросто — если посмотреть что продается сегодня в Минске объемом 3TB или 4TB, скорость вращения шпинделя 7200RPM и стоит гуманных денег (все же для дома беру), то выбор не слишком велик:

Брать диски с 5400RPM или 5900RPM не хочется, т.к. важна latency — у дисков 7200RPM теоретически это 8.3ms, а для 5400RPM и 5900RPM это 11.1ms и 10.2ms соответственно.

Advanced Format уже никого не удивишь — нужно лишь правильно выровнять разделы, а вот Shingled Magnetic Recording (SMR) — относительно новый тренд и может подкинуть проблем если приходится много и часто писать на диск — как раз мой случай.

Некоторые производители скрывают, что диск использует технологию SMR. Toshiba недавно опубликовала информацию о потребительских HDD в которых используется SMR. Еще на хабре нашелся список дисков от разных производителей с SMR.

В итоге заказал самый бюджетный вариант (Toshiba HDWD130UZSVA) — этот диск еще и самый тихий и поддерживает SCT Error Recovery Control что очень важно для дисков в RAID.

Источник

Всем привет!
Давно я сюда не писал, интересно, кто ни будь еще читает мои пустотные заметки? Ау!

Темой сегодняшнего занятия будет восстановление зеркала на программном рейде удаленного компьютера.
Правило первое — работает, не трогай.
Правило второе — лучшее враг хорошего.
Правило третье — кто не хочет работать головой, тот будет работать руками.

На сервере, расположенном в германском дата-центре hetzner внезапно корневая файловая система решила стать только для чтения. На сервере было два сата диска, три раздела (SWAP, /boot и root) каждый из которых отдельно зазеркален через программный рейд.
Система — centos 5.
Внимательное рассмотрение показало кучу ошибок в dmesg:

 SCSI error: return code = 0x08000002
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sda: Current [descriptor]: sense key: Medium Error
    Add. Sense: Unrecovered read error - auto reallocate failed

Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        01 02 26 46
ata2: EH complete
SCSI device sda: 2930277168 512-byte hdwr sectors (1500302 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000008
ata2.00: cmd 60/08:08:3f:26:02/00:00:01:00:00/40 tag 1 ncq 4096 in
         res 41/40:08:46:26:02/00:00:01:00:00/00 Emask 0x409 (media
error) <F>
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sda: 2930277168 512-byte hdwr sectors (1500302 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000008
ata2.00: cmd 60/08:00:3f:26:02/00:00:01:00:00/40 tag 0 ncq 4096 in
         res 41/40:08:46:26:02/8f:00:01:00:00/00 Emask 0x409 (media
error) <F>
</f></f>

Предварительный диагноз — смерть второго жесткого диска.
Саппорт подтвердил эту версию, но предупредил, что и первому диску не хорошо. Бэкапьте, говорят, все данные и будем менять диски и переставлять систему. Полная перестановка систему не очень радовала, ибо там почти террабайт данных, которые надо сначала гнать куда-то, а потом восстанавливать.
Я предложил попробовать сначала заменить совсем сдохший диск(sdb), восстановить на него рейды, после чего повторить процедуру с диском sda. Немцы согласились попробовать, но предупредили, что возможны неполадки из за сбоев на диске sda.
Так и получилось. Два маленьких раздела восстановились без разговоров, а большой — после трех процентов восстановления сбрасывался без объявления войны.
Да, а алгоритм действий следующий:
Просматриваем конфигурацию рейд-массивов:

cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1  sda2[0]
      264960 blocks [1/2] [U_]
      
md0 : active raid1 sda1[0] 
      2102464 blocks [1/2] [U_]
      
md2 : active raid1 sda3[0]
      1462635200 blocks [1/2] [U_]

Просматриваем таблицу разделов на диске /dev/sda:

fdisk -l /dev/sda

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         262     2102562   fd  Linux raid autodetect
/dev/sda2             263         295      265072+  fd  Linux raid autodetect
/dev/sda3             296      182401  1462766445   fd  Linux raid autodetect

И дублируем ее на диске /dev/sdb. Либо воссоздаем такую же с помощью fdisk, либо, что проще, через dd:

dd if=/dev/sda of=/dev/sdb bsize=512 count=1

И добавляем новые разделы в рейд:

mdadm /dev/md0 --add /dev/sdb1
mdadm /dev/md1 --add /dev/sdb2
mdadm /dev/md2 --add /dev/sdb3

Соответственно, последняя команда не возымела результата из за ошибок диска sda.
Не беда, меняем раздел /dev/sdb3 на тип 83 — Linux и создаем там файловую систему.
Вот тут то я и допустил первую ошибку. Для скорости копирования я разбил раздел в ext4 fs. На множестве мелких файлов она работает заметно бодрее, чем ext3. Что бы сохранить всю метаинформацию, я копировал с помощью rsync:

mkdir /olddisk; mount /dev/md2 /olddisk
mkdir /newdisk; mount /dev/sdb3 /newdisk
rsync -av /olddisk/* /newdisk

Процентов 15 скопировалось, после чего rsync на каком-то файле выдал ошибку. Вот он бедсектор. Сбойный файл я добавил в список исключений и продолжил копирование. К утру все перенеслось и я написал в поддержку, что можно менять и второй диск.
Надо заметить, что техподдержка у немцев работает с четкостью хорошо смазанного K98. Не взирая на время суток, в течении 5-10 минут приходит осмысленный и квалифицированый ответ на понятном английском.

Да, естественно все эти действия производились через режим rescue, при котором машина загружается с сетевого образа. В этом режиме выявился один крупный неочевидный подводный камень, но об этом позже.

После замены второго диска (sda) я воссоздал на нем таблицу разделов и добавил два маленьких раздела в рейды.
Все получилось и мне показали вот такую картинку:

cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdb2[1] sda2[0]
      264960 blocks [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      2102464 blocks [2/2] [UU]

Теперь надо было разбираться с большим корневым разделом.
Сначала я изменил тип раздела /dev/sda3 обратно на fd — Linux Raid. Потом создал в нем зеркальный массив из одного диска :)) (Как ни странно в линуксе возможно даже такое извращение)

mdadm --create /dev/md2 --level=1 --raid-disk=2 missing /dev/sda3

Вот на этом месте mdadm что-то пробормотал про старые версии grub, но я пропустил это мимо себя, решив, что у меня все новое и все будет хорошо.
Создаем файловую систему в массиве и копируем туда все данные из /dev/sdb3. Вот тут я уже сглупил по полной, поскольку я копировал раздел /dev/sdb3, то и файловую систему я создал как в нем — ext4.

mkdir /olddisk; mount /dev/sdb3 /olddisk
mkdir /newdisk; mount /dev/md2 /newdisk
rsync -av /olddisk/* /newdisk

Пол дня на копирование данных, после чего можно посмотреть, что получилось. Теперь точка монтирования называется /newdisk.
Визуально вроде бы все в порядке. Поправляем тип корневой файловой системы в файле /newdisk/etc/fstab на ext4 и перезагружаемся. И — и тишина… Даю пол-часа на проверку фс при старте, но нет, явно что-то не так… Надо опять теребить поддержку. Добрая хетцнеровская поддержка для таких несчастных предоставляет джавовское приложение для доступа к квм-свитчу, позволяющее увидеть, что происходит на экране компьютера от самого биоса начиная. И тут мы видим, что при попытке подмонтировать корневой раздел ядро выдает ошибку и вылетает в синий экран.
Опущу два дня мучительных исследований и сразу перейду к общим выводам:
Ошибок было две, в одной виноват я сам прямо, а в другой — косвенно.
Первая ошибка простая, надо было пересобрать initrd c поддержкой ext4

mkinitrd with=ext4 with=ext3 /boot/initrd-2.6.18-128.el5.img 2.6.18-128.el5

А вот вторая ошибка была здорово по заковыристее. Дело в том, что при загрузке по сети образа rescue системы у них используется не centos, а debian 7 с существенно более современным ядром и многими прикладными пакетами. В том числе здорово более свежая версия mdadm создает по умолчанию массив с суперблоком (superblock) версии 1.2, в то время как старинное ядро 2.6.18 используемое в консервативном пятом центосе способно автоматически определять только superblock 0.9 при загрузке системы. Способ откатить суперблок до 0.9 не снося при этом массив науке не известен.
Решение проблемы собственно описано на сайте centos, если понять в чем дело и суметь правильно сформулировать запрос. http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1 По сути нам надо подсунуть ядру конфиг файл рейдов, что бы ему не нужно было самому определять. Для этого для начала создадим такой конфиг на диске:

mdadm --detail --scan > /newdisk/etc/mdadm.conf

Теперь скачаем патч для скрипта mkinitrd и пропатчим скрипт. После эт

#Монтируем все разделы и переходим в чрут
mount /dev/md2 /newdisk
mount /dev/md1 /newdisk/boot
mount -o bind /proc /newdisk/proc
mount -o bind /dev /newdisk/dev
mount -o bind /dev/shm /newdisk/dev/shm
mount -o bind /sys /newdisk/sys
chroot /newdisk
cd /tmp
wget http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1?action=AttachFile&do=get&target=mkinitrd-md_d0.patch
cd /sbin
cp mkinitrd mkinitrd.dist
patch -p0 < /tmp/mkinitrd-md_d0.patch
cd /boot
mv initrd-2.6.18-128.el5.img initrd-2.6.18-128.el5.img.bak
mkinitrd /boot/initrd-2.6.18-128.el5.img 2.6.18-128.el5

Теперь, для очистки совести запретим обновлять mkinitrd, для этого пропишем в /etc/yum.conf

exclude=mkinitrd*

Все! Теперь у нас восстановленный рейд и новым суперблоком и более современной файловой системой. Следует помнить, что при обновлении ядра возможно потребуется опять пересоздать initrd.

Источник

Hi everybody,

After a reboot, this error appeared on my /boot (/dev/sda1) partition. I tried to solved this but after some commands and a reboot, the partitions on the /dev/sda disk where not listed anymore. I get this error (close to the first one):

ata6.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000008
ata6.00: failed command: READ FPDMA QUEUED
ata6.00: cmd 60/20:a0:00:00:00/00:00:00:00:00/40 tag 20 ncq dma 16384 in
              res 41/40:20:00:00:00/00:00:00:00:00/40 Emask 0x409 (media error) <F>
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/100
sd 5:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sda] tag#20 Sense Key : Medium Error [current] 
sd 5:0:0:0: [sda] tag#20 Add. Sense: Unrecovered read error - auto reallocate failed
sd 5:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
print_req_error: I/O error, dev sda, sector 0
ata6: EH complete

Here is the smartctl -a /dev/sda:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-21-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 2.5" HDD MQ01ABD...
Device Model:     TOSHIBA MQ01ABD075
Serial Number:    346PTQUUT
LU WWN Device Id: 5 000039 561c02030
Firmware Version: AX0A4M
User Capacity:    750 156 374 016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb  9 14:39:04 2018 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 194) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       2356
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3654
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       152
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   066   066   000    Old_age   Always       -       13819
 10 Spin_Retry_Count        0x0033   172   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3648
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       163
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       89
193 Load_Cycle_Count        0x0032   081   081   000    Old_age   Always       -       197659
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       31 (Min/Max 12/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       13
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       80
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   071   071   000    Old_age   Always       -       11977
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       269
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 568 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 568 occurred at disk power-on lifetime: 13819 hours (575 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 60 00 00 00 40  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 60 00 00 00 40 00      00:31:52.720  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:31:52.720  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:31:52.719  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:31:52.718  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      00:31:52.718  SET FEATURES [Set transfer mode]

Error 567 occurred at disk power-on lifetime: 13819 hours (575 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 d0 00 00 00 40  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 d0 00 00 00 40 00      00:31:48.918  READ FPDMA QUEUED
  60 08 c8 e0 66 54 40 00      00:31:48.916  READ FPDMA QUEUED
  60 08 b8 00 66 54 40 00      00:31:48.894  READ FPDMA QUEUED
  ec 00 01 00 00 00 00 00      00:31:48.893  IDENTIFY DEVICE
  ec 00 01 00 00 00 00 00      00:31:48.889  IDENTIFY DEVICE

Error 566 occurred at disk power-on lifetime: 13819 hours (575 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 e0 00 00 00 40  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 e0 00 00 00 40 00      00:31:45.086  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:31:45.085  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:31:45.085  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:31:45.084  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      00:31:45.084  SET FEATURES [Set transfer mode]

Error 565 occurred at disk power-on lifetime: 13819 hours (575 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 60 00 00 00 40  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 60 00 00 00 40 00      00:31:41.284  READ FPDMA QUEUED
  60 08 58 e0 66 54 40 00      00:31:41.283  READ FPDMA QUEUED
  60 08 48 00 66 54 40 00      00:31:41.259  READ FPDMA QUEUED
  ec 00 01 00 00 00 00 00      00:31:41.257  IDENTIFY DEVICE
  ec 00 01 00 00 00 00 00      00:31:41.254  IDENTIFY DEVICE

Error 564 occurred at disk power-on lifetime: 13819 hours (575 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 78 00 00 00 40  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 78 00 00 00 40 00      00:31:37.451  READ FPDMA QUEUED
  60 08 70 e0 66 54 40 00      00:31:37.449  READ FPDMA QUEUED
  60 08 f0 00 66 54 40 00      00:31:37.420  READ FPDMA QUEUED
  ec 00 01 00 00 00 00 00      00:31:37.419  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 00      00:31:37.418  SET FEATURES [Enable SATA feature]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13815         -
# 2  Short offline       Completed: read failure       00%     13815         2048

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The partitions where like this:
— /boot part. of ~125M
— root part. of ~20G
— /home part. in the whole left space on the disk

I don’t want to lose the data that where on the /dev/sda3 (/home) partition but the partitions can’t be listed so I wonder if there is any hope and if the disk can still be used. Maybe somebody here can help me solve this.

Last edited by Delyas (2018-02-12 17:23:33)

Источник

EDIT: . The members of this forum just helped me fix and repair a nasty hard disk error. I had run file system checks before, but what I never knew was that the default check does not update the bad block inode list.

p.H wrote:e2fsck detects and marks bad blocks only when run with the -c option.

With that one sentence, p.H saved my computer. And the advice that he and L_V gave me in this thread was priceless.

What ultimately worked for me was checking both my / (root) and /home partitions with the non-destructive read-write option, -cc from a Live CD:

Code: Select all

e2fsck -f -y -cc -C0 /dev/sda5
e2fsck -f -y -cc -C0 /dev/sda7

That check identified and repaired the affected inodes. It also wrote over the damaged files. Keep a list of those files. You will have to replace them (as explained below).

Next, I ran the checks again with the read-only option -c:

Code: Select all

e2fsck -f -y -c -C0 /dev/sda5
e2fsck -f -y -c -C0 /dev/sda7

Running the check a second time was an important step because it added a few more blocks to the bad blocks list.

Having repaired the file system, the next step was to repair the affected files:

p.H wrote:Note that e2fsck can remap bad blocks but cannot restore the unreadable contents of the affected files, so these files must be reinstalled from their respective packages.

In my case, I had a fresh install of Debian Buster and a Debian Buster Live CD, so I just copied them from the Live CD:

Code: Select all

mkdir /media/inspiron
mount /dev/sda5 /media/inspiron
cp /usr/bin/$FILE01  /media/inspiron/usr/bin/$FILE01
cp /usr/bin/$FILE02  /media/inspiron/usr/bin/$FILE02
...
umount /dev/sda5

After that, the computer booted like a charm. Importantly, it shutdown like a charm too. There were no priority 0 or 1 messages in my journalctl.

Thank you to p.H and L_V for helping me rescue this old machine! .

—————————————-

ORIGINAL POST:

After a fresh installation of Debian Buster on an old machine, the partition that contains my /home partition does not unmount at shutdown. The problem seems to be caused by an I/O error. At first glance, smartctl does not show any errors, but a deeper looks shows that the disk experienced a few errors on the / (root) partition a few years ago.

If I followed Linux Admins’ «Fixing disk problems» guide would that resolve the issue?

Thanks in advance,
— Soul

Code: Select all

$ journalctl -r -b -1 -p3

-- Logs begin at Sun 2019-05-19 13:22:05 EDT, end at Sun 2019-05-19 15:26:53 EDT. --
May 19 14:51:02 inspiron systemd[1]: Failed unmounting /home.
May 19 14:51:02 inspiron kernel: print_req_error: I/O error, dev sda, sector 162964427
May 19 14:51:02 inspiron kernel: ata1.00: error: { UNC }
May 19 14:51:02 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 14:51:02 inspiron kernel: ata1.00: cmd 60/08:88:c8:a3:b6/00:00:09:00:00/40 tag 17 ncq dma 4096 in
                                          res 41/40:08:cb:a3:b6/00:00:09:00:00/00 Emask 0x409 (media error) <F>
May 19 14:51:02 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 14:51:02 inspiron kernel: ata1.00: irq_stat 0x40000008
May 19 14:51:02 inspiron kernel: ata1.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
May 19 14:50:59 inspiron kernel: print_req_error: I/O error, dev sda, sector 162964427
May 19 14:50:59 inspiron kernel: ata1.00: error: { UNC }
May 19 14:50:59 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 14:50:59 inspiron kernel: ata1.00: cmd 60/20:a8:c0:a3:b6/00:00:09:00:00/40 tag 21 ncq dma 16384 in
                                          res 41/40:20:cb:a3:b6/00:00:09:00:00/00 Emask 0x409 (media error) <F>
May 19 14:50:59 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 14:50:59 inspiron kernel: ata1.00: irq_stat 0x40000008
May 19 14:50:59 inspiron kernel: ata1.00: exception Emask 0x0 SAct 0x200000 SErr 0x0 action 0x0
May 19 14:50:43 inspiron wpa_supplicant[509]: dbus: wpa_dbus_property_changed: no property SessionLength in object /fi/w1/wpa_supplicant1/Interfaces/1
May 19 14:47:06 inspiron root[7585]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:40:19 inspiron root[7277]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:34:55 inspiron root[7129]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:27:20 inspiron root[6970]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:19:26 inspiron root[6425]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:13:30 inspiron root[6164]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:07:00 inspiron root[4631]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 14:01:11 inspiron root[3004]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:53:40 inspiron root[2451]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:46:29 inspiron root[2355]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:41:26 inspiron root[2260]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:33:31 inspiron root[1803]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:28:13 inspiron root[1633]: /etc/dhcp/dhclient-exit-hooks.d/zzz_avahi-autoipd returned non-zero exit status 1
May 19 13:26:48 inspiron kernel: print_req_error: I/O error, dev sda, sector 201851126
May 19 13:26:48 inspiron kernel: ata1.00: error: { UNC }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:c8:f0:00:08/00:00:0c:00:00/40 tag 25 ncq dma 4096 in
                                          res 41/40:08:f6:00:08/00:00:0c:00:00/00 Emask 0x409 (media error) <F>
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:98:90:f6:3c/00:00:0a:00:00/40 tag 19 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:90:08:3e:d1/00:00:30:00:00/40 tag 18 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:88:08:2d:8d/00:00:15:00:00/40 tag 17 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:80:c0:3e:59/00:00:09:00:00/40 tag 16 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 61/18:30:18:7f:c5/00:00:2f:00:00/40 tag 6 ncq dma 12288 out
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:48 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:48 inspiron kernel: ata1.00: cmd 60/08:28:c8:6a:71/00:00:15:00:00/40 tag 5 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:48 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:48 inspiron kernel: ata1.00: irq_stat 0x40000001
May 19 13:26:48 inspiron kernel: ata1.00: exception Emask 0x0 SAct 0x20f0060 SErr 0x0 action 0x0
May 19 13:26:45 inspiron kernel: print_req_error: I/O error, dev sda, sector 361573512
May 19 13:26:45 inspiron kernel: print_req_error: I/O error, dev sda, sector 156843584
May 19 13:26:45 inspiron kernel: print_req_error: I/O error, dev sda, sector 201851126
May 19 13:26:45 inspiron kernel: print_req_error: I/O error, dev sda, sector 359754432
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/d0:f0:88:2c:8d/00:00:15:00:00/40 tag 30 ncq dma 106496 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/00:e8:40:3e:59/01:00:09:00:00/40 tag 29 ncq dma 131072 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/08:58:08:3e:d1/00:00:30:00:00/40 tag 11 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { UNC }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/08:48:f0:00:08/00:00:0c:00:00/40 tag 9 ncq dma 4096 in
                                          res 41/40:08:f6:00:08/00:00:0c:00:00/00 Emask 0x409 (media error) <F>
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/08:18:90:f6:3c/00:00:0a:00:00/40 tag 3 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 60/40:10:c0:6a:71/00:00:15:00:00/40 tag 2 ncq dma 32768 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:45 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:45 inspiron kernel: ata1.00: cmd 61/08:00:00:70:cc/00:00:31:00:00/40 tag 0 ncq dma 4096 out
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:45 inspiron kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 19 13:26:45 inspiron kernel: ata1.00: irq_stat 0x40000001
May 19 13:26:45 inspiron kernel: ata1.00: exception Emask 0x0 SAct 0x60000a0d SErr 0x0 action 0x0
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 804169080
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 361572440
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 201851904
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 201851126
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 813441024
May 19 13:26:40 inspiron kernel: print_req_error: I/O error, dev sda, sector 201848320
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/40:e0:78:a5:ee/00:00:2f:00:00/40 tag 28 ncq dma 32768 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/08:c8:30:59:d1/00:00:30:00:00/40 tag 25 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/08:c0:58:60:92/00:00:31:00:00/40 tag 24 ncq dma 4096 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/58:60:58:28:8d/00:00:15:00:00/40 tag 12 ncq dma 45056 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/d8:58:00:04:08/06:00:0c:00:00/40 tag 11 ncq dma 897024 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { UNC }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/00:50:00:00:08/04:00:0c:00:00/40 tag 10 ncq dma 524288 in
                                          res 41/40:00:f6:00:08/00:04:0c:00:00/00 Emask 0x409 (media error) <F>
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/40:48:00:20:7c/00:00:30:00:00/40 tag 9 ncq dma 32768 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: error: { ABRT }
May 19 13:26:39 inspiron kernel: ata1.00: status: { DRDY ERR }
May 19 13:26:39 inspiron kernel: ata1.00: cmd 60/00:40:00:f6:07/06:00:0c:00:00/40 tag 8 ncq dma 786432 in
                                          res 41/04:00:f6:00:08/00:00:0c:00:00/00 Emask 0x1 (device error)
May 19 13:26:39 inspiron kernel: ata1.00: failed command: READ FPDMA QUEUED
May 19 13:26:39 inspiron kernel: ata1.00: irq_stat 0x40000001
May 19 13:26:39 inspiron kernel: ata1.00: exception Emask 0x0 SAct 0x13001f00 SErr 0x0 action 0x0
May 19 13:22:09 inspiron kernel: mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: FW version command failed -5
May 19 13:22:09 inspiron kernel: mei mei::55213584-9a29-4916-badf-0fb7ed682aeb:01: Could not read FW version
May 19 13:22:05 inspiron kernel: ACPI: SPCR: Unexpected SPCR Access Width.  Defaulting to byte size

Code: Select all

# fdisk -l

Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: ST9500325AS     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x07f2837e

Device     Boot     Start       End   Sectors   Size Id Type
/dev/sda1              63    208844    208782   102M de Dell Utility
/dev/sda2  *       208845  30928844  30720000  14.7G  7 HPFS/NTFS/exFAT
/dev/sda3        30928845 155775023 124846179  59.5G  7 HPFS/NTFS/exFAT
/dev/sda4       155782305 976768064 820985760 391.5G  5 Extended
/dev/sda5  *    155782368 177305599  21523232  10.3G 83 Linux
/dev/sda6       177307648 199903231  22595584  10.8G 82 Linux swap / Solaris
/dev/sda7       199905280 976766975 776861696 370.4G 83 Linux

Code: Select all

# smartctl -l selftest /dev/sda

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -

Code: Select all

# smartctl -a /dev/sda

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Momentus 5400.6
Device Model:     ST9500325AS
Serial Number:    6VEGMVRP
LU WWN Device Id: 5 000c50 03067dd6f
Firmware Version: D005DEM1
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sun May 19 15:05:07 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 139) minutes.
Conveyance self-test routine
recommended polling time: 	 (   3) minutes.
SCT capabilities: 	       (0x103f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   101   089   006    Pre-fail  Always       -       29958806
  3 Spin_Up_Time            0x0003   099   099   085    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   091   091   020    Old_age   Always       -       9917
  5 Reallocated_Sector_Ct   0x0033   088   088   036    Pre-fail  Always       -       246
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       207791365
  9 Power_On_Hours          0x0032   073   073   000    Old_age   Always       -       23876
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   094   094   020    Old_age   Always       -       6861
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       1097
188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       3759
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   051   036   045    Old_age   Always   In_the_past 49 (Min/Max 49/49 #998)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       20
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       78
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       578157
194 Temperature_Celsius     0x0022   049   064   000    Old_age   Always       -       49 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   053   045   000    Old_age   Always       -       29958806
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       4
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       22868 (153 213 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3790333358
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1937597633
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 987 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 987 occurred at disk power-on lifetime: 23876 hours (994 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 cb a3 b6 09  Error: UNC at LBA = 0x09b6a3cb = 162964427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 c8 a3 b6 49 00      05:50:04.649  READ FPDMA QUEUED
  60 00 28 e0 a3 b6 49 00      05:50:04.617  READ FPDMA QUEUED
  60 00 08 c0 a3 b6 49 00      05:50:04.515  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00      05:50:04.513  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      05:50:04.512  IDENTIFY DEVICE

Error 986 occurred at disk power-on lifetime: 23876 hours (994 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 cb a3 b6 09  Error: UNC at LBA = 0x09b6a3cb = 162964427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 20 c0 a3 b6 49 00      05:50:02.009  READ FPDMA QUEUED
  60 00 08 10 50 bb 49 00      05:50:01.961  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      05:49:55.547  FLUSH CACHE EXT
  61 00 08 a0 33 4a 49 00      05:49:55.547  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      05:49:55.538  FLUSH CACHE EXT

Error 985 occurred at disk power-on lifetime: 23874 hours (994 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f6 00 08 0c  Error: UNC at LBA = 0x0c0800f6 = 201851126

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      04:25:50.279  READ FPDMA QUEUED
  60 00 08 f0 00 08 4c 00      04:25:50.257  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00      04:25:50.256  WRITE FPDMA QUEUED
  60 00 08 90 f6 3c 4a 00      04:25:50.256  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      04:25:50.256  READ FPDMA QUEUED

Error 984 occurred at disk power-on lifetime: 23874 hours (994 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f6 00 08 0c  Error: UNC at LBA = 0x0c0800f6 = 201851126

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00      04:25:47.981  READ FPDMA QUEUED
  60 00 80 28 9e 57 49 00      04:25:47.954  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      04:25:47.953  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      04:25:47.945  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      04:25:47.941  READ FPDMA QUEUED

Error 983 occurred at disk power-on lifetime: 23874 hours (994 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f6 00 08 0c  Error: UNC at LBA = 0x0c0800f6 = 201851126

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      04:25:42.214  READ FPDMA QUEUED
  60 00 d8 00 04 08 4c 00      04:25:42.209  READ FPDMA QUEUED
  60 00 00 00 00 08 4c 00      04:25:42.207  READ FPDMA QUEUED
  60 00 00 00 f6 07 4c 00      04:25:42.207  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      04:25:42.163  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Источник

Моя утилита (проверяет хеши файлов) исправно работает на разных Linux-серверах, но на одном из них регулярно падает с ошибкой «Input/output error».

Вывод dmesg --level=err:

[    1.329324] ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330)
[    1.329375] ACPI Error: Aborting method _SB.PCI0.SAT0.SPT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529)
[    1.336931] ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330)
[    1.336980] ACPI Error: Aborting method _SB.PCI0.SAT0.SPT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529)
[ 3013.266044] ata1.00: exception Emask 0x0 SAct 0x18 SErr 0x0 action 0x0
[ 3013.266054] ata1.00: irq_stat 0x40000008
[ 3013.266059] ata1.00: failed command: READ FPDMA QUEUED
[ 3013.266066] ata1.00: cmd 60/00:18:98:db:b8/01:00:44:00:00/40 tag 3 ncq dma 131072 in
                        res 41/40:00:60:dc:b8/00:00:44:00:00/40 Emask 0x409 (media error) <F>
[ 3013.266072] ata1.00: status: { DRDY ERR }
[ 3013.266075] ata1.00: error: { UNC }
[ 3013.281961] blk_update_request: I/O error, dev sda, sector 1152965728 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[ 3956.517696] ata1.00: exception Emask 0x0 SAct 0x18 SErr 0x0 action 0x0
[ 3956.517706] ata1.00: irq_stat 0x40000008
[ 3956.517710] ata1.00: failed command: READ FPDMA QUEUED
[ 3956.517717] ata1.00: cmd 60/00:18:90:99:f6/01:00:4e:00:00/40 tag 3 ncq dma 131072 in
                        res 41/40:00:78:9a:f6/00:00:4e:00:00/40 Emask 0x409 (media error) <F>
[ 3956.517723] ata1.00: status: { DRDY ERR }
[ 3956.517726] ata1.00: error: { UNC }
[ 3956.533817] blk_update_request: I/O error, dev sda, sector 1324784248 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 3961.281636] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 3961.281645] ata1.00: irq_stat 0x40000008
[ 3961.281649] ata1.00: failed command: READ FPDMA QUEUED
[ 3961.281657] ata1.00: cmd 60/08:00:78:9a:f6/00:00:4e:00:00/40 tag 0 ncq dma 4096 in
                        res 41/40:00:78:9a:f6/00:00:4e:00:00/40 Emask 0x409 (media error) <F>
[ 3961.281664] ata1.00: status: { DRDY ERR }
[ 3961.281667] ata1.00: error: { UNC }
[ 3961.297276] blk_update_request: I/O error, dev sda, sector 1324784248 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

В чём может быть причина?

Источник

Bad block HOWTO for smartmontools

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> 0x10c

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> scsiman

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Sly_tom_cat

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> 0x10c

Читайте также:

0x10c

scsiman

Sly_tom_cat

0x10c