Fatal firmware error driver detected possible fw hang halting fw

Добрый день. Имеется сервер Intel. Intel SC5400, 5U Tower, 670W (SC5400BASE) MB S5000PSLROMB (плата Raid SROMBSAS18E) Ключ активации INTEL SAS RAID AXXRAK18E ко

Добрый день.

Имеется сервер Intel.
Intel SC5400, 5U Tower, 670W (SC5400BASE)
MB S5000PSLROMB (плата Raid SROMBSAS18E)
Ключ активации INTEL SAS RAID AXXRAK18E
корзина на 4 винта AXX4DRV3GEXP
корзина на 6 винтов AXX6DRV3GEXP

В корзине на 4 винта живет рейд 10 (VD0).
В корзине на 6 винтов живут винты (VD2 и VD3, оба одиночные raid0 и VD1 рейд1 из двух винтов + hotspare).
Поменял BBU на свежую (т.к. старая попросила замены от старости) и начались проблемы.

При первом включении в рейд-биос сказал, что «отсутствуют VD1, VD2 и VD3 (всё что в корзине на 6 винтов) — «Продолжайте и я их забуду или выключите сервер, проверьте и включайте».
Я выключил/включил сервер — контроллер их нормально увидел.
Потом делал несколько перезагрузок — в 50% случаев он их не видел опять.

При этом рейд10, который в корзине на 4 винта видится и работает без проблем.
Проблемы только со всеми винтами в корзине на 6 дисков.

Далее включил сервер (с работающими VD2 и VD3 а VD1 вынул, т.к. ненужен в данный момент), все работало.
Через несколько часов VD2 (одиночный винт, рейд 0) отвалился. Попробовал его «force online» — не захотел.

А затем и VD3 (одиночный винт, рейд 0) отвалился.
Ребут, не видит VD2 и VD3, ок, пусть забывает их (т.к. работать надо, а основные данные на VD0 корзине на 4, которые не отваливаются).
В результате в WebRaidConsole только корзина на 4 винта с рейдом плюс один из винтов из второй корзины в состоянии Uncofigured Bad:
Изображение

При этом у меня куча Warning’ов типа
Controller ID: 0 PD Reset: PD = :17, Error = 3, Path = 50:01:e6:71:46:db:a0:0b
Controller ID: 0 PD Reset: PD = : :0, Error = 3, Path = 50:01:e6:71:46:db:a0:01
Controller ID: 0 Error: : :0 (Error 240)
Controller ID: 0 Command timeout on PD: PD = : :0 — No addtional sense information, CDB = 0x28 0x00 0x02 0x89 0x50 0x00 0x00 0x08 0x00 0x00 , Sense = 50:01:e6:71:46:db:a0:01, Path =

И немного событий Fatal:
Controller ID: 0 VD is now OFFLINE VD 2
Controller ID: 0 VD is now OFFLINE VD 3
Controller ID: 0 Fatal firmware error: Line 205 in ../../raid/mfihw.c
Controller ID: 0 Fatal firmware error: Driver detected possible FW hang, halting FW.
и пару Critical
Controller ID: 0 SAS topology error: SMP timeout

В чем скорее всего проблема? Что посоветуете?

Problem

Users running the ServeRAID M and MR10 Series SAS Controller Driver v6.702.07.00 for Microsoft Windows Server 2012R2/2012/2008R2/2008 releases may experience an unexpected controller reset or system hang while under moderate to heavy I/O. In some circumstances, the controller restart can occur during Windows boot-up, causing Windows to hang at the loading screen. In rare occasions, the controller may not reset correctly and will be unavailable until the system has been restarted.

If the MegaRAID Storag

Resolving The Problem

Source

RETAIN tip: H213051

Symptom

Users running the ServeRAID M and MR10 Series SAS Controller
Driver v6.702.07.00 for Microsoft Windows Server
2012R2/2012/2008R2/2008 releases may experience an unexpected
controller reset or system hang while under moderate to heavy I/O.
In some circumstances, the controller restart can occur during
Windows boot-up, causing Windows to hang at the loading screen. In
rare occasions, the controller may not reset correctly and will be
unavailable until the system has been restarted.

If the MegaRAID Storage Manager (MSM) application is running,
the following message may be logged in its event logs as well as
the Windows Application Event log: 

                
Fatal firmware error: Driver detected possible firmware hang,
halting firmware
Driver detected possible firmware hang, halting firmware
Fatal firmware error: Line 1305 in ../../raid/1078dma.c

Affected configurations

The system can be any of the following IBM servers:

  • System x3300 M4, type 7382, any model
  • System x3500 M4, type 7383, any model
  • System x3500 M4, type 7383 E5-xxxxV2, any model
  • System x3530 M4, type 7160, any model
  • System x3530 M4, type 7160 E5-xxxxV2, any model
  • System x3550 M4, type 5459, any model
  • System x3550 M4, type 7914, any model
  • System x3550 M4, type 7914 E5-xxxxV2, any model
  • System x3630 M4, type 7158, any model
    System x3630 M4, type 7158 E5-xxxxV2, any model
  • System x3650 M4 BD, type 5466, any model
  • System x3650 M4 HD, type 5460, any model
  • System x3650 M4, type 7915, any model
  • System x3650 M4, type 7915 E5-xxxxV2, any model
  • System x3750 M4, type 8722, any model
  • System x3850 X5, type 7143, any model
  • System x3850 X5, type 7145, any model
  • System x3850 X5, type 7146, any model
  • System x3850 X5, type 7191, any model
  • iDataPlex dx360 M4 server, type 7912, any mode
  • iDataPlex dx360 M4 server, type 7912 E5-xxxxV2, any model

The system is configured with at least one of the following
operating systems:

  • Microsoft Windows Server 2008 R2
    Datacenter Edition, any service pack
  • Microsoft Windows Server 2008 R2
    Enterprise Edition,any service pack
  • Microsoft Windows Server 2008 R2 Standard
    , any service pack
  • Microsoft Windows Server 2008, Datacenter
    Edition 32-bit, any service pack
  • Microsoft Windows Server 2008, Datacenter
    Edition 64-bit, any service pack
  • Microsoft Windows Server 2008, Enterprise
    Edition 32-bit, any service pack
  • Microsoft Windows Server 2008, Enterprise
    Edition 64-bit, any service pack
  • Microsoft Windows Server 2008, Standard
    Edition 32-bit, any service pack
  • Microsoft Windows Server 2008, Standard
    Edition 64-bit, any service pack
  • Microsoft Windows Server 2012, any service
    pack
  • Microsoft Windows Server 2012 R2, any
    service pack

The system is configured with one or more of the following IBM
Option part numbers:

  •  ServeRAID M1015 SAS/SATA
    Controller, Option part number 46M0831, any replacement part
    number
  • ServeRAID M1210e, any model
  • ServeRAID M1215 SAS/SATA Controller for IBM System x, Option
    part number 46C9114, any model
  • ServeRAID M5014 SAS/SATA Controller, Option part number
    46M0916, any replacement part number
  • ServeRAID M5015 SAS/SATA Controller, Option part number
    46M0829, any replacement part number
  • ServeRAID M5016 SAS/SATA Controller for IBM System x, Option
    part number 90Y4304, any replacement part number
  • ServeRAID M5025 SAS/SATA Controller, Option part number
    46M0830, any replacement part number
  • ServeRAID M5110 SAS/SATA Controller Card, Option part number
    81Y4481, any replacement part number
  • ServeRAID M5110 SAS/SATA Controller for IBM System x (CTO), any
    FRU
  • ServeRAID M5110e SAS/SATA Controller for IBM System x, onboard,
    any embedded
  • ServeRAID M5115 SAS/SATA Controller, Option part number
    90Y4390, any replacement part number
  • ServeRAID M5120 SAS/SATA Controller for IBM System x, Option
    part number 81Y4478, any replacement part number
  • ServeRAID M5210 SAS/SATA Controller for IBM System x, Option
    part number 46C9110, any any
  • ServeRAID M5210e SAS/SATA Controller for IBM System x, Option
    part number 46C9117 CTO, any any

The version 6.702.07.00 device driver for the ServeRAID M Series
and MR10 SAS Controllers for Microsoft Windows is affected.

Note: This does not imply that the network operating
system will work under all combinations of hardware and
software.

Please see the compatibility page for more information:

http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/

Solution

This behavior will be corrected in a future release of IBM
ServeRAID MR Device driver for Windows Server later than version
6.702.07.00.

The target date for this release is scheduled for first quarter
2015.

The file is or will be available by selecting the appropriate
Product Group, type of System, Product name, Product machine type,
and operating system on IBM Support’s Fix Central web page, at the
following URL:

http://www.ibm.com/support/fixcentral/

Workaround

This behavior has been corrected in version 6.704.12.00 of the
ServeRAID MR Device driver for Windows Server.

The file is or will be available by selecting the appropriate
Product Group, type of System, Product name, Product machine type,
and operating system on IBM Support’s Fix Central web page, at the
following URL:

     http://www.ibm.com/support/fixcentral/

To work around the issue, rollback the Windows driver to version
6.600.25.00, or update to the latest driver available on Fix
Central.

     http://www.ibm.com/support/fixcentral/

Additional information

System x has confirmed the cause of the fatal firmware errors
are unique to certain software disk I/O loads. Downgrading the
Windows Server device driver to version 6.600.25.00, or updating to
the latest version, will prevent the fatal firmware issue from
occurring. The ServeRAID SAS/SATA controller firmware code level is
not applicable.

Document Location

Worldwide

Operating System

System x:Windows Server 2008

System x:Windows Server 2008 x86-64 & 2008 R2

System x Hardware Options:Windows Server 2008

System x Hardware Options:Windows Server 2008 x86-64 & 2008 R2

System x:Windows Server 2012

System x Hardware Options:Windows Server 2012

System x:Windows Server 2012 R2

Lenovo x86 servers:Windows Server 2008

Lenovo x86 servers:Windows Server 2008 x86-64 & 2008 R2

Lenovo x86 servers:Windows Server 2012

Lenovo x86 servers:Windows Server 2012 R2

[{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWMJ0″,»label»:»Lenovo x86 servers->Lenovo System x3750 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXB8″,»label»:»Lenovo x86 servers->Lenovo System x3300 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXB9″,»label»:»Lenovo x86 servers->Lenovo System x3530 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXC1″,»label»:»Lenovo x86 servers->Lenovo System x3550 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXC2″,»label»:»Lenovo x86 servers->Lenovo System x3630 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXC3″,»label»:»Lenovo x86 servers->Lenovo System x3650 M4 BD»},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXC4″,»label»:»Lenovo x86 servers->Lenovo System x3650 M4 HD»},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXX0″,»label»:»Lenovo x86 servers->Lenovo System x3500 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»HWXX2″,»label»:»Lenovo x86 servers->Lenovo System x3650 M4″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU00ZUG»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->90Y4304″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU01DEW»,»label»:»System x->System x3500 M4->7383″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU01DKP»,»label»:»System x->System x3650 M4->7915″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU01FYU»,»label»:»System x->System x3550 M4->5459″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU01GCQ»,»label»:»System x->System x3530 M4->7160″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU04SRF»,»label»:»System x->System x3850 X5->7146″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU04SRO»,»label»:»System x->System x3850 X5->7145″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU90ABO»,»label»:»System x->System x3850 X5->7191″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU90ABX»,»label»:»System x->System x3850 X5->7143″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU90ADC»,»label»:»System x->System x iDataPlex dx360 M4 server->7912″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU91IPI»,»label»:»System x->System x3550 M4->7914″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU91NAJ»,»label»:»System x->System x3750 M4->8722″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU91NCW»,»label»:»System x->System x3630 M4->7158″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU91SVT»,»label»:»System x->System x3300 M4->7382″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEARD»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->81Y4478″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEARE»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->81Y4481″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEJ2R»,»label»:»System x->System x3650 M4 HD->5460″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEVWA»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->46C9110″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEVWB»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->46C9117″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOEZBN»,»label»:»System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->46C9114″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QUOFE4B»,»label»:»System x->System x3650 M4 BD->5466″},»Platform»:[{«code»:»PF033″,»label»:»Windows»}],»Line of Business»:{«code»:»»,»label»:»»}}]

Содержание

  1. LSI MegaRAID SAS 9271-8i писк
  2. Проблема с LSI 9260-8i при интенсивной однопоточной записи
  3. Проблема с LSI 9260-8i при интенсивной однопоточной записи
  4. How do I Resolve the «Controller ID: 0 Fatal firmware. » Error?
  5. Environment
  6. Related Products
  7. This article applies to 1 products
  8. Need more help?
  9. Give Feedback
  10. Disclaimer
  11. Syed Jahanzaib Personal Blog to Share Knowledge !
  12. February 26, 2016
  13. IBM ServeRAID m5110e Fatal firmware errors on Server 2008
  14. My RAID card keeps crashing lately, can’t even do a backup.
  15. Cyber_Akuma

LSI MegaRAID SAS 9271-8i писк

Достался мне по наследству сервер с вышеупомянутым контроллером, но вот беда, пищит он просто дико.

Пищит именно RAID контроллер, пищит раз примерно в 2 секунды. При этом сервер нормально загружается и в утилитке управления контроллером все диски в порядке.

Данная проблема появилась (как мне рассказали) после аварийного отключения питания. Пищать контроллер начинает после того как «находит» диски.

Еще в утилитке управления контроллером есть пугающие сообщения типа

Подскажите пожалуйста как починить этот писк. Ссылочка на все фотки и видео загрузки сервера. https://yadi.sk/d/ZszudkCMhPxjE

Как называется утилитка управления контроллером?

Так, понятно. Гуй закрывай, устанавливай http://www.lsi.com/downloads/Public/RAID Controllers/RAID Controllers Common . и смотри, что вообще с контроллером творится. Заодно бы и фирмварь обновить надо, а то между твоей и актуальной уже 5 релизов прошло.

Спасибо за помощь.

Я уже поставил себе MegaCli, как я понял они аналоги, но UserGuide больше склонен к MegaCli.
Проштудировал этот самый UserGuide, да обновиться не помешало бы, но страшно, система «боевая».
Да и не совсем понятно как обновиться, в инструкции есть только одна команда

Можно ли ее выполнить из основной системы? Не развалится ли массив?

И самое что напрягает, в changelog нет ниодного упоминания о моей ошибке.

Но наверное вы правы, надо с чего то начинать, попробую обновиться для начала.

Вот только не подскажите, данная команда корректна?
Можно ли ее выполнить из основной системы?

StorCli — более новая. MegaCli, может, и работает, но я бы следовал совету вендора.

Можно ли ее выполнить из основной системы? Не развалится ли массив?

Вообще не должен. Но работать, после перезагрузки, не будет до ручного вмешательства, вероятно. Мне, после обновления SAS 2208, пришлось выполнить пункт

3. Set the ‘Set Factory Defaults’ setting to ‘Yes’ and submit the change.

из инструкции по обновлению прошивки. Диски он увидел только после этого.

Прошивал так: ./storcli64 /c0 download file=./MR511.rom nosigchk
(это, разумеется, для 2208; для твоего, вероятно, название другое будет)

Ненене, storcli64 /c0 show all > out и его выложить.

Вобщем связался с разработчиками.
Наудивление очень быстро отреагировали.
Доступно изложили пути решения.

Если кому предстоит обращаться с разрабам первое что нужно сделать, это собрать все логи системы, они все равно это попросят. Скрипт для сбора логов лежит тут: http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?s&inc=8264

Далее мне посоветовали загрузиться с флехи и выполнить два набора команд, если первый не поможет, то выполнить второй. Мне помог первый. Ниже полный ответ. Может кому пригодится.

Источник

Проблема с LSI 9260-8i при интенсивной однопоточной записи

Проблема с LSI 9260-8i при интенсивной однопоточной записи

Сообщение yu_mor » 29 ноя 2016, 10:12

Здравствуйте.
Имеется сервер бэкапов на S5520UR, в нём стоит LSI 9260-8i, к нему подключены 6*HUS724040ALS640, из них собран raid 5, на дисковом пространстве установлена WS2012R2
А предыстория такая: купили 6 больших дисков, но не смогли с ходу заставить ос грузиться с gpt-диска, из 5-и сделали raid-5, 6-й убрали про запас, вместо него воткнули диск поменьше и поставили на него операционку. Всё было хорошо, пока не настало время сделать, как и планировали (boot-раздел положили на mbr-раздел usb-флешки)
В 5-ом рэйде из 6 дисков при интенсивной записи на скорости около 220МБ/с примерно на 500 гигабайтах (если файлы большие — диски виртуалок) или на 1500 гигабайтах (если файлы поменьше — бэкапы баз sql) система уходит в перезагрузку, в логах ошибка Fatal firmware error: Line 621 in ../../raid/1078main.c
Прошивка контроллера была 12.15.0-0205_SAS_2108_Fw_Image_APP_2.130.403.3835, на новой 12.15.0-0239_MR_2108_SAS_FW_2.130.403-4660 то же самое, только строчка другая: Fatal firmware error: Line 624 in ../../raid/1078main.c.
Такое ощущение, что контроллер забирает данные быстрее, чем успевает записать на диски.
Что подскажете? Менять диск? Контроллер? Дождаться окончания инициализации?
Конфигурация тома по-умолчанию, только отключены дисковые кэши.
Пробовал разные размеры strip, на умолчальном в 256к кажется, перезагрузки случаются чуть пореже.

UPD
Поменял на ночь IO Policy на cached, и бэкапы вернулись на место без перезагрузок.
Cледом успешно прошли новые бэкапы.
И уже под утро началось:
ID = 47
SEQUENCE NUMBER = 85460
TIME = 29-11-2016 05:37:18
LOCALIZED MESSAGE = Controller ID: 0 Background Initialization corrected medium error: ( VD 0 Location 0x278351bba, PD -:-:5 Location 0x278351bba)
ID = 110
SEQUENCE NUMBER = 85459
TIME = 29-11-2016 05:37:18
LOCALIZED MESSAGE = Controller ID: 0 Corrected medium error during recovery: PD -:-:5 Location 0x278351bb9
и закончилось вот чем:
ID = 51
SEQUENCE NUMBER = 88458
TIME = 29-11-2016 06:31:12
LOCALIZED MESSAGE = Controller ID: 0 Background Initialization failed on VD: 0
ID = 251
SEQUENCE NUMBER = 88457
TIME = 29-11-2016 06:31:12
LOCALIZED MESSAGE = Controller ID: 0 VD is now DEGRADED VD 0
ID = 81
SEQUENCE NUMBER = 88456
TIME = 29-11-2016 06:31:12
LOCALIZED MESSAGE = Controller ID: 0 State change on VD: 0 Previous = Optimal Current = Degraded
ID = 114
SEQUENCE NUMBER = 88455
TIME = 29-11-2016 06:31:12
LOCALIZED MESSAGE = Controller ID: 0 State change: PD = -:-:5 Previous = Online Current = Failed

у диска medium error count: 6074 и Pred fail count:1

Источник

How do I Resolve the «Controller ID: 0 Fatal firmware. » Error?

Environment

Intel® RAID Controller RS2BL040 / 80 Server board with the latest BIOS/firmware

Troubleshooting hints on how to approach the «Fatal Firmware» errors.

When the Controller ID: 0 Fatal firmware. error occurs, the system reboots.

The fatal firmware error initially points to the controller having failed. However, consider the following steps first:

  1. See that the Board-Card combination is a compatible one.
  2. Contact Support with the RAID logs and System Information Retrieval Utility (Sysinfo) logs if the situation persists.

This article applies to 1 products

Need more help?

Give Feedback

Disclaimer

All postings and use of the content on this site are subject to Intel.com Terms of Use.

Intel technologies may require enabled hardware, software or service activation. // No product or component can be absolutely secure. // Your costs and results may vary. // Performance varies by use, configuration and other factors. // See our complete legal notices and disclaimers. // Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right.

Источник

February 26, 2016

IBM ServeRAID m5110e Fatal firmware errors on Server 2008

In our data center, we are using IBM Xseries 3650 M4 series. We updated one of the systemX server firmware and drivers a month ago. since then it was having issue of halting in random days specially in heavy load conditions (like when backup executes) and was presenting following errors on the screen.

Raid controller details were as followed.

After some diagnostic it was found that the culprit was driver version “6.702.07.00“. As stated in IBM web site.

After that, we downloaded IBM Update Express ver 9.63 (ibm_utl_uxspi_9.63_winsrvr_32-64.exe) and execute the update for selected drivers on live running system that was hosting our lotus domino email system. It took around 1 hour for the download + update and upon rebooting, and till the writing of this post, the problem seems to be solved now.

After Update,

Note: I would recommend to NOT upgrade any critical system firmware until is really required.

Regard’s
Syed Jahanzaib

Источник

My RAID card keeps crashing lately, can’t even do a backup.

Cyber_Akuma

Distinguished

I have a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID M5014 card, but since those are just re-branded 9260-9i cards I reflashed it, took FAR less time to finish POST on LSI firmware and was updated more frequently to boot.

I then setup a RAID5 using four 3TB WD Red Harddrives connected by a SAS to 4-SATA port breakout cable. I admit though that I am not very knowledgeable about the lower level workings of RAID setups.

Anyway, the card was running fine for years, but lately it’s been crashing on me.

I noticed these crashes started when I upgraded to firmware 12.15.0-0189, although I don’t know if that is just a coincidence. It wasn’t after the upgrade that I noticed it had a warning message about the firmware having issues with extender backplanes and if you use one to not upgrade and wait for a firmware upgrade to address these issues.

Other than the BBU and SAS->SATA cable there is no additional hardware attached to this card, and the card is just simply in a desktop system and not any kind of server or NAS, but when I use the MegaRAID Storage Manager to check the status of the card (which always is listed as operating optimal and healthy) it lists my drives under a backplane. I admit, I have no idea what a backplane even is (attempting to Google information about it leads me to believe they are hot-swap bays for servers), or if the SAS to SATA cable is what it is considering a backplane.

Anyway, lately when the RAID array is under heavy use the controller keeps crashing, seeing errors like:

Controller ID: 0 Fatal firmware error:
Driver detected possible FW hang, halting FW.

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Then the array is down until I reboot the system. If can go for weeks without a problem, but it seems that whenever I use it heavily it crashes. Whether it’s one very large read operation (such as a backup of the array), or many small random read/write operations but in a very large amount, it seems to «overload» and crash frequently whenever it is under heavy load, but when not works fine for weeks.

This has made it so I cannot even backup my array because after a few hours it WILL crash, so now even backups are impossible! (I will have to see if I can make the backup app read from the array at a slower speed, for the moment just so I have SOME backup I am literally just dragging and dropping files through windows from the array onto the backup drive instead of making a backup image of the drive).

There was a firmware update released about two weeks ago, version 12.15.0-0205, but it didn’t seem to fix the issue, same hangs when attempting to backup. On top of that, now the card takes much much longer to finish POST as if it was an IBM card again.

As for trying to downgrade my firmware to see if that fixes it, there are a few issues with that.

First of all, as I said, I am not very knowledgeable about the lower level workings of RAID setups so I don’t know if a downgrade could wipe my array, I know upgrades don’t, but I am not sure about downgrades (mainly due to possible updates to the configuration data in newer firmwares) so I don’t want to try that until I am at least done just copying my files over to the backup drive. And second, I can’t find any information on the largest size a single drive (not the array, but a physical drive) can be for any firmware version (or the card itself). Currently I have 3TB drives and they work fine, but I want to eventually upgrade to 6TB or larger drives in a year or two, and if the older firmwares before 12.15.0-0189 don’t support drives that big I am stuck. And finally, LSI seems to only host the latest versions of the drivers, software, and firmware on the main product page now, referring you to another page for previous versions. which is down.

Does anyone have any ideas or suggestions? Anyone else by any chance also have this card and is running into these errors? Does this look like a fault of the harddrives? Or the card? Or is nothing likely wrong with either and it’s a firmware issue?

Источник

Cyber_Akuma



Oct 5, 2002



308



4



18,785

0


  • #1

I have a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID M5014 card, but since those are just re-branded 9260-9i cards I reflashed it, took FAR less time to finish POST on LSI firmware and was updated more frequently to boot.

I then setup a RAID5 using four 3TB WD Red Harddrives connected by a SAS to 4-SATA port breakout cable. I admit though that I am not very knowledgeable about the lower level workings of RAID setups.

Anyway, the card was running fine for years, but lately it’s been crashing on me.

I noticed these crashes started when I upgraded to firmware 12.15.0-0189, although I don’t know if that is just a coincidence. It wasn’t after the upgrade that I noticed it had a warning message about the firmware having issues with extender backplanes and if you use one to not upgrade and wait for a firmware upgrade to address these issues.

Other than the BBU and SAS->SATA cable there is no additional hardware attached to this card, and the card is just simply in a desktop system and not any kind of server or NAS, but when I use the MegaRAID Storage Manager to check the status of the card (which always is listed as operating optimal and healthy) it lists my drives under a backplane. I admit, I have no idea what a backplane even is (attempting to Google information about it leads me to believe they are hot-swap bays for servers), or if the SAS to SATA cable is what it is considering a backplane.

Anyway, lately when the RAID array is under heavy use the controller keeps crashing, seeing errors like:

Controller ID: 0 Fatal firmware error:
Driver detected possible FW hang, halting FW.

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Then the array is down until I reboot the system. If can go for weeks without a problem, but it seems that whenever I use it heavily it crashes. Whether it’s one very large read operation (such as a backup of the array), or many small random read/write operations but in a very large amount, it seems to «overload» and crash frequently whenever it is under heavy load, but when not works fine for weeks.

This has made it so I cannot even backup my array because after a few hours it WILL crash, so now even backups are impossible! (I will have to see if I can make the backup app read from the array at a slower speed, for the moment just so I have SOME backup I am literally just dragging and dropping files through windows from the array onto the backup drive instead of making a backup image of the drive).

There was a firmware update released about two weeks ago, version 12.15.0-0205, but it didn’t seem to fix the issue, same hangs when attempting to backup. On top of that, now the card takes much much longer to finish POST as if it was an IBM card again.

As for trying to downgrade my firmware to see if that fixes it, there are a few issues with that.

First of all, as I said, I am not very knowledgeable about the lower level workings of RAID setups so I don’t know if a downgrade could wipe my array, I know upgrades don’t, but I am not sure about downgrades (mainly due to possible updates to the configuration data in newer firmwares) so I don’t want to try that until I am at least done just copying my files over to the backup drive. And second, I can’t find any information on the largest size a single drive (not the array, but a physical drive) can be for any firmware version (or the card itself). Currently I have 3TB drives and they work fine, but I want to eventually upgrade to 6TB or larger drives in a year or two, and if the older firmwares before 12.15.0-0189 don’t support drives that big I am stuck. And finally, LSI seems to only host the latest versions of the drivers, software, and firmware on the main product page now, referring you to another page for previous versions…. which is down.

Does anyone have any ideas or suggestions? Anyone else by any chance also have this card and is running into these errors? Does this look like a fault of the harddrives? Or the card? Or is nothing likely wrong with either and it’s a firmware issue?

das_stig



Jul 24, 2009



8,213



5



41,965

1,342


  • #2

I would flash back to IBM firmware, then flash to LSI, the cards may be more than simple rebranded, but may have custom additions to hardware/firmware.

The backplane, to put it simply, is hardware that supplies power and data connection to drives rather than a mass of cables, but also does a lot more, hence they have firmware.

Cyber_Akuma



Oct 5, 2002



308



4



18,785

0


  • #3

I flashed this card to LSI firmware pretty much the day I got it late 2012, and it has been upgraded with LSI firmware periodically since, these problems only started cropping up about two months ago on the last two firmware updates.

Cyber_Akuma



Oct 5, 2002



308



4



18,785

0


  • #4

Well, just ran into something interesting when attempting to use Windows to just copy the files onto a backup drive.

It’s a single video file in the entire RAID that is causing my card to crash, I can access the entire rest of the RAID just fine, but trying to access that one file instantly makes it crash, whether I just rebooted and tried to only access that file, or it’s been weeks and I have been performing tons of read/write operations beforehand then run into that file.

This doesn’t make sense to me. If the file was corrupt or the disk was dying I understand, but then my card would just warn me of this, why would the firmware of the RAID card itself crash trying to access a specific file?



Aug 20, 2018



1



0



10

0


  • #5

I started experiencing the same issue for a specific file also for the past two days. Did you ever figure out how to resolve it or if there is other firmware that fixes it?

Well, just ran into something interesting when attempting to use Windows to just copy the files onto a backup drive.

It’s a single video file in the entire RAID that is causing my card to crash, I can access the entire rest of the RAID just fine, but trying to access that one file instantly makes it crash, whether I just rebooted and tried to only access that file, or it’s been weeks and I have been performing tons of read/write operations beforehand then run into that file.

This doesn’t make sense to me. If the file was corrupt or the disk was dying I understand, but then my card would just warn me of this, why would the firmware of the RAID card itself crash trying to access a specific file?

Peter Martin



Oct 9, 2014



2,969



2



21,615

265


  • #6

Tl dr. The card is probably dying

Thread starter Similar threads Forum Replies Date

MAWTZARD

Question my m.2 drive shows 1mb in windows 10 setup, diskpart,etc. After crash. Storage 4 Jan 31, 2023

dmitrykkkk

Question KingSpec 1TB SSD went into «read only» mode after OS crash ? Storage 11 Dec 29, 2022

J

Question New SSD causing BIOS to crash ? Storage 2 Dec 24, 2022

qjit

Question I installed a new ssd and i think it causes my games to randomly crash and also to cause visual bugs Storage 2 Dec 21, 2022

big16hd

Question Nvme EVO 980 PRO stops working after game crashes. Storage 8 Dec 9, 2022

T

[SOLVED] Computer unstable and crashing after installing 3rd NVME Storage 5 Nov 20, 2022

A

Question PC stops seeing internal drives while on, causes crash Storage 1 Oct 26, 2022

R

Question Pc crashes randomly, ssd dying? Storage 1 Sep 16, 2022

D

Question [Please Help] NVME constant crashing with games Storage 7 Sep 15, 2022

D

Question Battlefield 2042 update crash broke win10 boot ! Storage 2 Sep 1, 2022

  • Advertising
  • Cookies Policies
  • Privacy
  • Term & Conditions
  • Topics

I have a server 2019 box, its a supermicro sas chassis with an 8 drive raid 6 array on this hardware megaraid 3108.

What happened originally here is that i found windows was stuck, black screen.. so i reset the machine, then i found that it hangs on the supermicro logo screen with windows spinning dotted circle.. num lock unresponsive.

So i checked the array in the bios and found 2 drives missing.. i moved those 2 (possibly bad) drives to another set of slots, they went to foreign mode and then started rebuilding.

Firstly, im not sure why a degraded array would prevent windows from booting.. but even worse is the fact i’ve tried 2 usb boot sticks that have been known to work fine in the past and they have issues too..

I can get the usb stick to boot but only to a blue blank screen.. shift f10 i can get to command prompt.. if i type setup and hit enter it says setup is starting but sticks..

Back to the prompt, if i do diskpart (with or without the usb stick in), it hangs at machine: miniwanpc (forget the spelling but i guess the booted usb os).. never shows anything but i can shift f10 other prompts.

I’ve already ran a chkdsk on C prior.

Im really at a loss here.. (its our dpm backup server with all data on the one raid 6 array, the OS was carved out of it from the original setup).

Not sure what to do, i do note that the 2nd «bad»/recovered drive is at 45% on rebuild, the first drive fixed itself.

I’d think at least a boot stick should work, but i cant even get that far.  Maybe one or both of these drives are causing some sort of cascading failure (last ditch will be to pull those two and try again, or disconnect the megaraid and try the usb stick with no drives basically)

im pretty sure its going to boil down to getting those 2 new drives, it not working then wiping out the virtual array and starting over (loss of data most likely at least on the backed up server data side, though i may have a tape backup of the actual dpm server data)

Any thoughts or suggestions? 

אודות LENOVO

+

אודות LENOVO

  • החברה שלנו

  • חדשות

  • קשרי משקיעים

  • קיימות

  • תאימות מוצרים

  • אבטחת מוצר

  • קוד פתוח של לנובו

  • מידע משפטי

  • דרושים בלנובו

חנות

+

חנות

  • מחשבים ניידים & Ultrabooks

  • טאבלטים

  • מחשבים שולחניים & All-in-One

  • תחנות עבודה

  • אביזרים נלווים ותוכנה

  • שרתים

  • אחסון

  • עבודה ברשת

  • התנהלות לניידים

  • עודפים / תוצרת

תמיכה

+

תמיכה

  • דרייברים ותוכנות

  • מדריכים

  • בדיקת אחריות

  • חיפוש חלקים

  • צור קשר

  • בדיקת מצב תיקון

  • Imaging & Security Resources

מקורות

+

מקורות

  •  היכן ניתן לקנות ?

  • עזרה בקניות

  •  סטטוס הזמנה

  • מפרט מוצר (PSREF)

  • פורומים

  • רישום המוצר

  • נגישות המוצר

  • מידע על איכות הסביבה

  • Gaming Community

  • LenovoEDU Community

  • LenovoPRO Community

©

Lenovo.

|
|
|
|

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Fatal error что это такое как исправить
  • Fatal error стим ошибка при запуске
  • Fatal error сталкер тень чернобыля
  • Fatal error сталкер тайные тропы 2 ogse
  • Fatal error сити кар драйвинг

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии