| Blogs | Classifieds | Downloads | FlashChat | Gallery | Googlemap | Invite Friends | Links | Projects | Reviews | Wiki |
| |||||||||
|
#1
| ||||
| ||||
Hi I am getting the following errpt errors constantly. BC669AA7 0812214407 P H dac1 CONTROLLER HEALTH CHECK FAILURE 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR BC669AA7 0812214307 P H dac0 CONTROLLER HEALTH CHECK FAILURE 3074FEB7 0812214307 T H fscsi0 ADAPTER ERROR 3074FEB7 0812214307 T H fscsi0 ADAPTER ERROR Server-ux1 > lsdev -Cc adapter ent0 Available 1f-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01) ent1 Available 11-08 Gigabit Ethernet-SX PCI-X Adapter (14106802) ent2 Available 3H-08 Gigabit Ethernet-SX PCI-X Adapter (14106802) ent3 Available 29-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01) fcs0 Available 14-08 FC Adapter fcs1 Available 1A-08 FC Adapter fcs2 Available 3L-08 FC Adapter fcs3 Available 3S-08 FC Adapter sa0 Available LPAR Virtual Serial Adapter scsi0 Defined 3V-08 Wide/Ultra-3 SCSI I/O Controller scsi1 Defined 3V-09 Wide/Ultra-3 SCSI I/O Controller scsi2 Available 2w-08 Wide/Ultra-3 SCSI I/O Controller scsi3 Available 2w-09 Wide/Ultra-3 SCSI I/O Controller Server-ux1 is clustered using HACMP with Server-ux5 also showing. Server-ux5 > lsdev -Cc adapter ent0 Available 1n-08 Gigabit Ethernet-SX PCI-X Adapter (14106802) ent1 Available 2U-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01) ent2 Available 3Z-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01) ent3 Available 2R-08 Gigabit Ethernet-SX PCI-X Adapter (14106802) fcs0 Available 1j-08 FC Adapter fcs1 Available 1D-08 FC Adapter fcs2 Available 3c-08 FC Adapter fcs3 Available 3n-08 FC Adapter sa0 Available LPAR Virtual Serial Adapter scsi0 Available 1Z-08 Wide/Ultra-3 SCSI I/O Controller scsi1 Available 1Z-09 Wide/Ultra-3 SCSI I/O Controller scsi2 Defined 4M-08 Wide/Ultra-3 SCSI I/O Controller scsi3 Defined 4M-09 Wide/Ultra-3 SCSI I/O Controller With a little more search on the same errror i found if a switch which connects SAN and the AIX box gets reboot without AIX graceful shutdown, can cause this error. can anyone please provide some pointers if that is true and what should i do to avoid these errors. Regards Yash |
|
#2
| ||||
| ||||
Ok first of all the fscsi and scsi devices you've highlighted are not related. The scsi devices are for a SCSI adater. It may be that you've changed the profile at some point and removed that SCSI adapter from the LPAR. Try deleting the scsi2 and scsi3 devices and running the cfgmgr to see if they come back before spending any more time on those, for both servers. The errors you see are related to your fcs0 device connecting through to the SAN. What is the detailed error text on these errors (one of each type): BC669AA7 0812214407 P H dac1 CONTROLLER HEALTH CHECK FAILURE 3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR What hardware is being used, and which version of AIX is running? What SAn hardware do you have - a DS4000 by the looks of the errors?
__________________ Ross Mather, IBM AIX IT Specialist. That said anything I say here is my own opinion and not anything that you can ever hold against IBM. Ohhh and don't forget that I make mistakes too.... |
|
#3
| ||||
| ||||
Hi You are right, I check out from previous configs, scsi2 and scsi3 are not of the problem these are as it is for quite a long period, so not the issue. Issue is with fcs0 device. Here is the detail error.. Server-ux1#errpt -aj BC669AA7 LABEL: FCP_ARRAY_ERR7 IDENTIFIER: BC669AA7 Date/Time: Mon 13 Aug 11:43:22 2007 Sequence Number: 3882541 Machine Id: 0053485A4C00 Node Id: Server-ux1 Class: H Type: PERM Resource Name: dac1 Resource Class: array Resource Type: ibm-dac-V4 Location: U0.1-P2-I2/Q1-W200400A0B80F8FD4 VPD: Manufacturer................IBM Machine Type and Model......1742 Part Number.................348-0046200 ROS Level and ID............0520 Device Specific.(Z1)........05400400 Device Specific.(Z2)........05401103 Description CONTROLLER HEALTH CHECK FAILURE Probable Causes ARRAY CONTROLLER CABLES AND CONNECTIONS ARRAY DASD MEDIA Failure Causes DASD MEDIA ARRAY CONTROLLER CABLES AND CONNECTIONS Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Detail Data SENSE DATA ...... .... Server-ux1#errpt -aj 3074FEB7 LABEL: FSCSI_ERR4 IDENTIFIER: 3074FEB7 Date/Time: Mon 13 Aug 11:53:28 2007 Sequence Number: 3882558 Machine Id: 0053485A4C00 Node Id: tubairux1 Class: H Type: TEMP Resource Name: fscsi0 Resource Class: driver Resource Type: efscsi Location: U0.1-P2-I2/Q1 Description ADAPTER ERROR Probable Causes ADAPTER HARDWARE OR CABLE ADAPTER MICROCODE FIBRE CHANNEL SWITCH OR FC-AL HUB Failure Causes ADAPTER CABLES AND CONNECTIONS DEVICE Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES CHECK CABLES AND THEIR CONNECTIONS VERIFY DEVICE CONFIGURATION Detail Data SENSE DATA 0000 0000 0000 00AD 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000A 0A00 0000 0000 SAN Hardware is Fastt700, AIX is 5.2008, P650 server. What i all want is how to fix these errors, after that i need to include two disks into a VG. Regards Yash |
|
#4
| ||||
| ||||
OK I see the trouble you are having. The error messages are in fact reporting that there is a problem with the link from server to disk. If the switch is really being rebooted then that would explain whatis happening. In that case the only way to fix it is to put that one path offline before the switch is rebooted. The LUN itself is fine, and I can't see anything else in the error message that suggests that there is an actual problem with the server itself. The only thoing you could check is a diagnostic of the fcs card and the level of firmware that it uses.
__________________ Ross Mather, IBM AIX IT Specialist. That said anything I say here is my own opinion and not anything that you can ever hold against IBM. Ohhh and don't forget that I make mistakes too.... |
|
#5
| ||||
| ||||
Hi ross.mather Yesterday IBM hardware specialist visited, and we concluded that the link in between SAN and AIX, is not the problem, we tried several time in & out Fibre cable at both ends, but Since there appear no light coming from card, which strongly suspects the card is faulty. Green light is glowing on|off with a regular interval, with no orange LED glowing like others. Now i have no choice and we need to replace the HBA fcs0, currently the disk IO is running on other fcs2 HBA. Is there any place i can find out a good information about how to replace a RDAC storage driver? Rgds Yash |
|
#6
| ||||
| ||||
Not sure I understand the question, why do you want to replace the RDAC device driver? If you put a new Fibre Card in and then rezone your SAn everything should work just fine.
__________________ Ross Mather, IBM AIX IT Specialist. That said anything I say here is my own opinion and not anything that you can ever hold against IBM. Ohhh and don't forget that I make mistakes too.... |
|
#7
| ||||
| ||||
We use RDAC driver, which creates the dar, dac, hdisk, utm devices etc. Ok, Since other devices e.g fcs1 etc also use RDAC, So no need to remove driver, let me put it in other way ..How do i need to reconfigure these drivers again for new HBA? So that they keep the same multipath, policies of failover etc.. In our setup we use fastt 700, shared storage with HACMP, active-passive cluster. Failed HBA is on active node. There are two servers in cluster, with two DAR named dar0, dar1, two DAC dac0, dac2, shared VGs, LVs are created across both to provide resilience under HACMP. I read something relating Fastt700, will the following will be sufficient 1. HACMP resource groups, move them to failing over node, ensure VGs are varied off 2. Since all the RGs are moved to the other node, now all the devices on the redundant on the faulty HBA and can remove fcs0, fscsi0, dar0, dar1 , hdisk*, using rmdev 3. Run the cfgmgr -v After that do i need to export the VGs or the ODM entries for VG, disks etc remains same... 5. move the RGs back Rgds Yash |
|
#8
| ||||
| ||||
It should all work online. Ypu may well need to check the zoning on your SAn Switches and on the DS4000 as the WWN of the original card will still be in the configuration.
__________________ Ross Mather, IBM AIX IT Specialist. That said anything I say here is my own opinion and not anything that you can ever hold against IBM. Ohhh and don't forget that I make mistakes too.... |
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| |