Blogs Classifieds Downloads FlashChat Gallery Googlemap Invite Friends Links Projects Reviews Wiki
 


Our Sponsors
Want to advertise?  


Reply
 
LinkBack Thread Tools
  #1  
Old August 12th, 2007
nagar_yash's Avatar
nagar_yash Offline
Junior Member
 
Join Date: May 2007
Posts: 16
Controller health check

Hi

I am getting the following errpt errors constantly.

BC669AA7 0812214407 P H dac1 CONTROLLER HEALTH CHECK FAILURE
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR
BC669AA7 0812214307 P H dac0 CONTROLLER HEALTH CHECK FAILURE
3074FEB7 0812214307 T H fscsi0 ADAPTER ERROR
3074FEB7 0812214307 T H fscsi0 ADAPTER ERROR

Server-ux1 > lsdev -Cc adapter
ent0 Available 1f-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
ent1 Available 11-08 Gigabit Ethernet-SX PCI-X Adapter (14106802)
ent2 Available 3H-08 Gigabit Ethernet-SX PCI-X Adapter (14106802)
ent3 Available 29-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
fcs0 Available 14-08 FC Adapter
fcs1 Available 1A-08 FC Adapter
fcs2 Available 3L-08 FC Adapter
fcs3 Available 3S-08 FC Adapter
sa0 Available LPAR Virtual Serial Adapter
scsi0 Defined 3V-08 Wide/Ultra-3 SCSI I/O Controller
scsi1 Defined 3V-09 Wide/Ultra-3 SCSI I/O Controller

scsi2 Available 2w-08 Wide/Ultra-3 SCSI I/O Controller
scsi3 Available 2w-09 Wide/Ultra-3 SCSI I/O Controller

Server-ux1 is clustered using HACMP with Server-ux5 also showing.

Server-ux5 > lsdev -Cc adapter
ent0 Available 1n-08 Gigabit Ethernet-SX PCI-X Adapter (14106802)
ent1 Available 2U-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
ent2 Available 3Z-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
ent3 Available 2R-08 Gigabit Ethernet-SX PCI-X Adapter (14106802)
fcs0 Available 1j-08 FC Adapter
fcs1 Available 1D-08 FC Adapter
fcs2 Available 3c-08 FC Adapter
fcs3 Available 3n-08 FC Adapter
sa0 Available LPAR Virtual Serial Adapter
scsi0 Available 1Z-08 Wide/Ultra-3 SCSI I/O Controller
scsi1 Available 1Z-09 Wide/Ultra-3 SCSI I/O Controller
scsi2 Defined 4M-08 Wide/Ultra-3 SCSI I/O Controller
scsi3 Defined 4M-09 Wide/Ultra-3 SCSI I/O Controller

With a little more search on the same errror i found if a switch which connects SAN and the AIX box gets reboot without AIX graceful shutdown, can cause this error.

can anyone please provide some pointers if that is true and what should i do to avoid these errors.

Regards
Yash
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #2  
Old August 13th, 2007
ross.mather's Avatar
ross.mather Offline
Senior Member
 
Join Date: January 2007
Location: Nomadic in the UK
Posts: 574
Re: Controller health check

Ok first of all the fscsi and scsi devices you've highlighted are not related. The scsi devices are for a SCSI adater. It may be that you've changed the profile at some point and removed that SCSI adapter from the LPAR. Try deleting the scsi2 and scsi3 devices and running the cfgmgr to see if they come back before spending any more time on those, for both servers.

The errors you see are related to your fcs0 device connecting through to the SAN.

What is the detailed error text on these errors (one of each type):
BC669AA7 0812214407 P H dac1 CONTROLLER HEALTH CHECK FAILURE
3074FEB7 0812214407 T H fscsi0 ADAPTER ERROR

What hardware is being used, and which version of AIX is running? What SAn hardware do you have - a DS4000 by the looks of the errors?
__________________
Ross Mather, IBM AIX IT Specialist.
That said anything I say here is my own opinion and not anything that you can ever hold against IBM.
Ohhh and don't forget that I make mistakes too....
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #3  
Old August 13th, 2007
nagar_yash's Avatar
nagar_yash Offline
Junior Member
 
Join Date: May 2007
Posts: 16
Re: Controller health check

Hi

You are right, I check out from previous configs, scsi2 and scsi3 are not of the problem these are as it is for quite a long period, so not the issue. Issue is with fcs0 device.

Here is the detail error..

Server-ux1#errpt -aj BC669AA7

LABEL: FCP_ARRAY_ERR7
IDENTIFIER: BC669AA7

Date/Time: Mon 13 Aug 11:43:22 2007
Sequence Number: 3882541
Machine Id: 0053485A4C00
Node Id: Server-ux1
Class: H
Type: PERM
Resource Name: dac1
Resource Class: array
Resource Type: ibm-dac-V4
Location: U0.1-P2-I2/Q1-W200400A0B80F8FD4
VPD:
Manufacturer................IBM
Machine Type and Model......1742
Part Number.................348-0046200
ROS Level and ID............0520
Device Specific.(Z1)........05400400
Device Specific.(Z2)........05401103

Description
CONTROLLER HEALTH CHECK FAILURE

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
ARRAY DASD MEDIA

Failure Causes
DASD MEDIA
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
......
....
Server-ux1#errpt -aj 3074FEB7

LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

Date/Time: Mon 13 Aug 11:53:28 2007
Sequence Number: 3882558
Machine Id: 0053485A4C00
Node Id: tubairux1
Class: H
Type: TEMP
Resource Name: fscsi0
Resource Class: driver
Resource Type: efscsi
Location: U0.1-P2-I2/Q1

Description
ADAPTER ERROR

Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB

Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION

Detail Data
SENSE DATA
0000 0000 0000 00AD 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000A 0A00 0000 0000


SAN Hardware is Fastt700, AIX is 5.2008, P650 server.

What i all want is how to fix these errors, after that i need to include two disks into a VG.


Regards
Yash
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #4  
Old August 13th, 2007
ross.mather's Avatar
ross.mather Offline
Senior Member
 
Join Date: January 2007
Location: Nomadic in the UK
Posts: 574
Re: Controller health check

OK I see the trouble you are having. The error messages are in fact reporting that there is a problem with the link from server to disk.

If the switch is really being rebooted then that would explain whatis happening. In that case the only way to fix it is to put that one path offline before the switch is rebooted.

The LUN itself is fine, and I can't see anything else in the error message that suggests that there is an actual problem with the server itself. The only thoing you could check is a diagnostic of the fcs card and the level of firmware that it uses.
__________________
Ross Mather, IBM AIX IT Specialist.
That said anything I say here is my own opinion and not anything that you can ever hold against IBM.
Ohhh and don't forget that I make mistakes too....
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #5  
Old August 21st, 2007
nagar_yash's Avatar
nagar_yash Offline
Junior Member
 
Join Date: May 2007
Posts: 16
Re: Controller health check

Hi ross.mather

Yesterday IBM hardware specialist visited, and we concluded that the link in between SAN and AIX, is not the problem, we tried several time in & out Fibre cable at both ends, but Since there appear no light coming from card, which strongly suspects the card is faulty. Green light is glowing on|off with a regular interval, with no orange LED glowing like others.

Now i have no choice and we need to replace the HBA fcs0, currently the disk IO is running on other fcs2 HBA.

Is there any place i can find out a good information about how to replace a RDAC storage driver?

Rgds
Yash
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #6  
Old August 21st, 2007
ross.mather's Avatar
ross.mather Offline
Senior Member
 
Join Date: January 2007
Location: Nomadic in the UK
Posts: 574
Re: Controller health check

Not sure I understand the question, why do you want to replace the RDAC device driver? If you put a new Fibre Card in and then rezone your SAn everything should work just fine.
__________________
Ross Mather, IBM AIX IT Specialist.
That said anything I say here is my own opinion and not anything that you can ever hold against IBM.
Ohhh and don't forget that I make mistakes too....
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #7  
Old August 23rd, 2007
nagar_yash's Avatar
nagar_yash Offline
Junior Member
 
Join Date: May 2007
Posts: 16
Re: Controller health check

We use RDAC driver, which creates the dar, dac, hdisk, utm devices etc.

Ok, Since other devices e.g fcs1 etc also use RDAC, So no need to remove driver, let me put it in other way ..How do i need to reconfigure these drivers again for new HBA? So that they keep the same multipath, policies of failover etc..

In our setup we use fastt 700, shared storage with HACMP, active-passive cluster. Failed HBA is on active node. There are two servers in cluster, with two DAR named dar0, dar1, two DAC dac0, dac2, shared VGs, LVs are created across both to provide resilience under HACMP.

I read something relating Fastt700, will the following will be sufficient

1. HACMP resource groups, move them to failing over node, ensure VGs are varied off
2. Since all the RGs are moved to the other node, now all the devices on the redundant on the faulty HBA and can remove fcs0, fscsi0, dar0, dar1 , hdisk*, using rmdev
3. Run the cfgmgr -v

After that do i need to export the VGs or the ODM entries for VG, disks etc remains same...

5. move the RGs back


Rgds
Yash
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #8  
Old August 23rd, 2007
ross.mather's Avatar
ross.mather Offline
Senior Member
 
Join Date: January 2007
Location: Nomadic in the UK
Posts: 574
Re: Controller health check

It should all work online. Ypu may well need to check the zoning on your SAn Switches and on the DS4000 as the WWN of the original card will still be in the configuration.
__________________
Ross Mather, IBM AIX IT Specialist.
That said anything I say here is my own opinion and not anything that you can ever hold against IBM.
Ohhh and don't forget that I make mistakes too....
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Reply

Bookmarks

These are the 100 most searched terms
Search Cloud
0042-001 0042-001 nim 0042-008 nimsh: request denied 0513-001 the system resource controller daemon is not active 0513-001 the system resource controller daemon is not active. 0514-061 0514-061 cannot find a child device 0514-061 cannot find a child device. 0516-787 0516-787 extendlv 0516-787 extendlv: maximum allocation for logical volume 110000ac 3074feb7 aa00e1f3 aio aix aix aio aix freeware aix memory usage aix rsync aixif_arp_dup_addr b150f22a b181f22a b181fb53 ba010004 c1001020 d133c002 dacnone dcb47997 fcp_array_err6 fget_config gnu tar aix gtar aix hmc root password hmc vmware ibm p6 520 libpopt.a libpopt.a(libpopt.so.0) is needed by rsync-2.6.2-1 migratelv mksysb navisphere agent nim server pseriestech ptype and account type do not match rshd: 0826-813 permission is denied. rsync aix sc_disk_err4 scan_error_chrp vio server vmware hmc websm ... powered by Simple Search Cloud


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Powered by vbWiki Pro 1.3 RC5. Copyright ©2006-2007, NuHit, LLC

vBulletin Skin developed by: vBStyles.com


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73