Skip to content

How To Debug CCI Issues

Updated  by hvuser
  • PDF
  • Print
  • Copy To Clipboard
  • Collapse All Expand All

Content

How To Debug CCI Issues – Version 1.3

 

Introduction

This short document describes some of the many ways CCI can break. It should be used as a list of the things to check with a customer whenever CCI problems are raised.

Check:            This tag is used to highlight what to look for.

Installation and Environment Variables

UNIX

 

You would think it is easy to get this right – but we see many errors caused due to bad installation. Here is what should be done. Let us assume a non-default installation – as that is the easiest for the user to get wrong J

Let us assume we want to install CCI in /opt/HORCM

  1. Copy the installation file to the hard disk somewhere. It is called RMHORC. Let’s copy it to /var/tmp
  2. cd /opt
    Change directory to where you want /HORCM created

  3. cpio -idmu < /var/tmp/RMHORC
    This will copy all the files in the RMHORC “package” to /opt/HORCM

 

Check:            We have seen cases where users build their own installation “packages” for HORCM – and then copy the files from one Host to another. In UNIX particularly this is dangerous. CCI needs a “hidden” directory called .uds or it will not start.

  • In CCI 01-16-03 and below it was in /var/tmp
  • In CCI 01-17-03 and above it is in /yourdirectory/HORCM

This directory contains UNIX “pipes” when the instances are started. The “pipes” are deleted when the instance stops. Thus, you will see this:

 

root@SYD-E250-1:/opt/HORCM/.uds#:ls -al

total 4

drwxrwxrwx   2 root     sys          512 Feb 22 15:26 .

dr-xr-xr-x  12 root     sys          512 Feb 22 15:04 ..

 

root@SYD-E250-1:/opt/HORCM/.uds#:horcmstart.sh 4

starting HORCM inst 4

HORCM inst 4 starts successfully.

 

root@SYD-E250-1:/opt/HORCM/.uds#:ls -al

total 6

drwxrwxrwx   3 root     sys          512 Feb 22 16:29 .

dr-xr-xr-x  12 root     sys          512 Feb 22 15:04 ..

drwxrwxrwx   2 root     other        512 Feb 22 16:29 .lcmcl04

srwxrwxrwx   1 root     other          0 Feb 22 16:29 .lcmep04

 

Always check for this directory if you have a case where CCI does not start.

4          ln -s /opt/HORCM /HORCM

Check: You must create a link or the install in the next step will fail.

5          /HORCM/horcminstall.sh

Check:            You must do this on UNIX to create links to the CCI commands.

6          raidqry -h

 

Here is what you see if the user has done everything right.

 

root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HORC

 -h     Help/Usage

 -I[#]  Set to HORCMINST#

 -IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]

 -IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]

 -z     Set to the interactive mode

 -zx    Set to the interactive mode and HORCM monitoring

 -q     Quit(Return to main())

 -g            Specify for getting all group name on local

 -l            Specify the local query

 -r <group>    Specify the remote query

 -f            Specify display for floatable host

 

Check:            Always get the user to run this command and send you the output.

  • If user is using an old version of CCI, ask why. 01-19-03/04 or later is preferred for many reasons which will be discussed later.
  • In some special cases, like HPtM, a specific level of CCI may be stated in the ECN or Release Notes. In this case, it may be advisable to stick with that level. Also, the microcode ECNs for 9900V and USP always recommend a CCI level.
  • However, in my experience, CCI is always backwards compatible – and the developer has confirmed this – so one should always use the minimum level stated in ECNs.
  • Any command this user issues will be assumed to be TrueCopy (refer above - HORC). If the user is trying to perform ShadowImage operation, you now know why it is failing J
  • No instance has been set. Here is what you see if the instance variable has been set:

 

root@SYD-E250-1:/opt/HORCM/.uds#:export HORCMINST=4

root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HORC[4]

 -h     Help/Usage

 

If the user is trying to control Instance 1, now you know why it is failing J

 

Windows

Installation is easier. Double click the EXE and follow the bouncing ball. I always recommend taking the default of C:\HORCM. The directory is only about 10 MB in size so we are not likely to fill the drive.

Check:            The same rule applies as for UNIX. Always ask for raidqry output.

Environment Variables

 

Here is a ShadowImage example.

C:\HORCM\ETC>set horcminst=4

C:\HORCM\ETC>set horcc_mrcf=1

 

C:\HORCM\ETC>raidqry -h

Model  : RAID-Manager/WindowsNT

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HOMRCF[4]

 -h     Help/Usage

 -I[#]  Set to HORCMINST#

 -IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]

 -IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]

 -z     Set to the interactive mode

 

Check:            horcminst is case insensitive on windows – it is case sensitive on UNIX

Change mode of operation to TrueCopy

C:\HORCM\ETC>set horcc_mrcf=

C:\HORCM\ETC>raidqry -h

Model  : RAID-Manager/WindowsNT

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HORC[4]

 

Now try this on UNIX. Change mode of operation:

 

root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=1

root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HOMRCF[4]

 

root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=

root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HOMRCF[4]

 

Check:            This does not work. You must do this.

root@SYD-E250-1:/opt/HORCM/.uds#:unset HORCC_MRCF

root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Usage  : raidqry [options] for HORC[4]

 

Sending logs to GSC

If you have to escalate the problem to GSC, we will need the complete set of HORCM logs and all the HORCM CONF files. In general, the preferred method of doing this is to run “getconfig”. These executables/scripts are available on the TUF Knowledge area here.

If for any reason you do not run these scripts, you must zip up all the LOG directories underneath the HORCM directory. Never pick and choose which log to upload – many of them have the same name – and GSC may need to refer to all of them!

In addition, the factory always asks for the output from these commands (Windows only):

 

inqraid -CLI -fgx  $Phys

inqraid -CLI -fgvx $Vol

inqraid -CLI -fgx  $LETALL

 

Finding Command Devices

You cannot create a HORCM CONF file or check it for accuracy without doing INQRAID commands for UNIX/Windows and RAIDSCAN commands for Windows.

 

UNIX

Check:            Get the user to send you the result of this command:

 

root@SYD-E250-1:/opt/HORCM/.uds#:ls /dev/rdsk/* | inqraid -CLI -fxg

DEVICE_FILE     PORT    SERIAL  LDEV CTG  H/M/12  SSID R:Group PRODUCT_ID

...

c2t0d16s2       CL1-A-4  10111     0   -       -     -       - OPEN-V-CM

c2t2d36s2       -            -     -   -       -     -       - -        

c2t4d0s2        CL1-A-20  20169   43   -       -     -       - OPEN-V-CM

c2t6d0s2        CL1-A-11  80025  31F   -       -     -       - OPEN-V-CM

c2t6d14s2       -            -     -   -       -     -       - -        

c3t2d128s2      CL2-A-6  10262  2180   -       -     -       - OPEN-V-CM

c3t3d4s2        CL2-A-7   3157     1   -       -     -       - DF600F-CM

...

 

Here are 5 command devices – 2 are in bold. The first is a USP – the second is a 9570V. If the user wants to use the first one, they need to code:

/dev/rdsk/c2t6d0s2

in the HORCM CONF file.

Windows

Check:            Get the user to send you the result of these commands:

 

C:\HORCM\ETC>raidscan -x findcmddev h0,20

cmddev of Ser#   10111 = \\.\PhysicalDrive2

cmddev of Ser#   10111 = \\.\PhysicalDrive5

cmddev of Ser#      41 = \\.\PhysicalDrive7

cmddev of Ser#   10262 = \\.\PhysicalDrive8

cmddev of Ser#   80025 = \\.\PhysicalDrive10

cmddev of Ser#   20169 = \\.\PhysicalDrive11

cmddev of Ser#   20169 = \\.\Volume{3c107ab6-7dbf-11db-a1ed-000e0c6abf1d}

 

Check:            Do not use ANY of these names! If you find a user using this syntax, ask that it be changed J See INQRAID output below:

 

  • Harddisk numbers can change after a reboot.
  • GUID numbers can change in a MS Cluster environment after reboot
  • Do yourself a favour – do not use these names

 

C:\HORCM\ETC>inqraid $Phys -CLI

DEVICE_FILE     PORT    SERIAL  LDEV CTG  H/M/12  SSID R:Group PRODUCT_ID

Harddisk0       -            -     -   -       -     -       - 00JS-22MHB0    

Harddisk1       -            -     -   -       -     -       - 00JS-22MHB0    

Harddisk2       CL1-A    10111   515   -       -     -       - OPEN-V-CM      

Harddisk3       CL1-A    10111  1920   -  s/s/ss  9997 5:03-02 OPEN-V         

Harddisk4       CL1-A    10111   768   -  s/s/ss  9993 5:06-02 OPEN-V         

Harddisk5       CL1-A    10111  1856   -       -     -       - OPEN-V-CM      

Harddisk6       CL1-A    10111  2632   -  P/s/ss  999A 5:06-02 OPEN-V         

Harddisk7       CL1-A       41     0   -       -     -       - DF600F-CM       

Harddisk8       CL1-A    10262    16   -       -     -       - OPEN-V-CM      

Harddisk9       CL1-A    10262  8320   -  s/s/ss  2000 5:01-05 OPEN-V         

Harddisk10      CL1-A    80025   784   -       -     -       - OPEN-V-CM      

Harddisk11      CL1-A    20169    13   -       -     -       - OPEN-V-CM 

 

\\.\CMD syntax has been around since 01-17-03/05. There is no reason not to use it! If the user is running 01-17-03/05 or below, get them to use 01-19-03/04 or higher.

 

In this case, for Harddisk8/USP 10262, the correct syntax in the HORCM CONF file is:

\\.\CMD-10262-16                              or even

\\.\CMD-10262-16-CL1-A-12 if you know this is HSD 12 - or

\\.\CMD-10262-16-CL1-A                  or, for slack people,

\\.\CMD-10262                                    J

 

\\.\CMD-10262-16 is my preferred coding technique – as this takes care of multipath environments as well.

 

Starting HORCM Instances

There are so many ways for this to fail that I could write a book on this topic J

 

So, always take the easy way out. Send the user a deck that is bound to work. If it does not, then you have very little to debug. Here is such a deck – HORCM4.CONF:

 

UNIX

HORCM_MON

#ip_address     service         poll(10ms)      timeout(10ms)

10.129.2.53     11004           1000            3000

 

HORCM_CMD

#dev_name       dev_name    dev_name            dev_name

/dev/rdsk/c2t6d0s2

 

HORCM_DEV

#dev_group      dev_name        port#   TargetID        LU#     MU#

 

HORCM_INST

#dev_group      ip_address      service

 

There are only 3 things to check:

  • Is the IP address correct? Note: You can use “localhost” here, but this will not work for TC environments using 2 different CCI servers.
  • Is 11004 a “free” UDP port? Almost certainly it is.
  • Is the CMDDEV right? You can tell that from the commands we have already issued.

 

UNIX HORCM CONF files are kept in /etc

 

Windows

Here is HORCM8.CONF for Windows:

 

HORCM_MON

#ip_address     service         poll(10ms)      timeout(10ms)

10.129.3.127    11008           1000            3000

 

HORCM_CMD

#dev_name       dev_name    dev_name            dev_name

\\.\CMD-10262-16

 

HORCM_DEV

#dev_group      dev_name        port#   TargetID        LU#     MU#

 

HORCM_INST

#dev_group      ip_address      service

 

Use the same logic as for UNIX. Windows HORCM CONF files are in C:\WINDOWS

Other recommendations:

  • HDvM uses HORCM CONF files called HORCM900.CONF to HORCM988.CONF for temporary HORCM CONF files. Do not use these numbers yourself.
  • I suggest that you use 0-799 for user created files and 800-899 for HDvM created permanent HORCM CONF files.
  • I also suggest a numbering convention of 1100x where x is the number in HORCMx.CONF. This means that you will need to “reserve” UDP ports 11000 to 11899 for HORCM CONF usage.

 

Updating the “Services” file

Many people code HORCM CONF files like this:

HORCM_MON

#ip_address     service         poll(10ms)      timeout(10ms)

10.129.3.127    horcm8          1000            3000

 

In this case, the UDP port – horcm8 – must be defined in the “Services” file:

Windows         C:\WINDOWS\system32\drivers\etc\services

UNIX              /etc/services

 

Like this:

 

horcm0                        11000/udp

horcm1                        11001/udp

horcm8                        11008/udp

horcm9                        11009/udp

“blank line”

 

Check:            Under Windows, if there is no blank line after horcm9 (in this example) that definition will be ignored! PS No blank lines at the end of the HORCM CONF file, please J

Check:            If you have 2 CCI servers, using horcm8 and horcm9, for example, then both horcm8 and horcm9 have to be defined in both servers!

 

Reading the LOGS

 

Windows

Let’s start with Windows first this time J

In our example – we used Instance 8 – so you will find the log here:

 

C:\HORCM\log8\curlog\horcm_ml_acer510_log.txt

because this server is called ml_acer510.

 

Let us examine it in detail:

 

*****************************************************************************

            -  HORCM STARTUP LOG - Thu Feb 22 18:03:08 2007

*****************************************************************************

18:03:08-39210-05000- horcmgr started on Thu Feb 22 18:03:08 2007

18:03:08-39210-05000- _spawnvp() horcmd_08 using horcmgr [CWD=C:\HORCM\ETC]

18:03:08-3d090-07240- Fibre address conversion TBL has been set to '2'

 

P.P.   : RAID Manager for WindowsNT

Model  : RAID-Manager/WindowsNT

Ver&Rev: 01-19-03/04

Release: Production(GA)

 

ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.

 

HORCM(ml_acer510 7240) started by Administrator (0) on Thu Feb 22 18:03:08 2007

 

Lots of useful information here. See the data in bold.

 

18:03:08-3d090-07240- horcmd_08 started on Thu Feb 22 18:03:08 2007

18:03:08-3d090-07240- [horcmcfgrdf] access(conf_file) OK.

18:03:08-3d090-07240- [horcmcfgrdf] access(check) OK.

18:03:08-3d090-07240- [horcmcfgrdf] open(conf_file) OK.

18:03:08-3d090-07240- [horcmcfgetent] fseek(top) OK.

18:03:08-40b28-07240- converted CMDDEV filename \\.\CMD-10262-16 to \\.\PhysicalDrive8

 

 

Here is where \\.\CMD syntax is converted to a physical drive number.

 

18:03:08-40b28-07240- [horcmcfgetent] read(conf_file) OK.

18:03:08-40b28-07240- [horcmcfgrdf] close(conf_file) OK.

18:03:08-40b28-07240- [horcmcfgrdf] check(conf) OK.

18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.

18:03:08-40b28-07240- [horcread] cmddevopen() start

18:03:08-40b28-07240- [horcread] cmddevopen() finished

18:03:08-40b28-07240-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d8d4]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0x0012d8e4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0x0012d8f4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0x0012d904]0030:   f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09     000000010262.... 

 

Here is the USP serial number.

[0x0012d914]0040:   50090100 00040000 00040004 00040004     P............... 

[0x0012d924]0050:   ffffffff ffffffff 00060006 00060006     ................ 

[0x0012d934]0060:   00070007 00070007 000f0c00 00000000     ................ 

[0x0012d944]0070:   00000000 ef00e011 08030100 01004000     ..............@. 

[0x0012d954]0080:   38000400 04400100 01000400 00ff0100     8....@.......... 

[0x0012d964]0090:   80000000 00000000 00000000 00000000     ................ 

[0x0012d974]00a0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d984]00b0:   00800012 000e0002 00000000 00000000     ................ 

[0x0012d994]00c0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9a4]00d0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9b4]00e0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9c4]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d9d4]0100:   0004ffff 00060007 ffffffff ffff000b     ................ 

[0x0012d9e4]0110:   ffff000d 000e000f 00100011 00120013     ................ 

[0x0012d9f4]0120:   00140015 00160017 9914ffff 001a001b     ................ 

[0x0012da04]0130:   001c001d 001e001f 00200021 00220023     ......... .!.".# 

[0x0012da14]0140:   20002001 00260027 00280029 002a002b      . ..&.'.(.).*.+ 

[0x0012da24]0150:   002c002d ffff002f 00300031 00320033     .,.-.../.0.1.2.3 

[0x0012da34]0160:   00340035 00360037 00380039 003a003b     .4.5.6.7.8.9.:.; 

[0x0012da44]0170:   003c003d 003e003f 00400041 00420043     .<.=.>.?.@.A.B.C 

18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256

18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8

18:03:08-40b28-07240- [HORCMCFGRDF] SLPR is supported.

18:03:08-40b28-07240-                *****  SLPR bitmap  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012db18]0000:   80000000 00000000 00000000 00000000     ................ 

[0x0012db28]0010:   00000000 00000000 00000000 00000000     ................ 

18:03:08-40b28-07240- [HORCRELOWNLBA] floatable LBA(e011) is released.ID:0:\\.\PhysicalDrive8

18:03:08-40b28-07240- [horcread] cmddevopen() start

18:03:08-40b28-07240- [horcread] cmddevopen() finished

18:03:08-40b28-07240-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d8d4]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0x0012d8e4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0x0012d8f4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0x0012d904]0030:   f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09     000000010262.... 

[0x0012d914]0040:   50090100 00040000 00040004 00040004     P............... 

[0x0012d924]0050:   ffffffff ffffffff 00060006 00060006     ................ 

[0x0012d934]0060:   00070007 00070007 000f0c00 00000000     ................ 

[0x0012d944]0070:   00000000 ef00e011 08030100 01004000     ..............@. 

[0x0012d954]0080:   38000400 04400100 01000400 00ff0100     8....@.......... 

[0x0012d964]0090:   80000000 00000000 00000000 00000000     ................ 

[0x0012d974]00a0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d984]00b0:   00800012 000e0002 00000000 00000000     ................ 

[0x0012d994]00c0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9a4]00d0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9b4]00e0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9c4]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d9d4]0100:   0004ffff 00060007 ffffffff ffff000b     ................ 

[0x0012d9e4]0110:   ffff000d 000e000f 00100011 00120013     ................ 

[0x0012d9f4]0120:   00140015 00160017 9914ffff 001a001b     ................ 

[0x0012da04]0130:   001c001d 001e001f 00200021 00220023     ......... .!.".# 

[0x0012da14]0140:   20002001 00260027 00280029 002a002b      . ..&.'.(.).*.+ 

[0x0012da24]0150:   002c002d ffff002f 00300031 00320033     .,.-.../.0.1.2.3 

[0x0012da34]0160:   00340035 00360037 00380039 003a003b     .4.5.6.7.8.9.:.; 

[0x0012da44]0170:   003c003d 003e003f 00400041 00420043     .<.=.>.?.@.A.B.C 

18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256

18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8

18:03:08-40b28-07240- [HORCMCFGRDF] SLPR bitmap is checked.

18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.

18:03:08-40b28-07240- [HORCRELOWNLBA] floatable LBA(e011) is released.ID:0:\\.\PhysicalDrive8

18:03:08-40b28-07240- [horcread] cmddevopen() start

18:03:08-40b28-07240- [horcread] cmddevopen() finished

18:03:08-449a8-07240-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d8d4]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0x0012d8e4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0x0012d8f4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0x0012d904]0030:   f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09     000000010262.... 

[0x0012d914]0040:   50090100 00040000 00040004 00040004     P............... 

[0x0012d924]0050:   ffffffff ffffffff 00060006 00060006     ................ 

[0x0012d934]0060:   00070007 00070007 000f0c00 00000000     ................ 

[0x0012d944]0070:   00000000 ef00e011 08030100 01004000     ..............@. 

[0x0012d954]0080:   38000400 04400100 01000400 00ff0100     8....@.......... 

[0x0012d964]0090:   80000000 00000000 00000000 00000000     ................ 

[0x0012d974]00a0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d984]00b0:   00800012 000e0002 00000000 00000000     ................ 

[0x0012d994]00c0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9a4]00d0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9b4]00e0:   00000000 00000000 00000000 00000000     ................ 

[0x0012d9c4]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012d9d4]0100:   0004ffff 00060007 ffffffff ffff000b     ................ 

[0x0012d9e4]0110:   ffff000d 000e000f 00100011 00120013     ................ 

[0x0012d9f4]0120:   00140015 00160017 9914ffff 001a001b     ................ 

[0x0012da04]0130:   001c001d 001e001f 00200021 00220023     ......... .!.".# 

[0x0012da14]0140:   20002001 00260027 00280029 002a002b      . ..&.'.(.).*.+ 

[0x0012da24]0150:   002c002d ffff002f 00300031 00320033     .,.-.../.0.1.2.3 

[0x0012da34]0160:   00340035 00360037 00380039 003a003b     .4.5.6.7.8.9.:.; 

[0x0012da44]0170:   003c003d 003e003f 00400041 00420043     .<.=.>.?.@.A.B.C 

18:03:08-449a8-07240- [HORCREAD]: maxldev = 16384, unitnum = 256

18:03:08-449a8-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

18:03:08-449a8-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

18:03:08-449a8-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8

18:03:08-449a8-07240- [horcmcfgrdf] seldevdata() OK.

18:03:08-449a8-07240- MON(HORC): Size of memory allocation for CONFIG_DB = 64 bytes.

18:03:08-449a8-07240- MON:HORCM_CMD=\\.\PhysicalDrive8[Fibre][AL-PA=0xef -> C=5,T=1] port=CL1-A , targ=1, lun=12

 

Here is the AL-PA for the Port, and the Port, target ID and LUN.

18:03:08-449a8-07240- MON(HORC):number of Mus = 0

18:03:08-449a8-07240- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0 bytes.

18:03:08-449a8-07240- MON(HOMRCF):number of Mus = 0

18:03:10-d1b78-05000- horcmgr : executed 'CreateProcess(raidscan.exe -pi $PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.

 

I have quoted this in full for a reason. This is what you should expect to see if it all works. If it does not work, at least you can compare the user log with this one J

 

UNIX

Here is the output for a Solaris server called SYD-E250-1. This is the log for instance 4. As we installed CCI in /opt/HORCM, the log is here:

 

root@SYD-E250-1:/opt/HORCM/log4/curlog#:ls -al

total 28

drwxr-xr-x   3 root     other        512 Mar  7 16:29 .

drwxr-xr-x   4 root     other        512 Mar  7 16:29 ..

-rw-r--r--   1 root     other      10274 Mar  7 16:30 horcm_SYD-E250-1.log

drwxr-xr-x   2 root     other        512 Mar  7 16:29 horcmlog_SYD-E250-1

 

*****************************************************************************

            -  HORCM STARTUP LOG - Wed Mar  7 16:29:59 2007

*****************************************************************************

16:29:59-cac9d-11271- horcmgr started on Wed Mar  7 16:29:59 2007

16:29:59-cd940-11271- execvp() horcmd_04 using /etc/horcmgr [CWD=/]

16:29:59-e99c5-11272- Fibre address conversion TBL has been set to '1'

 

P.P.   : RAID Manager for Solaris

Model  : RAID-Manager/Solaris

Ver&Rev: 01-19-03/04

Release: Production(GA)

 

ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.

 

HORCM(SYD-E250-1 11272) started by root (0) on Wed Mar  7 16:30:00 2007

 

16:30:00-11d9d-11272- horcmd_04 started on Wed Mar  7 16:30:00 2007

16:30:00-17e65-11272- [horcmcfgrdf] access(conf_file) OK.

16:30:00-1c076-11272- [horcmcfgrdf] access(check) OK.

16:30:00-1e127-11272- [horcmcfgrdf] open(conf_file) OK.

16:30:00-29cf3-11272- [horcmcfgetent] fseek(top) OK.

16:30:00-31d0e-11272- [horcmcfgetent] read(conf_file) OK.

16:30:00-34856-11272- [horcmcfgrdf] close(conf_file) OK.

16:30:00-389cb-11272- [horcmcfgrdf] check(conf) OK.

16:30:00-4a34c-11272- [horcmcfgrdf] horccmddev(0) OK.

16:30:00-5ac7f-11272- [horcread] cmddevopen() start

16:30:00-63837-11272- [horcread] cmddevopen() finished

16:30:00-6e384-11272-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcb94]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0xffbfcba4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0xffbfcbb4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0xffbfcbc4]0030:   f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09     000000080025.... 

 

NSC55 with a Serial Number of 80025

 

[0xffbfcbd4]0040:   50090500 00020000 00050005 00050005     P............... 

[0xffbfcbe4]0050:   00040004 00040004 00060006 00060006     ................ 

[0xffbfcbf4]0060:   00070007 00070007 000f2a00 00000000     ..........*..... 

[0xffbfcc04]0070:   00000000 b200e00c 08030100 01004000     ..............@. 

[0xffbfcc14]0080:   f8000400 04400100 01000400 00ff0100     8....@.......... 

[0xffbfcc24]0090:   80000000 00000000 00000000 00000000     ................ 

[0xffbfcc34]00a0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc44]00b0:   0080000e 00080002 00000000 00000000     ................ 

[0xffbfcc54]00c0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc64]00d0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc74]00e0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc84]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcc94]0100:   00050004 00060007 00080009 0060ffff     .............`.. 

[0xffbfcca4]0110:   ffffffff ffffffff ffff000d ffffffff     ................ 

[0xffbfccb4]0120:   0020ffff ffffffff ffffffff ffffffff     . .............. 

[0xffbfccc4]0130:   ffffffff ffffffff ffffffff ffffffff     ................ 

[0xffbfccd4]0140:   0039ffff ffffffff ffffffff ffffffff     .9.............. 

[0xffbfcce4]0150:   0028ffff ffffffff ffff002d ffffffff     .(.........-.... 

[0xffbfccf4]0160:   ffffffff 00320033 ffffffff ffffffff     .....2.3........ 

[0xffbfcd04]0170:   ffffffff ffffffff ffffffff ffff000a     ................ 

16:30:01-ae6ea-11272- [HORCREAD]: maxldev = 16384, unitnum = 256

16:30:01-b1cea-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

16:30:01-b5e34-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

16:30:01-c2226-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2

 

Here is the CMDDEV

 

16:30:01-c636e-11272- [HORCMCFGRDF] SLPR is supported.

16:30:01-ca4bf-11272-                *****  SLPR bitmap  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfce08]0000:   80000000 00000000 00000000 00000000     ................ 

[0xffbfce18]0010:   00000000 00000000 00000000 00000000     ................ 

16:30:01-dad71-11272- [HORCRELOWNLBA] floatable LBA(e00c) is released.ID:0:/dev/rdsk/c2t6d0s2

16:30:01-deb6b-11272- [horcread] cmddevopen() start

16:30:01-e2d12-11272- [horcread] cmddevopen() finished

16:30:01-e7502-11272-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcb94]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0xffbfcba4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0xffbfcbb4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0xffbfcbc4]0030:   f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09     000000080025.... 

[0xffbfcbd4]0040:   50090500 00020000 00050005 00050005     P............... 

[0xffbfcbe4]0050:   00040004 00040004 00060006 00060006     ................ 

[0xffbfcbf4]0060:   00070007 00070007 000f2a00 00000000     ..........*..... 

[0xffbfcc04]0070:   00000000 b200e00c 08030100 01004000     ..............@. 

[0xffbfcc14]0080:   f8000400 04400100 01000400 00ff0100     8....@.......... 

[0xffbfcc24]0090:   80000000 00000000 00000000 00000000     ................ 

[0xffbfcc34]00a0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc44]00b0:   0080000e 00080002 00000000 00000000     ................ 

[0xffbfcc54]00c0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc64]00d0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc74]00e0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc84]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcc94]0100:   00050004 00060007 00080009 0060ffff     .............`.. 

[0xffbfcca4]0110:   ffffffff ffffffff ffff000d ffffffff     ................ 

[0xffbfccb4]0120:   0020ffff ffffffff ffffffff ffffffff     . .............. 

[0xffbfccc4]0130:   ffffffff ffffffff ffffffff ffffffff     ................ 

[0xffbfccd4]0140:   0039ffff ffffffff ffffffff ffffffff     .9.............. 

[0xffbfcce4]0150:   0028ffff ffffffff ffff002d ffffffff     .(.........-.... 

[0xffbfccf4]0160:   ffffffff 00320033 ffffffff ffffffff     .....2.3........ 

[0xffbfcd04]0170:   ffffffff ffffffff ffffffff ffff000a     ................ 

16:30:02-62fd9-11272- [HORCREAD]: maxldev = 16384, unitnum = 256

16:30:02-6712a-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

16:30:02-6b268-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

16:30:02-77659-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2

16:30:02-7b7d2-11272- [HORCMCFGRDF] SLPR bitmap is checked.

16:30:02-7f90c-11272- [horcmcfgrdf] horccmddev(0) OK.

16:30:02-85faf-11272- [HORCRELOWNLBA] floatable LBA(e00c) is released.ID:0:/dev/rdsk/c2t6d0s2

16:30:02-89c66-11272- [horcread] cmddevopen() start

16:30:02-8de05-11272- [horcread] cmddevopen() finished

16:30:02-925ff-11272-                *****  horcread  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcb94]0000:   48495441 43484920 4f50454e 2052454d     HITACHI OPEN REM 

[0xffbfcba4]0010:   4f544520 434f5059 20535953 54454d20     OTE COPY SYSTEM  

[0xffbfcbb4]0020:   44415441 20545950 45203031 00000000     DATA TYPE 01.... 

[0xffbfcbc4]0030:   f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09     000000080025.... 

[0xffbfcbd4]0040:   50090500 00020000 00050005 00050005     P............... 

[0xffbfcbe4]0050:   00040004 00040004 00060006 00060006     ................ 

[0xffbfcbf4]0060:   00070007 00070007 000f2a00 00000000     ..........*..... 

[0xffbfcc04]0070:   00000000 b200e00c 08030100 01004000     ..............@. 

[0xffbfcc14]0080:   f8000400 04400100 01000400 00ff0100     8....@.......... 

[0xffbfcc24]0090:   80000000 00000000 00000000 00000000     ................ 

[0xffbfcc34]00a0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc44]00b0:   0080000e 00080002 00000000 00000000     ................ 

[0xffbfcc54]00c0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc64]00d0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc74]00e0:   00000000 00000000 00000000 00000000     ................ 

[0xffbfcc84]00f0:   00000000 00000000 00000000 00000000     ................ 

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0xffbfcc94]0100:   00050004 00060007 00080009 0060ffff     .............`.. 

[0xffbfcca4]0110:   ffffffff ffffffff ffff000d ffffffff     ................ 

[0xffbfccb4]0120:   0020ffff ffffffff ffffffff ffffffff     . .............. 

[0xffbfccc4]0130:   ffffffff ffffffff ffffffff ffffffff     ................ 

[0xffbfccd4]0140:   0039ffff ffffffff ffffffff ffffffff     .9.............. 

[0xffbfcce4]0150:   0028ffff ffffffff ffff002d ffffffff     .(.........-.... 

[0xffbfccf4]0160:   ffffffff 00320033 ffffffff ffffffff     .....2.3........ 

[0xffbfcd04]0170:   ffffffff ffffffff ffffffff ffff000a     ................ 

16:30:02-e7a8a-11272- [HORCREAD]: maxldev = 16384, unitnum = 256

16:30:02-ebbdb-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1

16:30:02-efd23-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1

16:30:03-07ece-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2

16:30:03-0e0d4-11272- [horcmcfgrdf] seldevdata() OK.

16:30:03-12354-11272- MON(HORC): Size of memory allocation for CONFIG_DB = 64 bytes.

16:30:03-16392-11272- MON:HORCM_CMD=/dev/rdsk/c2t6d0s2[Fibre][AL-PA=0xb2 -> C=2,T=32] port=CL1-A , targ=32, lun=42

 

Here is the AL-PA for the Port, and the Port, target ID and LUN.

 

16:30:03-1a4ba-11272- MON(HORC):number of Mus = 0

16:30:03-1e633-11272- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0 bytes.

16:30:03-2275a-11272- MON(HOMRCF):number of Mus = 0

16:30:07-b3adf-11271- horcmgr : executed 'system(/bin/ls /dev/rdsk/* | /HORCM/usr/bin/raidscan -find inst)'.

 

Audit Logging

 

Check:            Always set full logging if possible. This was introduced with 01-17-03/05 – but it is disabled by default. The environment variable is HORCC_LOGSZ. If this environment variable is not set, only errors are logged. With this variable set, successful commands are logged as well – very useful if you need to know what was typed and when.

 

However, only the input is logged, not the output! So, always cut and paste the entire Command Prompt session and send that to GSC as well.

 

Check:            Can the user reproduce this problem at will? If so, get them to stop CCI, delete the LOGx directories and then start CCI and issue the command that fails. This will make reading the LOGx files much easier as the only messages in the logs will be what you want to look at J

 

Windows

 

TSTART.BAT  BAT file to start CCI and set the correct options for TC

 

@echo off

rem

rem Batch file to start HORCM for TrueCopy operations

rem

rem turn on CCI logging for 01-17-03/05 or later

set HORCC_LOGSZ=2048

rem

raidscan -x findcmddev h0,20

set horcmfctbl=2

rem set instance to match your naming convention for the PVOL instance

set horcminst=0

rem next line with a value for SI only!

set horcc_mrcf=

horcmstart 0 1

 

TSTOP.BAT         BAT file to stop CCI

 

@echo off

rem

rem Batch file to stop HORCM after TrueCopy operations

rem

horcmshutdown 0 1

set horcmfctbl=

set horcminst=

set horcc_mrcf=

set HORCC_LOGSZ=

 

UNIX

 

Check:            Always ask the user to “cut and paste” the command line input and output – you need to know what they typed and what the result was J

 

root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmstart.sh 4

starting HORCM inst 4

 

HORCM inst 4 starts successfully.

root@SYD-E250-1:/opt/HORCM/log4/curlog#:export HORCC_LOGSZ=2048

root@SYD-E250-1:/opt/HORCM/log4/curlog#:raidscan -p CL1-A

root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmshutdown.sh 4

inst 4:

HORCM Shutdown inst 4 !!!

 

root@SYD-E250-1:/opt/HORCM/log4/curlog#:cd ..

root@SYD-E250-1:/opt/HORCM/log4#:ls -al

total 10

drwxr-xr-x   4 root     other        512 Mar  7 16:50 .

dr-xr-xr-x  12 root     sys          512 Feb 22 15:04 ..

drwxr-xr-x   3 root     other        512 Mar  7 16:49 curlog

-rw-r--r--   1 root     other        289 Mar  7 16:51 horcc_SYD-E250-1.log

drwxr-xr-x   3 root     other        512 Mar  7 16:29 tmplog

 

Here are the contents of LOG file: horcc_SYD-E250-1.log

 

COMMAND NORMAL : EUserId for HORC[4] : root (0)  Wed Mar  7 16:50:36 2007

CMDLINE : raidscan -p CL1-A

16:50:37-450c6-11368- [raidscan][exit(0)]

COMMAND NORMAL : EUserId for HORC[4] : root (0)  Wed Mar  7 16:51:53 2007

CMDLINE : /usr/bin/horcctl -S

16:51:54-0f8cf-11376- [horcctl][exit(0)]

 

Command Device Reject

Most CCI errors are self explanatory – however this one is usually impossible for the user to debug L

Here is a simple ShadowImage example:

 

HORCM8.CONF

 

HORCM_MON

#ip_address         service             poll(10ms)          timeout(10ms) 

localhost           11008               1000                3000          

HORCM_CMD

\\.\CMD-977-5

HORCM_DEV

#dev_group          dev_name            port#     TargetID  LU#     MU#

#

VG01                LDEV49              CL1-A-1   1         7       0

HORCM_INST

#dev_group          ip_address          service  

#

VG01                localhost           11009   

 

HORCM9.CONF

 

HORCM_MON

#ip_address         service             poll(10ms)          timeout(10ms) 

localhost           11009               1000                3000          

HORCM_CMD

\\.\CMD-977-5

HORCM_DEV

#dev_group          dev_name            port#     TargetID  LU#     MU#

#

VG01                LDEV49              CL1-A-1   1         8       0

HORCM_INST

#dev_group          ip_address          service  

#

VG01                localhost           11008 

 

Check:            Is the user using “good syntax”? 

  • Even though this is a 9500V, users should always use Port-HSD-LUN syntax
  • I strongly recommend not to use Port-LDEV syntax – where is the cross-check?
  • Is MU# specified for ShadowImage? On some levels of CCI, this is mandatory. However, you should specify it anyway as this is Best Practice.

 

How to check if the HORCM CONF files are correct:

 

C:\HORCM\ETC>raidscan -p CL1-A-1 -m 0

PORT# /ALPA/C,TID#, LU#..Num(LDEV#....)...P/S, Status, LDEV#,P-Seq#,P-LDEV#

CL1-A-1/ef/ 5, 1,   0-0 .1(13)............S-VOL PAIR     13,  -----     10

CL1-A-1/ef/ 5, 1,   1-0 .1(29)............P-VOL PSUS     29,    977    309

CL1-A-1/ef/ 5, 1,   2-0 .1(48)............P-VOL PSUS     48,    977    300

CL1-A-1/ef/ 5, 1,   3-0 .1(309)...........S-VOL SSUS    309,  -----     29

CL1-A-1/ef/ 5, 1,   4-0 .1(310)...........S-VOL SSUS    310,  -----     29

CL1-A-1/ef/ 5, 1,   5-0 .1(308)...........S-VOL SSUS    308,  -----     24

CL1-A-1/ef/ 5, 1,   6-0 .1(305)...........S-VOL SSUS    305,  -----      1

CL1-A-1/ef/ 5, 1,   7-0 .1(49)............SMPL  ----  -----,  -----  -----

CL1-A-1/ef/ 5, 1,   8-0 .1(50)............SMPL  ----  -----,  -----  -----

 

C:\HORCM\ETC>pairdisplay -g VG01

Group   PairVol(L/R) (Port#,TID, LU-M) ,Seq#,LDEV#.P/S,Status, Seq#,P-LDEV# M

VG01    LDEV49(L)   (CL1-A-1, 1,   7-0977    49.SMPL  ----,-----   ----- -

VG01    LDEV49(R)   (CL1-A-1, 1,   8-0977    50.SMPL  ----,-----   ----- -

  • Check the PVOL and SVOL LDEV numbers (in this case, 49 and 50)
  • Check the Port/HSD/LUN (in this case, CL1-A-1 – LUNs 7 and 8)

 

But now the failure:

C:\HORCM\ETC>paircreate -g VG01 -vl

paircreate: [EX_CMDRJE] An order to the control/command device was rejected

Refer to the command log(C:\HORCM\log8\horcc_hp2k5_log.txt) for details.

 

And in the log we see this:

 

COMMAND ERROR  : EUserId for HOMRCF[8] : Administrator (0)  Wed Mar 07 17:02:30 2007

CMDLINE : paircreate -g VG01 -vl

17:02:30-9a8a8-12452- ERROR:cm_sndrcv[rc < 0 from HORCM]

17:02:30-9e728-12452- [paircreate] L_CMD(CREATEPAIR) ERROR :rc = -35

17:02:30-9e728-12452- [paircreate][exit(221)]

[EX_CMDRJE] An order to the control/command device was rejected

[Cause ]: An order to the command(control) device failed,or was rejected.

[Action]:Please confirm the following items.If this trouble doesn't resolve,then collect HORCM error logs(HORCM_LOG=C:\HORCM\log8\curlog) and Remote HORCM logs,and send them to service personnel.

(1) Check if the 'HORC or HOMRCF function' is installed in the RAID.

(2) Check if the RCP and LCP are installed in the RAID.

(3) Check if the path between the RAID CUs is established by using the SVP.

(4) Check if the pair target volume is an appropriate status.

 

Yes, meaningless error message numbers like -35 and 221. If this is a RAID subsystem, check the SSBLOGS on the SVP. However, for DF, the SSB is logged in CCI J

 

Contents of C:\HORCM\log8\curlog\horcmlog_servername\horcm_log.txt

 

17:02:30-9a8a8-14140- SCSI : Check Condition.

17:02:30-9a8a8-14140-                *****  SCSI SENSE DATA  *****

 ---ADDR--- -OFF-   0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F-     ------CHAR------

[0x0012f2b4]0000:   70000500 00000038 8400000d 961c0000     p......8........ 

[0x0012f2c4]0010:   00001000 00000000 00000000 00000000     ................ 

[0x0012f2d4]0020:   00000000 00000000 00000000 00000000     ................ 

[0x0012f2e4]0030:   00000000 00000000 00000000 00000000     ................ 

[0x0012f2f4]0040:   00000000 00000000 00000000 00000000     ................ 

[0x0012f304]0050:   00000000 00000000 00000000 00000000     ................ 

17:02:30-9a8a8-14140- SKEY = 0x05

17:02:30-9a8a8-14140- ASC  = 0x96

17:02:30-9a8a8-14140- SSB  = 0x8400,000d

 

17:02:30-9a8a8 is the cross-check. Next, it is not obvious, but the error code is:

 

961C   000D

 

Now, get hold of the latest AMS CCI manual which contains Appendix A.4

 

A.4 How to Read Detailed Error Log Codes

and this subsection

A.4.4 Sense Code and Detail Code

 

Beware – some versions of this manual do not contain these sections L Find one that does.

 

Table A.5 Sense Codes and Detailed Codes

Error                       Contents                                                                                Recommended Action

961C 000C              The S-VOL is a Sub LU of a unified LU. Check the status of the LU.

961C 000D              The default controllers controlling the P-VOL and S-VOL are not the same….

961C 000E              The P-VOL is a Cache Residency LU.                    Check the status of …

 

In this case, the PVOL and SVOL default controllers are not the same.

 

“Old Syntax” HORCM CONF Files

 

This problem only applies to RAID subsystems from 9900V onwards. With 7700E and 9900, there were no Host Storage Domains (HSD) so all LUNs were on the “real” port. With 9900V, USP etc the LUNs are normally considered to be attached to “logical” ports – which are called HSD or Host Groups.

 

However, it is still possible to use the “old” syntax. This always causes confusion after a while as LUNs get added and deleted from various HSD. Here is an example:

  • Imagine that 3 HSD are created on an empty port – HSD 1,2 and 3
  • Each HSD has 3 LUNs added – numbered as 0, 1 and 2

 

If this is done in sequence, HSD 1 has “absolute” LUNs 0-2, HSD 2 has “absolute” LUNs 3-5 and HSD 3 has “absolute” LUNs 6-8

 

Now imagine that the following actions have been performed some time later:

  • Delete HSD 2
  • Add HSD 4 with LUNs 0 and 1

 

And then you allocate LUN 3 to HSD 1 and 3. If you did not know that the previous changes had been made, it would be impossible for you to “guess” that:

  • HSD 1 LUN 3 was “absolute” LUN 5
  • HSD 3 LUN 3 was “absolute” LUN 9

 

Even worse, you have no way of looking at the LUN allocations via Storage Navigator as that only shows “relative” LUN numbers L

 

In a recent case, 47 S-VOL LUNs were deleted by mistake from a HSD. When the mistake was noticed, the same 47 S-VOL LUNs were added back in the “same order”. However, a subsequent pairdisplay showed the following:

 

TC-WRP 1003-108A(L) (CL2-F , 0, 45)32179 10b5.S-VOL PAIR ASYNC , 0 102e
TC-WRP 1003-108A(R) (CL1-C , 0, 4)32208 1003.P-VOL PAIR ASYNC , 0 108a - (1)

TC-WRP 1004-108B(L) (CL2-F , 0, 46)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1004-108B(R) (CL1-C , 0, 5)32208 1004.P-VOL PAIR ASYNC , 0 108b -

TC-WRP 1005-108C(L) (CL2-F , 0, 47)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1005-108C(R) (CL1-C , 0, 6)32208 1005.P-VOL PAIR ASYNC , 0 108c -
TC-WRP 1006-108D(L) (CL2-F , 0, 48)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1006-108D(R) (CL1-C , 0, 7)32208 1006.P-VOL PAIR ASYNC , 0 108d -

TC-WRP 1007-108E(L) (CL2-F , 0, 49)32179 108a.S-VOL PAIR ASYNC , 0 1003 - (2)
TC-WRP 1007-108E(R) (CL1-C , 0, 8)32208 1007.P-VOL PAIR ASYNC , 0 108e - (3)

...

 

What can you tell from the display above? Firstly, the pairdisplay was issued by the “DR” CCI server – as (L) refers to the S-VOL. Next, we have obvious mismatches – in yellow.  What is less obvious is that the turquoise and green pairs are also invalid. Indeed:

 

(3)  P-VOL is 10:07 and the associated S-VOL is 10:8E

(2) S-VOL is 10:8A and the associated P-VOL is 10:03

This entry does not go with (3)

(1) This is the associated P-VOL for (2)

 

Here is an excerpt from the “old” HORCM CONF file – using “absolute” LUN numbers:

 

TC-WRP                1003-108A          CL2-F        0          45

TC-WRP                1004-108B          CL2-F        0          46

TC-WRP                1005-108C          CL2-F        0          47

TC-WRP                1006-108D          CL2-F        0          48

 

And here is the same excerpt after the file has been changed to use HSD syntax:

 

TC-WRP                1003-108A          CL2-F-2      0          6

TC-WRP                1004-108B          CL2-F-2      0          7

TC-WRP                1005-108C          CL2-F-2      0          8

TC-WRP                1006-108D          CL2-F-2      0          9

 

As you can, the new HORCM CONF file is easier to understand and compare with Storage Navigator J

 

By the way, here is how you find out the “absolute” and “relative” LUN numbers:

 

raidscan -p CL2-F -fx

...

CL2-F   /88/ 3,  0,  49.1(108a)..........S-VOL PAIR  ASYNC   108a, ----- 1003

CL2-F   /88/ 3,  0,  50.1(108b)..........S-VOL PAIR  ASYNC   108b, ----- 1004

CL2-F   /88/ 3,  0,  51.1(108c)..........S-VOL PAIR  ASYNC   108c, ----- 1005

...

 

raidscan -p CL2-F-2 -fx

...

CL2-F-2 /88/ 3,  0,   6.1(108a)..........S-VOL PAIR  ASYNC   108a, ----- 1003

CL2-F-2 /88/ 3,  0,   7.1(108b)..........S-VOL PAIR  ASYNC   108b, ----- 1004

CL2-F-2 /88/ 3,  0,   8.1(108c)..........S-VOL PAIR  ASYNC   108c, ----- 1005

...

 

Secured CMDDEV and HORCMPERM Implications

If you use a “normal” ie non-secured CMDDEV, you can control CCI for any LUNs on any Host. This also means that you can destroy anyone’s data by using SI to copy your LUNs over the top of their LUNs J

 

For this reason, you normally only let the Storage Administrator have access to a “normal” CMDDEV – and you always give normal users access to a Secured CMDDEV.

 

You can tell if a CMDDEV is secured as follows:

 

C:\HORCM\ETC>horcmstart 0

starting HORCM inst 0

HORCM inst 0 starts successfully.

C:\HORCM\ETC>set horcminst=0

C:\HORCM\ETC>horcctl –D

Current control device = \\.\PHYSICALDRIVE1

 

This is a “normal” CMDDEV. For this test, I also had access to a secured CMDDEV – and it is possible to swap between them as follows:

 

C:\HORCM\ETC>horcctl -C

Changed control device(\\.\PHYSICALDRIVE1 -> \\.\PHYSICALDRIVE10*

C:\HORCM\ETC>horcctl -D

Current control device = \\.\PHYSICALDRIVE10*

 

The asterisk means that the CMDDEV is secured. Normally, of course, you would not give any user access to different types of CMDDEV as that will cause problems.

 

To test what will happen before giving a secured CMDDEV to a user, you can set the HORCMPROMOD environment variable as follows:

 

C:\HORCM\etc>set HORCMPROMOD=1

C:\HORCM\ETC>horcmstart 410

starting HORCM inst 410

HORCM inst 0 starts successfully.

C:\HORCM\ETC>set horcminst=410

C:\HORCM\etc>horcctl -D

Current control device = \\.\PhysicalDrive53

 

Note, however, that this does not affect the horcctl display!

 

Here is some pairdisplay output when HORCMPROMOD is not set on any CCI server.

 

C:\HORCM\etc>pairdisplay -g VG01

Group   PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M

VG01    d0(L)       (CL2-D , 1, 410)77010027   410.P-VOL PAIR NEVER ,75010010   410 -

VG01    d0(R)       (CL1-A , 1, 410)75010010   410.S-VOL PAIR NEVER ,-----   410 -

VG01    d1(L)       (CL2-D , 1, 411)77010027   411.P-VOL PAIR NEVER ,75010010   411 -

VG01    d1(R)       (CL1-A , 1, 411)75010010   411.S-VOL PAIR NEVER ,-----   411 -

VG01    d2(L)       (CL2-D , 1, 412)77010027   412.P-VOL PAIR NEVER ,75010010   412 -

VG01    d2(R)       (CL1-A , 1, 412)75010010   412.S-VOL PAIR NEVER ,-----   412 -

VG01    d3(L)       (CL2-D , 1, 413)77010027   413.P-VOL PAIR NEVER ,75010010   413 -

VG01    d3(R)       (CL1-A , 1, 413)75010010   413.S-VOL PAIR NEVER ,-----   413 -

VG01    d4(L)       (CL2-D , 1, 414)77010027   414.P-VOL PAIR NEVER ,75010010   414 -

VG01    d4(R)       (CL1-A , 1, 414)75010010   414.S-VOL PAIR NEVER ,-----   414 -

 

As you can see, LDEVs 410-414 on an AMS1000 (SN begins with 770x) are paired with LDEVs 410-414 on an AMS500 (SN begins with 750x).

 

Here is the same pairdisplay output when HORCMPROMOD has been set on both CCI servers.

 

C:\HORCM\etc>set HORCMPROMOD=1

C:\HORCM\etc>horcmstart 410

starting HORCM inst 410

HORCM inst 410 starts successfully.

C:\HORCM\ETC>set horcminst=410

C:\HORCM\etc>pairdisplay -g VG01

Group   PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M

VG01    d0(L)       (CL2-D , 1, 410)77010027   410.P-VOL PAIR NEVER ,75010010   410 -

VG01    d0(R)       (CL1-A , 1, 410)75010010 *****.----  ---- ------,----- ----- -

VG01    d1(L)       (CL2-D , 1, 411)77010027   411.P-VOL PAIR NEVER ,75010010   411 -

VG01    d1(R)       (CL1-A , 1, 411)75010010 *****.----  ---- ------,----- ----- -

VG01    d2(L)       (CL2-D , 1, 412)77010027   412.P-VOL PAIR NEVER ,75010010   412 -

VG01    d2(R)       (CL1-A , 1, 412)75010010 *****.----  ---- ------,----- ----- -

VG01    d3(L)       (CL2-D , 1, 413)77010027   413.P-VOL PAIR NEVER ,75010010   413 -

VG01    d3(R)       (CL1-A , 1, 413)75010010 *****.----  ---- ------,----- ----- -

VG01    d4(L)       (CL2-D , 1, 414)77010027   414.P-VOL PAIR NEVER ,75010010   414 -

VG01    d4(R)       (CL1-A , 1, 414)75010010 *****.----  ---- ------,----- ----- -

 

As you can see, the local CCI instance (L) has access to all its LUNs/LDEVs. However, the DR CCI server (R) has no access to LDEVs 410-414.

 

If you attempt to do any commands such as pairsplit, the following will happen:

 

C:\HORCM\etc>pairsplit -g VG01

pairsplit: [EX_ENPERM] Permission denied with the LDEV

Refer to the command log(C:\HORCM\log410\horcc_Verdande_log.txt) for details.

 

You can use HORCMPERM*.CONF (* is the instance number) to further limit CCI access. HORCMPERM*.CONF does not give you access to LDEVs that you are not allowed to process. It removes access to LDEVs that you are allowed to process but do not wish to process.

 

How does this work? Let’s start instance 410 with HORCMPROMOD=1 and no HORCMPERM.CONF file. At the bottom of the start up log you will see this:

 

11:01:48-518b0-02092- HORCM has been set to the PROTECT MODE on 'ENV'.

...

11:01:50-e2900-01428- horcmgr : executed 'CreateProcess(raidscan.exe -pi $PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.

 

As you can see, raidscan is called internally with an argument of  -pi $PhysicalDrive (all physical drives) – thus allowing all LUNs on this server to be accessed.

 

Now let’s stop horcm and define a file as follows:

 

C:\HORCM\etc>type C:\WINDOWS\HORCMPERM410.CONF

hd0-56

 

C:\HORCM\etc>

 

HORCMPERM410.CONF contains a list of every device that we wish to be able to access via CCI. Here is the resultant pairdisplay after a restart of horcm:

 

C:\HORCM\etc>pairdisplay -g VG01

Group   PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M

VG01    d0(L)       (CL2-D , 1, 410)77010027   410.P-VOL PAIR NEVER ,75010010   410 -

VG01    d0(R)       (CL1-A , 1, 410)75010010 *****.----  ---- ------,----- ----- -

VG01    d1(L)       (CL2-D , 1, 411)77010027   411.P-VOL PAIR NEVER ,75010010   411 -

VG01    d1(R)       (CL1-A , 1, 411)75010010 *****.----  ---- ------,----- ----- -

VG01    d2(L)       (CL2-D , 1, 412)77010027   412.P-VOL PAIR NEVER ,75010010   412 -

VG01    d2(R)       (CL1-A , 1, 412)75010010 *****.----  ---- ------,----- ----- -

VG01    d3(L)       (CL2-D , 1, 413)77010027 *****.----  ---- ------,----- ----- -

VG01    d3(R)       (CL1-A , 1, 413)75010010 *****.----  ---- ------,----- ----- -

VG01    d4(L)       (CL2-D , 1, 414)77010027 *****.----  ---- ------,----- ----- -

VG01    d4(R)       (CL1-A , 1, 414)75010010 *****.----  ---- ------,----- ----- -

 

The bold lines show what has changed. Here is the bottom of the start up log file.

 

11:08:03-7d3e8-02408- horcmgr : executed 'CreateProcess(raidscan.exe -find inst -z0r=C:\WINDOWS\horcmperm410.conf -z2w=NUL -z1w=NUL)',exit = 0.

 

As you can, raidscan has been called internally and is using the list of disks in HORCMPERM410.CONF to determine which LDEVs can be accessed. Here is some inqraid output:

 

C:\HORCM\etc>inqraid $LETALL -CLI

DEVICE_FILE     PORT    SERIAL  LDEV CTG  H/M/12  SSID R:Group PRODUCT_ID

E:\Vol13\Dsk54  CL2-D   77010027   410   -  P/s/ss  0000 A:07-00 DF600F

F:\Vol14\Dsk55  CL2-D   77010027   411   -  P/s/ss  0000 A:07-00 DF600F

Q:\Vol11\Dsk12  CL1-B     3157   169   -  P/s/ss  0000 5:02-00 DF600F

G:\Vol15\Dsk56  CL2-D   77010027   412   -  P/s/ss  0000 A:07-00 DF600F

R:\Vol12\Dsk13  CL1-B     3157   170   -  P/s/ss  0000 5:02-00 DF600F

H:\Vol16\Dsk57  CL2-D   77010027   413   -  P/s/ss  0000 A:07-00 DF600F

I:\Vol17\Dsk58  CL2-D   77010027   414   -  P/s/ss  0000 A:07-00 DF600F

J:\Vol2\Dsk0    -            -     -   -       -     -       - ST336754LC

 

The bold lines show that LDEVs 413 and 414 are Physical Drives 57 and 58 – and as we only allowed access to Physical Drives 0-56, this explains why the pairdisplay has changed.

 

Note that it is possible to “fix” this “mistake” by manual use of the raidscan command as follows:

 

C:\HORCM\etc>echo hd57-58 | raidscan -find inst

DEVICE_FILE             Group    PairVol     PORT   TARG  LUN M   SERIAL  LDEV

Harddisk57              VG01     d3          CL2-D     1  413 0  77010027   413

Harddisk57              VG01     d3          CL2-D     1  413 -  77010027   413

Harddisk58              VG01     d4          CL2-D     1  414 0  77010027   414

Harddisk58              VG01     d4          CL2-D     1  414 -  77010027   414

 

C:\HORCM\etc>pairdisplay -g VG01

Group   PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M

VG01    d0(L)       (CL2-D , 1, 410)77010027   410.P-VOL PAIR NEVER ,75010010   410 -

VG01    d0(R)       (CL1-A , 1, 410)75010010 *****.----  ---- ------,----- ----- -

VG01    d1(L)       (CL2-D , 1, 411)77010027   411.P-VOL PAIR NEVER ,75010010   411 -

VG01    d1(R)       (CL1-A , 1, 411)75010010 *****.----  ---- ------,----- ----- -

VG01    d2(L)       (CL2-D , 1, 412)77010027   412.P-VOL PAIR NEVER ,75010010   412 -

VG01    d2(R)       (CL1-A , 1, 412)75010010 *****.----  ---- ------,----- ----- -

VG01    d3(L)       (CL2-D , 1, 413)77010027   413.P-VOL PAIR NEVER ,75010010   413 -

VG01    d3(R)       (CL1-A , 1, 413)75010010 *****.----  ---- ------,----- ----- -

VG01    d4(L)       (CL2-D , 1, 414)77010027   414.P-VOL PAIR NEVER ,75010010   414 -

VG01    d4(R)       (CL1-A , 1, 414)75010010 *****.----  ---- ------,----- ----- -

 

Of course, you are unlikely to fix such an issue with raidscan. You would normally fix HORCMPERM*.CONF and then stop and restart horcm.

 

“Basic” HORCM CONF problems

When HORCM will not start, you strip the CONF file back to the bare essentials – and then change one thing at a time. Sometimes even this fails. Here are the most common reasons.

 

HORCM_MON

#ip_address            service             poll(10ms)          timeout(10ms) 

10.129.3.127           11042               1000                3000          

 

HORCM_CMD

#dev_name

# CMDDEV0 - USP600 - SN 10111 - \\.\CMD-10111-4

\\.\CMD-10111-4

 

The above file is correct – let us make some simple changes to break it.

 

1          Wrong IP Address

Change 10.129.3.127 to 10.129.2.127. A simple typo, but here is what you get:

 

Windows

[System Call Error]

SysCall: bind

WSAerr : 10049(0x00002741) (See winsock2.h)

ErrInfo: Internal Error

ErrTime: Mon Sep 08 12:43:03 2008

SrcFile: shorcmc.c

SrcLine: 2405

 

ERROR:cmr_repcre[scmcrepcr fail]

Of course, it is the “Internal Error” that confuses most people here. The real error is in the line above. It is the result of a standard call to an OS socket service, in this case Winsock.

Here is the relevant section from winsock2.h:

 

#define WSABASEERR              10000

#define WSAEADDRNOTAVAIL        (WSABASEERR+49)

 

Here is a useful web page – and some useful information:

http://www.sockets.com/err_lst1.htm

WSAEADDRNOTAVAIL (10049) Cannot assign requested address.

Berkeley description: Normally results from an attempt to create a socket with an address not on this machine.

So, the error is obvious when you know where to look. The problem is, not many people know where to look!

 

UNIX

UNIX errors messages are not only different, they are different on each platform! Here is the same error for Solaris:

 

[System Call Error]

SysCall: bind

Errorno: 126 (Cannot assign requested address)

ErrInfo: Internal Error

ErrTime: Tue Sep  2 11:45:40 2008

SrcFile: shorcmc.c

SrcLine: 2427

 

ERROR:cmr_repcre[scmcrepcr fail]

Here is a useful web page:

http://www.ioplex.com/~miallen/errcmpp.html

 

The relevant line for this error says:

 

 

AIX 4.3,5.1

HP-UX 11.22

Solaris 9,10

EADDRNOTAVAIL

68

Can't assign requested address

227

Can't assign requested address

126

Can't assign requested address

             

 

Once again, this is not the most intuitive error I have seen.

 

2          Invalid CMDDEV

Here is what you get if you change the CMDDEV to \\.\CMD-10111-42

 

12:52:23-16b48-04004- horcread():cannot open command device:\\.\CMD-10111-42

12:52:23-16b48-04004- [WARNING] This device(\\.\CMD-10111-42) is not ready for receiving a command.

12:52:23-16b48-04004- No device is ready for receiving a command in 1 line from HORC_CMD.

12:52:23-16b48-04004- ERROR:horcm_cfg_create

12:52:28-0b3b0-01136- horcmgr:Failed to connect to HORCM.

 

Here I think it is pretty obvious what the problem is J

 

3          Invalid service name

Change 11042 to “horcm42”

 

17:29:02-d59f8-02260- [horcmcfgrdf] open(conf_file) OK.

17:29:02-d59f8-02260- ERROR: A wrong ipaddr or servicename line exists in HORCM_MON: line 4

17:29:02-d59f8-02260- 10.129.3.127     horcm42         1000           3000

17:29:02-d59f8-02260- [horcmcfgrdf] close(conf_file) OK.

17:29:02-d59f8-02260- ERROR:horcm_cfg_create

 

Once again, it is more obvious what is wrong.

 

4          UDP port which is in use

Change 11042 to 1030.  This is not a “sensible” port number. It was chosen to cause an error.

 

[System Call Error]

SysCall: bind

WSAerr : 10013(0x0000271d) (See winsock2.h)

ErrInfo: Internal Error

ErrTime: Mon Sep 08 17:39:46 2008

SrcFile: shorcmc.c

SrcLine: 2405

 

ERROR:cmr_repcre[scmcrepcr fail]

 

Here is the relevant section from winsock2.h:

 

#define WSAEACCES               (WSABASEERR+13)

 

The following web page has more information:

 

http://www.sockets.com/err_lst1.htm

WSAEACCES (10013) Permission denied.

Berkeley description: An attempt was made to access a file in a way forbidden by its file access permissions.

However, in this case, that is hardly descriptive of the problem. Of course, if one had access to a command prompt, one could do this:

 

C:\HORCM\ETC>netstat -a -p UDP

 

Active Connections

  Proto  Local Address          Foreign Address        State

  UDP    ml_acer510:microsoft-ds  *:*

  UDP    ml_acer510:isakmp      *:*

  UDP    ml_acer510:1030        *:*

  …

  UDP    ml_acer510:54323       *:*

 

It is not likely that you will be this lucky J