Midrange Performance Data Collection

Midrange Performance Data Collection
AMS/WMS/HUS1xx

Please collect the following information for performance analysis by the Hitachi Global Support Center (GSC). If you need help, ask your local Hitachi Hardware Engineer (CE) and Hitachi Systems Engineer (SE). Collect all the information in the order shown. Ideally, you should collect all the data while the problem is occurring.

Brief Description and Timeline of the Performance Problem
Data Collection from Performance Monitor
Simple Trace
Time Difference Between Storage and Server(s)
GetConfig from Affected Servers
DLMgetras (if HDLM is used)
Remote Copy Information (if applicable)
OS Performance Information (useful)
SAN Config Information (useful)

Once collected, upload the data collection to TUF.

Brief Description and Timeline of the Performance Problem

Description. This is very important. Please remember, we do not know your server naming conventions nor which servers are connected to which ports. Please answer all these questions:

What are the concerns?
What server(s) are affected (single or multiple)? (Include Host Names)
What are the OS?
If the OS is Windows, specify the LUN-to-Drive Letter relationship.
Which Ports/Host Storage Domains/LUNs/Array Groups/LDEVs are having performance issues?
What types of applications are affected?
Is replication used - TC, TCED, COW, SI?
Is this array externalized (used as external storage)?
How is this problem affecting production?
What type of array disk is being affected by this performance problem? (SAS, SATA, Fibre Channel, SSD)

Timeline. Sometimes performance is good at certain times of the day - and bad at other times. We need to know the exact times at which it was good - as well as the exact times at which it was bad. Please answer all these questions:

Does the problem only occur at certain times or days of the week/month?
At what time(s) does the problem start?
At what time(s) does the problem go away? (or is it ongoing)?
Detail the exact timeline of all events before, during and after the event

Please be as specific as possible.

Data Collection from Performance Monitor

You must supply the Performance Monitor data using the following procedures:

Performance Monitor Data Collection - GUI or
Performance Monitor Data Collection - CLI
The minimum requirement is for 1 hour's data while the problem is happening

Performance Monitor is a standard feature of the AMS/WMS/HUS1xx. DF is not like RAID. With RAID Performance Monitor, you set it to collect data all the time, and then export what you want. With DF, you collect the data when you need it.

If data collection fails trying to collect data at 1 minute intervals, the DF subsystem may be under stress. If this happens, collect data at 5 minute intervals. For more details, see this topic.

Simple Trace

You must supply the following:

Simple Trace from Storage Array

For AMS/WMS/HUS1xx, this is provided as an option on the Web GUI. The trace must be taken within 24 hours of the problem.

Ideally, you should collect the Simple Trace immediately after collecting the Performance Monitor data - see above.

Time Difference between Storage and Server(s)

Supply the following:

Time difference between the DF internal clock and that of the host with the performance problem. The DF time can be obtained from the SNM management utility or by using the Web GUI.

Unless the storage array is set to use Network Time Protocol (NTP), the internal clock is most likley not set to "server time".

GetConfig from Affected Server(s)

Supply a Getconfig for the server(s) in question:

Download and run the latest GetConfig for your server(s).

In many cases, the performance problem is due to bad server configuration, IO Queue Depth etc. We need the getconfig to see (a) how LUNs are allocated and (b) how the server has been setup for performance.

DLMgetras (if HDLM is used)

For UNIX, you do not need to run this as the GetConfig script runs this command if applicable.

For Windows, run this command only if HDLM is not installed in the default program location (C:\Program Files).

Remote Copy Information (if applicable)

When remote copy is being used, provide the data below:

Model and serial numbers of the Primary and Remote subsystems.
Synchronized, simple traces from the Primary and Remote subsystems taken while the performance problem is in progress.
Please detail exactly what was happening when the traces were taken. For example, was there large TC Initial Copy or Resync activity being performed?
The time difference between the clocks on the Primary and Remote subsystem.
Synchronized Performance Monitor data from both the Primary and Remote subsystem.
Diagram showing connectivity between the Primary and Remote subsystem including switches, port numbers, link and distance.