What Information Do I Need to Gather to Allow GSC to Diagnose an HNAS Performance Problem
Content

Question

What Information Do I Need to Gather to Allow GSC to Diagnose a HNAS "Performance Problem"?

See also:

Environment

  • Hitachi Network Attached Storage (HNAS)
    • 3100/3200
    • 3080/3090
    • 4000 series

Answer

A standard set of HNAS diagnostics taken after the event is usually insufficient to diagnose the cause of a HNAS "performance problem."  If a HNAS system is having a performance problem the high level process is the following:

  • Understand what options are available for getting help with performance problems.
  • Verify that the system in question isn't impacted by any of the common causes of performance problems.
  • Performance data collection from the Hitachi storage array,
  • Gather a performance-info-report (PIR) from the HNAS on the impacted file system(s),
  • Whilst the PIR is running, gather a short (30sec) packet capture from one of the impacted clients,
  • Gather the additional required data,
  • Answer the performance problem questionnaire to provide the context for the problem.

Please note: It is important that the performance data such as the PIR and packet capture gathered below are collected while the performance problem is happening.   Data collected outside of the performance problem will provide no insight into what is causing the performance problem.

What help can be provided for my performance problem?

In reality, there is no such thing as a "performance problem," there are only "lack of capacity problems."  As such there are three possible approaches that can be taken to address these:

  • Perform a "performance tuning exercise,"
  • Perform a "system sizing exercise,"
  • Request a product enhancement to the system so that it has increased capacity without additional hardware by making changes to hardware or software of the product.

Performance tuning

A performance tuning exercise is something that can be lead by Hitachi GSC.  There are three things to bear in mind before going down this path:

  1. The purpose of a performance tuning exercise is just to try and determine if there are any changes to the environment which could be made which will increase the capacity available from a given system.  Whether or not any such changes are considered feasible is a customer business decision.
  2. Since capacity is heavily dependent on specific workload, GSC are unable to advise whether any particular load is within the expected capacity range of any particular system configuration (both hardware and system settings/configuration.)
  3. As a result GSC are unable to guarantee whether it is possible to achieve an acceptable level of performance for any particular workload on any particular hardware configuration.  (Or indeed whether any such configuration may even be possible.)

The way this process proceeds is as follows:

  1. Performance data for the HNAS and associated storage is captured that covers a time period when the perceived problem is occurring.
  2. Hitachi GSC evaluate this data and see whether any recommendations can be made for changes to the system which may allow additional load to be sustained on the existing hardware with improved performance.

The changes recommended may be:

  • Simple configuration changes to the file/block storage system (usually bringing the system into "best practice" state.)  It is rare however that these changes will have significant impact and we would generally expect systems to be setup to best-practice at install time.
  • Changes to the way the file/block storage is used to try and achieve a more efficient use of the resources that are available.  These changes will usually require some data migration moving from the sub-optimal configuration to a hopefully more efficient usage.
  • Suggested changes to the client behavior that would make more efficient use of the available storage resources.

We don't however want to propose too many changes at once for the following reasons:

  • Some changes may actually cause the available capacity to decrease due to unanticipated workload related factors,
  • It becomes difficult to tell which changes were useful and which were counter-productive and need to be reversed.

As a result, a performance tuning exercise is usually iterative in nature:

  • Data is gathered and analyzed,
  • Suggested changes are proposed and applied,
  • New data is gathered and analyzed to see whether additional changes may be worthwhile.

The performance tuning loop finishes when:

  1. An acceptable level of performance is achieved,
  2. The customer no longer wishes to apply suggested changes,
  3. There are no further suggested changes.

In the event of 2 & 3 this may mean that a system sizing exercise or product enhancement request is then required.

System sizing

System sizing would be carried out by your Hitachi account team and/or Hitachi GSS and there are two aspects to this:

  1. Understanding what load the system is required to sustain (including peak loads) and including margin for growth,
  2. Determining which system configuration(s) would be suitable for handling those loads.

Once this has been carried out and the necessary additional capacity provisioned then you can plan and implement a migration from the old capacity to the new additional capacity.  This approach is particularly suitable for customers who want "one set of changes that is going to resolve my problem."

Product enhancements

Under certain circumstances it may be possible to "tune" the hardware/software of a product so that it can accommodate a greater load without requiring any additional hardware resources.  The process for asking whether any such "tuning" may be possible is to raise a Product Enhancement Request (PER.)

An example of a product enhancement might be changing the system so that it can handle additional load before it exhausts the available CPU capacity.  In Hitachi product enhancements are considered a sales rather than a support function and should be requested through your account team.  

Suggested product enhancements are reviewed by product management and if deemed reasonable are added to the engineering backlog for possible scheduling and implementation.  In general the lead time for product enhancement requests would be quite long and they may be rejected if they are not considered suitable.

Verify system not impacted by common causes of performance problems

Common causes of HNAS performance problems are documented in the article What Are Common Causes of Performance Problems in HNAS Systems?  Before escalating to Hitachi you should identify and resolve any of the listed problems.

Storage Performance Data Collection

Kick off performance data collection from the Hitachi storage array for 60 minutes at 1 minute intervals (below are the links with detailed instructions for Hitachi Midrange and Enterprise Storage):

DF subsystems support Open Systems applications. Performance monitor is not constantly running on DF Subsystems.

Below are links for collecting the required data:

HNAS Performance Data Collection

While the above storage data collection is happening, kick off a 10 minute Performance Information Report (PIR) on the HNAS cluster specifying the file system that is currently not performing as expected:

See also How To: Collect a PIR if HNAS is Not Configured to Send via Email.

Packet Capture on Impacted Client

Whilst the PIR above is being collected please gather a short (~30 second) packet capture on an impacted client as per the guidance in How to Collect Packet Captures for Troubleshooting HNAS Problems.

Please also provide:

  • The IP address of the client the capture was taken from,
  • The IP addresses of the EVS(es) that the client should be accessing,
  • A description of what operations were being undertaken on the client whilst the packet capture was being gathered.

Additional Required Data Collection

After the PFM (storage) and PIR (HNAS) data collections have completed, gather a simple trace (Midrange) or dump (Enterprise) from the array and HNAS diagnostics and upload everything collected to TUF:

Performance Problem Questionnaire

Once the performance data collection is underway, please look at and provide answers for the HNAS Performance Issues Questionnaire:

Additional Notes

Which file system should I focus the PIR on?

If the file system to focus on is not obvious from the context of the problem, try and "focus" the PIR (-f switch) on the busiest file system on the impacted EVS or storage pool (span).  You can determine the busy file systems using the process described in:

How To: Determine Busy File Systems (HNAS)

How can I identify the busiest clients using the HNAS?

Please refer to the knowledgebase article:

What if my performance problem is intermittent?

If your performance problem is intermittent then we recommend using the HNAS crontab CLI command to start a PIR on the impacted file system at every 00, 15, 30 and 45 minutes past the hour.  You can then collect the PIRs and when the problem reoccurs send the PIR covering the time period in question to GSC.  A default length "10 minute" PIR take approximately 13 minutes to run so starting one every 15 minutes means the previous one will have completed whilst still giving good coverage.

If you are using HNAS firmware 12.5 or later then you may also be able to use "continuous PIR" - see the performance-info-report HNAS CLI command man page for additional details.

If a particular HNAS event seems to mark the start of the performance problem?

If a particular HNAS event seems to mark the start of the performance problem then it may be useful to trigger the start of a PIR when that event occurs.  The procedure for doing that is documented in:

Performance Data Collection

 

Attachments
CXone Metadata

Tags: Diagnosis,Q&A,hnas,Performance

PageID: 2907