[ Back | Print ]
Technical Information Document
Increasing Performance of Backups - TID10023910 (last modified 03MAR2004)
printer friendly tell a friend
Click here if this does not solve your problem
10023910 10023910 10023910
goal

Increasing Performance of Backups

fact

Novell NetWare 3.2

Novell NetWare 4.1

Novell NetWare 4.11

Novell IntraNetWare 4.11

Novell NetWare 5.0

Novell Clients

Groupwise

Computer Associates BrightStor ARCserve Backup 6.x, 6.6, 7.0, 9.0

Formerly TID 2937318

symptom

Slow performance with the Computer Associates ARCserve

fix

Source: http://support.ca.com/techbases/as61/10015.html 

One of the most frustrating problems for any system administrator is poor performance (i.e., a relatively small amount of data transferred to tape in a given time). Throughput issues during backups manifest themselves in many ways. Backups may take too long, or may slow down during the process. The server may hang. Throughput may be good for several servers in a backup, yet backups of a certain one or two remote servers may slow down or stop altogether. Sometimes it takes a bit of detective work to determine the cause. This document has been created to help determine what the cause is, and suggest possible ways to remedy the problem.

It must be understood that the ARCserve software has a limited effect on the movement of data through your system. Data is channeled through many sub-systems, such as your physical network devices and SCSI bus. If there be any underlying problems with these sub-systems, the likelihood that the problems will surface is increased by the nature of backups and restores. This is due to heavy I/O and data streaming.

Most other operations, such as normal file updates and printing, do not cause such large amounts of stress on a network.

Please note that the use of the term "host server" in this document refers to the server ARCserve resides on. The term "remote server" refers to any server being backed up or restored to across the network.


Determining the Bottleneck
:

Many factors assist in creating a successful NetWare-based backup system. Some of the major components are:

  • NetWare Operating System (host or remote server)
  • HardDisk Subsystem
  • SCSI Tape Drive
  • SCSI controller and cable (interfaces main system board w/ SCSI tape drive)
  • LAN adapter and cables (network communications)

Each of these subsystems can act as a bottleneck (constricting area) to data flow. Determining where the bottleneck lies is the top priority in increasing backup and restore performance.


Problem Scenarios:

  1. "I have not changed any software or hardware. Backups were running just fine until they started slowing down or the server started hanging."
  2. See sections: A, B, D, E, F, G

  3. "I have upgraded ARCserve from a previous version and now I get poor throughput or the server hangs."

  4. "I have upgraded ARCserve from 5.x to 6.x- now I get poor throughput or the server hangs."
  5. See sections: A, B, C, D, F

  6. "I have recently upgraded software and/or hardware and I have slow backups or the server hangs."
  7. See sections: B, C, D, F

  8. "This is a new install. Why do my backups take so long and/or are so slow?"
  9. See sections: A, B, C, D, E, F, G

  10. "I have a remote server (a server other than the ARCserve server) that takes very long to back up, or hangs when it's being backed up. All other servers in that job are backing up fine."
  11. See sections: A, D, E, F, G

  12. "My backup performance is acceptable, but my restores are much slower."
  13. See Sections: A, G


A) OPERATING SYSTEM SET PARAMETERS:

Packet Receive Buffers are areas of memory set aside to receive requests for service from network workstations or other NetWare servers. If the amount of buffers has reached the set maximum, the server must wait for buffers to be released in order to process additional requests. This can cause a bottleneck. To resolve this issue:

    • Type Load Monitor at the server console
    • Locate Packet Receive Buffer statistic (under "General Information")
    • If the above value reaches the MAXIMUM PACKET RECEIVE BUFFERS, increase the MINIMUM and MAXIMUM PACKET RECEIVE BUFFERS set parameters. The MINIMUM should be set to what is currently displayed on Monitor Screen. MAXIMUM should be two times whatever the MINIMUM is.
    • This procedure should be done on both host and remote servers.

Directory Cache Buffers are areas of memory set aside to store directory entries when reading through the file system. Since ARCserve must read all directories on a server's volume during a full backup, a shortage of these buffers will cause a bottleneck. If none are available, the server will wait until another process releases the use of a buffer. ARCserve is forced to wait also. To resolve this issue:

    • Type Load Monitor from the server console.
    • Note the Directory Cache Buffers statistic.
    • If the above value reaches the MAXIMUM DIRECTORY CACHE BUFFERS, an increase in MINIMUM and MAXIMUM DIRECTORY CACHE BUFFERS will be required. The MINIMUM should be set to what is currently displayed on Monitor Screen. MAXIMUM should be two times whatever the MINIMUM is.
    • This should be done on whichever server is being backed up: host and/or remote.

NOTE: These allocated buffers require 4 KB each. If your server is allocating 2000 directory cache buffers, a total of 8 MB of server RAM is being allocated just for directory caching. These resources cannot be released until the server is rebooted.

Slow Restores

The following set parameters will effect performance when writing large amounts of data to disk:

Minimum File Delete Wait Time (set to zero for restore)

File Delete Wait Time (set to zero for restore)

Dirty Disk Cache Delay Time (increase until performance increases)

Maximum Concurrent Disk Cache Writes (increase until performance increases)


B) THE SCSI SUBSYSTEM

An ill-performing SCSI subsystem can certainly cause a bottleneck during a backup/restore operation. To eliminate (or indicate) the components of the SCSI bus as possible contributors to poor performance use the following testing tool built into Tapesvr.nlm.

    • Run a test backup job of the Host server and note the current throughput (KB /min).
    • Type "tapesvr disable writes" at the server while the job is actively backing up the Host Server. This forces ARCserve to write data to a "bit bucket" as opposed to the tape drive itself. By writing to a null device, the Administrator can determine if poor performance is a direct result of a slow tape subsystem.

If the throughput remains the same, this indicates that ARCserve is delivering data as fast as the operating system is providing it and the SCSI devices are functioning properly.

If the throughput increases, the problem has been isolated to either the SCSI card, cable or tape device. Aim troubleshooting at these components.

    • Convert back to normal tape drive writes prior to the job ending by typing "tapesvr enable writes"
    • If you choose to wait until the job ends, the writes will automatically be re-enabled.

NOTE: using this option does not save data to tape. Subsequent sessions created on that tape will be rendered useless. The tape needs to be formatted or erased once the tests are complete. Do not select the "Delete Source" option for your test job.

Compression

Some tape drives are capable of performing compression during backup. Compression routines may require overhead during a backup, and therefore decrease performance. At the expense of tape capacity, turning off hardware compression may increase performance. The ARCserve software itself does not perform compression.

SCSI BIOS settings

The following SCSI card settings are recommended for peak performance:

SCSI DISCONNECTION should be ENABLED.

SYNCHRONOUS NEGOTIATION should be DISABLED.

PARITY should be ENABLED.

SCSI TRANSFER RATE should be set for as fast as the system allows.

NOTE: These parameters may vary in name for different vendors" products.


C) ARCSERVE TAPE DRIVER ARCHITECTURE

Buffers

The key to fast, efficient tape backups is for the tape server and the SCSI adapter to maintain a steady and sufficient flow of data to the tape device. Without an even flow of data, tape drive latency (periods in which the tape drive must wait for ARCserve to provide data) can occur. This causes the tape drive to "shoe-shine" the tape across the write head, constantly repositioning itself to find the end of the last block of written data. This slows down the backup process.

ARCserve places data into designated I/O buffers for the tape drive to "pick up". When the tape engine is ready to write data, it will take as many full I/O buffers as possible and write them to tape. If only a small amount of buffers are being filled at a time, the tape engine will need to make more requests than necessary, causing the operation to be inefficient and slow. In other terms, it is more efficient to take one trip to the store and pick up 10 items then to take 10 trips to the store and pick up 1 item at a time.

ARCserve's tape server can accommodate up to 32 buffers of 64 K each. Setting the number of buffers to a value that exceeds the amount of data the drive can consume in a single cycle can improve performance:

    • Open the ARCserve Tapesvr.cfg file located in the \ARCserve.6\nlm subdirectory.
    • Locate the [NLMx] section which corresponds to your tapedrive.
    • Within that section add the following line:

BUFFER=20 (Default is 6)

**NOTE: If there is no [NLMx] section in this file, the statement can be placed on the "NLMx=" line. (i.e. NLM1=STANDARD GROUP-ARCSERVE BUFFER=20)

This setting change can be verified only during an active job by viewing the top of the Tapesvr Screen. When Buffers are set to 20, "I/O Buffers:" should read 1,310,720 Bytes.

**NOTE: Setting Buffers equal to any value greater than needed may actually decrease performance.

Shots

The SHOTS parameter is simply a throttle by which the I/O throughput of the Tape Server can be increased or decreased. ARCserve can deliver a number of shots or chunks to a tape drive. Each shot is equivalent to 16 K of data. ARCserve, by default, delivers 14 shots at a time, or 224 K chunks. The Shots maximum value is 32. The Default is 14. You should only increase this value beyond 14 if you have a tape drive that uses a 64K blocksize. Otherwise you may be compromising data integrity.

The relationship between buffers and shots may have to be adjusted to help stream data more efficiently.

    • Open the ARCserve tapesvr.cfg file located in the \ARCserve.6\nlm subdirectory.
    • Create a section called [CONFIG] if there is not one already
    • Within that section add the following lines:

RSHOTS=1 (Default is 14)

WSHOTS=1 (Default is 14)

This will slow down the number of shots to the lowest (and slowest) possible setting. If this helps the situation, you can then begin to determine the optimal setting for your environment. Do this by raising the RSHOTS and WSHOTS parameter by small increments until it is no longer effective and then bringing it down to the previous value. You may also wish to try the inverse and raise the xSHOTS parameters to their maximum and lower the values until you find your peak performance.

Keep in mind the relationship between Buffers and Shots. Each Buffer is 64 K in size, while Shots are delivered in 16 K units. That means that ARCserve must deliver 4 Shots to send the data from one Buffer. Setting the value of xSHOTS more than 4 times the value of BUFFERs would therefore be unnecessary.

Example shown below:

[CONFIG]
RSHOTS=1
WSHOTS=1

Blocksize

The optimum blocksize for a tapedrive is determined by the manufacturer and our own in house testing and should be left at its default. It is possible to change the blocksize however it is usually unnecessary to do so. If a situation should come up where you feel the blocksize needs to be changed it should be done with the guidance of a qualified technician from either Computer Associates or the tapedrive manufacturer. Setting a tape drive to an untested blocksize may cause data integrity problems.


D) HARD DISK SUBSYSTEM

The components of the hard disk subsystem are the HardDrives and their controllers. When data is read from the file system, OS Cache is first checked for data. If data is not there, the OS must request the data from the HardDrive itself. This process obviously takes time and can also be a bottleneck. To test the speed of this process try the following: Load TESTNLM.nlm located in the ARCserve.6\Utility directory on the Host server.

    • Log into the host server in the appropriate login mode.
    • Answer "N" to the question "Log all file and directory names?"
    • Enter a volume name and path for the prompt "Enter starting directory:" (i.e., SYS: or SYS:\System)
    • For test number, use one of the following:
11: Compressed Read Test (this will not decompress files during the read similar to ARCserve)
12: SMS Read Test (this will utilize NetWare TSAs during the read)
    • When the test is complete, it will give the statistics of the read in seconds and Kilobytes. Convert these numbers to MB/min by using this formula: (Total bytes Read /1024) / (Total time / 60).

NOTE: "Failed to open" messages simply mean the file is open and cannot be read. Disregard these.

This test should be performed to simulate the backup scenario. If you are testing backup of the host server, run the TESTNLM.nlm on the host server and login to the host server. If you are testing remote backups, run the TESTNLM.nlm on the host server and login to the remote server. To eliminate the LAN during the test, copy the NLM to the remote server and run it locally there.

NOTE: The speed achieved during this test should not be expected during a proper backup. It is simply to test whether the cause of slow throughput can be isolated as the disk subsystem.


E) NUMBER OF FILES IN A DIRECTORY and CACHE BUFFERS:

The number of files in any given directory is directly correlated to read and write performance. Backups may begin with good throughput, and then begin to slow down or even stop altogether when accessing a directory with a large number of files (over 10,000). There will be no errors in the activity log to indicate any kind of problem, however.

The NetWare set parameters that influence I/O (input/output) efficiency are listed below with their default parameters:

Minimum Directory Cache Buffers = 20

Maximum Directory Cache Buffers = 500

Directory Cache Allocation Wait Time = 2.2 seconds

Directory Cache Buffer Nonreferenced Delay = 5.5 seconds

The above parameters can be changed to optimize the environment, but care should be taken. Existing memory consumption may dictate that a slow server needs to remain slow. Simply making changes to the server without verifying that it has enough memory to handle the change may lead to instability and data loss. Checking the existing amount of server memory and how it is allocated is a mandatory step when debugging performance issues.

    • At the time of this writing, Novell has several documents available that deal directly with an excessive amount of files in a directory. These documents may be of help in understanding how NetWare uses and accesses files and the repercussions of large directories on the file system. There are also set parameter value suggestions. Three of these documents can be found on Novell's website under TID numbers 2914774, 10021744, and 10023910.


F) COMMUNICATIONS

A poorly performing network can certainly be a bottleneck to data flow during a remote backup. Here are some things to investigate:

    • When a remote backup that has been working fine but suddenly runs into slow throughput issues though no system changes have been made(i.e., new software installed, configuration modifications), a hardware issue is the most likely problem. Investigate NIC (network interface) cards, cabling, routers, hubs and switches. If there are spare parts on-site, look to possibly swap out these various components to troubleshoot and determine the location of the problem.
    • If you are using full duplex on a NIC that supports both half and full duplex, try switching to half duplex. This will allow only one-way data flow through the card and eliminates interruptions from oncoming traffic. Check at the router to make certain that these have been set to half duplex as well.
    • Any time cables have to go through a hub or a switch, the possibility of problems arising increases. In troubleshooting, you may wish to try using a Cross-over Cable (direct cabling) to connect the host server to the remote server. This will help to either confirm or deny whether the problem source is a hub or switch.

Check NetWare NLM's

    • Type Modules at the server console.
    • Note the dates and versions of the following NLM's:

TLI.nlm, CLIB.nlm, STREAMS.nlm, SPXS.nlm, *.LAN

    • Compare the dates and versions against the current versions available on Novell's website and/or the NIC card manufacturer.
    • Apply all appropriate patches and upgrades available at Novell's website.
    • Apply most current LAN drivers from NIC manufacturer's website.

Check NetWare's LAN Statistics

Refer to the documentation that comes with the board to understand custom statistics for LAN adapters. Then, do the following:

    • Type Load Monitor at the server console
    • Choose LAN/WAN information
    • Check the No ECB available count statistic.

In general, if the value is low (under 100), then the adapter is working correctly. If the value is high (over 100), this may indicate a problem with the board itself or with the amount of system resources allocated for its use (see section above- MAXIMUM PACKET RECEIVE BUFFERS)

NOTE: Other statistics that may indicate a Network or NIC Problem are "Send packet too big count", "Receive packet overflow count", "Receive packet too big count", "Send packet misc errors", and "Receive packet misc errors". Monitor these values if they are constantly high or growing.


G) ARCSERVE HIGH PERFORMANCE PUSH AGENT

The Push Agent is an optional ARCserve product designed to address remote server backup performance issues through communication reduction. When two devices are performing a file system copy or backup procedure, they must agree on packets sent and packets received. When files are read and transferred on an individual basis, an acknowledgment must be sent after each read. This causes latency between file transfers.

The High Performance Push Agent alleviates this behavior by requesting the entire file system from a smart agent at the target device. This agent prepares its file system for delivery and then streams it to the backup server. This single data stream can increase backup performance by as much as 10-50%.

The Push Agent can be used with Data Multiplexing / File Interleaving. Multiplexing / Interleaving allows ARCserve to obtain files from multiple remote devices simultaneously. ARCserve's job processing engine will take these multiplexed / interleaved data streams and sort them on tape belonging to separate sessions. Up to 8 remote devices can be multiplexed / interleaved at a time. However, bench testing has found that you stop seeing performance increases and performance actually starts decreasing once you start multiplexing / interleaving 3 to 4 nodes or more.

NOTE: The Push Agent will be more likely to improve performance if the LAN is not your performance bottleneck. Verify that the network can handle the increased bandwidth necessary for Push Agent technology. Networks with higher packet latency periods will see a larger relative increase in performance.

NOTE: Be aware that Data Multiplexing / File Interleaving will cause restores to be slower than with non-multiplexed / interleaved backups.

NOTE: It is not recommended to backup across a WAN link, but if you must it would best to use a Push Agent if for no other reason it will allow you to use IP which will cross the WAN link much faster then a SPX connection will.

.
Document Title: Increasing Performance of Backups
Document ID: 10023910
Solution ID: 1.0.44638402.2459992
Creation Date: 23DEC1999
Modified Date: 03MAR2004
Novell Product Class: NetWare

Disclaimer

The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.

Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.


[ Back | Print ]