Troubleshooting a NonStop DOM application or component involves evaluating information from the following sources:
Locating errors in any distributed object application can be difficult for three reasons:
On Tandem systems, event messages and configuration files for different products can help you identify configuration problems. Several products, including TCP/IP, also support tracing. In some cases, an operator must work with other operators on remote systems to discover configuration mismatches.
Unexpected conditions detected by NonStop DOM are logged via the Tandem Event Management Service (EMS). Looking at the EMS messages is one of the first things to do when troubleshooting a problems. Often the source of a problem can be ascertained readily from information contained in the log.
You can refer the the manuals EMS FastStart Manual, EMS Manual and EMS Reference Summary for background information and usage questions about EMS.
By default NonStop DOM EMS messages are sent to the system collector $0. To specify a separate collector for NonStop DOM messages you must first start up this new collector and then set the variable MY_COLLECTOR in the env.sh file to the new collector name.
To start a separate collector for NonStop DOM enter the following commands at the TACL prompt:
> ADD DEFINE =_EMS_TEMPLATES, FILE $SYSTEM.ZDOMSD20.newnres > EMSACOLL /NAME $xxx, NOWAIT/ BLOCKING OFF
The variable $xxx is the process name for the collector that you set with variable MY_COLLECTOR in the env.sh file.
Note: If you installed the Software Development Kit (SDK) version of NonStop DOM, your files are installed into the subdirectory ZDOMSD20. If you are using the runtime version of NonStop DOM, your NonStop DOM fileset is located in the subdirectory ZDOMRT20.
To view the NonSop DOM EMS messages the following commands must be entered at the TACL prompt:
> ADD DEFINE =_EMS_TEMPLATES, FILE $SYSTEM.ZDOMSD20.newnres > EMSDIST TYPE PRINTING, COLLECTOR $xxx
The variable $xxx is the process name for the collector that you set with variable MY_COLLECTOR in the env.sh file.
Let's look at an example entry from the EMS log:
98-03-17 09:05:40 \KT22.3,330 TANDEM.NSDOM.D44 001617
GIOP Proxy profiles exhausted. Severity:
Error, Component: Application, PName:
\KT22.$:3:330, PId: 1037303857, TId: 1,
File: proxy.cpp, Line: 531
The date and time of the event is given first. Because the log may contain many messages from a number of different processes, examining EMS messages that were logged during a particular interval is a useful strategy. The second line contains the text error message in this case GIOP Proxy profiles exhausted. This error message text gives information that can help you troubleshoot the problem.
Distributed applications typically involve interactions between a number processes. When problems occur it is helpful to be able to gain insight into these dynamic interactions. The NonStop DOM tracing facility is provided for this purpose.
By using the Tracing Facility, you can narrow the problem area to a specific set of interactions. For example, a client may send a request to an object and never receive a reply. In this case focusing more narrowly on the server hosting the object should prove fruitful. NonStop DOM provides tracing for a number of internal components. You should be judicious in enabling tracing because the volume of output can be large.
Tracing in NonStop DOM is provided for the ORB components (Comm Server and LSD), Services (Naming and Event), as well as for client and server programs. To enable tracing, you modify the configuration database or set environment variables defined below before starting the processes. To disable tracing, you reverse the earlier modification and then restart the processes.
There are two options for enabling and disabling tracing. The first is to set tracing environment variables. These are the standard OSS environment variables. Setting the variable is done with an assignment statement, usually in conjunction with the EXPORT modifier.
For example,
export NSDOM_CFG_TRACE_CS=TRUE
Clearing the environment variable is done with the unset statement. For example,
unset NSDOM_CFG_TRACE_CS
The other option is to set a value in the configuration database. The entity default@trace contains names and values for the various trace settings. If the value is TRUE, tracing is enabled for NonStop DOM components affected by the trace flag setting. (The Naming Service uses the setting for trace in the entity NS@name_service_settings.)
NonStop DOM provides tracing for a number of components. In general, tracing should be enabled for the smallest set of components that allow you to locate the problem area. In the sections that follow we give recommended trace settings. The table below shows the available trace settings. In the first column, the name of the environment variable is given. Column two shows the corresponding database key. The third column is a brief description of the trace output you can expect.
| Environment Variable | Database Name | Description |
|---|---|---|
| NSDOM_CFG_TRACE_CS | comm_server | Comm Server Activity |
| NSDOM_CFG_TRACE_ES | event_svc | Event Service Activity |
| n/a | trace (in NS@name_service_settings) | Naming Service Activity |
| NSDOM_CFG_TRACE_IR | ir | Interface Repository Activity |
| NSDOM_CFG_TRACE_GCFEH | event_context_free | ORB Low Level Event Handling for Pathway Protocol |
| NSDOM_CFG_TRACE_GFSEH | event_file_system | ORB Low Level Event Handling for Guardian File System Protocol |
| NSDOM_CFG_TRACE_SOCKEH NSDOM_CFG_TRACE_SOCKEH_DETAIL |
event_socket | ORB Low Level Event Handling for TCP/IP Protocol |
| NSDOM_CFG_TRACE_EVENT_CORE | event_core | ORB Low Level Event Handling |
| NSDOM_CFG_TRACE_GIOP_FW | orb_giop_connections | ORB GIOP Protocol Layer |
| NSDOM_CFG_TRACE_ORB | orb_request_queue | ORB Request Processing |
| NSDOM_CFG_TRACE_POA | poa | Portable Object Adapter Activity |
| NSDOM_CFG_TRACE_PROXY NSDOM_CFG_TRACE_PROXY_DETAIL |
orb_proxy | ORB Proxy Processing: Method Dispatches and the Results of Method Invocations |
| NSDOM_CFG_TRACE_THREADS | threads | NonStop DOM Thread Framework |
| NSDOM_CFG_TRACE_TIMER | event_time | NonStop DOM Timer Objects |
| NSDOM_CFG_TRACE_DETAIL | Sets verbose tracing |
Several NonStop DOM processes are started via the nsdstart script. This script contains trace flag settings that have been commented out. If you want to enable tracing for one or more of the system processes, uncomment the appropriate lines prior to running nsdstart. See the Administration Guide: ndstart topic for details.
An example of one of these lines is:
[ set server env NSDOM_CFG_TRACE_ORB=TRUE
Removing the '[' uncomments the line.
The following table shows the useful trace settings for each of the system processes and the default name of the trace log file.
| NonStop DOM process | Useful Trace Settings | Log File Name |
|---|---|---|
| Comm Server | NSDOM_CFG_TRACE_CS NSDOM_CFG_TRACE_SOCKEH NSDOM_CFG_TRACE_GFSEH NSDOM_CFG_TRACE_GCFEH |
cs.out |
| Location Service Demon | NSDOM_CFG_TRACE_ORB NSDOM_CFG_TRACE_GIOP_FW NSDOM_CFG_TRACE_SOCKEH |
lsd.out |
| Event Service | NSDOM_CFG_TRACE_ES | es.out |
| Name Service | TRACE | ns.out |
To enable tracing for an NonStop DOM client or server process, set one or more environment variables prior to running the process. For a client process, setting NSDOM_CFG_TRACE_PROXY to TRUE is a useful starting point. If more detail is needed, enable one or more of the protocol trace variables (depending on the protocols in use by the client). For a server process, setting NSDOM_CFG_TRACE_POA to TRUE is a useful starting point. If more detail is needed enable NSDOM_CFG_TRACE_ORB or enable one or more of the protocol trace variables (depending on the protocols in use by the server).
NonStop DOM contains a debug version of the Shared Runtime Library (SRL), named NSDGSRL. Among other things, the debug version of the SRL is provided to help you track down heap corruption problems and application memory leaks.
To enable the debug version of the SRL, modify the following line in your etc/env.sh file:
add_define=_SRL_01 class=map file=$G_NSDSRL
This line specifies the version of the SRL to be used by NonStop DOM. In this line, change the value _SRL_01 to NSDGSRL. Reverse this change when you are done debugging.
The debug version of the SRL (NSDGSRL) provides the following two methods:
::operator new( ), which does the following:
heap_check( )
0xcc
heap_check( )
::operator delete( ), which does the following:
heap_check( )
0xdd
heap_check( )
The effect of using these versions of new( ) and delete( ) are:
heap_check( ). This is a Tandem C runtime function that verifies the integrity of the heap data structures.
These actions introduce significant processing overhead which can dramatically decrease application program performance. The debug SRL should only be used in a non-production environment.