The 5 Minute Cluster Communication Check for IBM Sterling B2Bi

By: CoEnterprise | August 14, 2020

Imagine if you will—your busy data center. Banks of servers hum in their racks, united in their work. Without notice, a micro-surge from a cooling unit cascades through a network switch into one of the server’s network cards. The NIC hiccups, locks its driver, and now it’s down.

Another server begins receiving the full load of both systems, as the load balancer begins failing over traffic to the only functioning node. Data moves, but traffic is bottlenecked. Jobs begin backing up in the overworked queue. An hour later, the lonely remaining server runs out of RAM, core dumps, and halts.

Now you’d really have a problem if this was real.

It may sound far-fetched, but preparing a network is all about planning and loss mitigation. Every good administrator thinks through scenarios, constantly spinning themselves tall tales in order to be prepared for any occurrence. Cluster health is a major aspect of these scenarios.

Clustering is a valuable and sometimes crucial feature of IBM Sterling B2Bi’s bag of tricks. By providing multiple servers, an organization can load-balance traffic as well as provide critical failover capacity. There’s nothing like laying down a few more lanes of Information Superhighway pavement to make your EDI traffic fly and provide high availability. But all this communication voodoo depends on a symphony of precise coordination between your nodes. Without that, the entire effort is for naught.

This leads us to an important question: What is the quickest and easiest way to test inter-node communication status? IBM has an answer, and it can be implemented in a matter of minutes!

Buried in the IBM Knowledgebase is a handy little technote, which addresses the problem of Sterling B2Bi not providing proper load balancing under Reference #1536487. It recommends troubleshooting by checking logs and UI Node Status pages, but what we’re interested in today is Step 6, a workflow communication test and a snippet of BPML code.

Here is the code we want:

<operation name=”Timestamp Utility”>
<participant name=”TimestampUtilService”/>
<output message=”TimestampUtilServiceTypeInputMessage”>
<assign to=”mandatoryNode”>node2</assign>
<assign to=”action”>current_time</assign>
<assign to=”format”>yyyyMMddHHmmssSSS</assign>
<assign to=”.” from=”*”></assign>
</output>
<input message=”inmsg”>
<assign to=”.” from=”*”></assign>
</input>
</operation>

<operation name=”Business Process Metadata”>
<participant name=”BPMetaDataInfoService”/>
<output message=”BPMetaDataServiceTypeInputMessage”>
<assign to=”CORRELATION”>TRUE</assign>
<assign to=”DISPOSITION”>FALSE</assign>
<assign to=”TRACE”>TRUE</assign>
<assign to=”.” from=”*”></assign>
</output>
<input message=”inmsg”>
<assign to=”.” from=”*”></assign>
</input>
</operation>

</sequence>
</process>

Source: http://www-01.ibm.com/support/docview.wss?uid=swg21536487

It’s a simple bit of code, which forces a node-specific timestamp to be written to Process Data, and allows traceable info to be written to the Correlation Table, for easy reference, later.

Simple, fast, and clean.

This particular code is designed to be used in a Two-Node scenario, as indicated by the two instances of the Timestamp Utility, and the <assign to=”mandatoryNode”> tags. By duplicating all of the Timestamp BPML and changing the Node assignments, this process can be expanded to as many nodes as you have in your cluster group. Simply check in this code as a new business process, and you’ll have a quick BP that forces Sterling B2Bi to distribute operations across every node in your setup.

Check-In

To Check-In the new process, we’ll start at the dashboard.

Click “Business Processes.”
Click the “Manager” sub-menu.
Under “Create” and “Process Definition” click the GO button.
We’ll name the new BP “TestClusterComms” (matching the BPML code we cited above in the <process name=”TestClusterComms”> field). These must match. If you plan on changing the name of the BP we’re creating, you must change the BPML code to reflect the new name.
Select “Business Process Text Editor” for the Input Mode.
Click “Next.”
We’ll use “Initial Check-In” for the Description field.
Paste the BPML Code from above into the “Business Process” field, as shown.
Click “Next.’

At this point, the Process Levels need to be defined. For this, I recommend following the System Defaults which should already be selected. Adjusting these levels should be done with care, but they can be useful for providing additional information during execution or resuming if a failure occurs. Please defer to your System Administrators if you want to change any of the default options. Click “Next” to continue.

On the following screens you can define Deadline Settings and BP Lifespan, but as with the Process Levels, I’d suggest you keep in line with the System Defaults unless you require those functions.

On the Confirm page, make sure “Enable Business Process” is selected and then click the “Finish” button at the bottom of the screen. Your BP is now checked-in and ready to use.

Usage

Search for the new “TestClusterComms” BP and click on the Execution Manager.

BPM

Then click on the “Execute” field.

Execution

In the window that pops up, click “GO.” Make special note of the Instance ID number if you need to look up the details of the BP later. A successful run of the BP will look like this:

BPD

Steps 1 & 2 have been highlighted to show each Timestamp Utility Service, running on each respective Sterling B2Bi Node. The [m] following the Execution Node shows that the Timestamp Service was forced to be Mandatory on each respective Node. Each Timestamp success in the execution of the process is proof-positive that the job has been successfully handed off to the appropriate node in your cluster.

A failed execution might look like this:

BPDFail

In this run, Node1 reports “Resource Disabled,” which makes perfect sense because for this example I intentionally disabled Node2 in order to generate the error. Whatever the error happens to be, it means something is afoot in the cluster and it’s time to look closer for a cause.

Customization

If you want to take this quick guide a step further, you could consider scheduling this BP for periodic execution and adding an OnFault flow for notification, thus converting this from a handy test into a potentially valuable first-alert of unrest in the cluster.

Considering the utility that checking in this code can provide you, there’s no reason why you shouldn’t take the time to do so. It does its job well, offering a clear, concise, and incredibly insightful window into a potentially crippling issue in near real-time.

Because the resource requirements for this execution are so low, it can be run often and quickly, giving you an at-a-glance verification that your cluster is alive and well and communicating with all its constituent nodes.