17. Monitoring clusters with QueueMetrics

QueueMetrics is able to monitor clusters of Asterisk servers, in order to monitor large call centres that are spread over a number of physical machines. This setting is often used for large deployments, as it leads to a number of advantages:

In order to implement this, QueueMetrics has been extended to support the notion of cluster, that is a set of Asterisk servers working together as if they were one single box. The cluster can be set up as is better fit, for example:

When QueueMetrics runs in cluster mode, the whole call center is monitored as if it were a big single Asterisk box, and the basic unit for reporting remains the set of selected queues. QueueMetrics will internally query the different servers or queue_log files as needed, and will automatically dispatch events to the correct Asterisk box.

17.1. Setting up a cluster

To set up a cluster, you should define the following configuration variables in configuration.properties:

cluster.servers=aleph|trix

This statement tells QM that the current cluster has two members, that are called "aleph" and "trix". We suggest using a short name for each server, as it will appear in many different screenshots. One option would be using the capital letters, like ""A", "B", "C" etc for different members of the cluster.

For each server (in our case "aleph", but we’ll have to repeat it for all members of the cluster), we will define the following properties:

cluster.aleph.manager=tcp:user:pass@10.10.3.5

This tells QueueMetrics that the manager interface for aleph can be found at 10.10.3.5, logging in as "user" with password "pass". The manager interface is needed to run Live monitoring and can be used to run commands to Asterisk (like logging agents on and off, starting chanspy sessions, etc).

cluster.aleph.queuelog=sql:P001

This tells QM that the queue_log file (or its contents) can be downloaded from partition P001 of the QM database. You must use MySQL storage in order for clustering to work at all.

cluster.aleph.monitored_calls=/share/aleph/calls/

This tells QM where to look for recorded calls on each Asterisk server. This is used by QueueMetrics in order to click-and-listen to recorded calls. A NFS or SMB share is usually a good starting point. As an alternative, you can enter the URL of an XML-RPC server that will return information about the recorded call (for more information on this topic, see Section 20.11, “Enabling XML-RPC call listening and streaming”).

cluster.aleph.callfilesdir=/share/aleph/callfiles/

If you do not want to connect to your Asterisk servers using the manager interface, you still need a way to send it commands (e.g. to start a chanspy session). In order to do this, you should give QM a directory to write callfiles to. If you use the manager interface, leave this entry blank. (We strongly suggest doing so and using the manager interface instead).

cluster.aleph.audioRpcServer=http://myserver/xmlRpcServer

If you use an XML-RPC "broker" in order to used live calls listening using a third-party software like Orecx, you should enter its URL here. This must be activated at once for all servers by not leaving blank in the property default.audioRpcServer. In all other cases, just leave this property blank. (for more information on this topic, see Section 20.11, “Enabling XML-RPC call listening and streaming”).

cluster.aleph.agentSecurityKey=AAA

When using the agent’s page in cluster mode, you must make sure that each agent "points" to the correct server, as this server will be used for both pulling agent’s data and sending logon/logoff commands. This is obtained on the agent’s page through a pull-down menu where the agent must select the correct server he’s logged on to. In order to avoid mistakes, it is possible to protect a server by adding a security key, so that only agents having that security key will see that server. If an agent has only one possible server, that server will be automatically selected. In practice, this means that you could create two agent classes, we call them AGENT_A and AGENT_B. They have the same keys, but in class AGENT_A there is the key SERVER_A, and in the other SERVER_B. We protect each server entry with SERVER_A for the first and SERVER_B for the other. Then we assign users to classes AGENT_A (for agents working on the first server) and AGENT_B (for agents working on the second server). If you want agents to manually switch servers, or your cluster is made up of only one machine, leave this blank.

17.2. Setting up the members of the cluster

On each box that is a member of the cluster, you should set up the following items:

  • Call recording: if calls are recorded to be played back through QueueMetrics, you should store them all in a directory that is accessible through the QueueMetrics server, or set up an external XML-RPC call broker.
  • Commands: if commands are to be sent to each Asterisk box, you should set up the [queuemetrics] context in the dial plan, and make sure the manager interface is set up or the /vars/spool/asterisk/callfiles directory is shared and accessible to the QueueMetrics server. A sample [queuemetrics] context can be found under WEB-INF/mysql-utils in the directory extensions-examples.
  • Logs: you should use qloaderd to upload data to a partition on the main QueueMetrics database. Make sure that each server uploads data to a different partition in the same database.
  • Clock: make sure the clocks on all members of the cluster is synchronized, and the same goes for the clock used on the QueueMetrics box and on the MySQL database. An utility that sync your machine’s clock to an external timing source like ntpdate will take care of this problem if run periodically through cron.

17.3. Setting up QueueMetrics to access the cluster

First thing, you should make sure that you have a clustered license for QueueMetrics and that your license is big enough in terms of agents to support all agents that are present in the call-center. Older licenses are valid for one Asterisk server only and QueueMetrics will complain they are not correct. The first releases of QueueMetrics 1.4 will anyway allow accessing a cluster up to a specified future date (likely October 2007).

To report on all members of a cluster, you should set the property:

default.queue_log_file=cluster:*

This means that all boxes defined as members of this cluster will be used a s a data source.

To report on a subset of the members of the cluster, you will use a syntax like:

default.queue_log_file=cluster:A|B|C

This way you will be reporting on boxes A, B and C.

If you want to report only on a single box, the syntax:

default.queue_log_file=cluster:C

Will be appropriate.

You can then change this property on-the-fly by going to the "Custom reports" page and editing as needed under the "File" text box.

If you have agents using QueueMetrics’s Agent’s pages, you should set them up so that each agents points to its correct server.

17.4. Using the Agent’s page with a clustered environment

The agent’s page on QueueMetrics acts as a kind of portal for an agent; she can use it to log on, log off, go to pause, enter pause codes, launch external apps linked to a call (e.g. CRM apps) and enter call codes (see The real-time agent page).

As the number of agents can be very high if compared to the number of supervisors who run reports or monitor the call center, QM uses a "minimal impact" policy: the page must be refreshed manually by each agent in order to avoid hammering the server with repeated page hits and the analysis run is a stripped-down, low-fat version of the full analysis QueueMetrics is able to perform. When coming to clusters, this means that to avoid useless load, calls for an agent will be searched only on the server the agent is working on and not on the entire cluster.

Also, we have the problem of defining where an agent is supposed to work: as QM can issue commands to Asterisk on behalf of an agent, it needs to know to which Asterisk server those commands must go. This is obtained by using the Server selection that will appear on the agent’s page if QueueMetrics is running in clustered mode. If more than one server is selectable, the combo box will let the agent switch server as she best sees fit (if only one server is selectable, QueueMetrics will use that server immediately and will make the combo locked).

As a QM installer, you can control which servers are selectable to which agents by setting the properties cluster.---.agentSecurityKey correctly for each Asterisk server in the cluster.