Overview
MiaRec implements a redundant, high availability architecture.
Below diagram show a network design of redundant recording in BroadWorks environment. Similar design applies to Cisco Built-in-Bridge recording interface, SIPREC recording interface for Metaswitch CFS, Metaswitch Perimeta SBC, Avaya SBC, Oracle/AcmePacket SBC.
Supported features
- Automatic fail over to the next available server in a cluster
- Load balancing of recording traffic between multiple servers
- More than 2 master servers in a cluster
- Geographical redundancy
- Replication of data may be continuous (immediately upon call completion) or by schedule (at night during low load hours).
How it works
A MiaRec cluster supports 2 and more servers. Any server in a cluster may receive recordings at any time. Upon call completion, audio files and call metadata is automatically uploaded/synchronized to other servers in a cluster.
This document describes implementation of redundancy for BroadWorks SIPREC and Cisco SIP Trunk built-in-bridge recording methods. Implementation of recording interface for these two platforms is based on similar principles with some variations.
Redundancy - new recordings
At the beginning of call recording, the phone system (Broadworks / Cisco UCM) sends SIP INVITE to the first available server in a cluster. If the primary server is down or its network is disconnected, it cannot respond to the SIP invitation. The usual SIP processing in this case is to deliver the invitation to the next recording server in the preference list.
Redundancy - in-progress recordings
If a recording server fails, all active recordings will be interrupted. If failure was caused by issues with network, then call recordings will be completed automatically by timeout (configurable). If failure was caused by hardware/software issue with recording process, then such recordings will remain in ACTIVE state till administrator manually mark them as completed. In both cases, the recording data will contain media from the beginning of call till the failure moment (unless there is issue with disk system).
MiaRec supports advanced architecture in order to achieve fault-tolerant architecture for in-progress calls. This architecture involves a dedicated recording server, which is configured in passive recording mode. Currently it is tested only for Cisco BiB protocol, but may work for SIPREC protocol with other phone platforms as well. The Cisco BiB network traffic, which is sent to the primary recording server, should be mirrored to a redundant server, which works in passive recording mode. This server records a copy of each call that is captured by the primary server. In case of the primary server failure in a middle of call, the redundant server has ability to continue recording of such call till the call disconnect. Such mechanism is based on architecture of Cisco Built-in-Bridge mechanism. Once media forking is activated, Cisco IP phone continues to send RTP packets to the primary recorder even if the latter is not reachable anymore. The phone doesn’t stop sending of RTP packets even if it receives “port is unreachable” ICMP error message. The redundant server continues to capture such RTP packets till call completes. This allows to achieve 100% redundancy for call recording.
Redundancy - completed recordings
After a recording is complete, MiaRec adds the call recording into queue for automatic replication to other server(s) in a cluster. Such data replication may be started immediately upon call completion or scheduled to specific time of day (for example, at night).
Geographical redundancy
MiaRec servers in a cluster may reside in different datacenter for geographical redundancy. There is no requirement for minimum latency between servers. It is only required that bandwidth between datacenters is enough to process data replication.
Data replication may configured as continuous (immediately upon call completion) or by schedule at specific time (for example, at night during low load hours).
Although there is no requirement to the 100% of availability of network link between datacenters. In case of unavailability of the target replication server, the replication process will be retried when network connection is restored.
The source replication server uses queue for data replication. The call recording is removed from queue only after successful replication. Overhead on queue is insignificant (it uses only a hundred of bytes per call recording in replication queue).