Over the past couple of weeks I’ve written two installments on voice codecs (A Cornucopia of Codecs and Codecs Continued). I mentioned some of the major characteristics of six different codecs and why you might choose one over another. However, I failed to point out something that is of great importance when making your codec choice. What do you do about Dual Tone Multi-Frequency (DTMF) touch tones? This is not something you can ignore because compressed codecs such as G.729 and G.726 will not successfully pass those tones along from sender to receiver. Does that mean that those codecs aren’t compatible with your voice mail system and SIP phones? Absolutely not. Read on to learn why.
How many of you are old enough to remember rotary telephones? For better or worse, I certainly am. In fact, it was all I knew for the first ten or so years of my life. Heck, we even had a party line for most of my childhood. Rotary phones used something called pulse dialing. You put a finger in the numbered hole in a “finger wheel,” pulled that wheel back to the “finger stop” and let go. During the return rotation, the electrical current of the telephone line would be interrupted in accordance to the number you dialed. The number zero would interrupt the circuit one time and the number nine would interrupt it ten times. The central office would then translate those current interruptions into the dialed telephone number.
DTMF was introduced by AT&T back in 1963 as a way to replace pulse dialing and rotary telephones. Now, instead of interrupting the electrical current to dial a number, the telephone produced a tone to represent the dialed number. Actually, it was two tones blended together — thus the Dual Tone part of DTMF.
DTMF has clearly been extended to purposes beyond simply dialing a telephone number. Interactive Voice Systems (IVR) prompt us for all sorts of things that we answer with button presses. We log into our voicemail systems and retrieve our messages with DTMF. If so inclined, you can even play Mary had a Little Lamb using DTMF.
DTMF wasn’t a problem with digital and analog telephone systems because they both use a toll quality (64 kilobit, 8000 Hz) audio connection. The tones and speech easily mixed with one another and tone detection hardware was able to separate the DTMF out for applications that required it. However, with VoIP and bandwidth concerns came voice compression and different techniques to send a legible voice stream using as little bytes as possible. This compression and voice encoding techniques wreck havoc on DTMF and render the tones undecipherable by the components that need to detect and act upon them.
Enter RFC 2833. With RFC 2833 you don’t send those DTMF signals on the same connection that you send your audio conversation. Instead, you send them out-of-band on their own stream. This allows you to compress the heck out of the voice without altering the DTMF signals.
Note: Technically, RFC 2833 has been replaced by RFC 4733, but for the most part people still want to call it 2833, so I do, too.
Depending upon the origin of the DTMF signals, they can start out in a separate stream, or that separate stream might be created by stripping the tones out of an audio conversation. An example of the latter would be a gateway that converts analog to SIP. Problems can arise from this stripping that need to be considered. The converter must “hear” the tone before stripping it out and sometimes there is leakage where the very beginning of the tone makes its way through. This might cause a voicemail system to “hear” two tones for a single tone. One would come from the RFC 2833 stream and the other in the voice stream. Fortunately, conversion hardware is getting better and better and these problems have become less common (albeit a bear to debug when they occur).
So, in terms of SIP, how is this RFC 2833 stream created and managed? Through Session Description Protocol (SDP), of course. SDP is used to describe the voice stream (e.g. G.729) and it’s also used to inform the recipient that RFC 2833 is available. Specifically, it uses something called telephone-event. Here is an example of an SDP media description that you might see in the body of an Invite message. Note the format of “0 – 15.” This represents the ten digits plus *, #, A, B, D, E, and Flash.
m=audio 12346 RTP/AVP 101
That’s probably about as much as you really need to understand about RFC 2833 and how it works. Its purpose is to create a separate stream for DTMF to allow voice codecs to strictly deal with creating the best possible voice stream using the fewest number of bytes. If you remember that you will be ahead of the game.