The appropriate utilisation of the available dynamic range of an audio transmission system has been the subject of considerable attention in the broadcast and pro-audio industry. It has received similar consideration in the world of telephony.

Dynamic range is the difference between the maximum and minimum signals in the programme material. It is necessary to keep the quietest signals above the noise inherent in the medium, yet ensure that the loudest signals do not distort or cause unwanted side effects in the transmission medium.

In the broadcast environment, a sound engineer monitors the signal being transmitted or recorded using a peak programme meter (PPM) or a volume unit meter (vu). An experienced operator can ensure that the programme peaks are at a consistent level and that the programme is efficiently using the available dynamic range. The content of the programme will influence its apparent loudness and the engineer will raise or lower the level independently of the meter indication to maintain the desired comfort level.

The telecommunications environment is rather different. Instead the telephones need to respond consistently from one product to another and the networks carrying the signals must not introduce significant increases or decreases in the apparent loudness level. The signal converted in the earpiece or loudspeaker must emerge to the listener at a uniform level. Several standards have emerged to define the appropriate acoustic levels at the microphone and the earpiece. Once these acoustic signals are converted to electrical signals it becomes much more difficult to determine the level.

The performance of the human ear influences the way we need to measure signals. The ear response is logarithmic, ie a doubling of the signal amplitude does not sound twice as loud. The difference between a whisper and a shout is about 40dB or a factor of 100. Fortunately the dynamic range of ordinary speech is relatively small and so the transmission system need not be very sensitive. The ear also responds to different frequencies in a non-linear manner. For a given amplitude, low and high frequencies do not sound very loud, whilst the ear is more sensitive to the mid-range frequencies. Thus the telephone network need only pass frequencies in the range 200Hz to 4kHz for speech to be fully intelligible. This rather limits its usefulness for orchestral reproduction as some users of music on hold systems should realise.

The extreme permissible programme signal levels are the clipping level, above which the peaks are cut off and the noise floor, below which any signal is lost in noise. The effective dynamic range is the difference between the two. Different media have different dynamic ranges, eg the effective dynamic range of a replayed CD recording is about 90dB, a good analogue tape recorder has about 55dB, FM radio provides about 50dB. An abundance of techniques exist to extend the dynamic range of a medium. They usually involve compression - reducing dynamic range requirement by making quieter signals louder followed by expansion of the signal or limiting - making sure peaks in signal level do not cause too much distortion, or both. These signal processing techniques can cause unwanted side effects such as noise tails or pumping but they do not affect the uncritical listener.

Some of these analogue techniques have been utilised in the digital signal processing systems employed in the telecommunications industry. Different networks have different coding techniques designed to maximise the intelligibility of a speech signal for the data rate being provided. The more modern and sophisticated techniques rely on the different properties of speech sounds to reduce data rates to a minimum. Thus the transmitted material does not easily relate to the original sound. When it is recovered to the analogue domain for the listener, the original signal may have been converted several times from one format to another.

This multiple conversion process can give rise to significant energy or amplitude error between the speaker and the listener. It is important that cross-network transmission should appear to be transparent to the users. Going from analogue to the local exchange, through PCM for the trunk, to GSM for delivery to the called party requires analogue to digital conversion, digital to ADPCM, ADPCM to GSM, and GSM to analogue.

Transcoders translate a digital signal from one format to another. Any gain or loss in the signal level will result in errors being promulgated on the network. It is not practical to devise a tone test for transcoder performance. The best method is to use real traffic and monitor the signal level over a period of time.

The peculiar nature of speech, its discontinuities between syllables, pauses between words and sentences, variable form factor, render conventional sine wave type measurement methods very inaccurate. ITU Recommendation P.56 describes a technique for the measurement of speech signals that is reliable and repeatable.

The technique employed in the PCSV6 Digital Speech Voltmeter fulfils the requirements in ITU Recommendation P.56. The speech voltmeter samples the input signal, then calculates the total energy by squaring and accumulating the samples. A second order exponential filter is applied to the samples to determine the envelope of the signal. The envelope value increments a counter for each of the fifteen 6dB spaced thresholds that it exceeds or has exceeded in the last 200ms. This 200ms hangover period provides a continuity in speech level between the natural silences that occur after certain sounds.

When the sampling process is terminated, the program calculates the long term mean level from the sum of the squared samples and the number of samples. The active level is determined by an iterative process such that it is 15.9dB above the threshold from which it is derived. Finally, the activity of the signal, the percentage of time when the active level is exceeded, is calculated.