Surround Sound Broadcasting
(first published in “International Broadcast Engineer” magazine)

Introduction

At a very early stage in the development of stereo recording, it was recognised that the stereo signal contained additional information that could be used to create an additional impression of "depth". A simple 90 degree crossed pair of "figure of eight" microphones, in those days typically ribbon microphones, captures sounds coming from all directions and produces a normal stereo image for sources in front of the microphones. Sounds coming from the rear are also placed in the stereo image, though sources from rear left will be placed at the right and vice versa!

Such a microphone arrangement also picks up sounds coming from the left and right but such sources produce an "out of phase" stereo image and this could be a major part of the "difference" signal. Although in no way recreating the original sound field, audio enthusiasts realised that this "out of phase" information could be retrieved and used to provide extra "depth" to a stereo image. The simplest way to do this was to place a loudspeaker behind the listener and wire it between the left and right "hi" (red) output terminals of the amplifier! The results were of course variable as the listener normally had no idea what microphone technique had been used to make the record - then the only source of domestic stereo! Results could sometimes be improved by using two loudspeakers at the rear, wired to be either in or out of phase with each other. Whilst this was really just an excuse for idle experimentation, it did illustrate the important principle that two channel stereo contained additional information that could usefully be fed to other loudspeakers.

Commercially made domestic "quadraphonic" sound systems arrived in the late 1960s but the formats were unsuitable for broadcast either because they relied on high frequency carriers outside the broadcast bandwidth or because they used matrix technologies that involved phase shifting signals. Phase shifts inevitably caused worries about the mono compatibility of the broadcast signal.

They were also commercial failures for a variety of reasons and the general antipathy they generated for surround sound allowed broadcasters to largely ignore it. They certainly adopted Ambisonic technology and the Soundfield microphone as a production tool but invariably reduced the "B format" signals to conventional left/right stereo for transmission.

For years before, the cinema had been more adventurous, always looking for new effects to excite the audience. Compatibility with other systems was not an issue so new picture formats could be devised and new multi-channel sound formats to go with them. Cinemascope used its four tracks in a LEFT/CENTRE/RIGHT/SURROUND fashion and Todd-AO sometimes used one or more of its six sound tracks for "effects" to be replayed at the rear or to the sides of the audience. Cinemascope was used for many major features including Walt Disney's "Fantasia" as well as releases from MGM, Fox and later Warner Brothers. As an aside, it is worth noting that the Paramount developed "VistaVision" system which used a horizontally running 35mm film in contrast Cinemascope and this had a four channel sound system called "Perspecta". However, the film had only a single mono sound track but it included control tones which steered the image by adjusting the gains of separate L, C, R and S replay amplifiers. "White Christmas" (1954) was a well known film using this system.

The Perspecta audio image with was obviously far less reliable than that from Cinemescope but had a big commercial advantage because the single mono optical sound track could be replayed in virtually any cinema. Todd-AO and Cinemascope both required multitrack magnetic heads on the projector and this made release prints expensive to produce. These costs meant that only a limited number of cinemas which could justify the cost of the replay equipment and caused multi-channel sound to all but die out. Optical soundtracks could provide a very adequate system at much lower cost.

Dolby Stereo

During the 1970s, Dolby noise reduction was being used in its multi band "A" version for professional sound recording. The simpler sliding band "B" version also gained wide acceptance domestically, mainly on cassette recorders. Both these systems allowed record/replay chains with much less objectionable background noise and it was a logical extension to use Dolby noise reduction to improve the performance of film optical soundtracks. The improvement was so great that it made possible a reduction in the width of the optical track whilst still allowing less noise to be heard. This width reduction made space for two separate tracks to be recorded in the space previously used by just one.

Placing the two optical tracks side by side also meant that older mono projectors could read the double track as though it were a single mono one and provide an output that was the sum of the two, albeit without the proper noise reduction decoding. However, conversion to Dolby Stereo was a relatively low cost process and avoided the need for film prints with mono optical tracks to be distributed as well as giving vastly improved soundtrack quality.

The usefulness of a two channel stereo system is limited in cinemas because of the large size of screens and the wide variety of viewing angles that occur from different parts of the auditorium. A simple two channel stereo system needs listeners to be close to having an equal distance and angle between their normal line of view and the left and the right loudspeakers. In a typical cinema, this can only happen for a limited number of the audience. Those sitting elsewhere hear an image coming predominantly from the speaker to which they are nearest.

The Dolby Stereo system therefore includes a central loudspeaker located behind the screen and mixes generally have most of the dialogue routed through it. This helps to create a stable central image for all of the audience. In creating a new cinema standard, it also seemed appropriate that the "effects" (rear) channel concept should be included. The early multi-channel magnetic sound systems had already pioneered the concept of placing speakers to the sides and rear of the audience so the "rear/effects" channel was designated as "surround".

There was therefore a total of four sound channels to be placed on the film. To encode the left, right, centre and surround (effects) signals, Dolby adapted the matrix technology abandoned by the "Quadraphonic" systems of a decade earlier and used the two film tracks to record the matrix output.

Replay systems often have additional speakers located behind the screen or elsewhere for low frequency "sub bass" duties though the input to these is derived from the two optical tracks. This has therefore been the basis for most cinema film production and theatres for the last 20 years. There still remain some cinemas with facilities for discrete multi-channel sound tracks but only a limited number of high budget films get released in formats to exploit these.

This design is now the one most commonly found, though the number of surround speakers varies depending on the size of the cinema. The established Dolby Stereo system of course uses only one signal to feed all the surround speakers.

Dolby Surround

Some films get the benefit of being re-mixed to produce specific sound tracks for home video and for television broadcast. A great many more end up being distributed to the home with only the Dolby Stereo soundtrack that was intended for replay through the decode matrix installed in cinemas. These two channels from the film are therefore often presented to home viewers, whenever they have a stereo VCR or stereo TV sound system.

The encoder used for cinema encoding is the Dolby MP Matrix encoder (MP=Motion Picture) encodes the "surround" signal by mixing it a level of -3 dB to the left and right output of the matrix to ensure a "constant power" arrangement. A phase shift of +90 degrees is applied to the surround component sent to the left output and -90 degrees to the right. As with any stereo transmission path, differential level and phase errors cause the image to be degraded and phase errors tend to be worse at the extremes of the frequency band. These can arise both in the transmission system and on domestic video recorders, particular with analogue tracks. When matrix encoding is used to add additional information as in the Dolby MP matrix, these phase errors cause crosstalk into and from the centre and rear channels. Sibilance can be a particular problem, as it contains high frequency components and if these break into the surround channel, the stability of the sound image for the "on screen" dialogue can easily be damaged. The surround signal is therefore band limited to 100 Hz to 7 kHz and has a modified form of Dolby B noise reduction applied.

Despite the subtleties employed in the MP matrix, the end result is that the surround signal ends up as a difference signal and it was not long before audio hobbyists once again reconnected their rear speakers to recover an approximation to it! The results of such inelegant decoding are variable at best and Dolby decoders including a rear channel amplifier, band pass filtering and the appropriate Dolby noise reduction system were soon on the market. The centre speaker is not always needed for a few viewers around a small domestic TV screen so the full complexity and cost of the cinema decoder was avoided. Whilst worthwhile surround sound reproduction is obtained, it is clearly different from what would have been heard in the cinema so the term "Dolby Surround" was devised as a convenient label.

Whilst simple matrix decoding of the Dolby Stereo signal to Dolby Surround produces an effective result, there are obvious limitations to the separation between the various channels. For example if a simple matrix is used, a source positioned front left will produce as much output in the difference channel as on the front left. Similarly, sources intended to be positioned totally in the surround will also produce an out of phase output in the front channels. The Dolby Surround - Pro Logic system provides enhanced separation between the channels by using some sophisticated techniques.

Pro Logic

A Dolby Surround Pro Logic decoder uses the same principles as its earlier cousin but applies some techniques to "steer" the apparent direction of sources. The outputs of the basic decoder could be compared to determine what should be the real direction of that signal. For example something present equally on the left and right outputs could be determined as belonging only on the centre channel and the gain of the left and right output therefore reduced. However, real world images contain many sources at different positions so a more sophisticated approach is needed. As with their noise reduction systems, Dolby again use psycho-acoustic principles. In this case they exploit two particular failings of the ear. One is that it is not good at localising two separate sound sources of similar level. Another is that when there is one dominant sound source, any change in position for other sources will generally go undetected.

The main complexity in a Dolby Pro Logic decoder is with the systems that determine the dominant component of a mix. In particular, it is important to be able to identify the differences between signal sources not as simple difference signals but as ratios. In many decoders DSP techniques can be used to achieve this.

Once the dominant signal has been identified, the decode matrix parameters can be adjusted to ensure that source appears to come from that direction. Particularly when digital signal processing is used, separations can be improved from the single figure performance of the simple matrix to over 35 dB between adjacent channels and between "opposite" channels, the crosstalk is limited only by the layout of the analogue circuitry!

Dolby AC-3

Although DSP techniques are used in many Pro Logic decoders, the systems all mentioned so far are designed for analogue audio systems. The Dolby Digital format found on cinema film releases uses a bit rate reduction technique to reduce the data volume to 320 kbit/sec to allow it be placed in an optical digital audio track that is located between the sprocket holes on 35 mm film as shown on the diagram of a frame or two of film.

Like other "perceptual coding systems", AC-3 codes only information that makes an audible contribution to the sound, ignoring low level elements that are close in frequency to higher level components. This takes advantage of the inherent masking that occurs in the ear/brain system. The lost information can be considered as "noise" but by dividing the audio spectrum into sufficiently narrow bands, the coding "noise" is masked by the dominant programme signal in that band. When the signal in a particular band is low, so is the coding noise!

AC-3 coding allows anything from a single mono channel up to more than five channels to be coded into a single bit stream with 20 bit resolution. The "3" part of AC-3 is simply a development reference number for their "Audio Code"!

AC-3 coding is capable of carrying five full bandwidth channels (20 kHz) and one narrow band channel as well as control and data identifying the channel structure being used. The narrow band channel has a low pass filter that is 3 dB down at 120 Hz and is used for the "sub-bass" channel. The overall band width of this channel is around 10% of the other audio channels so the term 5.1 channel is applied to the overall system using these six channels of audio. The five full bandwidth channels are then used to carry a discrete left, right, centre, left surround and right surround signals.

AC-3 coders can be used with fewer than 5.1 channels of audio and one format currently in use is sometimes described as 2/0. The reduced volume of data also means that data rates of typically 192 kbit/sec can be sufficient for two channel signals instead of the 384 kbit/sec more commonly used for full 5.1 systems. The two channel version also provides a simple conceptual step from the two channel stereo with which users are familiar. These two digital channels can be used to carry Dolby MP matrix encoded signals which are then replayed through conventional Pro Logic decoders. This 2/0 format, together with the mono version and also the full 5.1 have all been grouped together and given the generic common label of "Dolby Digital".

These various channel formats exist on various media with 35 mm film typically carrying the full 5.1 channels as well as also having the two Dolby Stereo tracks in the conventional place on the film. 5.1 channel audio is also found on all NTSC DVD releases, and some European versions though both also have the option of other formats including MPEG-1 or MPEG-2 two channel or multi channel together with Linear PCM audio. The choice being determined by technical issues of what else is to fit on the disc by way of video/images and also who is sponsoring the project!

In the home, decoders are capable of handling the AC-3 bit stream in way appropriate to the environment. In domestic installations there may be no sub-bass speaker so bass can be routed back into the other channels. Some early decoders routed this to all the other channels. Demonstrations have shown problems when powerful sub-bass signals that would dramatic effects for the cinema are rerouted into the small speakers often used in the home for the rear/surround channels. No doubt later systems will be more sophisticated! The system includes the potential to provide the user with options as to how much dynamic range is decoded.

DTS 5.1

The DTs system encodes three AES/EBU two channel bit streams and by using bit rate reduction techniques, recodes the compressed data into a new bit stream that is broadly similar to an AES/EBU two channel stream. Of course, as this is now encoded data it cannot be replayed through a conventional AES/EBU to analogue convertor but must instead be replayed through a DTs decoder. There are important difference between Dolby AC-3 and DTs which are argued strongly by their promoters, however, from the point of view of broadcast production the differences are not significant.

Mixing for Discrete Digital Formats

The non discrete nature of the now traditional Dolby Stereo matrix means that it is essential to monitor the signals after they have been matrix encoded and subsequently decoded. This is a standard feature of film dubbing studios and the techniques are familiar to dubbing mixers. A normal by-product of the process is that the surround speakers often carry low level signals resulting from the matrix parameters even if nothing is actually routed into the surround input of the MP matrix from the mixing console.

The phase relationships that often exist in stereo tracks of background atmospheres often cause them to "leak" into the surround speakers. It is a happy chance that such ambience can often sound entirely right spread around the sides and rear of the theatre. Indeed, it sometimes seems that a large part of the content of the surround channels arrives there through this "automatic" process, though of course what the dubbing mixer is hearing includes this "diffusion" of the ambience so it becomes a perfectly deliberate and controlled part of the mix.

When the matrix is abandoned and the channels are recorded discretely, this diffusion of ambiences into the surround no longer takes place. Some early 5.1 channel mixes contain surround channels which remain extremely silent until a sound effect occurs in them. This can mean the listener's attention is suddenly drawn to sounds coming from beside and behind him and be distracting. Mixing techniques are of course developing to ensure that some ambience is routed to both of the surround channels so as to preserve the natural "wrap around" that used to be provided automatically by the matrix system.

Future trends

Television tends to be the catalyst that persuades consumers to invest in new decoding systems. Once they are installed, there is no reason for them not to be used in radio broadcasting and on music CDs. Indeed, these media avoid the potential dilemmas of matching the sound image size to picture size so can exploit the surround mode to the full!

Although for cinema purposes, the technology is now available to encode more discrete channels (e.g. 7.1 with additional LC and RC channels) it seems probable that for most domestic environments, people will not be keen to install more speakers than those required by basic 5.1 systems, not only for ergonomic reasons, but because the 5.1 format is one which the film industry has adopted, for which dubbing studios have equipped themselves and for which a wide variety of consumer equipment is available.

All material is copyright PHM © 2004.

P H M (P H Music) : PO Box 383
Bury : BL8 4WX : GB
tel/fax: +44 (0)1204 887161
email:
peter@phmusic.co.uk