The Covid-19 pandemic has had many impacts on all our lives. For musicians, one of those impacts was the inability to perform music in an ensemble, something that many (including myself) find one of the most enjoyable aspects of music making. Seeing as it is the 21st century, many musicians turned to digital platforms as a way to create, perform and share their music - if you are reading this then you are most likely already aware of my wife Abigail’s activities in this area. Over the past year she has digitally released 17 singles, 21 recordings of classical compositions and has organised two live-streamed concerts featuring her own music. It is the recordings of her classical works, and in particular the choral pieces, that I would like to focus on here. These are scored for between three and eight voices, and include two hymns, two anthems and a setting of a Shakespearean sonnet.
In more normal times these pieces would likely have received their debut performances during an in-person concert, likely with an array of talented musical friends and contacts roped in to sing the parts, sharing the same physical stage. However, as household mixing was not possible, we instead made recordings of the pieces by getting the singers to each record their part individually in their own homes while listening to a guide track (a process I will refer to as asynchronous recording). I then had the task of putting the parts together and mould them into a vaguely convincing representation of a full choir. Going through this process for a number of pieces highlighted to me several aspects of performing in the same physical space that previously I had taken almost completely for granted, and that significantly elevate the performance. These include coordinating the timing of the music between performers (this involves not only singing at the same tempo, but also coordinating breaths and consonants at the starts and ends of words), tuning between and within parts, dynamics, and the eponymous quality of ‘blend’ (which for me encompasses the modulation of tone, vowel sounds and phrasing to reach a unified consensus style between singers). I then found that the audio engineer’s job is to try and replicate all these effects by editing the individual parts together to produce a recording that sounds as if those singers were performing synchronously in the same room, rather than asynchronously, hundreds of miles apart and using completely different recording devices. Quite a challenge.
I will briefly describe how I attempted to mimic each of these effects in the editing process and outline some of the difficulties involved. The first is timing. This is theoretically a fairly trivial, if laborious, issue to fix. Modern audio editing software allows sections of audio to be stretched or shrunk to a desired length without changing the pitch of the sound, meaning the vowels of each singer’s part can be aligned to coincide with each other, correcting any timing inconsistencies. Seems simple enough. However, I soon learned there was an art to this – the joins in between pieces of stretched audio can only be placed in certain regions to avoid introducing audible clicks. You can also only stretch a vowel so far before the audio quality becomes too degraded, and only squash it so much before the vibrato turns into an inhuman, rapid quivering. This, coupled with the necessity of making autocratic decisions about precisely where noticeable consonants and breaths are to be placed (or else muted if impractical to move in certain parts), and always abiding by the overarching rule to make the end result sound natural and unedited, makes the aligning process rather difficult. For the most part I have been fairly pleased with the results (upon hearing the final recording of one piece, one of the performers remarked how impressive it was that all the singers had just ‘happened’ to be so in time despite recording separately) - until, that is, I remember that the several hours I invested in doing all this aligning just produced the same result as one 3-minute recording of the singers in the same physical space would have done.
Another advantage of in-person singing is the ability to tune to the other voice parts. I would argue that it’s quite a bit easier to tune yourself to other singers in the same physical space than to the sound of a disembodied piano coming through your headphones, particularly as the headphones themselves somewhat block the sound of your own voice, making it harder to correct mis-pitching. Fairly effective audio tools exist that can be used to improve the tuning of the individual parts, but they can also introduce unnatural artifacts that need to be carefully removed or hidden to get a pleasing result. These and other limitations means the choice of where and how to use these tools requires much more artistry and skill than is often assumed.
Possibly the most magical advantage of in-person singing is the ability to blend with the rest of the ensemble - to match phrasing, vowel sounds and tone to one another, creating a homogenous, enchanting sound. This is probably the hardest effect to replicate in the editing process. When presented with an identical piece of music, individual singers often produce quite different interpretations (even despite several years of shared experience singing in the same choir, as was often the case for these recordings). Some tricks, such as selectively EQing certain notes to better match the tone of the rest of the ensemble, or even reusing a more favourable version of a repeated phrase, can go some way to manufacturing the impression of blend between the performers. However, in my opinion this is the hardest effect to replicate in an asynchronous recording regime, and is probably the weakest aspect of the choral recordings we have produced.
There are a few other, more technical issues as well, such as the effect of using different types of microphones for each voice (which can have a significant effect on the quality of sound), the space in which each recording is made (particularly whether any room reverberance is captured, which makes editing more challenging), and the noise level of each recording. Large discrepancies in these aspects of the individual recordings can have a negative effect on the ‘naturalness’ and polish of the final recording, and need to be compensated for as much as possible, introducing a further element of difficulty.
But it’s not all bad! The asynchronous recording regime enables individual control of each voice part, allowing for more flexibility in dynamics and balance. For example, it’s easy to bring out an interesting moment in an inner part that might otherwise be buried, or to adjust the balance of a phrase to better reflect the perceived composer’s intentions. Fixing moments when two parts come into unison and become overly loud is also trivial in this regime, whereas during an in-person ensemble recording the individual singers would have to be aware of and correct this themselves. To some extent the limitations of individual singers themselves can also be overcome; for example, if a phrase is too low for a singer to sing very loudly, they can be turned up relative to the other singers to preserve the overall balance. The spatial distribution of the parts can also be granularly controlled, as can reverberation, allowing the sense of space in the recording to better match the intended ambience of the music.
So, to recap, the process of asynchronous recording and editing consists of the combination of separate recordings of individual parts into a single, complete, and (hopefully) natural-sounding representation of the piece, as if it were performed by an in-person ensemble. However, even with the most realistic-sounding of these products, a fundamental issue still exists: authenticity. Choral pieces are written with the intention that they are performed by a group of singers singing together in the same physical space at the same time. The interactions among the performers and with the space are complex and sometimes subtle – even with exactly the same singers performing in the same place, no two performances will be exactly alike. Therefore, how can a recording constituted of parts sung separately ever authentically represent an in-person performance? It’s impossible to predict exactly how one singer would have reacted to the timing, volume and tone of an entry by another singer, or exactly how they would have jointly reached a consensus interpretation of a performance direction, without having them perform in the same room. It would be easy to argue that the creation of asynchronous recordings necessarily erodes the expression of individual singers in order to manufacture the timing, tuning and blend of an in-person performance, and thus loses something of the performers’ emotional interpretation of the piece. However, I would suggest that isn’t quite the case. By making all the tiny artistic decisions involved in the minutiae of editing the parts together, I would argue that the audio engineer is adding in their own creative interpretation of the music. Therefore, while not quite the same as an in-person performance, a successful multi-tracked recording can still contain a similar level of emotional response to the music – it’s just that it stems in part from a different source. In an ideal case, if the recordings are already fairly homogenous and the technical limitations minimal, I believe the artistry of the composer, performers and editor can all be authentically represented, actually increasing the absolute level of authenticity and emotion in the final product. Whether or not any of the recordings we’ve released so far come close to this ideal, I’ll leave up to you to decide!
Comments