Skip to main content

Sensory Mediation: Searching for Equality in Audiovisual Composition

Published onOct 06, 2020
Sensory Mediation: Searching for Equality in Audiovisual Composition


Sensory Mediation: Searching for Equality in Audiovisual Composition Bryan Dunphy, Goldsmiths, University of London.

This paper looks at the practice of generative abstract audiovisual composition and identifies some areas of interest for the audiovisual composer. Abstract audiovisual composition is the act of creating a work of art that explores the interaction between non-representational visual and sonic stimuli. This form of composition has no codified compositional method or extensive vocabulary; however, there is much exciting work being done internationally to address this problem. As such, the aim of this paper is to contribute towards the evolution of an audiovisual vocabulary that will act as the foundation for further artistic practice.


This paper proposes concepts and approaches to audiovisual composition that are unique to the discipline. In doing this, I hope to contribute to the growing body of literature and research that currently surrounds audiovisual art. My motivation for exploring the theory of audiovisual composition arises out of my personal artistic practice and experience. When approaching an audiovisual piece, what artistic principles guide the audiovisual composer? Are there any common principles existent in, or suggested by, contemporary literature and work that can be identified to aid the audiovisual composer in their practice?

The main focus of an audiovisual work is the interaction between audio and visuals. This is what makes the art form unique and is where meaning is found and created within the practice. In this paper, I will use the term audiovisual composition to refer to the temporal arrangement of audio and abstract visual elements. When I say abstract visuals, I mean that they are entirely non-representational (see Figure 1) and are either animated or computer generated. I use the term generative to describe my own work in this paper. Jon McCormack and Alan Dorin define generative art as “art that uses some form of generative process in its realization.”1 This definition is rightfully open to generative forms of practice that don’t necessarily involve a computer. Bret Battey acknowledges that the “term ‘generative art’ is subject to numerous definitions.”2 He then specifically contextualises his own usage as “another term for art (visual, music or other) that involves an artist coding and manipulating algorithms as part of his or her process” (see Figure 1).3 When I talk about generative audiovisual art, I am speaking in these terms.

Figure 1. Stills of abstract visuals from my piece Ventriloquy I (2018)

Mick Grierson describes his audiovisual art practice as a “metadiscipline” that combines several artistic fields of practice.4 I find this conceptualisation useful. However, it also implies the idea of a meta-framework for composition. Treating audiovisual composition as a distinct discipline establishes a need for a vocabulary and framework of its own. This is not to deny the influence of musical theory and more traditional musical structures, or to sever any relationship with methods common to fields such as film or the visual arts. Rather, this approach attempts to provide meaning associated with the relationships between audio and visuals. In this way, it can complement the established musical and visual art frameworks that artists are already familiar with, whilst at the same time contributing to the emergence of an artistic vocabulary specific to audiovisual art.

Audiovisual composition

Audiovisual approaches throughout the centuries have taken several forms. A particularly popular approach, synonymous with the colour organ tradition, was to map the notes of musical scales to particular hues of colour. Examples of this approach include the instruments created by Louis-Bertrand Castel (see Figure 2) and Alexander Wallace Rimington in 1734 and 1893 respectively.5

Figure 2. Charles Germain de Saint Aubin, “Caricature of Louis Bertrand Castel’s Ocular Organ,” in “SEEING/ SOUNDING/ SENSING: CAST SYMPOSIUM IN CONTEXT,” article by Sharon Lacey, posted 19 September, 2014,

A more recent example is the virtual colour organ created by Jack Ox that utilises complex colour and textural mapping strategies to visualise musical compositions.6 This approach can produce rich and complex artistic expressions. The mapping of particular tones to hues of colour has been attributed to artists’ experiences of syntheseia by some. Richard Cytowic defines synesthesia as “the involuntary physical experience of cross-modal association. That is, the stimulation of one sensory modality reliably causes a perception in one or more different senses.”7 As noted by Mitchell Whitelaw, neurological synesthesia is rare: “Auditory-to-visual synesthesia, ‘colored hearing’ is much rarer still.”8 Whitelaw further notes that Wassily “Kandinsky and composer Alexander Scriabin seem to have experienced it, while many other artists have been inspired by, or in some cases literally borrowed, synesthetic correspondences.”9 Whilst the concept of synesthesia has inspired many artists to combine audio and visual material in rich and interesting ways, each mapping strategy tends to be subjective and can only be utilised effectively by the individual artist. Whitelaw acknowledges that “when it comes to practically manifesting that sensory relation it founders on the problem of the map, the pattern of correspondences.”10 Whilst synesthetic associations between tone and colour are extremely rare and vary from person to person, studies have been conducted that show there are certain universal, non-arbitrary connections between sight and sound. These are grouped together under the term cross-modal correspondences.11 Charles Spence defines the term as “a compatibility effect between attributes or dimensions of a stimulus (i.e. an object or event) in different sensory modalities.”12 A famous example of an audiovisual cross-modal correspondence is the “bouba/kiki” effect (see Figure 3) where the vast majority of people will match the word kiki with an angular figure and bouba with a rounded figure.13

Figure 3. Kiki and Bouba. By Andrew Dunn (2004). (accessed 30/07/2020).

John Whitney Sr. is responsible for developing an approach to the composition of audiovisual material using his theory of differential dynamics.14 His compositions Matrix III (1972) and Arabesque (1975) were composed according to this approach.15 Whitney was motivated by a “search for a coherent idea of abstract composition inspired by the rules of Pythagorean harmonics.”16 Utilising these ratios, he was able to create visual animations imbued with musical movement. His ultimate goal was to build a digital instrument where he could simultaneously generate audio and visuals based on his theory. During the 1980s, he collaborated with Jerry Reed to create this instrument which he called RDTD.17 It was his belief that “sound and image composed on the same digital instrument will have totally revolutionary consequences.”18

Bill Alves recalls that he was “privileged to work with the computer animation pioneer John Whitney Sr. and was profoundly influenced by his ideas on how to apply musical concepts of harmony to visual arts of motion.”19 Adriano Abbado identifies Whitney’s work as being responsible for leaving “indelible marks on the history of visual music,” an area of audiovisual practice with an established history that includes the abstract films of Oskar Fischinger, Mary Ellen Bute and Norman McLaren to name only a few.”20 In particular, Jaap Guldemond, Marente Bloemheuvel and Cindy Keefer identify Fischinger as a pioneer of the form, stating that he “paved the way for an art form that came to be known as Visual Music.”21 Abbado recognised the difficulty in strictly defining the practice when he says that there “are different notions of what constitutes visual music.”22 One definition comes from Maura McDonnell, who states that a “visual music piece uses a visual art medium in a way that is more analogous to that of music composition or performance. Visual elements (via craft, artistic intention, mechanical means or software) are composed and presented with aesthetic strategies and procedures similar to those employed in the composing or performance of music.” 23

One of the reasons Whitney’s approach has proven so popular is because it is not dependent on his own subjectivity. Instead, it is grounded in mathematical ratios that provide universal meaning due to our lived experience of tonal harmony. However, within the context of contemporary generative work, it is necessary to also account for inharmonic sounds and movement and how they might be compositionally arranged.

As an audiovisual practice that is inherently influenced by multiple fields, it is perhaps unsurprising that film theory has proven a fertile ground for audiovisual practitioners to draw inspiration from. Michel Chion’s writing on sound in cinema has been especially influential over the last two decades.24 His concepts have informed several approaches to audiovisual composition including those by Grierson, Adam Basanta, John Coulter and Myriam Boucher and Jean Piché.25 Whilst he is not the first person to recognise that the combination of audio and visual media produces an effect that can be described as more than the sum of its parts, Chion’s coinage of the term “added-value” to describe this phenomenon has remained in use.26 In fact, Grierson describes his own practice of audiovisual composition as the “process of composing audiovisual works which exploit added-value.”27 In doing this, he borrows Chion’s terminology and re-contextualises it as a central concern of his abstract practice. The identification of a core compositional aim such as this productively contributes to the development of an audiovisual vocabulary. Some abstract audiovisual practitioners will not be aware of Chion’s terminology, as it is situated first and foremost in the field of film theory. However, the re-contextualisation of his ideas demonstrates the permeable boundaries between many audiovisual practices. The development of a compositional vocabulary like this, that is specific to abstract audiovisual art, can provide direction and meaning for artists. Their work can then be discussed on its own terms rather than relying solely on musical or visual based frames of reference.

Chion is also responsible for identifying a particular case of added-value which he calls “synchresis.” He defines synchresis as “the spontaneous and irresistible weld produced between a particular auditory phenomenon and visual phenomenon when they occur at the same time.”28 Boucher and Piché use Chion’s concept of synchresis quite successfully within their Sound/Image Relationship Typology. Here they specify compositional uses of synchresis, diegesis, time and narration within a vidéomusique context. Diegesis here “refers to the believability of the link between the artificial or real sound and the image it is purported to issue from.”29 Their identification of synchresis as a defining principle of their work cements the concept as a primary concern of the vidéomusique artist. The concepts of synchresis and added-value are important to artists working across several areas of audiovisual art. They are also relevant to the ideas I am discussing in this paper, which have arisen out of a generative abstract practice. 

Audiovisual Equilibrium

A recurring sentiment within audiovisual discourse is the desire to give equal importance to the composition of both the audio and visual elements of a piece. The frequency of this concept in scholarship on the subject, as well as interviews with audiovisual artists, indicates the importance of the issue for practitioners within this discipline. Cornelia and Holger Lund consider the balance between sound and image a fundamental concern of their definition of visual music where they state that the basic objective of the practice is to achieve “evenly balanced or equilibrated interplay between visual and acoustic components.”30Aimee Mollaghan states that John and James Whitney regarded the equality between audio and visual elements in their Five Film Exercises (1943-1944) as a core direction for original audiovisual compositions. She states that they “were adamant that their films should be original audiovisual compositions in which the sound and image shared an equal partnership.”31 However, even if this is their intention, their approach to the Five Film Exercises were not always interpreted that way. Andrew Birtwistle sees the Whitney Brothers’ optical process as a primarily visual practice where the sonic elements are in constant subjugation:    

With the Film Exercises the sonic is allowed into the Whitney project, but on condition that it is shaped, moulded, driven and curtailed by the visual. The film certainly proposes an audio-visual synthesis, but one in which the sonic is absorbed by the primary term of the audio-visual contract, which despite word order remains the visual.32

Birtwistle’s analysis of the Whitney Brothers’ process illustrates the difficulty for the audiovisual composer in attempting to treat audio and visual material equally. Diego Garro acknowledged the general primacy of sight over sound in human perception. He argued that upon experiencing an audiovisual work, the audience must be conscious of the fact that audiovisual artists “hold the primacy of both ear and eye together as their artistic credo.”33 Ryo Ikeshiro’s live audiovisual pieces, Construction in Zhuangzi (2011) and Construction in Kneading (2013) are based on the simultaneous creation of audio and visual elements from a common source of data. He calls this approach “audiovisualisation,” and states that it ensures a non-hierarchical sensory structure in his pieces as “the moving image is no longer a score for performers but intended to be experienced in tandem with the sound.”34 Holly Rogers describes the simultaneous experience of sound and image as a “holistic form of engagement,” citing Norman McLaren’s Synchromy (1971) as an example in which he “achieves a single audiovisual voice where neither sound nor image can successfully be extricated from the other.”35 This evidence points to the desirability and importance of equality between the audio and visual material within an audiovisual composition. It indicates that audiovisual equilibrium, as I am calling it here, is a worthwhile avenue of conceptual development within audiovisual art. 

At its core, the principle of audiovisual equilibrium establishes the hierarchical treatment of audio and visual media as an important force responsible for the experience of tension, release and added-value across the duration of a composition. I define the principle as follows: 

in an audiovisual composition, the material should be generated in such a way as to eliminate or minimise any absolute hierarchy between the audio and visuals. Further, the composition of the generated material should be sensitive to the perception of the audioviewer such that for the majority of a piece neither sense dominates the other.

This definition implies two areas where equilibrium within an audiovisual composition should be considered. Firstly, the system or process used to generate and map the material should be designed to ensure there is no absolute hierarchy of material. Secondly, the arrangement of the material itself should make use of methods that allow the audience to experience added-value arising from the interaction of audio and visuals.

In order to further illuminate the concept of audiovisual equilibrium I will briefly consider an example outside of the field. Mainstream documentary filmmaking is an example of popular media that adheres to audiovisual equilibrium in a certain way. Similar to Ikeshiro’s “audiovisualisations,” the documentary film-maker utilises the central narrative as their source of data from which they build the spoken audio and visual elements.36 Narration, interview questions and visual footage are composed meticulously to communicate the narrative to the audience in such a way that they will become emotionally involved. Audio and visual elements here play an equal role in communicating the story that the director wants to tell.

Yet, some audiovisual works may have moments where one modality dominates the other. During these moments, the potential for added-value is minimised as the perception of the audience is dominated by one side over the other. In order to maximise the potential for added-value experiences, an audiovisual composition should attempt to be as modally balanced as possible. The potential for added-value experiences to occur is increased during a modally balanced section, as the audience can perceive the interaction between the audio and the visual material, rather than one mode over the other.

However, it must be stated that even in the event of achieving audio visual equilibrium, an added-value experience is not guaranteed. This further depends on the nature of the interaction and the suitability of the material. The latter is closely related to the concept of isolated incoherence, which will be discussed below. Following from this, a conceptual relationship may be represented as follows:

longer periods of audiovisual equilibrium will lead to a higher potential for added-value experiences.

Mapping Hierarchy

When considering audiovisual equilibrium, the technical way in which the audio and visual material is generated, and consequently mapped, becomes an important stage in the composition. The strategy employed in the mapping of parameters must avoid introducing any absolute hierarchy in the material. By absolute hierarchy, I mean that it would be preferable to implement several mapping techniques rather than relying on only one. The system used could allow for the possibility of both audio-reactive visuals, sonification of visual elements and control of material from a third data source. This is important, as the presence of an absolute hierarchy between the audio and visuals for the entire duration of the piece will ultimately restrict the control of audiovisual equilibrium within the piece. The system used could allow for the possibility of both audio-reactive visuals, sonification of visual elements and control of material from a third data source. This is important, as the presence of an absolute hierarchy between the audio and visuals for the entire duration of the piece will ultimately restrict the control of audiovisual equilibrium within the piece. 

Consider a generative audiovisual system programmed in such a way as to rely entirely on the direct mapping of audio features to the movement of visual content. A simple example of this would be any basic music visualiser, such as the one included with Windows Media Player. Here, the entire visual structure and movement is derived from the audio content, thus demonstrating absolute hierarchy. As a further example, consider a system that relies completely on sonification of visual parameters, such as Ivan Kopeček and Radek Ošlejšek’s GATE (Graphics Accessible To Everyone) framework.37 The purpose of this system is to represent the visual data as sound in order to assist visually impaired people. In this case, there would be no interaction between the media, which would be desirable in an artistic context such as audiovisual composition. Stephen Callear states that systems such as these result in a “dominant” and “dependent” relationship between the two modalities.38 In order to retain control of audiovisual equilibrium, a system that depends entirely on a dominant/dependent relationship should be avoided. Sound is the dominant modality in the example of the example of the Windows Media Player; the visuals are an illustration of the analysed audio signal. Systems such as this generally allow for clear connections between the audio and visual elements, which makes them successful as music visualisers. Alternatively, visual material is the dominant modality in the GATE framework. The audio is functional here, in that it provides information that visually impaired people may not be able to see. However, when talking about audiovisual composition, the ability for the composer to manipulate the audience’s experience of audiovisual equilibrium is restricted if the system only allows for mapping of parameters between modalities in a single direction. For example, the composer would not be able to influence the morphological alignment of audio and visual elements, which has been posited by Simon Katan as a possible avenue of interesting experimentation “involving the temporal separation of sonic and visual events.”39 For an audiovisual composer to restrict themselves in this way would be to restrict the expressive potential for their work and would therefore be undesirable.

Figure 4: Still from Ventriloquy I (2018).     

My own compositions Ventriloquy I (see Figure 4) and Ventriloquy II (2018) were built on top of a system using this approach.40 My aim was to create an underlying architecture that would allow me to generate audio and visual material and manipulate it simultaneously in real-time. I used a neural network to map my controller data to pre-selected audio and visual parameters. A diagram of the system is shown in Figure 5 below:     

Figure 5: Neural AV mapping system for Ventriloquy I & II.

The vast majority of the mapping was done by the neural network. I hoped to eliminate as much mapping hierarchy as I could in this way, so that the audio was not derived from the visual material and vice versa. They are both generated independently but manipulated simultaneously. In this way, I hoped to avoid the dominant and dependent relationships identified by Callear. 

Perceptual experience

When discussing the audience’s perception of audiovisual equilibrium, I am referring to the tendency for the audioviewer’s focus to alternate between the audio and visual material. Perfectly balanced equilibrium would indicate that the audioviewer is not consciously focussed on either modality, but instead, they allow the material to cognitively bind in their perception as an audiovisual whole. The aesthetic choices and mapping approach of the composer will affect the audioviewer’s sense of equilibrium. This is the material that the audience is presented with, as opposed to the hidden architecture of the system as discussed above. In the act of composition, the artist should be conscious of how the material will affect the perception of the audience. 

Certain cognitive characteristics affect our perception in various ways so as to automatically give primacy to either sight or sound in a given situation. Adriana Sá provides an in-depth discussion of what she calls “sensory dominance” and provides strategies for minimising the natural dominance of sight over hearing so that sound can be brought to the forefront of the audience’s perception during a live audiovisual performance.41 She identifies two situations in which sense of sight will automatically dominate the sense of hearing. The first situation arises when there are extreme discontinuities within the visual material, whilst the second situation occurs when there is a strong correlation between the visual material and the source of the audio.42 That is, when the sound and the image conceptually match, the sound can be easily perceived as belonging to the visual object. The two scenarios identified by Sá cause the audience’s visual attention to become perceptually dominant. This in turn clouds their perception of the audio. For Sá, this is undesirable as she is primarily a musician and therefore requires the visuals to augment the audio without overpowering them. 

However, in the context of this discussion, my motivation is to identify concepts specific to audiovisual composition rather than musical composition. Therefore, I feel that the conceptualisation of audiovisual equilibrium is appropriate here as it implies a balance between the senses in a non-hierarchical way. Further, this balance can be purposefully swayed by the composer to favour either the audio or the visual. Sá’s work suggests that this purposeful control is certainly possible.

The following excerpt from Ventriloquy I provides an example of how audiovisual material can slip in and out of equilibrium.43 Due to the repetitive noisy movement in the visuals and the speed of the audio oscillations, the audio and visuals move in and out of synchronisation. It is an example of what Nicholas Cook calls “ventriloquism, ” when the visuals adopt rhythmic qualities from the audio.44 This is also an example of synchresis and its adherence to “the laws of gestalt psychology.”45 Ventriloquy I begins with the audio and visuals in sync, and at this point, I argue that the materials are in a state of equilibrium, or balance. As you are watching the movement and listening to the sound, the tight synchronisation appears to drift, causing the audio and visual streams to separate slightly. The balance of material becomes unsteady as you focus on trying to see if the material is in sync, at first contemplating the visual movement, then switching focus to the audio rhythm. At 0:08, the object morphs into a new shape. At this point, the audio rises in pitch and the visual movement appears to speed up. The audio and visuals combine to form a single perceptual object, therefore restoring the balance. This example relies on synchresis and audiovisual ventriloquism to highlight how audiovisual material can slip in and out of equilibrium in a temporal way. With this approach, I found it easy to perceive the shifting balance. However, audiovisual equilibrium can also be affected by other factors, such as the structural completeness of the media, and the relative complexity of the material.          

Video 1. Bryan Dunphy, “Ventriloquy (excerpt 2),” n.d., accessed June 24, 2020,    

Isolated Incoherence

Isolated Incoherence is a concept that has emerged through discussion, study and personal contemplation on the nature of audiovisual work. This concept states that neither the audio nor the visuals should be structurally coherent when experienced in isolation. The relative coherency of the material in each sensory mode will have an effect on the audiovisual equilibrium of the piece. When assessing audiovisual work, the audioviewer can ask themselves:

  • Could I listen to the audio on its own and be satisfied that I have experienced a fully developed piece of music or sound art?

  • Could I watch the visuals and enjoy them as a fully developed work in their own right?

I argue that if either element can be isolated from the other and experienced as a self-sufficient work in its own right from start to finish, it weakens the overall audiovisuality of the piece. I am defining the concept of isolated incoherence as follows:

in an audiovisual work, the audio and visuals should be completely dependent upon each other, such that they are not perceived as structurally complete entities when experienced separately.

When considered in relation to added-value, this concept seems logical. If the intentional combination of audio and visual material results in an experience that can be said to be greater than the sum of its parts (i.e. added-value), then listening to, or watching each element in isolation will result in a reduced experience. Although the nature of added-value is such that the combination of an individually coherent visual work and an individually coherent musical work may well result in an added-value experience, it is my contention that the purposeful combination of individually incoherent, or ambiguous, media will be more likely to lead to a stronger audiovisual whole. 

Throughout an audiovisual work, there may be sections where the audio and visuals are individually more coherent before they return to a state of ambiguity. This will alter the sense of audiovisual equilibrium. The relationship between congruent/incongruent material and tension/release is a common theme in audiovisual theory, for instance in the writings of Grierson and Callear.46 The very idea of an “audiovisual language” based on correspondence and interaction between audio and visual elements implies a mutual dependence between the two elements.47 If this dependence is absent, throughout the course of a whole piece, then it would be difficult to analyse the piece as an audiovisual composition as the term is understood here. If the audio or visual material are coherent on their own, then it could be argued that they are less likely to exhibit any dependence on additional material in any other sensory modality.

If the audio from an audiovisual composition can be listened to and understood without its visual counterpart, then why have the visual counterpart at all? In this scenario, the visuals may cause the piece to become situated in the world of music videos. Similarly, if the visual element of an audiovisual composition can be viewed and understood in isolation from the audio, it is more akin to an animation or film where the music acts as a soundtrack in support of the visuals. If we consider the example of the documentary film discussed above, we can see that this form of media adheres to the concept of isolated incoherence. If you divorce the script or visuals from each other, your comprehension of the narrative would likely be severely diminished. Further to this, the scripted, spoken audio track and the visuals are mutually dependent on each other.                

This example highlights an interesting characteristic of isolated incoherence. Abstraction frees both the audio and visual elements from representation, inherent association and narrative. It is because of this that the abstract audiovisual composer must work harder to ensure that both sensory modalities work together. With narration and footage in documentary film-making, isolated incoherence is guaranteed. With abstract audiovisual composition, this is not the case.      

A composer that is conscious of audiovisual equilibrium will be more likely to adhere to the concept of isolated incoherence, thereby ensuring a strong audiovisual expression. However, it is also possible for the visuals and the audio of a piece to be considered equally by the composer, but to also survive as independent entities upon critical analysis. It is my argument that a piece which demonstrates isolated incoherence is more likely to be perceived as a stronger audiovisual expression.

The concept of isolated incoherence is in direct contradiction to the philosophy of Jean Piché and his practice of vidéomusique who argued that the audio and visual elements should have separate, coherent identities. When speaking about Sieves (2004), Piché explained that for him the music should be able to exist as a coherent piece on its own.48 Piché states that this is to differentiate the audio in an audiovisual composition from the soundtrack to a film. If the film soundtrack were to be severed from the film “the music loses its reason to be.”49 Interestingly, this would indicate that film soundtracks exhibit isolated incoherence. He states that the visual element should be able to work well on its own: however, when the two are combined, the audioviewer should “get the immediate impression that one cannot be without the other.”50 This seems like a contradiction in the context of my discussion here. If the audioviewer was under the impression that one element could not exist without the other, then surely in isolation, each element would seem incomplete. 

Compositional Examples

In order to illustrate the concept and unpick some of the issues that arise from it, I will now discuss one of my own pieces and some contemporary work viewed through the lens of isolated incoherence. The contemporary works discussed below include This City (2015) by Mark Eats, Cyclotone III (2015) by Paul Prudence and Moon Drum (1991) by John Whitney.

Figure 6: Performance of Ventriloquy II at SIML, Goldsmiths, University of London.

In Ventriloquy II (see Figure 6), I attempted to create a sense of isolated incoherence by using harsh audio textures and ambiguous visual shapes. If you close your eyes and listen to the audio, there is often no discernible structure or progression with expected resolution. This is the advantage of the noise aesthetic in this context. The visuals alternate between agitated semi-visible shapes and chaos at different points in the performance. At 2:38, the visuals transition from the opening star-type figures to chaotic ambiguous colours before consolidating again into a semi regular form at 3:03. During this chaotic passage, the audio and visuals lose their tight bond. This suggests that if both audio and visuals are simply chaotic noise, then there has to be a connecting tissue to keep them together. Again, we return to an issue of balance. If the whole piece were to consist of formless noise, then it would lose the interest of the audioviewer quite quickly. At 3:03 a semi-formed object returns and reasserts the audiovisual bond. However, if either of the media streams are too well formed, there is a risk that the material will dominate the other senses. The section beginning at 5:36 succumbs to this. Two solid shapes emerge from the previous section and dominate the perception. Audiovisual equilibrium here is skewed towards the visual. I try to break this by using a sharp punctuating figure at 6:46 which reasserts the position of the audio. During the composition of this piece, I found that I instinctively arranged it according to the visual forms. This would suggest that I let the visual material dominate. Unfortunately, this may be the case here as I realised on reflection that throughout the piece, the visual palette is much richer and complex than the audio palette. This led me to the realisation that the relative complexity of the material in each modality can affect the audiovisual equilibrium. Indeed, the various ways this balance could be realised is observable in the following three audiovisual works.     

This City (2015) by Mark Eats demonstrates an example whereby the audio could exist sufficiently on its own.51 Analysing this piece within the context of isolated incoherence means that the audiovisuality of the piece isweakened. The visuals are representational and show a network of roads with cars, streetlights and traffic lights. The music is performed live by Mark Eats on a range of synths and midi controllers. Certain parameters of the audio are mapped to the visuals and affect them in real time. For instance, as he opens the filter on his Sub39 synth at 1:18, the cars lose gravity and float into the sky. They hang there weightless as the music builds tension underneath. An ascending scale reaches the leading note before resolving on the tonic as the cars drop back onto the road at 1:40 – a visual representation of the drop that is a staple of electronic music. This build up and resolution of tension is crafted well in this instance. However, this is a fully formed musical piece in and of itself. It follows its own chord progression and obeys the laws of tonal harmony. The correspondences here between the music and the visuals are transparent one-to-one mappings and therefore demonstrate a clear cause and effect relationship. For example, at 2:15 the performer introduces a delay effect into the music. The visuals at this point become blurred with the cars leaving white and red trails across the screen. Although these mappings are effective, the self-contained nature of the audio and visual material prevents me from completely accepting their absolute dependence on each other. In particular, the music is commanding the attention and could work just as well on its own. I would argue that this piece could successfully be described as a musical composition with supporting visuals, and it works very well in this context. However, if we were to analyse it in terms of audiovisual composition, the structural strength of the music would work against it. Indeed, the artist here is not working in an abstract context so such a critical analysis may be unsuitable here. Nonetheless, this serves as a valuable illustration of the idea of isolated incoherence.  

In order for an abstract audiovisual composition to be successful, it needs to demonstrate a fundamental connection between the audio and the visuals. In relation to isolated incoherence, I argue that representational visuals lend themselves more readily to narrative, so in the absence of one they can be structurally incoherent. However, popular music that is tonal and follows a regular beat has a very strong structure, as is the case with the piece above. The music is too dominant to allow a close structural bond with the visuals. It skews the audiovisual equilibrium toward the music for the entire piece. Therefore, I feel that it is more difficult to create audiovisual compositions using tightly structured and narrative material. Using these examples from outside the context of abstract audiovisual composition reinforces the point that isolated incoherence and audiovisual equilibrium are unique problems that the audiovisual composer needs to address. This helps to refine and focus the task that the audiovisual composer is faced with when approaching their work.

Paul Prudence’s Cyclotone III (2015)provides a useful case study in a generative abstract audiovisual composition.52 The visuals are abstract monochrome shapes, mainly rectangles, arranged in various circular and spherical formations. The audio is made up of mechanical clicks and machine-like noises. The audio is very tightly synchronised in parts to certain visual movements creating strong synchretic audiovisual correspondences. Also, there are less tightly synchronised ambient sounds that align themselves metaphorically with the floating characteristics of the spheres, and the smooth movement of the circular arrangements. When experiencing this piece there is a strong sense of audiovisual dependence. On closing my eyes, the audio loses a certain amount of structure and meaning, becoming more ambiguous. This suggests that the audio here would not be maintained as a coherent piece in isolation. The visuals would be more suited to isolation than the audio as they are quite solid structurally, however the audio certainly imbues the visuals with an ethereal yet mechanical personality that enhances and instils character in them. At the same time, the visuals lend the audio a definite structure and direction. I would argue that the isolated incoherence experienced within this piece unites the elements into a strong audiovisual entity that is experienced as more than the sum of its parts.

John Whitney’s Moon Drum (1991), the first in a series of twelve pieces inspired by Native American culture, is a work where the visuals could be argued to represent a coherent expression and the audio could be argued to represent a more incoherent expression.53 The series as a whole is a substantial addition to Whitney’s catalogue and belongs to his later output. As such, it represents the culmination of decades of work with differential dynamics. It is an important link in the evolution of visual music practice towards utilising the computer as the main creative tool, and could be seen as a direct link between contemporary generative work and the film based visual music tradition. Whitney composed both the visuals and music for this series of works thereby realising his dream of having a system that allowed for the audio and visuals to be composed together. Previously he made a conscious decision to only concentrate on his visual practice.

For the time being, I elected to put aside the musical problem as it bore along my own long-term plans while I would concentrate upon new prospects for optical differential dynamics. I would settle for whatever music I might find for each new graphic composition since my optical studies were the immediate challenge.54  

The audio in the first Moon Drum section is made up of flat midi drum samples and some basic synthesis which severely limits the audio aesthetic. The visuals are rich and diverse. They could be said to form a coherent whole, but the relative flatness of the audio detracts from them, thereby weakening the overall audiovisuality of the piece. This is an interesting interaction between the audio and visual elements. The strong visual material is capable of existing on its own, but the audio material would not constitute a fully developed musical statement in isolation. According to the logic I have outlined above, it would seem that the less complete audio material would be more desirable than the more complete visual material. However, the audio here seems to weaken the piece as opposed to the visuals. In this situation, we see an inversion of the logic of isolated incoherence. By adhering to the principle of isolated incoherence, the composer would aim to create both audio and visuals that are incapable of existing on their own merits. However, in the case of Whitney’s Moon Drum, it is the more incomplete modality that is perceived as weakening the audiovisuality of the piece. I argue that this is actually down to another aspect of audiovisual equilibrium that I have yet to explore fully. That is audiovisual complexity. The reason that the audio in Moon Drum weakens the piece is because it is not as rich in technical or aesthetic depth as the visuals.  

Following from this, I can now add a caveat to the definition of isolated incoherence. For the concept of isolated incoherence to be fully exploited, audiovisual complexity needs to be balanced. That is, both the audio and visual material should be appropriately complex in relation to each other. If one side is less complex than the other, it will create an undesirable imbalance in the overall equilibrium of the piece. Once the development of both audio and visuals are deemed to be on an appropriately equal level in terms of aesthetic richness, the composer can then potentially manipulate the perceived coherence of each media stream.

Cook’s concept of “gaps” in a media stream is a useful way to visualise the concept of coherency within the material. He states that a media stream is gapped where there is an “implication but not realization,” or “absence of closure.”55 It is this quality that makes either the audio or visual material incoherent when analysed in isolation. The incoherent nature of the material increases the potential for interaction with audiovisual content. An autonomous, fully realised musical or visual work will leave no room for interaction with complementary material. Cook states that any example of multimedia where “one or more of the constituent media has its own closure and autonomy is likely to be characterised by contest.”56 By “contest,” Cook means “the sense in which different media are, so to speak, vying for the same terrain, each attempting to impose its own characteristics upon the other.”57 Although Cook sees this state of contest as desirable, I would argue that within the context of audiovisual composition, it is undesirable as it may prevent sufficient connection between the material. An audiovisual work characterised by contest will mean that the sense of audiovisual equilibrium is always either skewed towards the audio or visual side. Considering these ideas, the conceptual model stated above may be adjusted like so: 

mapping hierarchy, sensory dominance, isolated incoherence and audiovisual complexity all affect the perception of equilibrium in an audiovisual piece. Longer periods of equilibrium will lead to a higher potential for added-value experiences.


The above concepts suggest aesthetic and compositional directions that may be useful to the audiovisual composer. There is much work to be done to develop our understanding of what is happening when we combine audio and visuals in a compositional manner. For instance, the concept of audiovisual complexity needs to be more thoroughly explored in relation to how it affects isolated incoherence, and how it might affect other aspects of the composition.

In addition to concepts such as audiovisual equilibrium and isolated incoherence, there are other factors to consider when creating audiovisual work. These include spatiotemporal alignment, mapping complexity, semantic or metaphorical alignment, automatic cross-modal correspondences and Gestalt principles. How do all of these factors relate to each other? In what ways do they affect each other and ultimately how do they affect the audience’s experience of the final work? What analytical frameworks could account for all of these elements?

The above discussion outlines some early theoretical thinking about the forces at work within abstract audiovisual compositions. My own work is generative in nature, which allows for access to system architecture and mapping through coding. These techniques and ways of working have certainly influenced my ideas on audiovisual work. It can be said that artists working in another area of audiovisual art may not find these concepts useful. However, in saying that, I have attempted to conceptualise the above discussion in a general way, to allow for a level of flexibility. Perhaps the focus I have placed on the concept of compositional equality between audio and visual material will encourage further discourse toward how this could be achieved. In addition, the concept of isolated incoherence itself needs further refinement, and the outlining of more concrete steps toward this would certainly be useful. The continuing evolution of audiovisual art is an exciting and fascinating subject. As the practice continues to evolve, it remains important to develop the critical and analytical frameworks we use to explain and understand the developments that take place in practice. Whilst hypothesising can be potentially limited, there is much value in what is learned through discussion, debate and speculation. These understandings ultimately feed back into practice to stimulate further innovation within audiovisual art, which continues to thrive and evolve in compelling ways. 


Abbado, Adriano. “Perceptual Correspondences of Abstract Animation and Synthetic Sound.” Leonardo, Supplemental Issue 1 (1988): 3.

Abbado, Adriano. Visual Music Masters: Abstract Explorations: History and Contemporary Research. Milan: Skira Editore, 2017.

Alves, Bill. “Digital Harmony of Sound and Light.” Computer Music Journal 29, no. 4 (2005): 45–54.      

Basanta, Adam. “Shades of Synchresis: A Proposed Framework for the Classification of Audiovisual Relations in Sound-and-Light Media Installations.” EContact! 19, no. 2 (2017).

Battey, Bret. “Creative Computing and the Generative Artist.” International Journal of Creative Computing 1, no. 2/3/4 (2016): 154–73.           

Birtwistle, Andrew Brian. “Cinesonica : Sounding the Audiovisuality of Film and Video.” PhD diss., Goldsmiths, University of London, 2006.      

 Boucher, Myriam, and Jean Piché. “Sound/Image Relationships in the Context of Abstraction: Towards a Typological Proposition.” Paper presented at Seeing Sound, Bath Spa University, Bath, March 23-25, 2018.

Callear, Stephen. “Audiovisual Particles: Parameter Mapping as a Framework for Audiovisual Composition.” PhD diss., Bath Spa University, 2012.

Chion, Michel. Audio-Vision: Sound on Screen. Edited by Claudia Gorbman and Walter Murch. New York: Columbia University Press, 1994.

Cook, Nicholas. Analysing Musical Multimedia. Oxford: Clarendon Press, 1998.

Coulter, John. “Electroacoustic Music with Moving Images: The Art of Media Pairing.” Organised Sound 15, no. 1 (2010): 26–34.            

Cytowic, Richard. “Synesthesia: Phenomenology And Neuropsychology A Review of Current Knowledge.” Psyche: An Interdisciplinary Journal of Research on Consciousness 2, no. 10 (1995): 1–22.     

Garro, Diego. “From Sonic Art to Visual Music: Divergences, Convergences, Intersections.” Organised Sound 17, no. 2 (2012): 103–113.

Grierson, Mick. “Audiovisual Composition.” PhD diss., University of Kent, 2005.

Guldemond, Jaap, Marente Bloemheuvel, and Cindy Keefer. “Oskar Fischinger: An Introduction.” In Oskar Fischinger 1900 - 1967: Experiments in Cinematic Abstraction, edited by Cindy Keefer and Jaap Guldemond, 10–30. Amsterdam and Los Angeles: EYE Filmmuseum, Center for Visual Music, 2012.

Ikeshiro, Ryo. “Studio Composition: Live audiovisualisation using emergent generative systems.” PhD diss., Goldsmiths, University of London, 2013.

Katan, Simon. “Sight, Sound, the Chicken, and the Egg: Audio-Visual Co-dependency in Music.” PhD diss., Brunel University, 2012.

Keefer, Cindy, and Jaap Guldemond, eds. Oskar Fischinger 1900 - 1967: Experiments in Cinematic Abstraction. Amsterdam, Los Angeles: EYE Filmmuseum, Center for Visual Music, 2012.

Kopeček, Ivan, and Radek Ošlejšek. “Hybrid Approach to Sonification of Color Images.” In 2008 Third International Conference on Convergence and Hybrid Information Technology, 722–27. Busan: IEEE, 2008. 

 Lund, Cornelia, and Holger Lund. Audio.Visual: On Visual Music and Related Media. Stuttgart: Arnoldsche Art Publishers, 2009. 

McCormack, Jon, and Alan Dorin. “Art, Emergence, and the Computational Sublime.” Paper presented atProceedings of the Second International Conference on Generative Systems in the Electronic Arts. Monash University, Victoria. December, 2001, 67–81. 

McDonnell, Maura. “Visual Music.” eContact! - Videomusic: Overview of an Emerging Art Form 15, no. 4 (2014).

Mollaghan, Aimee. The Visual Music Film. Basingstoke: Palgrave MacMillan, 2015.

Ox, Jack, and David Britton. “The 21st Century Virtual Reality Color Organ.”      IEEE Multimedia, Journal of IEEE Computer Society 7, no. 3 (2000): 6-9.

Ox, Jack, and Cindy Keefer. “On Curating Recent Digital Abstract Visual Music.” Center for Visual Music, 2008. Accessed 23 June, 2020,     

Ramachandran, V. S., and E. M. Hubbard. “Synaesthesia - A Window Into Perception, Thought and Language.” Journal of Consciousness Studies 8, no. 12 (2001): 3–34.     

Rogers, Holly. “The Musical Script: Norman McLaren, Animated Sound, and Audiovisuality.” Animation Journal 22 (2014): 68–84.

Sá, Adriana. “A Perceptual Approach to Audio-Visual Instrument Design, Composition and Performance.” PhD diss., Goldsmiths, University of London, 2016.

Spence, Charles. “Crossmodal Correspondences: A Tutorial Review.” Attention, Perception, & Psychophysics 73, no. 4 (2011): 971–95.           

Whitelaw, Mitchell. “Synesthesia and Cross-Modality in Contemporary Audiovisuals.” The Senses and Society3, no. 3 (2008): 259–76.           

Whitney, John. Digital Harmony: On the Complimentarity of Music and Visual Art. Peterborough, New Hampshire: Byte Books / McGraw-Hill, 1980.

Whitney, Michael. “The Whitney Archive: A Fulfillment of a Dream.” Animation World Magazine, August 1997.

Media Cited

Crystalsculpture2. “John Whitney-Matrix III (1972).” YouTube video, 10:34. n.d. Accessed June 21, 2020.

de Saint Aubin, Charles Germain. “Caricature of Louis Bertrand Castel’s ocular organ.” In “SEEING/ SOUNDING/ SENSING: CAST SYMPOSIUM IN CONTEXT,” article by Sharon Lacey, posted September 19, 2014.

Dunphy, Bryan. “Ventriloquy – Bryan Dunphy.” YouTube video, 11:20. December 6, 2018. Accessed June 24, 2020.

Dunphy, Bryan. “Ventriloquy II – Bryan Dunphy.” YouTube video, 10:04. December 12, 2018. Accessed June 24, 2020.

Dunphy, Bryan. “Ventriloquy (excerpt2).” Vimeo video, 1:59. n.d. Accessed June 24, 2020.

Eats, Mark. “This City.” Vimeo video, 2:34. n.d. Accessed April 5, 2020.

Ikeshiro, Ryo “Construction in Keading Part 1 [excerpts].” Vimeo video, 4:27. n.d. Accessed June 23, 2020.

Ikeshiro, Ryo. “Construction in Zhuangzi 1.” Vimeo video, 14:30. n.d. Accessed June 23, 2020.

McLaren, Norman. “Synchromy.” Online video, 7:39. n.d. Accessed June 24, 2020.

Postingoldtapes. “John Whitney – Arabesque (1975) early computer graphics.” YouTube video, 6:44. n.d. Accessed June 21, 2020.

Prudence, Paul. “Cyclotone III.” Vimeo video, 2:15. n.d. Accessed April 5, 2020.

Whitney, John. “Moon Drum.” Online video, 1:38:59. n.d. Accessed April 5, 2020.


Bryan Dunphy is an audiovisual composer, musician and researcher interested in generative approaches to creating audiovisual art. His work explores the interaction of abstract visual shapes, textures and synthesised sounds. He is interested in exploring strategies for creating, mapping and controlling audiovisual material in real time. His background in music has motivated him to gain a better compositional understanding of the combination of audio and visuals. His recent work has explored the implications of immersive experiences on the established language of screen based audiovisual work. He is currently completing his PhD. in Art and Computational Technology at Goldsmiths, University of London.     

No comments here
Why not start the discussion?