The Posthumanist Synaesthesia: How Machine Learning and Virtual Reality Revolutionise Human Perception
Mandi Li, University of Amsterdam
Based on liberal humanism, humans, a sacred and exceptional species, hold a central position in the realm of perceptive experiences. This humanist narcissism constricts us into a subjective world and hides objective reality. However, the advent of artificial intelligence (AI) and virtual reality (VR) challenges this humanist view of perception. This research investigates the role of machine learning (ML) and VR in facilitating synaesthesia, a phenomenon where one sensory modality triggers sensations in another. Through a textual analysis approach grounded in media studies, two synaesthetic projects, ‘Paint with Music’ and ‘Cyberdream’, are examined. This analysis reveals a reciprocal partnership between AI, VR, and humans. More specifically, while ML and VR extend the boundaries of humans’ visual and auditory perceptions through machine-induced synaesthesia, ML and VR acquire training data from human users. This reciprocal relationship ushers in an era of posthumanism characterised by blurred cognitive boundaries between humans and machines, and the physical division between humans and their mechanical prostheses.
The whole vibrant cosmos surrounding humans is constricted into a barren world that consists of only five senses. In other words, a person’s world is defined by her capacity for sensing. Yet, humans have imagined different perceptive possibilities by leveraging their gifts of art and literature. For example, in The Great Gatsby, F. Scott Fitzgerald wrote “the orchestra is playing yellow cocktail music.”1 This description enables readers to both hear and ‘see’ the music in Gatsby’s lavish Long Island party. In the age of artificial intelligence (AI), however, synaesthesia is no longer a literary device, but a solid fact.
In this article, I investigate synaesthesia in the framework of posthumanism with two questions in mind: how do machine learning (ML) and virtual reality (VR) help humans experience synaesthesia? And what is the value of this synaesthetic logic? The advent of powerful AI, such as ChatGPT, and the skyrocketing number of discourses on virtual worlds render these questions relevant and important. Moreover, these questions examine the potential of human-AI cooperation at a time of high anxiety, when some imagine a dystopian future manipulated by robots. Indeed, the rise of AI has inflicted one of the biggest existential crises of the 21st-century, a fear that automation will replace humans, making them economically redundant.
Although these terrors of apocalyptic machines are psychologically understandable, this article finds a reciprocal partnership between AI, VR and humans by analysing two case studies of visual music based on ML and VR. In other words, on the one hand, human data helps train VR and ML systems to become more immersive and accurate; on the other hand, the mechanical fusion of visual processing and audio processing induces human synaesthesia, enabling human subjects to hear an image. Thus, my central argument is that ML and VR extend the boundaries of human perceptions (here specifically referring to the innate human senses of sight and hearing) through machine-induced synaesthesia, which ushers in an era of posthumanism. This era is characterised not only by the blurring of cognitive boundaries between humans and machines, but also by the conspicuous physical division between humans and their mechanical prostheses.
To make this argument, I will employ textual analysis to explore the structuralist meaning behind creative AI and VR products in the fields of arts and culture. Textual analysis in media studies defines texts as “not only written material but every cultural practice or product.”2 Using this definition, I will analyse technological products as texts by deconstructing “implicit patterns, assumptions and omissions” of media content.3 More specifically, I will deconstruct the dynamic interrelations between humans and technologies. However, a general limitation of textual analysis is its insufficient understanding of the contexts of production and audience.4 To address such a challenge, I will supplement the textual interpretation with technical explanations.
In the following sections, I will first provide a literature review on the concepts of perception, posthumanism, synaesthesia, ML and VR. I will then conduct a textual analysis of two synaesthetic projects based on ML and VR: Paint with Music and Cyberdream. Finally, I will discuss the posthumanist implications of these projects on liberal humanism.
Brian Rogers defines the term ‘perception’ in two ways. First, perception includes the human “experience of seeing, hearing, touching, tasting, and smelling objects and individuals in the surrounding world”; while it can also refer to “the processes that allow [humans] to extract information from the patterns of energy that impinge on [their] sense organs.”5 However, both definitions are limited by a liberal humanist perspective, which considers humans as the centre of the experience. Therefore, embracing a posthumanist perspective is particularly valuable in moving the conceptualisation of perception beyond human subjectivity to a broader ecosystem consisting of humans, machines, and their environments.
Posthumanists argue that “the boundaries of the human subject are constructed rather than given.”6 That is to say, as illustrated by N. Katherine Hayles, humans and any technology that can “compensate for [their] deficiencies,” or enhance their natural abilities, join in one single entity.7 For example, the glasses of myopic patients become a part of them as the glasses enhance their visual perception. This blurring boundary between humans and technologies is further supported by the school of cybernetics. For instance, in “A Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in the 1980s,” Donna Haraway points out the boundary breakdown between organisms and machines: “the certainty of what counts as nature” is severely weakened by the lively, “intelligent” machines.8
These cybernetic and posthumanist theories suggest the necessity to conceptualise perception from the perspective of human-technology collaboration. In the context of ML and VR, this paper defines perception as the processes of how humans utilise technologies to extract essential information about their surroundings, and during these processes, humans and technologies become one single system. A good example of such human-machine perception is synaesthesia.
Synaesthesia is the “neural condition in which stimulation of one sensory modality triggers involuntary sensation in another.”9 For example, through synaesthesia, one can hear a painting or see a song. Furthermore, this concurrent perception generally refers to “only two [senses] in an asymmetrical configuration: a primary … that activates a secondary.”10 In this paper, synaesthesia specifically refers to the neural condition wherein the primary sense of seeing activates the secondary sense of hearing.
Additionally, despite the vital role that 21st-century technologies play in shaping humans’ synaesthetic experience, it is critical to acknowledge synaesthesia is by no means a new phenomenon. In his book Thinking, Fast and Slow, psychologist Daniel Kahneman points out the associative mechanism of human brains: “ideas that have been evoked trigger many other ideas, in a spreading cascade of activity in your brain.”11 It is thus logical to extrapolate that one sense that has been evoked can trigger other associative senses simultaneously, resulting in synaesthesia. This associative synaesthesia is further substantiated by the “multi-modal effect” in audiovisual studies, which indicates that a healthy, conscious person might associate visual stimulation with auditory stimulation.12 Jonathan Weinel also suggests that the use of psychedelic drugs (for example, LSD) can enhance the synaesthetic experience by “[heightening] states of interconnectivity in the brain.”13
Furthermore, before we begin analysing synaesthetic projects based on ML and VR, it will be useful to provide a wider historical context for synaesthetic art. Between the 16th- and 19th-centuries, our ancestors used religious rituals to induce synaesthesia. For example, the deafening drumming in Haitian Vodou rituals “may contribute towards states of sensory overload, inducing trance states.”14 Following this, the prevalence of cinematography in the mid-20th-century gave birth to short films that visualise music.15 More specifically, such short films present symbolic footage that “reflects the rhythms, timbres, and melodies of music.”16 One such example is Walt Disney's Fantasia (1940), an animated musical anthology film consisting of several segments, each accompanied by classical music pieces. In these segments, the visuals of the film synchronise with the music, creating a visual narrative that enhances the emotional and dramatic aspects of the music. Further technological innovations came in the late 20th-century with home electronics, such as the Atari Video Music which was designed to generate equivalent images on TV in response to music.17 By analysing audio input and converting it into abstract patterns and effects, the Atari Video Music generated visuals that were characterised by geometric shapes and colour changes. It is worth noting a common thread running through these synaesthetic practices is that humans tend to leverage the most advanced technologies of the time to simulate the experience of synaesthesia. Moreover, the type of technological medium shapes how humans experience synaesthesia. As the following case studies show, this pattern continues to exist in the age of ML and VR.
As defined by Melanie Mitchell, AI is “a branch of computer science that studies the properties of intelligence by synthesizing intelligence.”18 However, as the meaning of “intelligence” is highly contested, AI lacks “a precise, universally accepted definition.”19 Given that the field of AI is extremely broad, my focus is on ML, a specific sub-discipline of AI that “[enables] computers to learn from data.”20 In the following analysis, I will specifically examine the ML model Differentiable Digital Signal Processing, which learns to generate music from the training data.
VR, meanwhile, has been described by Sang-Min Park and Young-Gab Kim as a technology that creates a virtual world “based on 360-degree images.”21 In this digital three-dimensional environment, the user is embodied by an avatar.22 One of the most advanced VR products currently is Meta’s Oculus Quest, which is also the technical basis of the following case study Cyberdream. According to the official website of Meta Quest, Oculus Quest consists of three parts: an optic headset that displays 360-degree videos; a touch controller that tracks the user’s hand movement in the VR space; and a processor that tracks the user’s input.23
Despite the long history of synaesthetic art, the advent of AI and VR revolutionises how humans experience synaesthesia. In this section, I will analyse two synaesthetic projects that represent some of today’s most cutting-edge technologies, while acknowledging that this is not an exhaustive list of audiovisual examples.
Paint With Music is an art project based on ML developed by the Google Magenta team. This project translates users’ paintings into the timbres of violin, saxophone, trumpet and flute. More specifically, “the Y position of the [sensorial canvas] would be translated into a pitch.”24 In other words, users hear higher pitches if they draw on the upper part of the canvases, while hearing lower pitches when drawing towards the bottom.25 These sensorial canvases consist of four genres that are inspired by nature: sky, ocean, street, and paper, and each genre generates a unique style of music (see Figure 1).26 An example of creation is the saxophone solo accompanied by birdsong, a product of the sky canvas (see Figure 2). This piece of music associates the visual elements of the sky, such as clouds, birds, and stars, with their symbolic audio.
From a technical perspective, Paint with Music is based on Differentiable Digital Signal Processing (DDSP), a ML system that translates visual inputs into audio outputs. As Figure 3 demonstrates, DDSP first uses the neural network to recognise auditory patterns in users’ visual inputs. These patterns are then processed by the digital signal processors (DSP) that output realistic auditory signals.
Importantly, as DDSP’s primary sense of ‘seeing’ activates its secondary sense of ‘hearing,’ DDSP achieves the mechanical synaesthesia. This achievement is made possible by the extensive utilisation of human data, including analog signals from human paintings and music, which are employed in the DSP's training process. That is to say, humans consciously or unconsciously train AI to obtain ‘synaesthesia.’ Nevertheless, humans did not help machines for free. In the case of Paint with Music, users use AI as a prosthesis in order to expand their perceptive boundaries of vision and hearing, which are thereby synchronised by DDSP.
Jonathan Weinel designed the VR project Cyberdream, which “provides immersive spatial visualisations of music” by using Oculus Quest.27 The key to the design of this project is “symbolic representation,” of which “recurring visual themes found on album covers, event flyers, and other cultural artefacts” are used as a basis for the image design of music visualisations.28 Weinel describes this design as “synaesthetic,” whereby the sensory modalities of sight and hearing are experienced synchronously.29
An example of creation is the electronic dance music created by Zig-Zag Toy (see Figure 4). The 3D scene of a robotic skull was designed based on the futuristic and surrealist motifs of rave music.30 Furthermore, by manoeuvring the Oculus Touch controller, the user “emits jagged orange beams,” which in turn generate their synaesthetic music track (see Figure 5).31 In particular, by hand tracking the user’s movement, the controller and Zig-Zag Toy are able to generate different sounds corresponding to different beam patterns created by the user.32 Similar to Paint with Music, this synaesthetic process is realised through DSP, which translates the user’s visual input into auditory output.
Humans are prisoners living in Plato’s allegory of the cave. We mistake the shadows we perceive as reality. Indeed, as I stated in the introduction, the lively world surrounding humans is constricted into a dark cave that comprises merely five senses. Cognitive psychologists Donald D. Hoffman, Manish Singh and Chetan Prakash demonstrate this subjectivity of human perception in their famous interface theory of perception, wherein they argue that as “perception is a product of evolution,” natural selection hides the truth from humans by only letting them perceive what they need.36 Therefore, human perception is akin to a computer’s user interface, which “serves to guide useful actions,” rather than showing objective reality.37
Likewise, historian Yuval Noah Harari points out the limitations of humans’ visual and auditory perceptions. Harari suggests that “the spectrums of light and sound are far larger than what we humans can see and hear.”38 As Figure 6 exemplifies, the electromagnetic spectrum is approximately 10 trillion times larger than what humans can see.39 Thus, given the subjective nature of the humanities, are homo sapiens doomed to live in their self-centred worlds? Or is it possible for them to perceive more than what natural selection allows them to? As the following subsection shows, with the help of AI and VR, the latter is obtainable.
The synaesthetic projects Paint with Music and Cyberdream both suggest a reciprocal partnership between AI, VR, and humans. Put simply, while humans use AI and VR as the prosthetic sensory organs to help experience mechanical synaesthesia, human data is fed back to ML systems for training purposes. Importantly, ML and VR expand the human boundaries of vision and hearing by uniting them, thereby suggesting that images may have sound. Henceforth, humans will no longer be constrained by their own subjectivity; rather, they will perceive the world from the new mechanical perspective of synaesthesia. Of course, it is crucial to acknowledge that this mechanical viewpoint is not objective either. However, the value of such mechanical perception is not to unveil the ultimate reality but to allow us to understand the world differently.
Moreover, during this machine-human synthesis, humans and their mechanical prostheses join in one single entity. We thus evolve from homo sapiens into posthumans, a new species in the 21st-century whose perception is no longer limited by natural selection. As posthumanist N. Katherine Hayles illustrates, “cybernetic systems are constituted by information.”40 In this regard, AI and VR become a part of humans, for these technologies provide them with essential perceptive information about their environment. This fusion of humans and technologies is also in agreement with Marshall McLuhan’s theory of understanding media as the extensions of man. For McLuhan, all technologies can be understood as media and each medium is “an extension of [humans’] physical bodies.”41 For instance, a wheel is a medium that extends the human foot.42
Does this posthumanism mark the end of the humanities? In his book 21 Lessons for the 21st Century, Harari claims that liberal humanism is losing its faith as it cannot address the new challenges brought about by biotechnology and information technology.43 For instance, AI not only outperforms humans’ physical abilities, but also competes with humans in cognitive skills such as “learning, analysing, communicating and above all understanding human emotions.”44 This shatters the humanist belief in the superiority and sacredness of human life. Similarly, Hayles argues that liberal humanism, which assumes “a coherent, rational self, the right of that self to autonomy and freedom,” fails to adapt to today’s society.45 This is because many human decisions are made by the collaboration between human and nonhuman agents rather than by humans alone.46 For example, who a person decides to date depends on both the person’s personal preference and Tinder’s recommendation algorithms.
It thus becomes imperative to re-evaluate and reimagine the definition of the humanities in the context of today’s technologically mediated world. Posthumanism fundamentally challenges the traditional humanist conception of the autonomous and independent human subject, rendering it obsolete in the face of technological advancements. However, posthumanism does not render the physical bodies of humans redundant as we are not turned into cold-blooded cyborgs with implanted chips. Instead, being a posthuman is an abstract concept: the notion that humans should be conceptualised in relation to technology. This understanding allows for the expansion of human cognitive abilities through the integration of mechanical prostheses. By embracing these prostheses, we can transcend certain human biological limitations and understand the world through a novel window, augmenting our cognitive faculties while still maintaining our human embodiment.
Such an abstract concept of posthumanism is also material as the way of thinking shapes concrete actions. As Hayles notes, this materiality is substantiated by George Lakoff and Mark Johnson: “our images of our bodies, their limitations and possibilities, openings and self-containments, inform how we envision the intellectual territories we stake out and occupy.”47 If we define ourselves as a part of the technological ecosystem, rather than as the centre of the world, we will have the opportunity to escape from Plato’s cave replete with human subjectivity. Posthumanism therefore encourages us to embrace a more expansive view of reality, one that acknowledges the interconnectedness between humans and technology.
The recent development of AI and VR challenges the traditional humanist view of perception and opens up new possibilities for human perception. This paper has investigated the role of ML and VR in facilitating synaesthesia, a phenomenon where one sensory modality triggers sensations in another. Through an analysis of two synaesthetic projects, Paint with Music and Cyberdream, a reciprocal partnership between AI, VR, and humans has been revealed.
The analysis shows that ML and VR extend the boundaries of human perception through machine-induced synaesthesia, where humans can ‘hear’ an image or ‘see’ a sound. ML and VR systems learn from human users to augment their immersive and accurate performance, while humans use AI and VR as prostheses to expand their sensory capacities. This reciprocal relationship blurs the cognitive boundaries while maintaining the physical division between humans and machines, ushering in an era of posthumanism.
The implications of this research go beyond the realm of perception. It highlights the potential of human-AI cooperation and challenges the fears of a dystopian future dominated by machines. Instead, it emphasises the collaborative nature of human-technology interaction and the opportunities it presents for human enhancement. By employing textual analysis and technical explanations, this research has provided insights into the structuralist meaning behind synaesthetic projects based on ML and VR. It has expanded the understanding of perception within the framework of posthumanism, considering humans and technologies as part of a broader ecosystem.
Finally, future research could extend the concept of machine-induced synaesthesia as a type of posthumanist perception, while researchers should be encouraged to investigate the synaesthetic practices based on AI and VR that are beyond entertainment purposes.
Bateson, Gregory. Steps to an Ecology of Mind. Chicago: University of Chicago Press, 1972.
Cox, Christoph. Sonic Flux: Sound, Art, and Metaphysics. Chicago: University of Chicago Press, 2018.
Doury, Simon, and Caroline Buttet. “Paint With Music.” Magenta, January 6, 2022. https://magenta.tensorflow.org/paint-with-music.
Engel, Jesse. “DDSP: Differentiable Digital Signal Processing.” Magenta, January 15, 2020. https://magenta.tensorflow.org/ddsp.
Fürsich, Elfriede. “In Defense of Textual Analysis: Restoring a Challenged Method for Journalism and Media Studies.” Journalism Studies 10, no. 2 (February 2009): 238–252.
Fitzgerald, F. Scott. The Great Gatsby. Ware: Wordsworth Classics, 1993.
Harari, Yuval Noah. 21 Lessons for the 21st Century. New York: Spiegel & Grau, 2018.
Harari, Yuval Noah. Homo Deus: A Brief History of Tomorrow. London: Harvill Secker, 2016.
Hayles, N. Katherine. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics. Chicago: University of Chicago Press, 1999.
Haraway, Donna. “A Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in the 1980s.” Australian Feminist Studies 2, no. 4 (1987): 65–107.
Hoffman, Donald D., Manish Singh, and Chetan Prakash. 2015. “The Interface Theory of Perception.” Psychonomic Bulletin & Review 22 (2015): 1480–1506.
Johnson, Mark. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. Chicago: University of Chicago Press, 1987.
Kahneman, Daniel. Thinking, Fast and Slow. London: Penguin, 2012.
Lakoff, George, and Mark Johnson. Metaphors We Live By. Chicago: University of Chicago Press, 1980.
Li̇, Mandi. “A Second Life for Educators: A Hybrid Extended Reality Education Between Zuckerberg’s Vision and Educational Researchers’ Imaginary.” Journal of Metaverse 3, no. 1 (2023): 73–78.
Meta. “Meta Quest 2: Immersive All-In-One VR Headset | Meta Store.” Accessed February 6, 2023. https://www.meta.com/nl/en/quest/products/quest-2/.
Meta. “Hand Tracking Privacy Notice.” Accessed March 15, 2022. https://www.meta.com/en-gb/help/quest/articles/accounts/privacy-information-and-settings/hand-tracking-privacy-notice/.
McLuhan, Marshall. Understanding Media - the Extensions of Man. Corte Madera, CA: Gingko Press, 2003.
Mitchell, Melanie. Artificial Intelligence: A Guide for Thinking Humans. London: Pelican, 2019.
Park, Sang-Min, and Young-Gab Kim. 2022. “A Metaverse: Taxonomy, Components, Applications, and Open Challenges.” IEEE Access 10 (2022): 4209–4251.
Rogers, Brian. Perception: A Very Short Introduction. Oxford. Oxford University Press, 2017.
Weinel, Jonathan. Explosions in the Mind: Composing Psychedelic Sounds and Visualisations. London: Palgrave MacMillan, 2021.
Weinel, Jonathan. “Synaesthetic Audio-Visual Sound Toys in Virtual Reality.” Presented at 16th International Audio Mostly Conference, Trento, Italy, September 2021. 135-138.
Mandi Li is a third-year Bachelor Honours student from the University of Amsterdam, majoring in Media and Information. With a keen focus on the Philosophy of Technology, Machine Learning, and Digital Humanities, her insights are notably recognised in prominent journals. In 2023, she authored “A Second Life for Educators: A Hybrid Extended Reality Education Between Zuckerberg’s Vision and Educational Researchers’ Imaginary,” published in Journal of Metaverse. The same year also saw her insightful work, “From Archive to Anarchive: How BeReal Challenges Traditional Archival Concepts and Transforms Social Media Archival Practices,” featured in the Journal of Contemporary Archival Studies. Beyond her pursuits in the Humanities, Mandi, in collaboration with her team, programmed the philosophy large language model “ChatQuine” that simulates the intellectual mind of philosopher Willard Van Orman Quine, now published on GitHub.
Media and Information, Department of Humanities, University of Amsterdam
Amsterdam, the Netherlands
Personal Email: [email protected]
Institutional Email: [email protected]