The first demo of the Opera Automata web application is now available on the site. So far, the process has been highly iterative and transformative process, with various aspects of the project needing to be completely re-imagined. This DevLog dissects the project as of February 2020 and recaps the development process until this point.
This version of the project features a heavily abstracted version of the technology I’ve been developing. The aim of the demo is to provide a glimpse into some of the core systems that the opera will be employing.
A fundamental component of this project is the use of real-world data. It is intended for that final product to use utilise such data in as many aspects of the generative process as possible. However, in this demo, the only attributes generated from UK Census data are Sex & Sexuality. The rest have been been assigned placeholder probabilities for the time being. In the demo you are able to “regenerate” your character at will, using the respective button on the character panel.
The characters also generate complex personalities and emotional states. These I was not able to effectively show off in the demo, but the basic idea is that each character has a personality based on the Myers-Briggs system and this personality defines their disposition towards particular actions and methodologies. Their emotional state is affected by events surrounding them (relative to their personality). The idea is that the characters personality and emotion dictates the leitmotif and harmonic tension of the music they sing.
Beyond themselves, each time character is generated, they generate opinions of each entity in the room. At present, this opinions are categoric but I intend to convert this into a spectral variable to create a more realistic simulation. These opinions dictate the pool of adjectives in which the character will use to deliver an opinion of the given entity.
I am currently in the process of redesigning the piano synthesis algorithm (as discussed in The Process) so this component is unable to be experienced in this demo. Though, the opera is currently capable of generating simple musical phrases in the style of melody-dominated homophony. It does though through the systematic definition of notes, keys, intervals and time cells. Playback is humanised by slightly adjusting the velocity, timbre and timing of each note played. This creates a more realistic musical experience.
The language generation shown in the demo involves a relatively simple system of linguistic definitions. The code contains a definitions of, and various types of, words. It also contains definitions for particular types of phrases. The two types available in the demo are “descriptions” and “opinions”. These contain simple formulas for the types and placement for words in a sentence. For example, a description of an entity will take into account whether the entity is an object or a person. If it is an object it will either use the “the” or “this” article depending on the number of that entity that exist in the room. A description of a person will omit the article entirely. If the person is the same as the person saying the sentence, the subject noun becomes “I” and the verb to be becomes “am”.
Unfortunately there is a currently a browser-related audio issue that blocks the audio from the web app. Hopefully I’ll be able to fix this issue soon. However, the algorithm is currently capable of replicating an approximation of the waveforms produced at the human glottis. This is the “source” component of the source-filter speech synthesis model that I’m employing. The filter / vocal tract component is currently under development so
The physics simulation in the project is largely handled by Unity Game Engine’s built-in capabilities, so it required minimal coding compared to other aspects of the project. I decided to make the objects in the room draggable in order to provide a superficial activity for users who require it in order to maintain focus.
Navigational AI (Pathfinding)
Thanks to financial support from the Joe Brown Memorial Award, I was able to invest in a a brilliant AI Navigation system. Doing so has taken a lot of work off my chest and allowed me to spend more time working on the more artistic aspects of the project.
Moving from Sample to Physics-Based Synthesis
The biggest shift forced upon the project was the abandonment the abandonment of sample-based synthesis and FMOD Studio (FMOD Studio is a procedural audio software designed for dynamic audio manipulation in video games). I reached a point where the limitations of sample manipulation became very apparent to me, particularly the limitations surrounding expressive range and spacial acoustics. With our current utilities, no amount of wave-form manipulation can successfully alter the acoustic environment in which a given sample is recorded. Hopefully, with the ever-evolving development of AI, this is something we can look forward to in the future.
The alternative: physics-based synthesis. The vocal synthesis in the demo involves the virtual modelling of the actual instrument itself. This allows for a near-infinite number of variables that allow you to dynamically fine-tune sonic qualities in a multitude of acoustic environments. The downside, however, is that the algorithms involve a lot of dense, complicated physics and mathematics. As a result the development of this component is huge time dedication and for the vocal synthesis I’ve only currently developed the modelling of the source waveform at the glottis.
One aspect of the project I had to cut back on is the graphics. The above image shows the visual style of the project during January 2020. Moving applications on to the web always requires a certain amount of performance optimisation but with this project in particular, with its hefty algorithms, a lot of stripping-down was required in order to make everything run smoothly in a web-based environment. As a result I have now adopted a much flatter, cartoon-like aesthetic which I have also mirrored in the design of the user interface and website.
Since I’ve designed the website myself (so that I am able to include the application and present data appropriately), I’ve had another large-scale task added to my work load. This wouldn’t be so much of an issue if I wasn’t set on having it’s aesthetic mirror that of the application.
Receiving the Joe Brown Memorial Award added a lot of extra pressure for this project (and I’m already a perfectionist). Despite that, I am quite happy with the current state of the project and I’m really beginning to see a glimpse of what I had envisioned. I think that, in some regards, I have already partially achieved a number of aims I set. I think the demo indeed begins to provoke thought on the possibilities of such technology in this artistic context. I believe there is so much more to come from this project and I’m excited to see it blossom.