Generative Music - Ambient Poem Rhythm Game Proposal
Since the last blogging, we had a workshop for MusicVAE, using Jupyter notebook. It is really lovely that the last workshop really fits to my final project concept, so I can save sometime to research and trial and errors.
One thing I figured out was, not only training and generating new pieces, (Even just testing pre-trained models took hours to just generate few sentences!!) I needed to simplified my concept.
I changed to use poem from prose. This makes sense for the concept of listening narration rhythmically. Poems have rhythm and musical elements by nature.
I scraped Poetry Foundation for poems, and they actually do have Listen section as a way to appreciate poetry.
I was thinking about a very ambitious plan such as real-time generating narrative-music, but I guess I need to be satisfied with a small start.
The poem I am gonna use is A Gift for You BY EILEEN MYLES.
Tacotron2 predicts Mel-spectrogram of a text, and WaveNet synthesis a voice based on that.
Since I wanted to give a sense of musical piece, I chose NSynth for the interpolation between generated voice and ambient music.
I tried just like the poem lines, but the sound so stiff and strong, so I made lines again based on the waveform of original narration.
It took so much time to generate each line, I tried Nsynth simultaneously.
NSynth 1 [Human Voice || Tibet Singing-bowl || Ibo Drum]
NSynth 2 [Generated Voice || Ocarina sound || Flute Sound]
At the first iteration, I thought low pitch instrumental sounds are really good to hear, but the interpolation result with human voice was a bit monstrous and unpleasant. I changed to high pitch instruments for the second iteration.
Instrument sounds samples below
After these trial, it looks like Nsynth might not be the model I was looking for. What I was looking for was something like Style Transfer for Music.
Using Nsynth interpolates pitch and rhythm also, but I wanted to keep the rhythm and pitch of narration while change the timbre from instruments, or a certain music genre.
While looking for code, I found Wavenet generatef from Mel-spectrogram representation, so it could be interesting generate instrumental sound from mel-spectrogram of narration file. I actually tried to do that, but I was using Jupyter Notebook so I couldn’t access output files.
For the proof of concept, I tired to put the sounds to rhythm game Unity assets. Still I need to manually assign lines to the sound, but it was so interesting that text and sound was kinda synced if I consider them as four-quarter measure music. Since the asset counts the beats, no matter how long or short the line was, each line passed after 4 beats and it looked okay. This can be a new way fo an approach for the next step.