Voice - left or right

This project started with the simple idea to combine two classes projects together: Dynamic Web Development and Expressive Interaction - Voice. The key requirements are 0. network communication 1. use VUI for that. So I and Phil come up with the idea that making a multiuser game using voice interaction. The key concept that we settled was let users interact with their voice at the same time, even though the gameplay experience is not quite operated or delivered.
The picture in my mind was - a bunch of people is shouting to the game nearby, together or separately, and the narration helps /hinder the gameplay.  I wanted to realize the mess that all the people shouting and hearing each other, it's a bit participatory performance.

There were references that inspired us. Basically, the game targets users sitting in the same place as the best situation, so I can document it as a performance. All users have laptops and use the default mics as input devices. The game will play narrations also, so the voice from other users and the narration will be interruptive inputs together.
Twitch Plays Pokémon: multi-user interaction to control one object together. 


Last Man Standing: Multiuser interaction using the same objects, but separately. web-based interaction idea and narration persona.
Stanley Parable: narration persona.
Stranger than fiction: narration persona.
Portal GLaDOS: synthetic narration.

The narrator persona:
-male vs. female: Since most of the authoritative narration are using the male voices, I am interested in trying female authoritative voices. 
-synthetic vs. voice acting: using voice actor would be easier to deliver paralanguage (sigh, gasps, throat clearing, mhm..), but using synthesized voice for the experiment purpose also interesting.

So the voice will be a synthetic, female voice that has authoritative, sarcastically humorous, joking personality but makes you feel inhumane somehow (she knows that she doesn't have a physical body in the real world.)

It would be great to show some expressions also(like Watson expressive SSML), but there are not many voice synthesis engines that have expression features together. If I cannot express emotions naturally, rather an artificial sound voice would be better. (Users will excuse the hidden emotion expression.)

Ultimately, it would be perfect to make a neutral-female sound synthetic voice that has emotional expression.




During the classes, Expressive SSML was the most impressive voice synthesis. I decided to use this as the narrator.

Scraped tweets with the keywords left and right. #Left and #Right was much stronger, but I need to parse the hashtags out, so I just used left and right... (now I think I should refine texts more..)

