Cartoon Magic Mirror
14 December 2024
hello hello hello
this past semester i decided to take a VR/AR independent study class led by Joe Geigel.
Requirements:
- Goal of the project
The goal of the project was to see if the presence of the uncanny valley in real-time facial capture can be averted by mapping the continuous real-time facial capture data to a 2D avatar with discrete emotions and discrete animation frames within those emotions.
Idea: show distinction between mocapi test avatar and virtual avatar
Address uncanny valley effect in virtual avatar
Create a virtual avatar with hand-drawn facial expressions
Tracks head movement
Matches facial expressions with a person in real-time
- How you built it
- Tools:
- Unity as rendering/animation engine
- Rokoko studio + ipad mini for ARKit facial capture
- OpenSeeFace to capture body position from laptop camera
- C# to convert ARKit facial capture data to discrete animation states
- Aseprite to draw character, background, and emotion animations
- Pre-implementation research:
- Paul Ekman's Facial Coding Action System and the seven basic emotions
- joy
- anger
- fear
- surprise
- sadness
- contempt
- disgust
- iMotions - already does emotion detection, but only detects the seven basic emotions. not expressive enough for this project
- Paul Ekman's Facial Coding Action System and the seven basic emotions
- Tools:
- Results
- What you learned
- Avatar charisma is more important for emotional connection than pinpoint accuracy!
- Most people liked it, some were unimpressed, but nobody said it was creepy
- There are a lot of different variants in how people make facial expressions
- Anger in particular – closed-mouth vs open-mouth anger
- Wasn’t a distinction that showed up in the 7 basic emotion research
- Simple emotion detection algorithms only work when you have simple emotions
- What would you do next?
- Test it out with more people and record the variety of facial expressions people make
- Use a more advanced emotion detection system
- Probably not anything premade, since response time and charisma are more important than accuracy
- Use a more advanced mouth motion detection system
- Currently only uses jawOpen blendshape as proxy for mouth movement, but speech facial motion is more complex than that