Cartoon Magic Mirror

14 December 2024

hello hello hello

this past semester i decided to take a VR/AR independent study class led by Joe Geigel.

Requirements:

Goal of the project
- The goal of the project was to see if the presence of the uncanny valley in real-time facial capture can be averted by mapping the continuous real-time facial capture data to a 2D avatar with discrete emotions and discrete animation frames within those emotions.
- Idea: show distinction between mocapi test avatar and virtual avatar
- Address uncanny valley effect in virtual avatar
- Create a virtual avatar with hand-drawn facial expressions
- Tracks head movement
- Matches facial expressions with a person in real-time
How you built it
- Tools:
  - Unity as rendering/animation engine
  - Rokoko studio + ipad mini for ARKit facial capture
  - OpenSeeFace to capture body position from laptop camera
  - C# to convert ARKit facial capture data to discrete animation states
  - Aseprite to draw character, background, and emotion animations
- Pre-implementation research:
  - Paul Ekman's Facial Coding Action System and the seven basic emotions
    - joy
    - anger
    - fear
    - surprise
    - sadness
    - contempt
    - disgust
  - iMotions - already does emotion detection, but only detects the seven basic emotions. not expressive enough for this project
Results
What you learned
- Avatar charisma is more important for emotional connection than pinpoint accuracy!
- Most people liked it, some were unimpressed, but nobody said it was creepy
- There are a lot of different variants in how people make facial expressions
  - Anger in particular – closed-mouth vs open-mouth anger
  - Wasn’t a distinction that showed up in the 7 basic emotion research
- Simple emotion detection algorithms only work when you have simple emotions
What would you do next?
- Test it out with more people and record the variety of facial expressions people make
- Use a more advanced emotion detection system
  - Probably not anything premade, since response time and charisma are more important than accuracy
- Use a more advanced mouth motion detection system
  - Currently only uses jawOpen blendshape as proxy for mouth movement, but speech facial motion is more complex than that