Lip Service


I’m not a particularly big anime fan for a variety of reasons (mostly petty snobbery). The only shows I’ve made an effort to watch are the “classic” ’70s and ’80s television series like Gundam — because it’s culturally relevant — and Macross — because it’s a goofy love triangle between an idol, a career woman, and a young, short Japanese teenage archetype trapped inside a mildly interesting space epic. Although somewhat marginal in Japan, Macross‘ U.S. release as Robotech was a huge milestone for Japanese cultural importation in America.

Before it hit our homes each afternoon, I doubt we ever saw a TV show where multiple main characters die-off in the course of the show. After enduring the depressing Buddhist saga of Gundam last year, I’ve decided to watch the original Japanese version of Macross to see what I was missing.

During my childhood, “Japanimation” was still considered a mildly retarded artform. My brother and sister used to crack me up by doing real-life impressions of Speed Racer‘s jerky motions. There’s something inherently hilarious about the fact that the characters never talk until they stopped moving. The other hallmark of this anime-bashing was that the characters’ lips never matched up to what they were saying. I had always imagined that this was simply because the English translation of Japanese dialogue would never fit the same mouth movements. Same problem with Kung-Fu: It’s just the nature of the beast.

What I’m noticing from Macross, however, is that the lips almost never matched the Japanese dialogue in the first place. In the worst cases, a character’s mouth will start moving a good second before the speech begins. Things have obviously improved in recent years, but I feel wiser now knowing that I wasn’t losing anything in the translation.

W. David MARX (Marxy)
January 25, 2006

Marxy wrote a lot of essays back on his old site Néomarxisme. This is one of them.

35 Responses

  1. Carl Says:

    I noticed the same thing when my Japanese-American friend showed me some Kamen-Riders tape in high school. I was pretty surprised that the lip sync was so bad even in a live action.

  2. DB Says:

    I have a friend who works at an animation studio – when I asked him about this, he said it was because Japanese doesn’t depend as much on the formation of the mouth as English, so no need to depict the individual sounds. Seems like total bullshit of the kneejerk cultural superiority / uniqueness kind to me, but that was his answer.

  3. marxy Says:

    That explanation doesn’t make sense because – this is all speculation – mouth movements are mosly based on vowel sounds (the big opening of the mouth during an “ee” as in bee) or the rounding of the (“o” as in “go”). And if so, Japanese would be equally extreme as English, no?

  4. DB Says:

    Yeah, that’s a stupid explanation, I agree. Personally I think it’s just a result of different traditions and styles, with possibly no real conscious ‘reason’ behind it.

  5. Chris_B Says:

    my vote is for low budgets and short production deadlines

  6. marxy Says:

    Now we’re just waiting for the kneejerk “Shinto” response…

  7. Momus Says:

    There’s a pretty obvious Shinto explanation: it takes a second for the kami who lives deep in these characters’ stomachs to realise their mouths are moving and start speaking.

  8. trevor Says:

    everything i have seen on anime production. has been that the voice’s have been recorded… AFTER its been animated. basicly in a big over dub session. witch is still in practice today.

  9. ls Says:

    I think you’re misinterpreting this a bit … it’s not that Japanese has fewer sounds than English (although it does — Japanese has 5 vowels, English 12 or 13) but that the information stream is easier to parse and therefor less dependent on visual cues. The Japanese syllable is so stripped down (all CV except for those that end in moraic nasals or geminates) that it’s just easier to distinguish the sounds. It would be much easier to write speech-recognition software for Japanese than English or Russian. Of course, this is also the reason for the ridiculous number of homophones.

    Whether or not this is a valid explanation for the lip-syncing thing is up for debate … I think they’re just lazy about it.

    Here’s an interesting parlor game: have one person repeat the syllable “da” over and over while another person stands in front of them and mouths it in sync. Then have the person switch to mouthing “ba” instead. You’ll notice an immediate change in your perception of the sound, even if the two people are of different genders. You can’t get it to go away, either, as with some visual illusions. It’s extremely robust. The so-and-so effect, I can’t remember …

  10. Grishnackh Says:

    my vote is for low budgets and short production deadlines

    I’d like to second that. From what I’ve heard, the average animator’s monthly wage is about half of his/her salaryman counterpart, and many can’t even afford living independent of their parents in the beginning of their career. The situation is only getting worse as much of the work has been outsourced to Korea and China.

    In my opinion, the overall quality of anime reached its peak around 1990 and enjoyed a brief renaisaance in the late 90s, so it’s really hard to say if there’s any significant improvement in recent years. But then again, poor lip-syncing (and low frame rate et al) has never been an issue to most anime fans, for some reason…

  11. marxy Says:

    As for that linguistic explanation, English surely has more vowel sounds and more consonent sounds than Japanese, but aren’t anime mouth movements mostly based on the vowel part? Most characters don’t have lips, so you’re not going to get well-articulated visual bilabial representations etc. You’re going to get the mouth opening wide for i sounds and pushing out for o and u sounds. English may have more mouth positions total, but I’m not convinced that all the Japanese vowels look so similar that you can eschew animating them.

    You don’t really need cartoon mouths to match the dialogue, but I very much doubt that the Japanese language lends itself better to an internally homogenous set of movements – especially when the mouth moves in an exaggerated fashion. In real Japanese, speakers will swallow up their sounds so much that there’s less movement, but anime doesn’t follow that.

  12. DB Says:

    I put a little chart from an animation book on my site so you can get an idea how they do lip sync –

    The more common question I hear is ‘Why don’t they draw the characters to look Japanese?’ I think the answer to both is that this is just how the art form has evolved – or been ‘intelligently designed – in Japan.

  13. Momus Says:

    But that answer (“this is just how the art form has evolved in Japan”) skips over all the interesting and relevant information. It certainly doesn’t answer the question. It’s a sort of shrug.

  14. marxy Says:

    But isn’t realistic lip syncing a “luxury” in some ways? And how would viewers in the past know that such a luxury is even possible when they’ve never probably seen a domestic cartoon featuring matched dialogue?

  15. ls Says:

    The shape of the mouth opening is no more dependent on vowels than consonants, at least in any significant way. But again, this is irrelavent. It’s not a question of whether Japanese speakers perform these movements. I don’t think that Japanese is in any way more “homogenous” in its actual performance. Rather, it’s a question of perception. Due to the much larger number of possible English syllables — syllables, not sounds — we depend more on lip-reading to distinguish between them. You know the kana … now consider that there are maybe around 50,000 syllables in English. You have to get up to 6 consonants and a vowel (“sprints”) in the same amount of time in which a Japanese speaker identifies one vowel, one consonant, and possibly a nasal or geminate (“ka” or “han”). The constrictive structure of Japanese syllables makes your perception much easier to error-check.

    English is simply more difficult to understand at the phonological level, and therefore lip-reading becomes a bigger part of our speech perception. We depend on it more to identify closely clustered consonants. When it’s not there, we might naturally find its absence less acceptable than the Japanese do — they simply never pay attention to it, in life or on TV, because they don’t need to. So, ultimately, it’s not about verisimilitude (what is, in anime?) but about catering to the perceptual needs and habits of the viewer. Again, I’m not sure this is the real “reason” behind dubbing practices, but it’s not an invalid argument.

    As for the other point about race, I think we can all agree that it’s creepy as hell …

  16. der Says:

    Re. ls: this would explain why talking on the phone is so incredibily difficult in all languages except Japanese: it’s the lip reading that is missing to help you distinguish those pesky syllables.

    (Do you have any references for your highly dubious (but not impossible, I guess) theory? I have never heard that ASR of Japanese has significantly lower error rates than that of other languages (which should follow from your assertion). And too many possible counter arguments come to mind (disambiguation by context in a less homophone-ridden language, etc. etc.))

  17. guest Says:

    Wait a minute, the Japanese aren’t white?

  18. Grishnackh Says:

    I think it’s pointless to get deep into linguistics concerning the lip-syncing problem in anime, which has always been a reduced art form since its very inception. For one thing, The standard frame rate for TV anime is only 12 per second. However much the quality of drawing has improved and however much the style has sophisticated over the years, it has basically stayed on the same technical level as far as I can remember. So proper lip-syncing is really really of secondary concern. Film features, especially those by world-renowned directors, are of course another story.

    Talking about anime film features, I recently had an argument over the word “Japanimation” with an acquaintance, who apparently reads a lot of writings on anime in Japanese. Until I pointed out that this word has long fallen out of use in the English-speaking world, he was under the impression that “Japanimation” referred to arthouse animated films as opposed to TV anime in general. That peculiar impression turned out to be from a trend from 2000 on of how Japanese critics themselves use this word (in the katakana form of course). It just doesn’t make any sense to me.

  19. Momus Says:

    Didn’t we come to this point before in a conversation about TV? Marxy (quoting the thesis of “Everything Bad is Good for You”) was saying the complexity of US TV shows made for smarter viewers. I countered with some McLuhanite argument that the inverse was the case; if the TV show does all the work, the viewer gets all the more passive. Low res forms (and I think we can call bad lip sync a “low res” thing) actually make their consumers “fill in the gaps” with their own imagination, experience, etc.

  20. marxy Says:

    I imagine that the mouth movements match up to the dialogue. My experience guides me to think that the mouth movements should match up to the dialogue. I feel smarter now.

  21. Momus Says:

    But if everything bad is good for you, isn’t everything good (like good lip-syncing) bad for you?

  22. der Says:

    Low res forms (and I think we can call bad lip sync a “low res” thing) actually make their consumers “fill in the gaps” with their own imagination, experience, etc.

    That again can explain why Germans are so much more imaginative than for example British people, as here most films are dubbed (big films often fairly OKish, but TV shows (even big ones like Friends) perhaps worse than those animes we’re discussing in today’s class).

  23. ls Says:

    No, I don’t have any references, I’m pulling this directly out of my ass … you’re right, other factors would tend to make Japanese a more difficult language for speech recognition. I was just thinking of the first step of identifying syllables.

  24. Jrim Says:

    Low res forms (and I think we can call bad lip sync a “low res” thing) actually make their consumers “fill in the gaps” with their own imagination, experience, etc.

    Yes, but it’s dodgy lip-synch taking place within a world of interstellar travel, enormous flying battle-bots, and women with eyes the size of saucers and preternaturally pert H-cup breasts. Sorry, where were those gaps I was supposed to be filling in…?

  25. Grishnackh Says:

    Yes, but it’s dodgy lip-synch taking place within a world of interstellar travel, enormous flying battle-bots, and women with eyes the size of saucers and preternaturally pert H-cup breasts. Sorry, where were those gaps I was supposed to be filling in…?

    Maybe you should just check out one of those anime titles that don’t feature any of the aforementioned stereotypes…? Well, the saucer-sized eyes are kind of hard to avoid since it’s so well-established in this art form.

    I’m by no means backing Momus’ “low-res” argument, though. My previous post is simply a try at reasoning why lip-sync is so overlooked in anime by examing its rather unevolved technical respect.

    All the linguistic and technical issues aside, I think the most intriguing question is: Why has anime, still pretty much a domestic industry, become the hottest cultural export from Japan? It’s had a tremendous impact on millions of young, impressionable Western minds, effectively shaping their notions of the country.

    Old-school Japanophiles are no more, new-school wapanese are taking over.

  26. woof Says:

    i agree with momus. i’m involved in making animation in the UK, and having been brought up on the visual language of japanese animation, i’ve often found it difficult to work with british animators who insist on animating everything perfectly, frame by frame. i always found it quite telling how the japanese animation industry were a little slow in embracing the supposedly “cutting edge” of 3D CG animation (even miyazaki has only recently begun using it, and even when he does, 90% of the films are still hand drawn) – CG leaves nothing to the imagination- it literally has no gaps, because although virtual, you’re “building” everything. it’s like the antithesis of the current obsession to make everything seem real or at least convincing in CG – to do away with the inbetweens- i always have to yell at animators to say “NO!! I DON”T WANT IT TO LOOK CONVINCING! IT’S ANIMATION! I WANT IT TO LOOK RIDICULOUSLY MANNERED!!”

  27. Carl Says:

    I find Japanese is usually pretty easy to listen to and understand… except when people mumble in the morning. When they talk like that, it’s basically impossible to pick out words.

  28. Dirk Says:

    The reason why the sync is off with Japanese animation is because the voice actors record the dialogue after the animation is complete, whereas in American animation the dialogue is recorded first and the animators sync the mouth movement to the actual recorded dialogue.

  29. marxy Says:

    I certainly like the overstylization of anime – that giant sweat-drop for example. But the dubbing style seems to be a logistical, economic issue, not an aesthetic one.

  30. woof Says:

    but it IS an aesthetic one. it doesn’t cost anymore money to record dialogue before the animating than it does after- so it isn’t an economic issue. the key aesthetic advantage of recording the dialogue after the animation is completed is in the fact that it allows the animation to in a sense dictate itself, not by the voiceovers- it can be led by the emphasis on image and movement. animation is essentially all in the rhythm and the timing of frames; as an animator it can be a substantial creative constraint to have to work with predetermined timings, especially one which was recorded by a bunch of voice actors in a recording studio, who at the end of the day are just acting out between their “real” selves.
    although it is different, this whole idea kind of reminds me of the role of the “kuroko” in bunraku, or even the master puppeteers who don’t even attempt to disguise themselves.

  31. nate Says:

    according to a friend of mine who has no specific right to know… it’s a mix. Some of the episodes will be completed when the voice talent (who are actually pretty legitimate stars) are brought in to do most of the season’s (or series in other cases) work. Others won’t be, but the animation staff still never works with the recorded voice in many cases.

  32. Dirk Says:

    I wouldn’t say it’s purely an aesthetic choice. Being forced to animate precise lip movements does mean that the animation takes longer to complete (or requires more animators), and is thus more costly to produce. Woof, you seem to give short shrift to the contributions of voice actors to animation. Doing all of the recording after the fact seriously diminishes the potential for an actor to contribute to the development of the character and completely eliminates the possibility l for improvisation. The crux of your argument seems to be that the animator should be allowed to determine the timing. Well, what animator are you talking about? The director has control over the key animators who then determine what the lowly inbetweening grunts will do. But ultimately, it’s the director who controls the timing. If we accept your premise that it doesn’t cost more to record the actors prior to animating (with the intention of syncing the lips to the voices), then I suppose the director does have an aesthetic choice to be made. Does he control the performance of the characters via the voice actors or via the animators? For example, Katsuhiro Otomo chose to prescore Akira because he wanted the performances to be generated by the actors. Of course, Otomo had a huge budget to work with.

    As for the legitimacy of seiyuu as “stars,” well, they certainly are within the otaku community, but I wouldn’t expect the average Japanese person to know who Seki Tomokazu is (although I’m not sure if they would know who Tadanobu Asano is either, has he been in any mediocre television dramas lately?).

  33. ndkent Says:

    There’s also a connection that apparently sound feature films often? or always? were shot without sound and had all the voices post dubbed for many decades, I think up through the 1960s. Dialog in Hollywood is of course frequently looped too, but my understanding was the Japanese sound frequently? or typically? was not recorded at all during production.

    And for what it’s worth I think “Star Blazers” (Spacecruiser Yamato) was the first full fledged U.S. aired anime series with most of the hallmarks associated with anime. It had characters dying for instance. (but no Chinese pop idol) It stirred up some very early fandom but didn’t make enough of a splash to import more.

    “Robotech” came along some years later for the next “generation” of kids. “Robotech” wasn’t just “Macross”, they also packaged the unrelated anime series “Southern Cross” and “Mospeada” as a package to air every weekday rather than weekly. Though I guess the fans got into Macross and felt confused and turned off when the other series followed. But it proved some appitite for Japanese material not intended for the U.S. market.

  34. Carl Says:

    Here’s one theory I thought of about the use of white-looking characters in anime: drawing Japanese people is hard, because to tell two Japanese people apart you need to look at subtle differences in noses, skull shape, and mole placement, whereas you can tell two white-like characters apart by giving them different colored hair and crazy shaped noses. In other words, using white-characters let’s you use more stylization, since you don’t have to draw fine face details to convey the information.

    Of course, this theory doesn’t explain the big eyes. Also, all my Japanese friends tell me white people all look alike anyway.

  35. nate Says:

    ask japanese people if anime characters are caucasian. They’re not. The big eyed, light brown, blonde, or blue haired men and women of the manga and anime world are not gaijin unless they author says so. Otherwise it’s presumed they’re just stylized J-peeps.