The UX design case of closed captions for everyone

2019-02-07

Are video subtitles really chiefly for users who cannot hear or lack an audio device? A recent Twitter thread on “closed captions for the hearing” triggered a brief qualitative exploration and thought experiment - there may well be a growing group of users being forgotten in the design of closed captions.

Most commonly perceived as an auxiliary means for the hearing impaired, video subtitles, a.k.a. closed captions (CC), have only recently started to be widely considered as an affordance for users in situations with no audio available/possible (think mobile devices in public settings, libraries, shared office spaces); the latter to the extend that contemporary “social media marketing guidelines” strongly recommend subtitling video clips uploaded to Facebook, Twitter et al.

So: subtitles are for those who cannot hear, or with muted devices?

Who else uses closed captions?

I’m personally a great fan of closed captions, for various reasons unrelated to either of the above, and have often noticed certain limitations in their design. Hence, the user researcher inside me just did a somersault as I randomly encountered a Twitter thread following Jason Kottke asking his 247.000 followers:

After seeing several photos my (English-speaking, non-deaf) friends have taken of their TV screens over the past week, I’m realizing that many of you watch TV with closed captions (or subtitles) on?! Is this a thing? And if so, why?

The 150+ replies (I guess this qualifies as a reasonable sample for a qualitative analysis of sorts?) are a wonderful example of “accessibility features” benefiting everybody (I wrote about another instance recently). The reasons why people watch TV with closed captions on, despite having good hearing abilities and not being constrained by having to watch muted video, are manifold and go far beyond those two most commonly anticipated use cases.

Closed captions are used by people with good hearing and audio playback turned on. An overseen use case?

Even applying a rather shallow, ex-tempore categorisation exercise based on the replies on Twitter, I end up with an impressive list to start with:

Permanent difficulties with audio content
- audio processing disorders
- short attention span (incl., but not limited to clinical conditions)
- hard of hearing, irrespective of age
Temporary impairments of hearing or perception
- watching under the influence of alcohol
- noise from eating chips while watching
Environmental/contextual factors
- environment noise from others in the room (or a snoring dog)
- distractions and multitasking (working out, child care, web browsing, working, phone calls)
Reasons related to the media itself
- bad audio levels of voice vs. music
Enabler for improved understanding
- easier to follow dialogue
- annoyance with missing dialogue
- avoidance of misinterpretations
- better appreciation of dialogue
Better access to details
- able to take note of titles of songs played
- ability to understand song lyrics
- re-watching to catch missed details
Language-related reasons
- strong accents
- fast talking, mumbling
- unable to understand foreign language
- insecurity with non-native language
Educational goals, learning and understanding
- language learning
- literacy development for children
- seeing the spelling of unknown words/names
- easier memorability of content read (retainability)
Social reasons
- courtesy to others, either in need for silence or with a need/preference for subtitles
- presence of pets or sleeping children
- avoiding social conflict over sound level or distractions (“CC = family peace”)
Media habits
- ability to share screen photos with text online
Personal preferences
- preference for reading
- acquired habit
Limitations of technology skills
- lack of knowledge of how to turn them off

An attempt at designerly analysis

The reasons range from common sense to surprising, such as the examples of closed captions used to avoid family conflict or the two respondents explicitly mentioning “eating chips” as a source of disturbing noise. Motivations mentioned repeatedly refer to learning and/or understanding, but also such apparently banal reasons like not knowing how to turn them off (a usability issue?). Most importantly, though, it becomes apparent that using CC is more often than not related to choice/preference, rather than to impairment or restraints from using audio.

At the same time, it becomes very clear that not everybody likes them, especially when forced to watch with subtitles by another person. The desire/need of some may negatively affect the experience of others present. A repeat complaint that, particularly with comedy, CC can kill the jokes may also hint at the fact that subtitles and their timing could perhaps be improved by considering them as more than an accessibility aid for those who would not hear the audio? (It appears as if the scenario of audio and CC consumed simultaneously is not something considered when subtitles are created and implemented; are we looking at another case for “exclusive design”?)

And while perceived as distracting when new - this was the starting point of Kottke’s Tweet - many of the comments share the view that it becomes less obtrusive over time; people from countries where TV is not dubbed in particular are so used to it they barely notice it (“becomes second nature”). Yet, there are even such interesting behaviours like people skipping back to re-read a dialogue they only listened to at first, as well as that of skipping back to be able to pay better attention to the picture at second view (e.g. details of expression) after reading the subtitles initially.

Last but not least, it is interesting how people may even feel shame over using CC. Only a conversation like the cited Twitter thread may help them realise that it is much more common than they thought. And most importantly that it has nothing to do with a perceived stigmatisation of being “hard of hearing”.

CC as part of video content design

The phenomenon is obviously not new. Some articles on the topic suggest that it is a generational habit of generation Z (though Kottke’s little survey proves the contrary), or even sees it as paranoid and obsessive-compulsive behaviour of “postmodern completists” as facilitated by new technological possibilities. Research on the benefits of CC for language learning, on the other hand, reaches back several decades.

No matter what - the phenomenon in itself is interesting enough to make this a theme for deeper consideration in any design project that contains video material. Because, after all, one thing is for sure: closed captions are not for those with hearing impairments or with muted devices alone - and to deliver great UX, these users should be considered as well. At the same time, it is designers’ responsibility to ensure that extending the inclusiveness of video content through CC does not come at the cost of breaking the established science of providing access to those who need CC the most.