To appear in T. Holtgraves (Ed.), Oxford Handbook of Language and Social Psychology
Natural Language Use as a Marker of Personality
Molly E. Ireland
University of Pennsylvania
Matthias R. Mehl
University of Arizona
Address correspondence to Molly E. Ireland (firstname.lastname@example.org), Annenberg School for Communication, University of Pennsylvania, 3620 Walnut Street, Philadelphia, PA 19104, or Matthias Mehl (email@example.com), Department of Psychology, University of Arizona, 1503 E University Blvd., P.O. Box 210068, Tucson, AZ 85721.
Natural Language Use as a Marker of Personality
Natural language has been integral to the study of personality since the field’s inception. Lexical approaches to personality factor analyze the words people use to describe others in order to zero in on a small number of mostly-independent personality traits. Early on, this approach generated the Big Five model that remains a major force in personality research today (Costa & McCrae, 1992; Goldberg, 1981). More holistic narrative approaches code content themes in the stories that people tell about their lives in order to understand how individuals construe their own personality and represent it to others (McAdams & Pals, 2006).
Despite personality researchers’ early recognition that language and stories might accurately describe personality, the idea that natural language use also reflects personality and might provide knowledge about individual differences above and beyond self-reports did not gain traction until the last decade. Before the computer revolution, Walter Weintraub (1981) spent decades amassing data on natural language use – primarily by training coders to count words and phrases by hand – with little recognition. Although some accepted his premise that verbal behavior, like other actions, reflects psychological processes, the psychological community was slow to adopt the burdens that came with carrying out linguistic research on sufficiently large samples by hand (Mehl, 2006a).
The recent surge in the popularity of studying language in the social and behavioral sciences stems in large part from technological advances in computational linguistics. The internet and computer science more generally have made it possible to easily compile and analyze natural language corpora with word counts climbing into the millions and billions. As of 2010, Google Books had digitized 12 percent of all books ever published and has made those data available to interested researchers (Michel et al., 2010). Social networking and publishing sites like Livejournal, and, more recently, Twitter allow researchers to download their users’ language use for free or for relatively low costs (e.g., Golder & Macy, 2011; Ramirez-Esparza, Chung, Kacewicz, & Pennebaker, 2008; Yarkoni, 2010). Facebook similarly allows researchers to access users’ status updates, although they tend to have stricter restrictions on usage than other sites (e.g., Kramer, 2010; Schwartz et al., 2013). The near-universal use of the internet has helped psychology widen its sampling nets for a wide range of purposes. The rise of the internet has been particularly fruitful for language research: Observing verbal communication or other language behavior online is simple. Within minutes, online social media users often produce hundreds of units of objectively and easily quantifiable (i.e. text-based) behavioral data.
Perhaps more importantly, sophisticated text analysis tools allow researchers to analyze the increasingly large and diverse datasets that technological advances have made possible. One of the first text analysis tools to enter into common usage is a computationally simple word count program called the Linguistic Inquiry and Word Count (LIWC; Pennebaker, Francis & Booth, 2007). LIWC is a computerized text analysis program that outputs the percentage of words used in a text or batch of texts that fall into one or more of over 80 grammatical (e.g., articles, first-person singular pronouns), psychological (e.g., positive emotion, insight), and topical (e.g, social, sex) categories. It does so by comparing each word in a text against a set of internal word lists or dictionaries. Although the psychometrically-developed content categories have a (minor) subjective component, the grammatical and linguistic categories are, for the most part, based on objective and factual information about the established lexical members of that category.
LIWC, along with its predecessors the General Inquirer (Stone, Dunphy, Smith, & Ogilvie, 1966) and DICTION (Hart, 1984), focus on word frequencies alone irrespective of context. For example, the sentences, “I’ve never been less happy” and “I’m the happiest person alive” would be coded as containing identical proportions of positive emotion words despite the very different meaning of each sentence. Critics have pointed out that, in examples like these, the programs’ context blindness can lead to noisy or difficult-to-interpret results.
The role of word count programs, however, is not to obviate self- and observer reports but to complement them, and to clarify the gaps and inconsistencies they leave behind. A cursory reading of the two statements about happiness above or a single-item measure of positive emotion could tell you that the above speakers range from least to most happy, respectively. However, much of word count programs incremental utility comes from the information that questionnaires and content coding often misses: Programs like LIWC are able to tell us that the speakers in the examples above, despite their differences, are focusing on happiness to similar degrees. Studies have borne out the intuition that it’s often speakers’ focus rather than their conveyed meaning that matters. For example, early expressive writing studies showed that people who wrote about past traumas using positive emotion words tended to benefit more than those who used exclusively negative emotion words (Pennebaker, 1997). This finding was particularly striking because, due to the traumatic nature of the expressive writing topics, uses of positive emotion words were frequently negated (e.g., “she’ll never forgive me,” “I’m not a good person”).
Measurement and Psychometrics of Natural Language Use
Before serious and widespread work could begin on the links between natural language use and personality, two questions needed to be addressed: First, how can we measure (i.e., analyze) natural language use? And, second, does natural language use fulfill the basic psychometric requirements for a personality or individual difference variable (i.e., is it moderately stable over time and across context)?
Before computerized text analysis, language analysis in both psychology and linguistics was by necessity primarily qualitative. Linguists would often subject single utterances or exchanges to intense scrutiny (see Tanenhaus & Trueswell, 1995). Others would draw inferences about human behavior in general based on everyday observations of speech patterns of social relationships (e.g., Lakoff, 1975). Psychologists who sought psychometrically tractable linguistic data tended to use content coding methods, in which trained judges rated texts according to a set of criteria that typically focused on writers’ or speakers’ use of content themes, such as anxiety, hope, and health or sickness (Gottschalk & Gleser, 1969). However, even the most reliable and statistically sound of these methods were slow, labor-intensive, and subject to human error.
A major benefit of word count approaches is that their reliability is never compromised by subjective biases or experimenter error. Computer programs output the same results regardless of the mental state of the experimenter using them, and they will always find exactly how many times a certain word or group of words occurs in a given text regardless of how easily overlooked those words might be (e.g., the notoriously invisible of1). Indeed, the context-blindness of word count programs is beneficial in this respect. Whereas a human coder might be bogged down by shades of meaning or biased assumptions about a speaker or writer, computerized text analysis programs focus single-mindedly on what language alone can tell us.
Once programs were available that could rapidly and reliably identify language patterns in large samples of texts, it was possible to establish the basic psychometrics of natural language use. One of the first investigations into the reliability of language use was conducted by Gleser and colleagues (1959). In that study, participants told a personal story in monologue for about 5 minutes. Transcripts were divided into two equal halves and coded for several linguistic and psychological language categories, such as adjectives and feelings. The average correlation between the two halves of these stories across all categories was moderate-to-high, approximately r = .50. In other words, as anyone who has either written or read a story could tell you, there is some variation in stories over time. Authors use different words when laying out the setting of a story than when describing the climax (Vine & Pennebaker, 2009). Yet despite these obvious differences, the linguistic fingerprint of an author or speaker tends to remain visible over the course of a narrative. Similarly, individuals’ spoken language use tends to remain consistent during hour-long life history interviews, with consistency coefficients (Cronbach’s alpha) ranging from .41 to .64 for several stylistic and content language categories, despite stark differences in interviewer questions between the first and second halves (Fast & Funder, 2008).
Several studies have now demonstrated that natural language use evidences substantial consistency not only over the course of a narrative, but over time as well. Schnurr et al. (1986) asked medical students to describe their experiences coming to medical school in two unscripted monologues spaced one week apart. They found that language use, as analyzed with the General Inquirer (Stone et al., 1966), was highly reliable over time. Across 83 content categories measured in that study, including references to people, work, affect, and evaluations, the average correlation between the two monologues was .78. Later, around the time that computerized text analysis approaches were beginning to gain mainstream momentum, Pennebaker and King (1999) further tested the limits of linguistic stability by comparing individuals’ language use over longer stretches of time, lasting up to several years, and across diverse contexts, ranging from scientific articles to students’ stream-of-consciousness essays. They once again found, using word frequencies calculated by LIWC, that people maintain good linguistic consistency in most language categories – often despite predictable situational fluctuations in language use.
The same consistency across time and place has been found for naturalistic, everyday spoken language as well. In one of the first studies using the Electronically Activated Recorder (EAR; Mehl, Pennebaker, Crow, Dabbs, & Price, 2001) methodology, Mehl and Pennebaker (2003) recorded college students in their natural environments over the course of two 2-day periods spaced 4 weeks apart. The EAR sampled 30 seconds of ambient sounds roughly every 12 minutes. All captured talking was transcribed and coded for participants’ location, activity, and mode of conversation (i.e., telephone or face-to-face). With few exceptions, both linguistic and psychological categories were substantially correlated across time, activity, and interaction mode. Interestingly, the authors also found that function word categories, including grammatical parts of speech such as pronouns and articles, were more consistent than were content-based psychological categories (average function word r = .41; average psychological processes r = .24).
Taken together, this past research establishes that people’s natural language use is characterized by a good degree of temporal stability and cross-situational consistency. Therefore, the ways in which people spontaneously use language – for example, idiosyncrasies in word choices or speaking styles – satisfy psychometric requirements for personality or individual difference variables.
Language Style versus Language Content
Early analyses of personality and language tended to base their conclusions on transcripts that had been coded for content words and phrases (e.g., Smith, 1992). Indeed, in computational linguistics, the function words that make up language style continue to be referred to as “stop words” because they are usually ignored during automated language processing. However, individual differences in language style are often more psychologically telling and psychometrically parsimonious than are differences in language content. Language style is defined by a person’s use of function words, including pronouns, articles, conjunctions, and several other categories that make up the grammatical structure of utterances (Table 1). Function words tend to be short, frequently used, and have little meaning outside the context of a conversation. In part because of these characteristics, they are processed fluently and largely automatically during both language production and comprehension (Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Levelt, 1989). Language content, on the other hand, is defined by a person’s use of nouns, verbs, adjectives, and most adverbs. In short, content words determine what a person says, and function words determine how they say it.
Note. Only basic-level and not superordinate function word categories were included. Categories are from LIWC2007 (Pennebaker et al., 2007).
To a greater degree than function words, content words are practically constrained by the topic or context of conversation. For example, group members assigned to solve math problems together will uniformly use content words related to that task (e.g., calculate, solution) regardless of whether they are each individually thinking about the problems in different ways. Function words, on the other hand, are more loosely constrained by the topic of a conversation, allowing people to discuss the same content in different styles. The versatility of function words allows researchers to measure differences in language style across contexts rapidly and objectively. Despite some degree of natural verbal and nonverbal convergence between individuals during social interaction (Chartrand & van Baaren, 2009; Ireland & Pennebaker, 2010; Pickering & Garrod, 2004), function words used during conversation reliably reflect differences in social status, honesty, and leadership styles (Hancock, Curry, Goorha, & Woodworth, 2008; Pennebaker, 2011; Slatcher, Chung, Pennebaker, & Stone, 2007; see Tausczik & Pennebaker, 2010 for a review).
A second reason that researchers might focus on language style instead of or in addition to content is that function words tend to more directly reflect social cognition during conversation. The relationship between function word use and social cognition is primarily a practical matter: Because function words have little meaning outside of the context of a sentence, they require common ground or shared social knowledge to be interpreted (Pennebaker et al., 2003). For example, to understand the sentence He shut the dog in there,the speaker must know that the listener shares his knowledge of the man, the dog, and the location in question. This mutual understanding of a situation, its potential referents, and each conversation partner’s knowledge of the situation is known as common ground and theoretically forms the foundation of any successful conversation (Clark & Brennan, 1991). Given that interest in and attention to others’ thoughts and feelings is an integral aspect of personality (e.g., particularly Big Five extraversion and agreeableness), the ability to automatically extract individuals’ social cognitive styles from their language use could be a valuable addition to personality researchers’ toolboxes.
A final reason for paying attention to style in addition to content is purely psychometric. Individual differences in language observed in early language research focused primarily on phrases that include both language content and style. To use women as an example, female speakers tend to use more uncertainty phrases, such as I wonder if, and extra-polite phrases, such as would you mind, than male speakers do (Holmes, 1995; Lakoff, 1975; Poole, 1979; Rubin & Green, 1992). However, many of these phrases can be measured more simply by counting the function words that they commonly include – specifically, in the previous examples, first-person singular pronouns (e.g., I) and auxiliary verbs (e.g., would). Indeed, an analysis of formal writing in the British National Corpus found that function words offered the most efficient way to classify texts as authored by men or women (Koppel, Argamon, and Shimoni, 2003). Similarly, a corpus analysis of spoken and written language collected in 70 studies revealed that function words more reliably discriminated between male and female participants than did content words (Newman, Groom, Handelman, & Pennebaker, 2008). In personality research specifically, direct comparisons are relatively sparse. However, style appears to provide the best classification accuracy for neuroticism, providing gains over content alone and even over content and style combined (Argamon, Koppel, Pennebaker, & Schler, 2009). Whether similar effects will be found for other personality traits and individual differences has yet to be conclusively determined.
In the end, whether language content or style is a more reliable indicator of personality and individual differences may depend to a large degree on what personality measures are used to establish criterion validity. Although individual differences such as age and sex appear to be more strongly linked to language style, personality traits as measured by Big Five scales are often more consistently and strongly linked with language content – including both language categories and individual words – than language style. This may be because research exploring the link between language use and the Big Five has overwhelmingly used personality self-reports as the gold standard of true personality, whereas demographic variables can be more objectively observed.
The pattern of personality self-reports matching language content and demographic individual differences matching language style may be due to a match between the automaticity of the behavior and the measurement (Eastwick, Eagly, Finkel, & Johnson, 2011). For example, a neurotic person who realizes that neuroticism is socially undesirable may – deliberately or not – project a cheerful exterior by using positive emotion words and by downplaying his anxiety in self-report questionnaires. However, less accessible components of language use, such as increased use of self-references like I and me, may correlate with less accessible behavioral indicators of worry such as compulsively checking the status of a loved one’s flight or spending extensive time on WebMD. Given the abundance of online language use – including e-mails, blogs, and online chats, which are often archived by default (see Baddeley, 2011) -- and the fact that many nonverbal behaviors can be accessed simply by downloading browser histories, future personality research may be able to incorporate naturalistic measures of individuals’ online behavior to triangulate when and where language style and content and behavioral and self-reported personality converge.
The Big Five Personality Domain
The literature on language and the Big Five is the largest of the subareas within the study of personality and natural language. To accommodate its size, the sections below first summarize the samples commonly used in this research and next address major findings for each Big Five dimension individually.
Language samples.The kind of naturalistic language that has perhaps most frequently been subjected to computerized text analysis and linked with the Big Five is online or computerized language use. In the roughly 20 years since the internet was made accessible to the general public, language has become the most accessible naturalistic behavior available to behavioral scientists. As the sections below will explore as well, everyday verbal behavior is carried out online and often automatically saved in blogs, social networking sites, e-mail accounts, online chats, and text messages. More formal texts abound as well, including a huge range of academic submissions, ranging from admissions essays to published scholarly work, not to mention nearly a fifth of the fictional novels, poetry collections, and nonfictional books published in recorded history (see Michel et al., 2010).
Considering that this goldmine of information is often free and accessible to anyone with the necessary web programming or copying-and-pasting skills, it is surprising that only a few studies linking the Big Five and the kind of quasi-naturalistic language use that occurs in these formats have been conducted. The studies that have been conducted show great promise, however, for both understanding naturalistic manifestations of personality and for the longstanding goal of automatically building personality profiles based on behavioral data (Dodds & Dansforth, 2009; Mairesse & Walker, 2011; Mairesse, Walker, Mehl, & Moore, 2007).
A few studies have gone to the effort of collecting spoken language as it occurs in real life. These studies were made possible by the advent of the EAR, or Electronically Activated Recorder, about a decade ago (Mehl et al., 2001). The EAR is a programmable audio recorder that periodically records snippets of ambient sounds (e.g., 30 seconds every 12 minutes). When the EAR records, it captures any surrounding noise – including language used by subjects in their daily interactions with their social networks. Later, trained transcribers listen to the recordings, type the language they hear and typically also coding for basic features of subjects’ momentary social environments (e.g., location, activity).
Within studies that have looked at laboratory writing or dialog tasks, language use largely falls into two categories: tasks with face-valid relevance to personality, such as asking people to talk about events that were important in shaping their identity, and those that attempt a more circumspect route, such as asking students to describe an object (e.g., a water bottle; Pennebaker, 2011). Not surprisingly, considering that the criterion for personality is nearly always responses to face-valid self-report scales, language used in the former tasks tends to correlate more strongly with personality dimensions. For example, although Pennebaker & King found only a small number of modest significant correlations (rs < .20) between self-reported Big Five traits and language used in stream-of-consciousness writing and essays about coming to college, Hirsh and Peterson (2008) and Fast and Funder (2008) found a large number of moderate correlations (rs = .20-.40) between self-reported personality and language used in separate tasks that asked participants to describe their life stories.