Summary and Future Directions

The aim of this chapter has been to provide a blueprint of research on language and personality up to this point that depicts both its structural soundness and need for additions and improvements. In closing, we will provide an overview of the existing studies in form of a summary table (see Table 2) and outline a few recommendations for future progress.

Table 2. Summary of Linguistic Indicators of Personality


Qualifier (e.g., Moderator)

Big Five


second-person pronouns (+), first-person plural pronouns (+), positive emotion (+), social (+), leisure (+), sex (+), inhibition (-), tentativeness (-)



positivity (+), first-person singular pronouns (+), social (+), home (+), family (+), communication (+), death (-), money (-), swearing (-)


articles (+), prepositions (+), personal pronouns (-), family (-), home (-), rest (-)

Contradictory facets


swearing (-), negative emotion (-)



first-person singular pronouns (+), negative emotion (+)

Sex, both only in private


Type A

first-person singular pronouns (+)


first-person singular pronouns (+), negative emotion (+), anger (+), swear (+), sex (+), first-person plural pronouns (-)

Sex, Communication Context


first-person singular pronouns (+), first-person plural pronouns (+), second person pronouns (-), third-person pronouns (-)



first-person singular pronouns (+), negative emotion (+)

Public vs. private, correspondent closeness

Trait Emotionality


second-person singular pronouns (+)

Negative emotions

negative emotion (+)

Writing or speaking topic

Positive emotions

positive emotion (+)

Note. See text for references. For the Big Five, only the most common and universal correlates are listed.

Finding consistent threads among studies is sometimes made difficult by differing methodologies. Even among studies that used the same text analysis tool, some focused only on linguistic content rather than all categories, and others used different versions of a program that include several non-overlapping categories. The literature on language and personality would no doubt benefit from more comprehensive reporting of effects, in papers or in online supplemental materials. The existing studies suggest that both content and style categories are critical. Although content words are more susceptible to self-regulation and thus tend to be lower fidelity indicators of internal states, the degree to which a person’s language use fails to reflect their self-or informant-reported personality is often a telling indicator of self-regulatory personality processes and person x situation interactions (Baddeley, 2011; Baddeley & Pennebaker, 2012; Mehl et al., 2006; Mehl & Holleran, 2008). Style words are often more challenging to interpret, but are valuable as the mostly automatic, and therefore more psychologically representative, indicators of attentional focus and thinking styles (see Tausczik & Pennebaker, 2010). Content and style are two sides of a data-rich coin, and personality psychology has much to gain from increasingly considering both aspects of language use.

In order to correctly interpret the nature and true magnitude of effects, studies of language and personality may also need to increasingly measure and consider a range of potential moderators or modifiers, including facet-level trait measures (Yarkoni, 2010), individuals’ sex (Mehl et al., 2006), whether language use is public or private (Mehl & Holleran, 2008), the closeness of conversation partners (Baddeley, 2011) and linguistic co-occurrences (Gill & Oberlander, 2002). Specifically for function words, which are by definition extraordinarily versatile, research has shown that moderators matter. For example, whether I or you is said by a man or a woman and in the context of an angry or cheerful communication can dramatically influence which psychological processes those words reflect (Fast & Funder, 2010; Mehl et al., 2006; Tausczik & Pennebaker, 2010).

Context effects, such as the types of communication that a situation affords or demands, are important considerations in any area of behavioral research. Studies of language use are no exception. Just as a highly extraverted person would not be expected to behave dramatically differently than an introverted person in a situation lacking the potential for social interaction, personality traits that are predominantly defined by differences in social interaction are likely to leave fewer observable traces in solitary writing such as stream-of-consciousness essays. Furthermore, writing or speaking tasks that resemble criterion measures of personality (e.g., self-report personality questionnaires and essays describing one’s personality) are bound to be more highly correlated than naturalistic measures of language (e.g., Hirsh & Peterson, 2008). However, perhaps in part due to the influence of corpus linguistics, where language from a wide range of communication media are frequently compiled into a single dataset comprising billions of words, studies of linguistic indicators of personality have only recently come to seriously consider communication context. Given that so many personality dimensions hinge on how people react to and interact with others, it is particularly important – in studies of natural language use and beyond – for personality research to increasingly study the links between naturally occurring dialog, self-reports, and observer reports. As naturalistic language research expands with ongoing advances in audio recording technology and computer science methods, it should become easier to understand how linguistic signals are attenuated and warped by contextual influences such as experimental task, communication medium, and motivation.

The accomplishments of computerized text analysis in the last 15 years have been extraordinary. However, the software designers, programmers, and data analysts behind this revolution readily admit that there is room for improvement. Cohen and colleagues’ (2009) and S. Cohen’s (2011) research on the measurement of trait affect points to a possible need to improve word-count measurements of common positive emotion words, which are often used in ways that do not reflect positivity (e.g., I was pretty bored, someone like you), by considering their linguistic contexts. New discoveries made in function word categories that are new to the most recent version of LIWC (Pennebaker, Booth, & Francis, 20007) suggest that finer grained analyses based on words’ grammatical roles have the potential to clarify mixed results in past research and shed light on the cognitive mechanisms underlying personality dimensions.

Measures of within-text context – and the usability of tools that consider linguistic context – are bound to improve studies of language and personality as well. A word’s location in a text or sentence (Vine & Pennebaker, 2009) and its neighboring words (Gill & Oberlander, 2004) clearly matter but are rarely considered in psychological text analyses. Programs such as Latent Semantic Analysis (Landauer & Dumais, 1997) and WordSmith (Scott, 2008) handle such variables and, as they become more widely known and user-friendly, stand to greatly enrich future research.


In this famous monograph on personality, Allport (1937) wrote “language is a codification of common human experience, and by analyzing it much may be found that reflects the nature of human personality” (p. 373). Interestingly, the field of personality and language use only started getting serious momentum more than half a century later. As the research reviewed in this chapter reveals, though, the field is now rich, vibrant, and has already produced many important discoveries. We expect that the immense progress in (stationary and mobile) computing technology and parallel advances in computational linguistics will create a strong push for the field over the next years and lead to critical improvements in the complexity with which naturalistic language can be analyzed. It is our sense that the field will thrive to the extent that it uses these technologically-driven, “bottom-up”, analytic advances and, at the same time, balances them with innovative theoretical developments and clarifications from “top down”. To achieve this, it will undoubtedly become necessary for researchers from different fields to “cross-talk”. Social psychologists, personality psychologists, cognitive psychologists, linguists, communication scholars, computer scientists and other researchers will need to engage in conversations and collaborations and thereby transcend (and hopefully reduce) traditional discipline boundaries to more fully understand how our words reflect our selves.


1At some point, you may have received the following test over e-mail: “How many Fs does the following passage contain? ‘Finished files are the result of years of scientific study combined with the experience of years.’” Finding only three Fs tends to result from readers skipping ofs.

2The term sex is used by default to refer to all differences in personality-language links between men and women. However, gender may be more appropriate in cases where linguistic differences seem to be more strongly influenced by gender norms than biology (see Eagly, 1995).


