What does your conscience have to do with psychosis and evolution? Why does psychometrics find that success in humans, a social species, is more dependent on book smarts than people smarts? Should personality models explain who we are or how we got here? And what does this all have to do with language? This series seeks to answer those questions.
α and β
Readers of the blog are familiar with the genesis of the Big Five. Calling upon the lexical hypothesis, intrepid psychometricians set out to map the linguistic landscape of character. Using proto word vectors, correlation matrices of common adjectives were reduced to a few dimensions.
In the 1990s the field coalesced around the Big Five, a rotation of the first five principal components of such data. Proponents’ arguments then reflect current consensus about what the model can accomplish.
“The [Five Factor Model] does not provide, nor was it ever intended to provide, a model of personality dynamics or personality development. This does not mean that the Big Five dimensions cannot or will not be explicated at a dynamic and developmental level, only that the model was developed to account for the empirically observed relations among trait descriptions.” John and Robins (1994) Traits and Types, Dynamics and Development: No Doors Should Be Closed in the Study of Personality
Personality is one of the most theory-rich domains of life. People everywhere—not least of whom psychologists—have causal models for how traits relate and develop. It is an odd situation for the field’s general model to be explicitly silent on the topic. While there is some theoretical work with the Big Five, this often involves the factors to 1 or 2 super-factors (1,2,3,4,5). The first such explication of meta-traits was in 1997 with Digman’s Higher-order factors of the Big Five. His metanalysis consisted of PCA on the 5x5 correlation matrix reported in 14 different Big Five studies. The first two factors were remarkably consistent across the studies and aligned with theoretical models of development. Here he describes the first PC.
Another possibility, one that looks upon factors as causal agents, rather than simply a collection of correlated variables, is that Factor α represents the socialization process itself. From Freud (1930) to Kohut (1977), from Watson (1929) to Skinner (1971), personality theorists of various persuasions have been concerned with the development of impulse restraint and conscience, and the reduction of hostility, aggression, and neurotic defense. From this point of view, Factor α is what personality development is all about. Thus, if all proceeds according to society's blueprint, the child develops superego and learns to restrain or redirect id impulses and to discharge aggression in socially approved ways. Failure of socialization is indicated by neurosis, by deficient superego, or by excessive aggressiveness. ~John M. Digman, Higher-order factors of the Big Five
Decades later, there is still debate about the legitimacy of these factors. On the one hand, there is work that finds these same factors in many languages, and relates them to several other theoretical models. Testing the latter case, Saucier, Thalmayer and Payne explored whether four different theoretically-derived systems in other literatures are rotations of, and reducible to, the Big Two:
The interpersonal complex
Current models of morality/warmth and competence.
The two largest dimensions in clinical symptom reports – internalizing and externalizing tendencies – as they manifest in normal-range populations.
Approach and avoidance tendencies, a prominent theoretical approach for constructing a biological-process model.
On the other hand, some still litigate how these factors relate to the Big Five. This could have been settled from the start by analyzing lexical data, rather than instruments and constructs. Consider this well-cited paper critical of the Big Two theory, written a decade after Digman’s discovery:
The text is lifted from the abstract. Ironically, the correct course is also suggested: “Therefore, the hypotheses of higher order factors and blended variables can only be tested with data on lower level personality variables that define the personality factors.”
α from the item-level
A cornerstone of the Big Five is that they are defined by their word loadings on common adjectives. What happens when we do PCA on the Big Five? We must return to the "variables that define the personality factors”—words.
Consider another angle. Imagine you have word-level survey data and perform PCA, extracting five factors. If you report the varimax-rotated structure (the Big Five), how well can someone else recover the unrotated factors?
The good news, is we have the data to find out. Saucier and Goldberg published word Big Five loadings of 435 adjectives. They have kindly made the item-level survey data available. Each word is defined by how well 900 students rate it describes them. From this we can calculate word loadings on the unrotated factors. I also do PCA on the Big Five loadings reported in the paper. These are correlated in the figure below. Following Digman, the latter are named with Greek letters. We expect to recover the original factors, plus distortion as some variance will be double-counted due to varimax rotation.
Calculated either way, the first two PCs align at 0.8 and 0.91 respectively. One can get a fairly good picture of the unrotated factors, even when rotated factors are published. If we were feelin’ nuts, we could remove ourselves further from the lexical PCs by designing an instrument to approximate the rotated factors (such as the BFI). PCA on those factors would recover an even more distorted version of PC1. This is the analysis Digman performed. Munging data so far from the source, he could not make strong claims about how the Big Two relate to the Big Five, language, or general personality. However, they are by construction a distorted version of the first two unrotated factors in lexical data. Alpha and Beta are what the first two factors of the Big Five would be without varimax rotation.
Thousands of papers cite Digman, including many pushing back on his claims. To my knowledge, none make this simple argument to explain the Big Two. In the literature they are typically treated as hierarchically related.
In fact, the foremost paper, memed above, analyzes data one step further removed from language than the BFI. Instead of using item- or word-level data, which the authors had in abundance, they rely on Big Five Aspects which are derivative of the derivative Factors.
Relation to the Big Five
A little known fact in personality science is that the first factors dwarf the minor factors of the Big Five. Consider the eigenvalues of 435 adjectives below. They represent how much variance (personality information) each factor explains in the survey and NLP data.
The first factor is 8 times greater than the fifth; a disparity papered over with a name like Big Five. Varimax rotation results in a redistribution of personality information. Content is moved from the first factor to the rest. Without this boost, the last 2-3 factors are tagalongs. With the added variance they are brought up to par becoming: Conscientiousness, Neuroticism, and Openness respectively. Even still, Openness is not consistently recovered, and is often grouped with other traits like intelligence. This rotation is to the detriment of PC1—α—which becomes Agreeableness.
As seen in the table above PC1 is distributed to every factor (save Neuroticism), and Agreeableness is constructed mostly from the first three unrotated factors. Some anti-Dynamism is taken from PC2, and some anti-Order from PC3. (For more information on the content of the unrotated factors see this post, which includes code to derive them from language models.) Has anybody waxed as eloquent about Agreeableness as Digman did about α? Much of the original construct is lost, with just a 0.64 correlation. This mutilates the simple structure and has led to much confusion, including underrating the theoretical richness of lexical data. Saruman explains a similar process “Do you know how the Orcs first came to being? They were Elves once, taken by the dark powers, tortured and mutilated. A ruined and terrible form of life.”
We could have been measuring α for decades but instead we got Agreeableness and confusion on how the two relate. (Elf→Orc, it turns out.) This argument is a more colorful version of the one David and I make in Study 1 of the Deep Lexical Hypothesis: that it’s not clear if the last 2 (or 3) factors are statistically justified. Obviously, I also happen to be a big fan of α, and think that no model can make up for distributing the ideas it contains. Even if that model contains additional useful factors. (As a note, my co-author has no connection with this unauthorized version of the argument.)
Because we have access to word level data, let’s take a closer look at α.
What is α?
By going back to language data, we can view α without the distortion introduced by varimax rotation and instrument design. There are of course problems with surveys. They are often collected on WEIRD undergrads who may or may not know the definition of all the words. Surveys are boring, and there’s not much personal benefit to filling them out accurately. Natural language processing solves this by finding word relations in orders of magnitude more data using text from speakers in all walks of life. To those more familiar with surveys, rest assured, the two methods correlate 0.93 on the first PC. Using the same 435 words as Saucier and Goldberg, Here are the top 30 words loading on each pole of α:
considerate, peaceful, respectful, kind, courteous, unaggressive, polite, agreeable, cordial, reasonable, pleasant, benevolent, compassionate, understanding, charitable, helpful, accommodating, cooperative, amiable, tolerant, humble, trustful, patient, genial, altruistic, easygoing, modest, unselfish, friendly, down-to-earth, generous, diplomatic, mannerly, relaxed, selfless, sincere, undemanding, warm, tactful, affectionate
vs
abusive, belligerent, disrespectful, quarrelsome, unkind, rude, bigoted, intolerant, inconsiderate, uncooperative, irritable, vindictive, impolite, prejudiced, antagonistic, ungracious, crabby, egotistical, cruel, surly, uncouth, cranky, scornful, impatient, selfish, egocentric, possessive, greedy, jealous, tactless, combative, callous, conceited, bitter, uncharitable, unsympathetic, unruly, unstable, bullheaded, unfriendly
Digman puts forth a good description of this factor, though in terms too Freudian for this century. Now the characterization would be social self-regulation. How well one can regulate their own desires/beliefs/goals to make life pleasant for others. Language is the view from society; therefore α represents society’s approval.
Suacier notes that the Big Two related to morality; a realm that was summarized well two millennia ago. It is said there was prospective Jewish convert who asked the rabbi Hillel to explain the law and the prophets while standing on one foot. He replies, "What is hateful to you, do not do to your fellow: this is the whole Torah; the rest is the commentary." The word list above, too, can be reduced to the Golden Rule. Are you considerate? Do you make peace? Do you refrain from abuse?
Enter GFP
It turns out, that the first PC of almost any personality survey looks like α. Surveys of drug dependence, psychiatric disorders, or what you think of werewolves—all return a suspiciously similar first PC. This has been come to be known as the general factor of personality (GFP). If you are looking for commentary beyond “do unto others”, there is an extensive literature.
Despite universality, there is still debate about what it is. Consider the light hand even proponents take summarizing the situation.
“Numerous studies and meta-analyses have now confirmed that personality traits tend to correlate such that a general factor of personality (GFP) emerges. Nevertheless, there is an ongoing debate about what these correlations, and therefore the GFP, represents. One interpretation is that the GFP reflects a substantive factor that indicates general social effectiveness or emotional intelligence. Another interpretation is that the GFP merely is an artifact based on measurement or response bias.” Van der Linden et al, Is there a Meaningful General Factor of Personality?
So, α is now more commonly called GFP. In addition to an open debate about it’s relationship to the Big Five, some believe it is a statistical artifact. Finally, as posed by the title of the review, is it a meaningful general factor?
Is GFP general?
In what sense is GFP general? g is general because 1) it has substantial loading on every intelligence subtest 2) it can explain large proportions of test data 3) it is highly externally reliable. GFP meets the first requirement. It’s hard to measure any construct without picking up some GFP.
On the second point, Revelle does good work comparing eigenvalues to demonstrate that GFP doesn’t explain nearly as much of the data as does g. This actually goes back to the namesake of the blog, The Vectors of Mind. Intelligence research had seen great success reducing test data down to one dimension. Thurstone realized that more were required to properly represent personality. Remember, we have many words for whether someone is clever; intelligence is subset of personality. As such, we expect a model of personality to necessarily be more complex. NLP uses word vectors, not word scalers, after all. Thurstone’s invented multiple factor analysis for this very reason, more than one factor is required!
Additionally, I’d like to note that there is method bias that exaggerates the first eigenvalue on intelligence tests. The instruments are designed with questions that are either right or wrong. This is psychometrically a good strategy; it’s easy to score. Though it’s difficult to measure, telling an engaging story requires intelligence. If one did devise a gauge, it may indeed correlate more with intelligence’s second PC—verbal tilt—than g. This isn’t a dig on intelligence research, just noting that what is easy to score emphasizes the huge monopolar first PC. Perhaps a good attribute of a map, as long as it is not confused for the territory.
Which brings us to the third point. My opinion is that personality is harder to measure, so comparing g to personality scores is a bit misleading because the latter are more corrupted by noise. I am interested in traits themselves, which can be measured and described via word vectors without ever introducing person-based instruments. Like I said before, convert to word space. Even with these caveats, Roberts et al make a compelling case that over many studies personality is on par with SES and intelligence as a predictor of life outcomes.
These three points considered, I suggest another name for α/GFP: the primary factor of personality (PFP). Primary means:
Of chief importance; principal.
Earliest in time or order.
The personality factor matches both. GFP is misleading as it is not analogous to g; more factors are required for a general model.
PFP a statistical artifact?
A 2013 paper on the subject opens “The overwhelmingly dominant view of the GFP is that it represents an artefact due either to evaluative bias or responding in a socially desirable manner.” Viewed lexically, response of whom? Word vectors? All of reddit? If it is an artifact, why spread it around to the rest of the Big Five via varimax rotation? Can’t have it both ways.
Conclusion
Psychometrics is a land without ground truths1. As such we should grab hold of the Lexical Hypothesis when we can.
"...our common stock of words embodies all the distinctions men have found worth drawing, and the connections they have found worth marking, in the lifetime of many generations: these surely are likely to be more numerous, more sound, since they have stood up to the long test of survival of the fittest, and more subtle, at least in all ordinary and reasonable practical matters, than any that you or I are likely to think up in our armchair of an afternoon—the most favourite alternative method." J.L. Austin, A Plea for Excuses
It’s for this reason that I prefer describing PFP as the Golden Rule. Often disciplines prefer to construct jargon that does not bear the impression of millions of lives. There is less baggage, and one can speak more precisely, the thinking goes. But this tends to silo knowledge in an ivory tower detached from mundane—human—realities. The next post argues that evolution along PFP transformed us from beast to a species with a collective conscience. This connection is more difficult to see if PFP is mentally stored as the more sterile social self-regulation. Perhaps a romantic proposition, but words do matter.
It is somewhat ironic that the success of John’s arguments in No Doors Should Be Closed in the Study of Personality has tended to close doors on word-level research. Why study unwieldly vocabularies, when streamlined general factors exist? This post took a lexical view of α, β, and the GFP, which enabled us to answer the long-standing questions of how they relate to the Big Five and if they are more than a statistical artifact. Are these things obvious to those in the field? Are there implications for IO psychology? What can PFP’s magnitude tell us about evolution and religion?
In his dense Clocking the Mind, Jensen makes the point that reaction time is the only psychometric variable with a physically meaningful unit. Everything else must be normed against a population. The Lexical Hypothesis is valuable as it provides a frame of reference to the physical and social world.
1. I tend to strongly prefer fitting general factors through hierarchical factor analysis than through taking the first principal component. My issue with taking the first principal component is that it seems extremely sensitive to the universe of item content; if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1, whereas hierarchical factor analysis can still easily distinguish it from the general factor (if such a general factor exists).
Of course a big issue with this point is that it is questionable whether there even is an "objectively correct" way of selecting items.
I'm holding my hope out for genomics as I think it can completely cut through these issues, because (in PCA terminology) genetic variants are discrete and so give you a privileged basis.
2. What do you think about the Halo model by Anusic and Schimmack? https://psycnet.apa.org/record/2009-22579-009
Key point: "The most important finding was that the halo factors of different raters were unrelated to each other (r .08, SE .07) and that the 95% CI suggests that the true parameter is likely to be small, ranging from .06 to .22 (see Tables 1 and 2)."
I.e. they find that different people don't agree on the Halo factor, which seems like what you would expect if it's an evaluative artifact.
(One complication is that your general factor is a mixture of the traditional alpha and their general factor, and they do find agreement on alpha.)
A thought I had. The general factor of personality has high positive loadings for terms like considerate, helpful, etc. You see this as a sort of golden rule.
In WEIRD societies, it may be virtuous to be helpful, kind, considerate, etc., generally of all humans, but in certain societies, people may value reciprocity to one's own tribe/ethnic group rather than the trait just generally. I wonder if this would be reflected in the language.