22 Comments
Jul 26, 2022Liked by Andrew Cutler

1. I tend to strongly prefer fitting general factors through hierarchical factor analysis than through taking the first principal component. My issue with taking the first principal component is that it seems extremely sensitive to the universe of item content; if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1, whereas hierarchical factor analysis can still easily distinguish it from the general factor (if such a general factor exists).

Of course a big issue with this point is that it is questionable whether there even is an "objectively correct" way of selecting items.

I'm holding my hope out for genomics as I think it can completely cut through these issues, because (in PCA terminology) genetic variants are discrete and so give you a privileged basis.

2. What do you think about the Halo model by Anusic and Schimmack? https://psycnet.apa.org/record/2009-22579-009

Key point: "The most important finding was that the halo factors of different raters were unrelated to each other (r  .08, SE  .07) and that the 95% CI suggests that the true parameter is likely to be small, ranging from .06 to .22 (see Tables 1 and 2)."

I.e. they find that different people don't agree on the Halo factor, which seems like what you would expect if it's an evaluative artifact.

(One complication is that your general factor is a mixture of the traditional alpha and their general factor, and they do find agreement on alpha.)

Expand full comment
author

"if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1"

One of the advantages of using words instead of items is that the decisions about what items to use is outsourced to millions of speakers of that language. One can argue, that the results here are only 435 out of tens of thousands of English words used to describe one another. In the Deep Lexical Hypothesis paper we show results where we also use 2k and 20k words and the content of PC1 doesn't change. It's robust to an incredible array of decisions. In fact, to try and tease out later PCs I threw out words that mostly loaded on just PC1 and then did factor analysis again. PC1 still showed up as PFP, and the order of all subsequent factors remained unchanged.

Same thing holds on a striking number of different surveys and populations. Take almost any survey in almost any group and something very close to PFP emerges as PC1. Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population.

Haven't read about Halo, will put it on the list.

Expand full comment

"Haven't read about Halo, will put it on the list."

I definitely think it's important, since to me it seems like the critical way to investigate whether it's true variance or measurement bias.

"Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population."

That is surprisingly high. Do you have a link?

Though I guess that makes sense under the Halo model just as well as under the GFP model.

Expand full comment
author

How can it be measurement bias when it's word vectors? When it's more predictive of real-world outcomes than any other trait? Also, if it is measurement bias, then how can varimax be justified? I don't see how it's even a debate. Maybe psychologists see something I don't.

Something else I didn't bring up, is that PCA is done on correlation matrices, not data matrices. In a correlation matrix, each item is zero-meaned before being correlated. So the response bias is removed. Of course there could be second-order response bias where positive items co*vary* with eachother. But, well, no other trait is held to controlling for second-order effects like that.

Link: https://www.researchgate.net/publication/323984208_General_Factors_of_Psychopathology_Personality_and_Personality_Disorder_Across_Domain_Comparisons

Relevant portion: "The correlation between the p factor and g-PD was r = .92 (SE = .02, p < .001), the correlation between the p factor and the GFP was r = -.70 (SE = .05, p < .001), and the correlation between g-PD and the GFP was r = -.90 (SE = .04, p < .001). The results of this model clearly indicated that the general factors were highly correlated."

I think there are other studies that compare these three; the actual number I remembered was 0.78. not 0.7 and 0.9.

Expand full comment

One issue that comes up is, what does it mean for a general factor of personality to truly exist?

If agreeableness, conscientiousness, emotional stability, extraversion and openness all exist, then one can of course take the sum of those and get something that more or less exists too. And in fact for many item samplings this might be the PC1, even without measurement error.

However, when people then do a whole bunch of rotation trickery, then they end up with five different traits. And those five different traits are for whatever reason correlated, in the same direction as this PC1. So rather than defining the general factor as the sum of those five trait, one might instead take a page out of IQ research's book and define the general factor as being whichever latent variable generates correlations between those traits.

And that can be nearly or entirely due to Halo effects, even if the sum of the non-Halo component of the traits actually exist in a genuine way!

When I think about the "general factor of personality", then I tend to think of it in the latter sense, by analogy with intelligence research. I'd suggest "good personality" for the former sense.

(... "Social self-regulation" seems to imply the latter sense? In that it makes statements about the underlying cause of the axis, that the axis is because of variation in how much people self-regulate to fit social norms.)

Expand full comment

"How can it be measurement bias when it's word vectors?"

As I see it, word vectors capture the semantic and connotational meaning of the words. Approximately, we might think of it as capturing the effects of the personality traits. (Whereas the "person vectors" you get from large-sample surveys approximately capture the *causes* of the personality traits, though because humans are rational and because the measurement is verbal, there's a lot of overlap between the structure of the effects and the structure of the causes, leading to the same Big 5 in each method.)

For a rational agent, the primary thing one cares about is utility, i.e. good or bad. As such, it would be logical that the primary way that words semantically and connotationally differ from each other is utility.

However, what's good from one perspective may be bad from another perspective. So it also makes sense if the utility aspect is less interpersonally correlated than the other aspects.

"When it's more predictive of real-world outcomes than any other trait?"

I think this is easier to think about after rotating the traits to a Big 5 like structure.

The usual Big 5 scales have some correlation between each other. So you could infer that there is a general factor, and take their mean to get an estimate of the general factor.

If the general factor is just a rater-based Halo effect, then that shouldn't correlate with outcomes unless those outcomes are also rated by the rater or something. However, the correlation between the Big Five is of course not perfect, so the estimate of this general (Halo) factor will have a lot of nuisance variance. This nuisance variance won't just be random noise, but will instead be the original Big Five. This means that if even if the general factor doesn't correlate with an outcome, the estimate of the general factor might, because it also contains the sum of the Big Five.

"Also, if it is measurement bias, then how can varimax be justified?"

Not sure what issue exactly you have in mind. But usually psychologists ignore the exact varimax solution and instead just pick the top pure-loading scales for each factor and use those as the definitions for the factors.

"But, well, no other trait is held to controlling for second-order effects like that."

IMO they should! If you look at informant-reports, systematic measurement error can get pretty severe, making up 30-50% of the total variance!

"Link"

This is for personality disorders, not drug use, right?

Expand full comment
Dec 26, 2022Liked by Andrew Cutler

A thought I had. The general factor of personality has high positive loadings for terms like considerate, helpful, etc. You see this as a sort of golden rule.

In WEIRD societies, it may be virtuous to be helpful, kind, considerate, etc., generally of all humans, but in certain societies, people may value reciprocity to one's own tribe/ethnic group rather than the trait just generally. I wonder if this would be reflected in the language.

Expand full comment
author

Yeah, even when the Jews coined the Golden Rule it was radical that it would apply to the outgroup. The parable of the Good Samaritan is "radical" in empathizing with a neighboring tribe (who even spoke the same language?).

There's also an interesting interpretation of the Cain and Abel story, which is from much earlier. Cain kills Abel, and then wanders the earth (in guilt?). Jehovah then marks him as different. Some think this is about conflict between agricultural tribes and pastoral tribes, but the pastoral tribes still being treated as the same people. "We have had wars, but they still count as our estranged brother". A bit of an aside, but interesting to see the idea of "other" evolve through history.

As to your question, it's very interesting if this would show up in language. My guess is this is a better place for a linguist, and factor analysis would fail to show this particular difference. But who knows.

It's a medium-term goal to find a collaborator to do some multi-lingual factor analysis. For now I'm more interested in consciousness.

Expand full comment

Ashton et al (2015) also acquired data on Alpha. Ordered from highest to lowest loadings, the terms loading on Alpha are:

considerate

helpful

sincere

gentle

kind

respectful

reliable

giving

thoughtful

kind-hearted

careful

good-hearted

ethical

cooperative

pleasant

law-abiding

hard-working

trustworthy

gracious

companionable

patient

diligent

polite

thorough

efficient

responsible

honest

warm-hearted

conscientious

courteous

understanding

warm

moral

accommodating

sympathetic

mature

agreeable

organized

reasonable

peaceful

good-natured

truthful

stable

mild

generous

tolerant

well-mannered

charitable

studious

dependable

approachable

humble

proper

modest

empathetic

soft-spoken

self-disciplined

sensitive

quiet

faithful

down-to-earth

loyal

forgiving

big-hearted

loving

friendly

tidy

discreet

hospitable

rational

conservative

diplomatic

selfless

conventional

calm

methodical

dignified

scholarly

natural

congenial

civil

reserved

flexible

industrious

meticulous

alert

cautious

affectionate

feminine

resourceful

[Snip: Terms with absolute values below .300]

hot-tempered

superficial

irritable

self-destructive

childish

sloppy

lazy

insincere

insensitive

patronizing

compulsive

unreliable

antagonistic

egocentric

domineering

immature

hasty

blunt

boastful

loud

shallow

argumentative

irrational

mischievous

cruel

dishonest

callous

rowdy

disrespectful

abusive

rash

aggressive

abrasive

defiant

rough

sneaky

deceptive

abrupt

careless

egotistical

violent

sly

inconsiderate

overbearing

overconfident

self-centered

selfish

scheming

hostile

conceited

ruthless

rebellious

deceitful

destructive

vindictive

irresponsible

condescending

malicious

greedy

manipulative

harsh

rude

arrogant

devious

reckless

Expand full comment
author

It's amazing how consistent the list is: considerate, helpful, reliable. It's all about thinking of others. In another post I argue that this represents a fitness landscape. As a social species it is fit to be a good team member. It is surprisingly rare to connect the Lexical Hypothesis to evolution, in part because there is disagreement on whether the first PC is even real. Given that it is so theoretically satisfying (highlighted by Darwin, later studied as reciprocal altruism), that should have been good evidence that it's not simply noise!

Expand full comment

Agreed; there's no reason to think that it's merely a coincidence that natural language terms related to prosocial behaviors appears first under factor analysis. But there's a problem your model has to address before we reach your conclusion: The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?

In other words, the fact that prosociality appears as the largest factor in language terms *may* indicate that this is the primary axis on which human personality itself varies, or, it may simply be an artifact of the importance of this factor to humans as they formed coalitions and determine whom to cooperate with. And there is some reason to think it's the latter. If we look at natural language terms for color, "red" is much more common and appears earlier in languages than colors like blue or green, but because identifying ripe fruits and open wounds was useful. Similarly, identifying who is going to help or hinder you can be absolutely critical to survival in a highly social situation, while knowing who is imaginative or who is nervous matters much less.

So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success? We do know that intelligence has been under intense selective pressure in hominids, even before the split from chimpanzees, and its heritability is remarkably high. Yet Big Five Agreeableness, and HEXACO Agreeableness and Honesty-Humility, show lower heritability than other personality factors. This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.

There are other ways of investigating the issue, however. For instance, studies have established that inbreeding depression severely reduces intelligence. If it could be shown that alpha (however defined or measured) is more affected by inbreeding depression than other personality traits, then that would definitely support your model. If you can find, or carry out, a study like this, I'd be very eager to see the results!

Expand full comment
author

>The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?

I'm all in for B. Language is very much the "view from society", as is factor analysis of word vectors. As you note, it's a record of what society finds important and not necessarily the major axis of variance of individuals' personality. These are obviously related because it's _usually_ best to go along to get along. We are domesticated, for the most part. I guess I'm a bit less cynical about human nature than most evo-psyche people.

My current project is to try and understand what exactly society was selecting for. The Golden Rule is my favorite description. But what would that have done to our minds? What mechanisms did we evolve step by step in order to get along. I think they have to do with language, which makes them unique to humans.

>So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success?

One indication that it has successfully selected for is the gender difference. Historically into the deep past, socially defecting was more often fit for men than women. And in fact we do see large gender differences in GFP. Feminine loads on PC1 about 1SD more than does masculine (where SDs are calculated among a pool of 2k personality adjectives). Studies on individuals also find big differences, though I'm not sure on the number off the top of mind.

Of course, lexical data is also social bias...language is the view from society! There is a contradiction in the literature where this is recognized in the case of GFP to the extent its existence is questioned. But using the same data the Big Five are derived and that same variance is distributed to all of the factors. Why does Big Five escape the question of whether lexical data is actual personality structure or simply societal bias?

>This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.

Part of the problem must be that personality is harder to measure than IQ? I know that one can "correct" for instrument noise but that's never quite the same. Intelligence may also be more of an unalloyed good, whereas there needs to be mechanisms overlaying agreeableness so that one is not a doormat. Would not necessarily undercut the preeminence of the trait but would tend to decrease correlation values.

Expand full comment

There's definitely a lot here! I'll try to be concise:

1. I don't believe women are more prosocial or less prone to defection than men. You might look for "Our Grandmother's Legacy" by Tania Reynolds for an introduction to a large body of work on female competition.

2. Most researchers regard the GFP differently from the way that you seem to. I agree with you that one can take any rotation of a factor space that one chooses. But when most researchers speak of a GFP, it is in the context that it somehow "subsumes" or "exists at a higher level than" other personality traits, and I think detractors are usually arguing against this overly simplistic view when they talk about bias.

3. Personality is currently harder to measure than IQ, yes. But that doesn't suggest we should, ceteris paribus, expect Alpha or similar traits to have lower heritability than other traits. But as I recall, this is true across samples and across instruments. If anything, we should tend to suspect that the large number of synonyms for Alpha shows it can be measured with more accuracy than other traits, potentially increasing our estimates of its heritability.

Expand full comment
Oct 11, 2022Liked by Andrew Cutler

Thanks. I have always been well-disposed to a general factor of personality, if only because personality questionnaires are coy about some people being a real pain to work with, and adopt a "all types are necessary and welcome" when the reality is that the uncooperative are a social drag.

https://www.unz.com/jthompson/intelligence-emotions-and-personality/

Expand full comment
author

"So, leaving aside personality, and looking only at the putative new emotional-state-understanding-skill, they designed tests of emotional intelligence. This proved to be quite difficult. After a decade of work they found that there was some evidence for this skill, but to my reading no more outstanding than a minor subtest in a general intelligence test. Working out the emotions of others is related to general intelligence."

My issue with those tests is that EQ is simply much harder to test. The scores end up being a lot of noise + general test taking ability. If instead we could ask god for a subject's EQ, it may correlate less with IQ. I resort to the same thinking to say that EQ > IQ for outcomes; if only we could measure it we'd find higher external validity!

It is an interesting dynamic in the psychometrics community where the people okay with ranking others (often hardnosed and disagreeable) find themselves champions of GFP. Whereas people who sense the taboo end up downplaying the importance of emotional intelligence. Consider the implications of an unalloyed good. The horror!

Expand full comment

Are there any Big Three personality tests (Affiliation, Dynamism, Order) currently available to take or is it too early days for that yet? Would be cool.

Expand full comment
author

Yeah, early days yet. Could probably take any broad survey like the Big Five Inventory and then do PCA to map scores to the Big Three. I think this would be better than most psychometric instruments just by virtue of not assigning items to just one factor and weighting them either 0 or 1. This is especially true if people can't help but evaluating an item in relation to Affiliation/PFP, as I think is the case. PCs after the first are calculated via residualizing. Not always possible to directly measure a residual variable: https://twitter.com/AndrewCutler13/status/1547943121446612992

Expand full comment
Aug 5, 2022Liked by Andrew Cutler

You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?

The reason I'm asking is that word2vec style embeddings use a noisy metric (an unmotivated weighted distance within a document) and even noisier reference data (e.g. a collection of MSNBC articles from 2012) from which the embedding is computed. As a result such embeddings in practice give (I claim) quite distorted approximations to the language models people have in their heads today.

Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist". These may well form a useful aggregate in your analysis for statistical purposes but when assessing someone's personality they are separate for me: many people want the charity they give to headed by nice people who are pushy, even if they prefer a friend to be gentle and obliging. This mismatch might be because you used a poor embedding and a better embedding would reflect different lists of words.

Expand full comment
author
Aug 5, 2022·edited Aug 5, 2022Author

Great points all around. There's kind of a give and take with the blog format where I don't want to get lost in the weeds, but I do have answers if pressed.

>You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?

I calculated this by using word vectors to calculate a word x word affinity matrix, using the pearson correlation. The 30 words are those that have the highest loading on PC1 according to dimensionality reduction of this affinity matrix.

The word vectors are actually from transformers rather than word2vec. Specifically I use DeBERTa, which was the state of the art when I was working on the embedding. For a much more in depth explanation check out: https://psyarxiv.com/gdm5v/

You can also run the code on a colab notebook here: https://colab.research.google.com/drive/1SXZNVqH0m_Bnd2hvIJFYiKQvHWpGu8ZM?usp=sharing

>Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist".

This may be true. It's quite difficult to sum up a whole factor. Even the list of 30 words (quite a lot to fit in your brain) may be misleading as these are just the poles. All of the 435 words exist on and contribute to the factor. No guarantees all of that can be summed up parsimoniously.

As for this changing with a better embedding...I've looked at results from dozens of language models couples with dozens of different extraction and dimensionality reduction choices. The resulting factor is extraordinarily stable. So much so that it correlate 0.93 even when compared to _survey_ results of college kids (the current gold standard, though I do think word vectors should replace that).

Expand full comment

Knowing that the result is stable across different language models is reassuring.

Expand full comment
Jul 26, 2022Liked by Andrew Cutler

Very interesting stuff. Thank you for sharing.

Expand full comment
deletedMay 9, 2023Liked by Andrew Cutler
Comment deleted
Expand full comment
author

Yeah, those are absolutely important, and it wouldn't surprise me if you can guess a lot about a person's personality from their gait. Certainly humans make a lot of inferences like that, at least about dominance, age, etc. And we evolved to infer personality.

One distinction I would make is that lexical work does not produce an instrument to measure an individual's personality. At the end of factorizing adjective similarities, psychologists still had to build the Big Five Inventory. They just sort of had a template for the factors they were trying to measure. So I think you get pretty close to all personality variation with language, but it doesn't tell you how to measure it. Humans definitely measure it using non-verbal behavior.

Expand full comment