1. I tend to strongly prefer fitting general factors through hierarchical factor analysis than through taking the first principal component. My issue with taking the first principal component is that it seems extremely sensitive to the universe of item content; if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1, whereas hierarchical factor analysis can still easily distinguish it from the general factor (if such a general factor exists).
Of course a big issue with this point is that it is questionable whether there even is an "objectively correct" way of selecting items.
I'm holding my hope out for genomics as I think it can completely cut through these issues, because (in PCA terminology) genetic variants are discrete and so give you a privileged basis.
Key point: "The most important finding was that the halo factors of different raters were unrelated to each other (r .08, SE .07) and that the 95% CI suggests that the true parameter is likely to be small, ranging from .06 to .22 (see Tables 1 and 2)."
I.e. they find that different people don't agree on the Halo factor, which seems like what you would expect if it's an evaluative artifact.
(One complication is that your general factor is a mixture of the traditional alpha and their general factor, and they do find agreement on alpha.)
"if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1"
One of the advantages of using words instead of items is that the decisions about what items to use is outsourced to millions of speakers of that language. One can argue, that the results here are only 435 out of tens of thousands of English words used to describe one another. In the Deep Lexical Hypothesis paper we show results where we also use 2k and 20k words and the content of PC1 doesn't change. It's robust to an incredible array of decisions. In fact, to try and tease out later PCs I threw out words that mostly loaded on just PC1 and then did factor analysis again. PC1 still showed up as PFP, and the order of all subsequent factors remained unchanged.
Same thing holds on a striking number of different surveys and populations. Take almost any survey in almost any group and something very close to PFP emerges as PC1. Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population.
"Haven't read about Halo, will put it on the list."
I definitely think it's important, since to me it seems like the critical way to investigate whether it's true variance or measurement bias.
"Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population."
That is surprisingly high. Do you have a link?
Though I guess that makes sense under the Halo model just as well as under the GFP model.
How can it be measurement bias when it's word vectors? When it's more predictive of real-world outcomes than any other trait? Also, if it is measurement bias, then how can varimax be justified? I don't see how it's even a debate. Maybe psychologists see something I don't.
Something else I didn't bring up, is that PCA is done on correlation matrices, not data matrices. In a correlation matrix, each item is zero-meaned before being correlated. So the response bias is removed. Of course there could be second-order response bias where positive items co*vary* with eachother. But, well, no other trait is held to controlling for second-order effects like that.
Relevant portion: "The correlation between the p factor and g-PD was r = .92 (SE = .02, p < .001), the correlation between the p factor and the GFP was r = -.70 (SE = .05, p < .001), and the correlation between g-PD and the GFP was r = -.90 (SE = .04, p < .001). The results of this model clearly indicated that the general factors were highly correlated."
I think there are other studies that compare these three; the actual number I remembered was 0.78. not 0.7 and 0.9.
One issue that comes up is, what does it mean for a general factor of personality to truly exist?
If agreeableness, conscientiousness, emotional stability, extraversion and openness all exist, then one can of course take the sum of those and get something that more or less exists too. And in fact for many item samplings this might be the PC1, even without measurement error.
However, when people then do a whole bunch of rotation trickery, then they end up with five different traits. And those five different traits are for whatever reason correlated, in the same direction as this PC1. So rather than defining the general factor as the sum of those five trait, one might instead take a page out of IQ research's book and define the general factor as being whichever latent variable generates correlations between those traits.
And that can be nearly or entirely due to Halo effects, even if the sum of the non-Halo component of the traits actually exist in a genuine way!
When I think about the "general factor of personality", then I tend to think of it in the latter sense, by analogy with intelligence research. I'd suggest "good personality" for the former sense.
(... "Social self-regulation" seems to imply the latter sense? In that it makes statements about the underlying cause of the axis, that the axis is because of variation in how much people self-regulate to fit social norms.)
"How can it be measurement bias when it's word vectors?"
As I see it, word vectors capture the semantic and connotational meaning of the words. Approximately, we might think of it as capturing the effects of the personality traits. (Whereas the "person vectors" you get from large-sample surveys approximately capture the *causes* of the personality traits, though because humans are rational and because the measurement is verbal, there's a lot of overlap between the structure of the effects and the structure of the causes, leading to the same Big 5 in each method.)
For a rational agent, the primary thing one cares about is utility, i.e. good or bad. As such, it would be logical that the primary way that words semantically and connotationally differ from each other is utility.
However, what's good from one perspective may be bad from another perspective. So it also makes sense if the utility aspect is less interpersonally correlated than the other aspects.
"When it's more predictive of real-world outcomes than any other trait?"
I think this is easier to think about after rotating the traits to a Big 5 like structure.
The usual Big 5 scales have some correlation between each other. So you could infer that there is a general factor, and take their mean to get an estimate of the general factor.
If the general factor is just a rater-based Halo effect, then that shouldn't correlate with outcomes unless those outcomes are also rated by the rater or something. However, the correlation between the Big Five is of course not perfect, so the estimate of this general (Halo) factor will have a lot of nuisance variance. This nuisance variance won't just be random noise, but will instead be the original Big Five. This means that if even if the general factor doesn't correlate with an outcome, the estimate of the general factor might, because it also contains the sum of the Big Five.
"Also, if it is measurement bias, then how can varimax be justified?"
Not sure what issue exactly you have in mind. But usually psychologists ignore the exact varimax solution and instead just pick the top pure-loading scales for each factor and use those as the definitions for the factors.
"But, well, no other trait is held to controlling for second-order effects like that."
IMO they should! If you look at informant-reports, systematic measurement error can get pretty severe, making up 30-50% of the total variance!
"Link"
This is for personality disorders, not drug use, right?
It's amazing how consistent the list is: considerate, helpful, reliable. It's all about thinking of others. In another post I argue that this represents a fitness landscape. As a social species it is fit to be a good team member. It is surprisingly rare to connect the Lexical Hypothesis to evolution, in part because there is disagreement on whether the first PC is even real. Given that it is so theoretically satisfying (highlighted by Darwin, later studied as reciprocal altruism), that should have been good evidence that it's not simply noise!
Agreed; there's no reason to think that it's merely a coincidence that natural language terms related to prosocial behaviors appears first under factor analysis. But there's a problem your model has to address before we reach your conclusion: The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?
In other words, the fact that prosociality appears as the largest factor in language terms *may* indicate that this is the primary axis on which human personality itself varies, or, it may simply be an artifact of the importance of this factor to humans as they formed coalitions and determine whom to cooperate with. And there is some reason to think it's the latter. If we look at natural language terms for color, "red" is much more common and appears earlier in languages than colors like blue or green, but because identifying ripe fruits and open wounds was useful. Similarly, identifying who is going to help or hinder you can be absolutely critical to survival in a highly social situation, while knowing who is imaginative or who is nervous matters much less.
So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success? We do know that intelligence has been under intense selective pressure in hominids, even before the split from chimpanzees, and its heritability is remarkably high. Yet Big Five Agreeableness, and HEXACO Agreeableness and Honesty-Humility, show lower heritability than other personality factors. This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.
There are other ways of investigating the issue, however. For instance, studies have established that inbreeding depression severely reduces intelligence. If it could be shown that alpha (however defined or measured) is more affected by inbreeding depression than other personality traits, then that would definitely support your model. If you can find, or carry out, a study like this, I'd be very eager to see the results!
>The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?
I'm all in for B. Language is very much the "view from society", as is factor analysis of word vectors. As you note, it's a record of what society finds important and not necessarily the major axis of variance of individuals' personality. These are obviously related because it's _usually_ best to go along to get along. We are domesticated, for the most part. I guess I'm a bit less cynical about human nature than most evo-psyche people.
My current project is to try and understand what exactly society was selecting for. The Golden Rule is my favorite description. But what would that have done to our minds? What mechanisms did we evolve step by step in order to get along. I think they have to do with language, which makes them unique to humans.
>So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success?
One indication that it has successfully selected for is the gender difference. Historically into the deep past, socially defecting was more often fit for men than women. And in fact we do see large gender differences in GFP. Feminine loads on PC1 about 1SD more than does masculine (where SDs are calculated among a pool of 2k personality adjectives). Studies on individuals also find big differences, though I'm not sure on the number off the top of mind.
Of course, lexical data is also social bias...language is the view from society! There is a contradiction in the literature where this is recognized in the case of GFP to the extent its existence is questioned. But using the same data the Big Five are derived and that same variance is distributed to all of the factors. Why does Big Five escape the question of whether lexical data is actual personality structure or simply societal bias?
>This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.
Part of the problem must be that personality is harder to measure than IQ? I know that one can "correct" for instrument noise but that's never quite the same. Intelligence may also be more of an unalloyed good, whereas there needs to be mechanisms overlaying agreeableness so that one is not a doormat. Would not necessarily undercut the preeminence of the trait but would tend to decrease correlation values.
There's definitely a lot here! I'll try to be concise:
1. I don't believe women are more prosocial or less prone to defection than men. You might look for "Our Grandmother's Legacy" by Tania Reynolds for an introduction to a large body of work on female competition.
2. Most researchers regard the GFP differently from the way that you seem to. I agree with you that one can take any rotation of a factor space that one chooses. But when most researchers speak of a GFP, it is in the context that it somehow "subsumes" or "exists at a higher level than" other personality traits, and I think detractors are usually arguing against this overly simplistic view when they talk about bias.
3. Personality is currently harder to measure than IQ, yes. But that doesn't suggest we should, ceteris paribus, expect Alpha or similar traits to have lower heritability than other traits. But as I recall, this is true across samples and across instruments. If anything, we should tend to suspect that the large number of synonyms for Alpha shows it can be measured with more accuracy than other traits, potentially increasing our estimates of its heritability.
Thanks. I have always been well-disposed to a general factor of personality, if only because personality questionnaires are coy about some people being a real pain to work with, and adopt a "all types are necessary and welcome" when the reality is that the uncooperative are a social drag.
"So, leaving aside personality, and looking only at the putative new emotional-state-understanding-skill, they designed tests of emotional intelligence. This proved to be quite difficult. After a decade of work they found that there was some evidence for this skill, but to my reading no more outstanding than a minor subtest in a general intelligence test. Working out the emotions of others is related to general intelligence."
My issue with those tests is that EQ is simply much harder to test. The scores end up being a lot of noise + general test taking ability. If instead we could ask god for a subject's EQ, it may correlate less with IQ. I resort to the same thinking to say that EQ > IQ for outcomes; if only we could measure it we'd find higher external validity!
It is an interesting dynamic in the psychometrics community where the people okay with ranking others (often hardnosed and disagreeable) find themselves champions of GFP. Whereas people who sense the taboo end up downplaying the importance of emotional intelligence. Consider the implications of an unalloyed good. The horror!
Are there any Big Three personality tests (Affiliation, Dynamism, Order) currently available to take or is it too early days for that yet? Would be cool.
Yeah, early days yet. Could probably take any broad survey like the Big Five Inventory and then do PCA to map scores to the Big Three. I think this would be better than most psychometric instruments just by virtue of not assigning items to just one factor and weighting them either 0 or 1. This is especially true if people can't help but evaluating an item in relation to Affiliation/PFP, as I think is the case. PCs after the first are calculated via residualizing. Not always possible to directly measure a residual variable: https://twitter.com/AndrewCutler13/status/1547943121446612992
You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?
The reason I'm asking is that word2vec style embeddings use a noisy metric (an unmotivated weighted distance within a document) and even noisier reference data (e.g. a collection of MSNBC articles from 2012) from which the embedding is computed. As a result such embeddings in practice give (I claim) quite distorted approximations to the language models people have in their heads today.
Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist". These may well form a useful aggregate in your analysis for statistical purposes but when assessing someone's personality they are separate for me: many people want the charity they give to headed by nice people who are pushy, even if they prefer a friend to be gentle and obliging. This mismatch might be because you used a poor embedding and a better embedding would reflect different lists of words.
Great points all around. There's kind of a give and take with the blog format where I don't want to get lost in the weeds, but I do have answers if pressed.
>You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?
I calculated this by using word vectors to calculate a word x word affinity matrix, using the pearson correlation. The 30 words are those that have the highest loading on PC1 according to dimensionality reduction of this affinity matrix.
The word vectors are actually from transformers rather than word2vec. Specifically I use DeBERTa, which was the state of the art when I was working on the embedding. For a much more in depth explanation check out: https://psyarxiv.com/gdm5v/
>Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist".
This may be true. It's quite difficult to sum up a whole factor. Even the list of 30 words (quite a lot to fit in your brain) may be misleading as these are just the poles. All of the 435 words exist on and contribute to the factor. No guarantees all of that can be summed up parsimoniously.
As for this changing with a better embedding...I've looked at results from dozens of language models couples with dozens of different extraction and dimensionality reduction choices. The resulting factor is extraordinarily stable. So much so that it correlate 0.93 even when compared to _survey_ results of college kids (the current gold standard, though I do think word vectors should replace that).
Yeah, those are absolutely important, and it wouldn't surprise me if you can guess a lot about a person's personality from their gait. Certainly humans make a lot of inferences like that, at least about dominance, age, etc. And we evolved to infer personality.
One distinction I would make is that lexical work does not produce an instrument to measure an individual's personality. At the end of factorizing adjective similarities, psychologists still had to build the Big Five Inventory. They just sort of had a template for the factors they were trying to measure. So I think you get pretty close to all personality variation with language, but it doesn't tell you how to measure it. Humans definitely measure it using non-verbal behavior.
Yeah, even when the Jews coined the Golden Rule it was radical that it would apply to the outgroup. The parable of the Good Samaritan is "radical" in empathizing with a neighboring tribe (who even spoke the same language?).
There's also an interesting interpretation of the Cain and Abel story, which is from much earlier. Cain kills Abel, and then wanders the earth (in guilt?). Jehovah then marks him as different. Some think this is about conflict between agricultural tribes and pastoral tribes, but the pastoral tribes still being treated as the same people. "We have had wars, but they still count as our estranged brother". A bit of an aside, but interesting to see the idea of "other" evolve through history.
As to your question, it's very interesting if this would show up in language. My guess is this is a better place for a linguist, and factor analysis would fail to show this particular difference. But who knows.
It's a medium-term goal to find a collaborator to do some multi-lingual factor analysis. For now I'm more interested in consciousness.
1. I tend to strongly prefer fitting general factors through hierarchical factor analysis than through taking the first principal component. My issue with taking the first principal component is that it seems extremely sensitive to the universe of item content; if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1, whereas hierarchical factor analysis can still easily distinguish it from the general factor (if such a general factor exists).
Of course a big issue with this point is that it is questionable whether there even is an "objectively correct" way of selecting items.
I'm holding my hope out for genomics as I think it can completely cut through these issues, because (in PCA terminology) genetic variants are discrete and so give you a privileged basis.
2. What do you think about the Halo model by Anusic and Schimmack? https://psycnet.apa.org/record/2009-22579-009
Key point: "The most important finding was that the halo factors of different raters were unrelated to each other (r .08, SE .07) and that the 95% CI suggests that the true parameter is likely to be small, ranging from .06 to .22 (see Tables 1 and 2)."
I.e. they find that different people don't agree on the Halo factor, which seems like what you would expect if it's an evaluative artifact.
(One complication is that your general factor is a mixture of the traditional alpha and their general factor, and they do find agreement on alpha.)
"if e.g. a dimension by accident gets 2x more items than other dimensions, then PCA will tend to turn that dimension into PC1"
One of the advantages of using words instead of items is that the decisions about what items to use is outsourced to millions of speakers of that language. One can argue, that the results here are only 435 out of tens of thousands of English words used to describe one another. In the Deep Lexical Hypothesis paper we show results where we also use 2k and 20k words and the content of PC1 doesn't change. It's robust to an incredible array of decisions. In fact, to try and tease out later PCs I threw out words that mostly loaded on just PC1 and then did factor analysis again. PC1 still showed up as PFP, and the order of all subsequent factors remained unchanged.
Same thing holds on a striking number of different surveys and populations. Take almost any survey in almost any group and something very close to PFP emerges as PC1. Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population.
Haven't read about Halo, will put it on the list.
"Haven't read about Halo, will put it on the list."
I definitely think it's important, since to me it seems like the critical way to investigate whether it's true variance or measurement bias.
"Even if the population are residents of a psyche ward and it's a survey about substance abuse, the first PC will correlate about 0.8 with the first PC of the Big Five Inventory. 0.8 isn't one, but that is remarkable considering the biased survey and population."
That is surprisingly high. Do you have a link?
Though I guess that makes sense under the Halo model just as well as under the GFP model.
How can it be measurement bias when it's word vectors? When it's more predictive of real-world outcomes than any other trait? Also, if it is measurement bias, then how can varimax be justified? I don't see how it's even a debate. Maybe psychologists see something I don't.
Something else I didn't bring up, is that PCA is done on correlation matrices, not data matrices. In a correlation matrix, each item is zero-meaned before being correlated. So the response bias is removed. Of course there could be second-order response bias where positive items co*vary* with eachother. But, well, no other trait is held to controlling for second-order effects like that.
Link: https://www.researchgate.net/publication/323984208_General_Factors_of_Psychopathology_Personality_and_Personality_Disorder_Across_Domain_Comparisons
Relevant portion: "The correlation between the p factor and g-PD was r = .92 (SE = .02, p < .001), the correlation between the p factor and the GFP was r = -.70 (SE = .05, p < .001), and the correlation between g-PD and the GFP was r = -.90 (SE = .04, p < .001). The results of this model clearly indicated that the general factors were highly correlated."
I think there are other studies that compare these three; the actual number I remembered was 0.78. not 0.7 and 0.9.
One issue that comes up is, what does it mean for a general factor of personality to truly exist?
If agreeableness, conscientiousness, emotional stability, extraversion and openness all exist, then one can of course take the sum of those and get something that more or less exists too. And in fact for many item samplings this might be the PC1, even without measurement error.
However, when people then do a whole bunch of rotation trickery, then they end up with five different traits. And those five different traits are for whatever reason correlated, in the same direction as this PC1. So rather than defining the general factor as the sum of those five trait, one might instead take a page out of IQ research's book and define the general factor as being whichever latent variable generates correlations between those traits.
And that can be nearly or entirely due to Halo effects, even if the sum of the non-Halo component of the traits actually exist in a genuine way!
When I think about the "general factor of personality", then I tend to think of it in the latter sense, by analogy with intelligence research. I'd suggest "good personality" for the former sense.
(... "Social self-regulation" seems to imply the latter sense? In that it makes statements about the underlying cause of the axis, that the axis is because of variation in how much people self-regulate to fit social norms.)
"How can it be measurement bias when it's word vectors?"
As I see it, word vectors capture the semantic and connotational meaning of the words. Approximately, we might think of it as capturing the effects of the personality traits. (Whereas the "person vectors" you get from large-sample surveys approximately capture the *causes* of the personality traits, though because humans are rational and because the measurement is verbal, there's a lot of overlap between the structure of the effects and the structure of the causes, leading to the same Big 5 in each method.)
For a rational agent, the primary thing one cares about is utility, i.e. good or bad. As such, it would be logical that the primary way that words semantically and connotationally differ from each other is utility.
However, what's good from one perspective may be bad from another perspective. So it also makes sense if the utility aspect is less interpersonally correlated than the other aspects.
"When it's more predictive of real-world outcomes than any other trait?"
I think this is easier to think about after rotating the traits to a Big 5 like structure.
The usual Big 5 scales have some correlation between each other. So you could infer that there is a general factor, and take their mean to get an estimate of the general factor.
If the general factor is just a rater-based Halo effect, then that shouldn't correlate with outcomes unless those outcomes are also rated by the rater or something. However, the correlation between the Big Five is of course not perfect, so the estimate of this general (Halo) factor will have a lot of nuisance variance. This nuisance variance won't just be random noise, but will instead be the original Big Five. This means that if even if the general factor doesn't correlate with an outcome, the estimate of the general factor might, because it also contains the sum of the Big Five.
"Also, if it is measurement bias, then how can varimax be justified?"
Not sure what issue exactly you have in mind. But usually psychologists ignore the exact varimax solution and instead just pick the top pure-loading scales for each factor and use those as the definitions for the factors.
"But, well, no other trait is held to controlling for second-order effects like that."
IMO they should! If you look at informant-reports, systematic measurement error can get pretty severe, making up 30-50% of the total variance!
"Link"
This is for personality disorders, not drug use, right?
Ashton et al (2015) also acquired data on Alpha. Ordered from highest to lowest loadings, the terms loading on Alpha are:
considerate
helpful
sincere
gentle
kind
respectful
reliable
giving
thoughtful
kind-hearted
careful
good-hearted
ethical
cooperative
pleasant
law-abiding
hard-working
trustworthy
gracious
companionable
patient
diligent
polite
thorough
efficient
responsible
honest
warm-hearted
conscientious
courteous
understanding
warm
moral
accommodating
sympathetic
mature
agreeable
organized
reasonable
peaceful
good-natured
truthful
stable
mild
generous
tolerant
well-mannered
charitable
studious
dependable
approachable
humble
proper
modest
empathetic
soft-spoken
self-disciplined
sensitive
quiet
faithful
down-to-earth
loyal
forgiving
big-hearted
loving
friendly
tidy
discreet
hospitable
rational
conservative
diplomatic
selfless
conventional
calm
methodical
dignified
scholarly
natural
congenial
civil
reserved
flexible
industrious
meticulous
alert
cautious
affectionate
feminine
resourceful
[Snip: Terms with absolute values below .300]
hot-tempered
superficial
irritable
self-destructive
childish
sloppy
lazy
insincere
insensitive
patronizing
compulsive
unreliable
antagonistic
egocentric
domineering
immature
hasty
blunt
boastful
loud
shallow
argumentative
irrational
mischievous
cruel
dishonest
callous
rowdy
disrespectful
abusive
rash
aggressive
abrasive
defiant
rough
sneaky
deceptive
abrupt
careless
egotistical
violent
sly
inconsiderate
overbearing
overconfident
self-centered
selfish
scheming
hostile
conceited
ruthless
rebellious
deceitful
destructive
vindictive
irresponsible
condescending
malicious
greedy
manipulative
harsh
rude
arrogant
devious
reckless
It's amazing how consistent the list is: considerate, helpful, reliable. It's all about thinking of others. In another post I argue that this represents a fitness landscape. As a social species it is fit to be a good team member. It is surprisingly rare to connect the Lexical Hypothesis to evolution, in part because there is disagreement on whether the first PC is even real. Given that it is so theoretically satisfying (highlighted by Darwin, later studied as reciprocal altruism), that should have been good evidence that it's not simply noise!
Agreed; there's no reason to think that it's merely a coincidence that natural language terms related to prosocial behaviors appears first under factor analysis. But there's a problem your model has to address before we reach your conclusion: The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?
In other words, the fact that prosociality appears as the largest factor in language terms *may* indicate that this is the primary axis on which human personality itself varies, or, it may simply be an artifact of the importance of this factor to humans as they formed coalitions and determine whom to cooperate with. And there is some reason to think it's the latter. If we look at natural language terms for color, "red" is much more common and appears earlier in languages than colors like blue or green, but because identifying ripe fruits and open wounds was useful. Similarly, identifying who is going to help or hinder you can be absolutely critical to survival in a highly social situation, while knowing who is imaginative or who is nervous matters much less.
So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success? We do know that intelligence has been under intense selective pressure in hominids, even before the split from chimpanzees, and its heritability is remarkably high. Yet Big Five Agreeableness, and HEXACO Agreeableness and Honesty-Humility, show lower heritability than other personality factors. This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.
There are other ways of investigating the issue, however. For instance, studies have established that inbreeding depression severely reduces intelligence. If it could be shown that alpha (however defined or measured) is more affected by inbreeding depression than other personality traits, then that would definitely support your model. If you can find, or carry out, a study like this, I'd be very eager to see the results!
>The lexical hypothesis depends upon an interaction between A) individuals, who are described by B) language-producing society. Does alpha's prominence depend on A, or on B?
I'm all in for B. Language is very much the "view from society", as is factor analysis of word vectors. As you note, it's a record of what society finds important and not necessarily the major axis of variance of individuals' personality. These are obviously related because it's _usually_ best to go along to get along. We are domesticated, for the most part. I guess I'm a bit less cynical about human nature than most evo-psyche people.
My current project is to try and understand what exactly society was selecting for. The Golden Rule is my favorite description. But what would that have done to our minds? What mechanisms did we evolve step by step in order to get along. I think they have to do with language, which makes them unique to humans.
>So what's the best explanation? Is prosociality the first factor because it's been under intense selective pressure as a trait that distinguishes us from other animals, or, is it the first factor because it was socially relevant, and people make words for things that matter to their own success?
One indication that it has successfully selected for is the gender difference. Historically into the deep past, socially defecting was more often fit for men than women. And in fact we do see large gender differences in GFP. Feminine loads on PC1 about 1SD more than does masculine (where SDs are calculated among a pool of 2k personality adjectives). Studies on individuals also find big differences, though I'm not sure on the number off the top of mind.
Of course, lexical data is also social bias...language is the view from society! There is a contradiction in the literature where this is recognized in the case of GFP to the extent its existence is questioned. But using the same data the Big Five are derived and that same variance is distributed to all of the factors. Why does Big Five escape the question of whether lexical data is actual personality structure or simply societal bias?
>This may not resolve the issue completely, but it tends to disconfirm the idea that Alpha is the primary personality trait under selection over the past million years.
Part of the problem must be that personality is harder to measure than IQ? I know that one can "correct" for instrument noise but that's never quite the same. Intelligence may also be more of an unalloyed good, whereas there needs to be mechanisms overlaying agreeableness so that one is not a doormat. Would not necessarily undercut the preeminence of the trait but would tend to decrease correlation values.
There's definitely a lot here! I'll try to be concise:
1. I don't believe women are more prosocial or less prone to defection than men. You might look for "Our Grandmother's Legacy" by Tania Reynolds for an introduction to a large body of work on female competition.
2. Most researchers regard the GFP differently from the way that you seem to. I agree with you that one can take any rotation of a factor space that one chooses. But when most researchers speak of a GFP, it is in the context that it somehow "subsumes" or "exists at a higher level than" other personality traits, and I think detractors are usually arguing against this overly simplistic view when they talk about bias.
3. Personality is currently harder to measure than IQ, yes. But that doesn't suggest we should, ceteris paribus, expect Alpha or similar traits to have lower heritability than other traits. But as I recall, this is true across samples and across instruments. If anything, we should tend to suspect that the large number of synonyms for Alpha shows it can be measured with more accuracy than other traits, potentially increasing our estimates of its heritability.
Thanks. I have always been well-disposed to a general factor of personality, if only because personality questionnaires are coy about some people being a real pain to work with, and adopt a "all types are necessary and welcome" when the reality is that the uncooperative are a social drag.
https://www.unz.com/jthompson/intelligence-emotions-and-personality/
"So, leaving aside personality, and looking only at the putative new emotional-state-understanding-skill, they designed tests of emotional intelligence. This proved to be quite difficult. After a decade of work they found that there was some evidence for this skill, but to my reading no more outstanding than a minor subtest in a general intelligence test. Working out the emotions of others is related to general intelligence."
My issue with those tests is that EQ is simply much harder to test. The scores end up being a lot of noise + general test taking ability. If instead we could ask god for a subject's EQ, it may correlate less with IQ. I resort to the same thinking to say that EQ > IQ for outcomes; if only we could measure it we'd find higher external validity!
It is an interesting dynamic in the psychometrics community where the people okay with ranking others (often hardnosed and disagreeable) find themselves champions of GFP. Whereas people who sense the taboo end up downplaying the importance of emotional intelligence. Consider the implications of an unalloyed good. The horror!
Are there any Big Three personality tests (Affiliation, Dynamism, Order) currently available to take or is it too early days for that yet? Would be cool.
Yeah, early days yet. Could probably take any broad survey like the Big Five Inventory and then do PCA to map scores to the Big Three. I think this would be better than most psychometric instruments just by virtue of not assigning items to just one factor and weighting them either 0 or 1. This is especially true if people can't help but evaluating an item in relation to Affiliation/PFP, as I think is the case. PCs after the first are calculated via residualizing. Not always possible to directly measure a residual variable: https://twitter.com/AndrewCutler13/status/1547943121446612992
You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?
The reason I'm asking is that word2vec style embeddings use a noisy metric (an unmotivated weighted distance within a document) and even noisier reference data (e.g. a collection of MSNBC articles from 2012) from which the embedding is computed. As a result such embeddings in practice give (I claim) quite distorted approximations to the language models people have in their heads today.
Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist". These may well form a useful aggregate in your analysis for statistical purposes but when assessing someone's personality they are separate for me: many people want the charity they give to headed by nice people who are pushy, even if they prefer a friend to be gentle and obliging. This mismatch might be because you used a poor embedding and a better embedding would reflect different lists of words.
Great points all around. There's kind of a give and take with the blog format where I don't want to get lost in the weeds, but I do have answers if pressed.
>You provide 30 words for each pole of "social regulation". Based on which metric are these the closest, and which word embedding are they from?
I calculated this by using word vectors to calculate a word x word affinity matrix, using the pearson correlation. The 30 words are those that have the highest loading on PC1 according to dimensionality reduction of this affinity matrix.
The word vectors are actually from transformers rather than word2vec. Specifically I use DeBERTa, which was the state of the art when I was working on the embedding. For a much more in depth explanation check out: https://psyarxiv.com/gdm5v/
You can also run the code on a colab notebook here: https://colab.research.google.com/drive/1SXZNVqH0m_Bnd2hvIJFYiKQvHWpGu8ZM?usp=sharing
>Perhaps as a result of this, I also feel that your lists of close words contain two quite different strands, so are not well matched to either a positive or negative phrasing of the Golden Rule, and also seem at odds with "Social Regulation". The one strand is something like "nice"/"awful", the other is something like "pushover"/"activist".
This may be true. It's quite difficult to sum up a whole factor. Even the list of 30 words (quite a lot to fit in your brain) may be misleading as these are just the poles. All of the 435 words exist on and contribute to the factor. No guarantees all of that can be summed up parsimoniously.
As for this changing with a better embedding...I've looked at results from dozens of language models couples with dozens of different extraction and dimensionality reduction choices. The resulting factor is extraordinarily stable. So much so that it correlate 0.93 even when compared to _survey_ results of college kids (the current gold standard, though I do think word vectors should replace that).
Knowing that the result is stable across different language models is reassuring.
Yeah, those are absolutely important, and it wouldn't surprise me if you can guess a lot about a person's personality from their gait. Certainly humans make a lot of inferences like that, at least about dominance, age, etc. And we evolved to infer personality.
One distinction I would make is that lexical work does not produce an instrument to measure an individual's personality. At the end of factorizing adjective similarities, psychologists still had to build the Big Five Inventory. They just sort of had a template for the factors they were trying to measure. So I think you get pretty close to all personality variation with language, but it doesn't tell you how to measure it. Humans definitely measure it using non-verbal behavior.
Yeah, even when the Jews coined the Golden Rule it was radical that it would apply to the outgroup. The parable of the Good Samaritan is "radical" in empathizing with a neighboring tribe (who even spoke the same language?).
There's also an interesting interpretation of the Cain and Abel story, which is from much earlier. Cain kills Abel, and then wanders the earth (in guilt?). Jehovah then marks him as different. Some think this is about conflict between agricultural tribes and pastoral tribes, but the pastoral tribes still being treated as the same people. "We have had wars, but they still count as our estranged brother". A bit of an aside, but interesting to see the idea of "other" evolve through history.
As to your question, it's very interesting if this would show up in language. My guess is this is a better place for a linguist, and factor analysis would fail to show this particular difference. But who knows.
It's a medium-term goal to find a collaborator to do some multi-lingual factor analysis. For now I'm more interested in consciousness.