< Lessons

What are psychometric tests?

Cover
The use of tests in recruitment, career guidance and coaching has grown strongly and this growth is expected to accelerate along the current disruption of work. It is important for test users and those planning to use them to make informed decisions for which common sense to a large extent suffices. This article is an updated abstract of the book Psychological Assessment Methods at Work (Niitamo, 2003).

Tests used at work


The tests commonly used in work settings can be divided into three broad classes: tests measuring cognitive or intellectual abilities, personality tests and tests of behavioral styles. Cognitive ability tests measure the person's maximum performance, e.g., ability to visually perceive and process three-dimensional spatial relations, among others, a strict requirement of airplane pilots and air traffic controllers. Altogether 8-20 distinct cognitive abilities are said to exist depending on the classification.

In contrast to maximum performance, personality tests are about the person's typical behavior, e.g., tendency to behave in an extraverted manner or striving for good achievements in one's undertakings. The different personality factors may classified into two broad classes: external behavior traits and goal-oriented factors. The former class is represented by the currently dominant Big Five model and its close variants, eg., HEXACO. Most well-known representatives of the latter class are motives or needs and vocational interests in the domain of work. Cognitive or thinking styles, raised to big attention in the previous decade by the Nobel prize winner Kahnemann (2011) are occasionally seen as parts of intelligence and occasionally parts of personality. The different personality factors don't compete with each other but represent different perspectives on personality.

All of the previous factors are most frequently measured with standardized self-report questionnaires. Among the many trait questionnaires one of the most well-known is the NEO-PI-R (Costa & McRae, 1992) which measures the Big Five traits. Among motive or need questionnaires the PRF (Jackson, 1984) stands out as a widely popular choice. The most well-known theory of vocational interests, vocational personality is Holland's RIASEC (Holland, 1997) which has served as basis for development of numerous self-report questionnaires. Among the questionnaires measuring cognitive styles comes the MBTI (Myers & Briggs, 1984), purportedly the most widely used personality test in the world. Attitudes such as optimism appear often as complements to traits and motives in personality inventories.

Tests of behavioral styles differ from personality tests in their concern for behavior in particular settings such as in leadership, team work, stress situations etc. Well-known exemplars include the MLQ (Bass & Avolio, 1990) measuring leadership styles, the BTRI (Belbin, 1993), a test of team roles and COPE (Carver et al., 1989), a test of strategies used in coping with stress.

Predicting job performance - meta-analyses


Meta-analyses incepted in the 1980:ies have led to a scientifically robust picture of tests' and other methods' ability to predict performance at work. But to begin with, it is important to know that tests' ability to predict performance is relatively modest. At the same time it is significant, depending on what to compare them with. The proportion of variance in job performance explained by scientifically valid tests lies between 8-20%.

The modest amount of variance explained means that 80-92% of job performance remains unpredicted. In other words, for the greater part success at work (or in life) remains (fortunately) unpredictable from personal factors. However, according to the meta-analyses, no other preceding information or method, with exclusion of the structured interview, reaches even these figures. Moreover, if a 20-45 minute testing session can lead to even weak predictions of important outcomes such as success at work or an individual's career options, tests definitely rise to an important position.

According to the early meta-analyses, cognitive ability tests and structured interviews occupy a shoulder-to-shoulder position as the two strongest predictors of performance at work. Personality tests have been seen as following one step behind. However, several meta-analyses in the new millennium have yielded significantly lower average prediction coefficients for cognitive ability tests. Particularly noteworthy is the meta-analysis by Bertua et al. (2005) on Northern European samples.

A recent review article in one leading journal anticipates shock waves to the field in showing that the early meta-analyses on personnel selection methods have produced overly high average prediction coefficients on many of the methods. (Sackett et al., 2021). Correction of range restriction, typical in research samples has been carried out on unfounded grounds inflating the coefficients by 0.1 - 0.2 additional points. However, the average coefficients remain statistically significant and useful in applied settings. The top-5 ranking of the methods coincides fairly well with the earlier meta-analyses with two dramatic changes. For long thought as the leading method aside structured interviews, cognitive ability tests drop to the forth place. In contrast, vocational interests, minimally included in the early meta-analyses surpass general personality traits as predictors of work performance (Sackett et al., 2021).

One recent meta-analysis not only reiterates the finding of lower predictiveness of cognitive ability tests but shows equal predictive power for motivational factors (Van Iddekinge et al., 2018). Half of the meta-analysis samples measured motivation as a temporary state of mind while half measured motivation as an enduring personality characteristic measured with personality tests. The researchers go on to suggest that this tie position holds only until the first job where after motivation takes the lead position in predicting work performance. This would be in line with the idea that the first job is always about learning basic skills and matters in which intelligence may well play a significant role. The next jobs capitalize growingly on all kind of interaction and creative problem solving where the role of intelligence is smaller.

The change in working life may ultimately be the main reason for the decline of significance of cognitive ability tests. Namely, the data in the early meta-analyses end in the 1980'ies whereas the more recent analyses draw upon data from the recent decades. In other words, the significance of intelligence would have been bigger during the era of structured industrial production. The current service society calls instead for social interaction skills.

Meta-analyses performed in the 2010:ies have raised renewed attention to vocational interests. While vocational interests were previously seen as useful only in occupational choice and career counseling, three meta-analyses have shown that they can also predict performance at work, even better than personality traits (eg., Nye et al., 2017). Moreover, the prediction became even stronger when the match to jobs prescribed by Holland's (1997) theory, was included in the formula. For example, entrepreneurially interested individuals performed better in entrepreneurial jobs than in jobs that emphasize social interests. Together these results suggest that the position of intellectual abilities is on the fall while personality tests describing directional and situation specific behavior are on the rise as a predictors of work performance. Structured interview retains its position as the top-one predictor.

There has been real bafflement over the observation that assessment centers which combine different assessment methods fall short from the 20 % prediction level. The explanation for this counterintuitive result is that these meta-analyses have drawn upon samples consisting of already employed individuals (usually managers) or, of candidates in the final recruitment phase where tests are typically deployed. In both cases the variance in predictors has essentially shrunk due to the preceding screening steps. In other words, in such situations people are already "intelligent, results-oriented, conscientious enough" which makes capturing individual differences more difficult than in meta-analyses using broader population samples.

Technical criteria


Behavioral processes are fuzzy boundaried and complex in nature. It is definitively not enough just to name a particular set of questions as a measure of a particular cognitive ability, personality factor or behavioral style. The measurement and predictive ability of such a question or item set must be verified and documented in a scientifically agreed-upon manner. The written documentation must be accessible easily or with reasonable effort to those interested. The core technical, psychometric criteria include reliability as an indicator of the test's ability to measure anything at all, validity as an indicator of its ability to predict the intended behaviors and reference norms which concern the interpretation of test scores.

Reliability

Reliability is the test's ability to measure some quality. Construction of a test is always a tedious process in behavioral science. An early milestone is to establish measurement ability for the test. Existence of reliability is a necessary requirement for validity, for without it, the test can never predict anything. Reliability is indicated as internal consistency between the set of questions, that is, their ability to form a sufficiently coherent set of data points as reflection of some quality. Another way to assess reliability concerns stability over time by showing that measurements performed in different points of time yield sufficiently similar results. The test's measurement ability is of such importance that reliability calculations should be carried out on data from the population where the test is planned to be used.

Validity

Validity concerns the test's ability to measure the intended quality and predict behavior external to the test itself. The former criterion is verified in many ways, for example through showing that the test relates to neighboring qualities in ways prescribed by theory. For example, one would expect a test of ideational thinking to correlate with measures of creative thinking. Another commonly used procedure concerns group differences, a test of leadership motive should differentiate leaders from non-leaders. The most important aspect of validity is the test's ability to predict behavior, a test of leadership motivation should predict independently appraised leadership performance.

Norms

The numerical scores given to test responses are not very interpretable in themselves because numerical scores may vary across different populations. The same raw score on sociability reflects strong sociability in the introverted Finnish population while in the extraverted Sicilian population it barely reaches the level of average sociability. In contrast to measurement of temperature, measurement of mental states or qualities lacks an absolute zero-point on which to anchor itself. Measurement is realized by comparing the observed scores to scores attained in some reference population. So-called raw scores (RS), responses to single test questions, are standardized ie., related to some larger reference group such as working age adults in a particular country.

Standard scores are usually expressed in the test's outcome profile and indicate the test-taker's position in relation to some reference group: how large a proportion of people in the reference group receives an equal or higher (or lower) a test score. Tests should generally be normed to those populations and countries where the test is going to be used.

Content criteria


Technical features of tests can be examined in test publishers disseminations and in peer-reviewed reports (e.g., Mental Measurements Yearbook 1938-2021). Today there is a large number of tests offered that fulfill the mandatory technical or psychometric criteria. Moreover, no dramatic differences can be expected in the ability to predict job performance between the competing scientifically constructed test brands. Neither are there such things as wonder tests. The choice among tests has increasingly become based on their content aspects, the test's background theory, areas of use and the user experience of both test-takers and the service receiving clients.

Background theory

Tests, particularly personality tests have become big business all around the world. Many new tests are published today with sky-high promises and lofty slogans with the fate of falling into obsolescence after short hype periods, their product cycles. Some of the new tests claim to be measuring such timely qualities, fads that don't have any theoretical or research grounding. Some new tests appear as newly named qualities of already established ones, exemplifying reinventions of the wheel. It is always useful to examine the kind of theory, concept or big picture behind a test. While not shutting eyes from genuinely new and interesting concepts, it is usually wise to set one's preferences on tests that are based on some established behavioral theory. Given their abundance today, it makes more sense to shift attention on the usability of concepts and tests in different situations.

Areas of use

The uses of tests may be roughly divided into recruitment and development purposes. Recruitment uses cognitive ability and personality tests which predict behavior in a wide range of situations. Behavioral style tests, with more focused target behaviors, are obviously used less frequently. In addition to predictive purposes the use of personality tests is argued to be useful in illuminating the person of the candidate. For example, in addition to spatial ability, it makes sense to predict the pilot candidate's collaborative style in and outside the cockpit. Personality tests systematize the often very casual and intuitive picture derived from (unstructured) interviews. They enable more systematic coverage of questions as whether the candidate's competence capitalizes on seeking quality vs. results, whether the managerial candidate's strength lies in leading others' behavior vs. their thoughts or, whether the person's collaboration emphasizes communication, guidance or listening to others.

In addition to the traditional performance emphasizing job success criteria (salary, career progress etc.) what are increasingly recognized in organizations as valuable assets are different citizenship behaviors (OCB). Such "civic virtues" as collaboration, helping and supporting others etc. can more strongly be predicted with personality tests than the traditional performance criteria. Besides social behavior it is worth noting that meta-analyses have shown a clear the relation of personality tests to creativity (Feist, 1998) which is not predictable from intelligence tests. It is also curious that conscientiousness is the best personality predictor of traditionally measured work performance whereas creative performance is predicted from a lesser amount of conscientiousness (!). A third, perhaps most important feature that has raised much discussion in recent years relates to personality factors' ability to predict people's well-being, health and mortality raising them to an important socio-political spotlight. (Roberts et al., 2007; Bleidorn et al., 2019).

Development activities rarely involve intellectual skills because they are not seen as something that can be developed through rehearsal. But in contrast, development programs draw heavily upon different behavioral style and personality tests. The former set their development target on behaviors in particular situations while the latter focus on general behavior tendencies driven by personality or, personality factors directly amenable to change.

User and client UX

Much more important than technical specifications such as mobile usability is how understandable the test contents are to both to test users and test service clients. The test user runs repeatedly into situations where he/she has to deliver information to test-takers and service receiving managers. All this calls for understandable language and terminology in test content. Obviously the most important question for the user concerns the test's ability to differentiate work processes and guide the test user in development efforts. Here again, terminological simplicity and commonsense language rise to an important position.

Clients of the testing service include recruitment candidates taking the test, managers who hire the candidates as well as test-takers participating in the organization's development programs. The test-takers must in recruitment situations feel themselves treated in an appropriate and fair manner. More so, legistlation in most countries requires that test content be justified as relevant for success in a particular target job.

Managers who receive the testing service should without difficulties understand the test content while making their hiring decisions. In the organization's development programs it is in turn important that the test-taker experience the test as interesting, credible and if possible, inspiring. Both in recruitment and development it is useful to be able to provide the test-takers with interpretation documents for interpreting the outcome profiles. Some tests offer machine-made descriptive reports on individuals creating an illusion of accuracy which can by no way be justified given the modest predictive power of tests. Another risk is that such illusory precision abolishes the need to interview people which in turn can lead to serious misjudgments.

Summary


The predictive ability of tests, although "only" from small to modest in magnitude, has been scientifically established along the meta-analyses incepted in the 1980:ies. Today, test users are offered a large number of tests that fulfill the required technical criteria. Therefore, it is no longer suffices to provide proof for tests' predictive power. The focus has shifted to the content features of tests. Content issues concern so-called ecological validity which refers to the test's usability in contexts and situations where the test will be applied. Having already moved to the online environment, testing must attend to cyber security issues and comply with the GDPR regulations. Online environment enables so-called Big Data analyses and tests are increasingly offering the possibility to use test data in so-called People Analytics undertakings.

The disruption of work brings a wholly new kind of criterion to appraisal of ecological validity. The question is can and how tests be used to guiding people in their navigation toward unforeseen job contents and competencies. Finally, it must be understood that because of their relatively modest effect size and explained variance, tests as such cannot serve as "king makers" for the HR. The value of tests becomes realized to its full extent only after tests function as part of the organization's competency concept, its acquisition and development.

Development processes are still implemented in separate silos across different parts of the organization. HR professionals are still overly concerned about being able to offer "something new" instead of focusing on the process contents and their systematic implementation. Only after integrated processes extending beyond quartiles can the HR assume her important leadership role in the disrupting world of work.

Bass, B. M., & Avolio, B. J. (1990). Transformational leadership development: Manual for the multifactor leadership questionnaire. Palo Alto, CA: Consulting Psychologists Press.
Belbin, R.M. (1993). Team roles at work. London, UK: Butterworth-Heinemann.
Bertua, C., Anderson, N., & Salgado, J.F. (2005). The predictive validity of cognitive ability tests: A UK meta-analysis. Journal of Occupational and Organizational Psychology, 78: 387-409.
Bleidorn, w. et al., (2019). The policy relevance of personality traits. American Psychologist, 74(9), 1056-1067.
Briggs-Myers, I., & Briggs, K.C. (1985). Myers-Briggs Type Indicator (MBTI). Palo Alto, CA: Cons. Psych. Press.
Carver, C.S, Scheier, M.F., & Weintraub, J.K. (1989). Assessing coping strategies: a theoretically based approach. Journal of Personality and Social Psychology, 56, 267-283.
Costa, P. T. & McCrae, R. R. (1992). Revised NEO personality inventory and NEO Five-Factor inventory professional manual. Odessa, FL: Psychological Assessment Resources.
Feist, G. J. (1998). A Meta-Analysis of Personality in Scientific and Artistic Creativity. Personality and Social Psychology Review. Vol. 2. No. 4., 290-309.
Jackson, D.N. (1984). Personality Research Form Manual. 3rd ed. Port Huron, MI: Research Psychologists Press.
Mental Measurements Yearbook 2021. Buros
Nye, C.D., Su, R., Rounds, J., & Drasgow, F. (2017). Interest congruence and performance: Revisiting recent meta-analytic findings. Journal of Vocational Behavior, 98, 138-151.
Roberts, B.W., Kuncel, N.R., Shiner, R., Caspi, A. & Goldberg, L.R. (2007). The Power of personality. Perspectives on Psychological Science. Vol. 2/4, pp. 313-345.
Sackett, P.R., Zhang, C., Berry, C.M. & Lievens, F. (2021). Revisiting Meta-Analytic Estimates of Validity in Personnel Selection: Addressing Systematic Overcorrection for Restriction of Range. Journal of Applied Psychology, Dec. 30.
Van Iddekinge, C.H., Aguinis, H., Mackey, J.D., & DeOrtentiis, P.S. (2018). A Meta-Analysis of the Interactive, Additive, and Relative Effects of Cognitive Ability and Motivation on Performance. Journal of Management Vol. 44 No. 1, Jan., 249-279.

Close

Helsinki (HQ)

Competence Dimensions Ltd

Helpdesk

GMT +3:00 - ± 1:00
helpdesk(at)wopi.net