Understanding Validity

Marian Pitel

Marian Pitel

VP Research @ nugget.ai

I/O researcher, consultant, and nerd with a passion for making the world better - one workplace at a time.

This paper was written by students in the Masters of I/O Science program at the University of Guelph, located in Ontario, CA. Special mentions to the students for their generous contributions: Melissa Pike, Molly Contini, Julia Kearney, and Jordan Moore. Supervised by Marian Pitel, VP Research at nugget.ai.



WWelcome to the second post in our series about measurement! In organizations, it is incredibly important that research be conducted with rigor. The rigor and quality of research can be assessed through the measurement of validity and reliability. Validity is defined as the extent to which a test accurately assesses what it intends to measure (Heale & Twycross, 2015). Reliability, on the other hand, refers to the consistency of a test. More specifically, reliability can be defined as whether the test consistently produces similar scores within people or across time and situations (Nunnally & Bernstein, 1994). This post will further explore the concept of validity. If you missed our post about reliability, you can read it here.

I’m sure most of you have taken the sort of quiz that lets you know your best attribute based on your food preferences. Even though they seem silly, these types of quizzes are really popular. Naturally, individuals are interested in finding out information about themselves and others. However, when we take any type of quiz or survey, we need to ask ourselves: what kind of information is this telling me? Can I trust what the survey is saying? How do I know if the results are true? Does liking chocolate ice cream with sprinkles really mean I’m open to new experiences?



You can't always assume that a test accurately measures what the creators say that it measures.



In science, tests are often created to measure different constructs. A construct is a skill, attribute, or ability that exists within a person and is not directly observable (Williams, 2015). For example, though you may know a person is smart by the way they speak and what they say, you cannot directly observe intelligence (Williams, 2015). A valid test is able to accurately classify people who have different levels of the construct that you're measuring (Borsboom, Mellenbergh, & van Heerden, 2004). For example, a valid intelligence test would generate scores that correctly reflect different individuals' levels of intelligence. In this blog, we're going to review two types of validity: construct validity and criterion related validity.


Construct Validity

Construct validity provides evidence as to whether you can draw inferences from test scores about the construct being studied (Heale & Twycross, 2015). In simpler terms, construct validity represents the extent to which a test accurately assesses the construct it aims to measure (Cronbach & Meehl, 1995). Imagine I’m designing my own intelligence quotient (IQ) test, called the Brainy Bowl. To be considered a valid test, scores on the Brainy Bowl should reflect the range of intelligence levels of the individuals in any given group. The Brainy Bowl would be considered high in construct validity if it was capable of effectively differentiating between individuals that are high in intelligence from those that are low in intelligence (Cronbach & Meehl, 1995). Construct validity can further be broken into two specific forms of validity: convergent and divergent validity.


Convergent Validity

Convergent validity refers to the extent to which tests for constructs that should be similar to one another actually provide similar results (Holton III, Bates, Bookter & Yamkovenko, 2007). This form of validity is determined by examining the relationship between two tests, typically by examining their correlation (a statistical index used to represent the strength of a relationship between two variables; Bobko, 2001; Holton III et al., 2007). For example, convergent validity could be demonstrated by a strong correlation between the Brainy Bowl and a common and previously validated IQ test, the Wechsler Adult Intelligence Scale (WAIS). If someone scored highly on the Brainy Bowl, but not on the WAIS, the Brainy Bowl would be considered to have low convergent validity, as the two intelligence tests should produce similar scores


Divergent Validity

Divergent validity is the extent to which tests for constructs that are novel and different to one another provide unrelated results (Holton III et al., 2007). Divergent validity is established by a weak correlation between two tests that measure theoretically different constructs. For example, if the Brainy Bowl measures intelligence it would be assumed that it would not be strongly correlated with tools that measure communication skills.



While it is possible for an individual to be highly intelligent and also possess strong communication skills, tools used to measure these constructs shouldn’t be correlated across people as they are intended to measure quite different constructs.



The Brainy Bowl, therefore, would have low divergent validity if it was found to be strongly correlated with results from tools design to measure communication skills.


Criterion-Related Validity

The second type of validity we will discuss is called criterion-related validity. Criterion-related validity assesses whether there is a relationship between scores produced by a test and scores on a specific outcome, called criterion variables (Adock & Collier, 2001). Basically, criterion-related validity captures how well one measure predicts an outcome. Intelligence, for instance, is correlated with the outcome of future job performance (Gagnon & Barber, 2018). Valid intelligence tests should, therefore, have a relationship with measures of job performance. Criterion-related validity can be broken into three forms: predictive, concurrent, and postdictive.


Predictive Validity

Predictive validity demonstrates how well a test predicts success on a future outcome (Gagnon & Barber, 2018). Predictive validity is measured by correlating the test and outcome. The size of this correlation should be considered in order to understand how well the test predicts the outcome (Gagnon & Barber, 2018). Let’s say all incoming employees at Company A must take the Brainy Bowl on their first day. If employees’ Brainy Bowl scores are strongly correlated with their job performance at their one-year performance appraisal, that would demonstrate high predictive criterion-related validity for the Brainy Bowl. As such, you could use Brainy Bowl scores to effectively predict employee job performance.


Concurrent Validity

Concurrent validity examines whether two separate measures taken around the same time are related to each other (Gagnon & Barber, 2018). For example, we’ve established that intelligence is correlated with job performance. Concurrent validity indicates that an individual who has a high score on an intelligence test would score similarly well on a job performance test taken at the same time. Using Company A as an example, let’s pretend that instead of taking the Brainy Bowl on their first day, employees take it on the day of their one-year performance appraisal. The Brainy Bowl would have high concurrent validity if Brainy Bowl scores were strongly correlated with individual performance appraisal scores. As such, you could use Brainy Bowl scores to effectively predict employee job performance.


Postdictive Validity

Postdictive validity is an indication of how well a test can be used to predict the value of an outcome taken before the test (Gagnon & Barber, 2018). This sounds like a mouthful! Let’s say Company A has access to all of their employees’ university GPAs and also get their employees take the Brainy Bowl on their first day. The Brainy Bowl would be considered to have postdictive validity if you could use the Brainy Bowl to predict the employee’s GPA that they attained while in university (a score they received at an earlier in time).



We’ll ask you one more time: Does liking chocolate ice cream with sprinkles really mean I’m open to new experiences? After reading this blog and thinking about the construct and criterion related validity of those kinds of quizzes, I hope you’ll agree that the answer is a resounding no! It’s important to note that not every type of validity will apply to every test. Most of the time, testing the validity of a test with one or two of the methods described here will do the trick. Keep these forms of validity in mind when taking any tests from now on and it might provide some insight in terms of why the test may or may not be valid!




To learn more about our research, contact Marian here.

Click here for whitepaper references and citations.