Quantitative Data Isn't Only About Numbers
All quantitative data is counting something. The question is, what's being counted? Are you counting individual things, categories, or something that might fall on a number line? The concept of discrete and continuous data was discussed in this Topic.
But you can further categorize quantitative data as shown in the diagram below.
Being able to distinguish the type of quantitative data you’re planning to collect is an incredibly valuable skill in quantitative research. If you know the data type, you can figure out how to collect that data in your study, how to analyze it (with various analytical techniques, covered more in this Handbook), and even how to visualize and report your quantitative findings.
The first step is to recognize if you’re collecting quantitative-categorical or quantitative-numerical data. To discuss either type means understanding the four levels of measurement proposed by S.S. Stevens.
S.S. Stevens' Scales of Measurement
In 1959, a psychologist named Stanley Smith Stevens (or S.S. Stevens) wrote a paper that took quantitative data and broke it into four scales: nominal, ordinal, interval, and ratio. Each of these levels has different properties, capturing different types of information.
The first two levels are examples of categorical-quantitative data, while the last two are numerical-quantitative data. This sounds like fancy wording, but it does help explain the properties of the four levels that Stevens proposed.
Steven’s four scales aren’t without criticisms. This article discusses issues with Steven’s proposed framework. However, after more than sixty years, Steven’s four scales are still commonly used. The levels are used across industries and domains, like in data science. Recognize that while popular, you’ll have to recognize and defend the scale of quantitative data you’re working with.
Categorical Data: Nominal (By Name)
Nominal data is data that's put into groups or categories that don’t have a clear or natural pattern. Data at this level is the lowest in Steven's levels of measurements. You can only use frequency to distinguish between columns (“people clicked red more than blue as their favorite color”).
Examples of Nominal Data
- Color (e.g., red, yellow, blue, etc.)
- Mobile Operating system (e.g., Android, iOS, Windows, etc.)
- Device type (e.g., smartphone, tablet, etc.)
- Application (e.g., Facebook, YouTube, Spotify, etc.)
- Countries (Russia, Egypt, South Africa, etc.)
You can also have nominal data that only have two groups, such as true/false, yes/no, or male/female, apple/banana. This type of data is known as binary or dichotomous data and can be useful in some situations (like in recruitment screeners).
Categorical Data: Ordinal (by order)
Ordinal data fall into a limited number of categories/groups. These groups have a natural or a given order. The possible responses in both nominal and ordinal data are known as levels. The overall categorical variable (such as satisfaction or device type) is also known as categorical factors.
Examples of Ordinal Data
- Score (e.g., high, moderate, low)
- Medals (e.g., gold, silver, bronze)
- Satisfaction (e.g., strongly satisfied, satisfied, neither satisfied nor dissatisfied, etc.)
- Reported frequency (e.g., Yearly, Monthly, Weekly, etc.)
- Preference (e.g., most preferred, somewhat preferred, neutral, etc.)
Leaving categorical-quantitative data, let's look at numerical-quantitative data. There are two levels that S.S. Stevens proposed: interval and ratio data.
Numerical Data: Interval (by distance)
Interval data have a clear order and exact distances (aka intervals) between each data point. Unlike nominal and ordinal data, interval data can be measured along a number line. This number line can go past zero, meaning negative numbers, as shown in the diagram below:
Interval data is somewhat challenging to use because there's debate on how equal the intervals are. Should interval data in the experience research world be treated as having equal distance between points or not? (covered more at the end of this section). For this reason, most examples below are outside the experience research world:
Examples of Interval Data
- IQ
- SAT scores
- Calendar year (e.g., 2020, 2021, 2022, etc.)
- Income brackets
Numerical Data: Ratio (by relationship)
The last data type is the most informative of Steven's four levels. Ratio data has a clear order, exact intervals between data points, and most importantly, a meaningful zero point. A zero point is when your data can't be negative. You can't have "zero weight" or "zero height" because that just means you're not measuring anything.
Zero at the ratio data level means there's a lack of absence for what you're measuring. If you used a number line to measure ratio data, it starts at zero and continues in the positive direction only.
Examples of Ratio Data
- Age
- Average time spent scrolling (in minutes)
- Number of likes
- Total number of errors in a usability test
No matter what type of data you collect, understanding how to interpret and use it is essential in your quantitative research.
Behavioral vs. Self-Report Data
Let’s tie the four levels Stevens proposed back to experience research. In practice, most of your quantitative-numerical data will come from sources that record behavioral data. Things like how long someone spends using the app, the median e-commerce purchase amount, and the average number of shares over a year are all examples of behavioral data.
On the other side are self-report data. Self-report data are all the data you collect when you’re working directly interacting with people. For example, every survey you run collects self-report data. However, self-report data is always quantitative-categorical data. People can’t precisely or accurately provide interval or ratio data.
But don’t some survey questions, like the Likert scale, collect interval data?
Agree-and-A-Half: The Likert, Interval, & Ordinal Debate
The Likert scale is a very popular question used in surveys. While surveys are covered more in this Handbook, many inexperienced researchers assume the quantitative data from Likert items are examples of interval data, when in fact they’re likely examples of ordinal data.
Likert items may use numbers (as anchor points or when scoring) but the numbers are representing categories, not values on a number line. Let’s look at an example to understand this confusion better.
Imagine that you're measuring how nice or mean fruits are to each other on Fruitful Island. To do this, you randomly sample one hundred fruits and ask them the following Likert-item type question:
Let's say you’ve collected the data and you're starting your analysis. You could analyze this data by taking each response, assigning it a value (as shown below), and then taking the average of all the responses.
The analysis here seems straightforward and appropriate. You could multiply the assigned value with the percent of people who selected each answer and then take the average. But this is a bad approach to take.
When you assign numbers, you unknowingly treat the differences between the responses as equal. However, while the assigned values have equal distances, the actual responses they represent don’t. Can you really say that the distance between feeling "very nice" and "nice" is the same? And even if you think the responses are equally spaced, can you be sure that every respondent viewed the distances between responses as equal? Probably not.
You can’t average “very nice” and “mean” and get “mean-and-a-half.”
But let's assume that you did assume the responses all had equal intervals. What happens when you use the assigned scores and take an average? Your analysis will give you an answer but what does that mean in the real world? What does it mean to take the average of "very nice" and "mean"? Does it give you an answer of "nice-and-a-half?" Or how about "mean-and-one-quarter-nice"? The numbers, in this case, make it seem like your data is more informative than it is.
The point is simple: don't assign numerical values (interval) to Likert-type item responses (ordinal). You can assign pretty much any number to the responses, long as they're equally spaced and still come to "quantitative results." Mathematically, all of the possible assigned values are the same, even if what the numbers represent are very different (as shown below).
Ultimately, it's up to you to choose and determine what type of data you have. Know that your stakeholders or other researchers might wonder why you chose to interpret the data you did. Being able to defend and explain your rationale for interpreting your data at a particular level can be as important as figuring out what quantitative data to collect in the first place.
Handbook Closing Thoughts
Quantitative data is more than just numbers and statistics. The numbers represent the complex, messy things you're collecting. Knowing what type of data your hypothesis needs you to collect or what type of data to avoid can give you focus and clarity when planning your quantitative approach.
- S. S. Stevens' Levels of Measurement
- discrete and continuous data
- categorical data; numerical data
- nominal, ordinal, interval, ratio data