Unit 4

Handbook

–

Lesson 4

Survey item formats

summary

This is some text inside of a div block.

The “Perfect” Survey Question

Every survey question has two parts: the question stem and the response choices. The question stem is the question or statement a respondent reads and interprets, while the response choices dictate how they should respond. Together, you get an individual survey item.

What’s interesting it’s the response format (or how response choices are presented to respondents) drives how survey questions are categorized. Below are some of the common response formats used in survey questions. All of the formats are discussed further below.

While there isn’t a “perfect” survey, there are qualities for the “perfect” or “ideal” survey question. Below are some of those qualities. It can be challenging to meet all of them, but it’s the process of writing questions to meet the qualities that matter. Without thinking about these qualities, your questions only appear to be collecting useful survey data.

Qualities of the “Perfect” Survey Question

The question stem is short, unambiguous, and relevant to the target segment
The response choices are relevant, meaningful, and comprehensive to the target segment
The question stem or response choices don’t bias, influence, or suggest a response
The survey item measures the attitude, behavior, opinion, or construct you want (aka validity)
The survey item is interpreted in the same way by all respondents (aka reliability)
The survey item doesn’t measure other concepts (aka discriminant validity)

You might not satisfy every quality, but cognitive testing can help see which qualities your questions are lacking. Keep in mind these qualities when writing open- and closed-ended survey questions. Let’s take a closer look at both.

‍

Open-Ended Questions

All questions ask someone to respond to something; what’s different is how people provide that response. You can categorize every survey question into two buckets: open-ended and closed-ended questions.

Open-ended questions are where respondents write their responses, while closed-ended questions ask respondents to respond using predetermined answer choices. One type of question isn’t better than the other because they have different strengths and weaknesses. Let’s first tackle open-ended questions.

The table below shows the open-ended survey questions' major strengths, weaknesses, and risks.

Open-ended questions allow respondents to answer with written responses. This could be an empty text box or even a fill-in-the-blank. However, your goal is the same: to write a meaningful open-ended question that offers boundaries or context for their response.

This example open-ended question explains the idea of context further: “How often do you buy shoes?” When you first read this question, it seems to tick off many of the boxes discussed earlier in this chapter. But when you go to answer it, it’s more challenging.

When the question asks, “how often,” does the question creator mean in the last year or just in general? Or is the question asking about running shoes or hiking shoes? Without clear boundaries, the responses might be irrelevant to what you’re trying to learn. You can catch some of these issues when cognitive testing your questions before launching.

Guide 10: Cognitive Testing your Survey

While you can roughly control the context for an open-ended question, you can’t control a response’s level of detail or its relevance. You might get the below responses for the example question above. If you care about the frequency of buying shoes, the first response is helpful. If you’re focused on frequency and the types of shoes someone buys, the second response is better, even if you have to parse through extra detail.

The takeaway is to use open-ended questions sparingly and to test them early. You’ll have to qualitatively analyze the open-ended response data, which can be very time-consuming (jump to this Handbook for more). If done poorly, you’re just adding effort to your analysis time without extracting any value. If done correctly, you can be confident in the responses and accuracy.

What’s faster and easier for respondents are closed-ended questions.

‍

Closed-ended Questions

When you think of a survey, what ideas pop into your head? Perhaps you think about Likert-type items, rankings, and multi-select questions. All of these are examples of closed-ended questions.

Most surveys tend to have more closed-ended questions because it allows the survey creator to collect lots of data and analyze it relatively easily while it allows the respondent to answer them quickly. Read, click, go.

Additional strengths, weaknesses, and risks are listed below for closed-ended questions.

One strength of closed-ended survey questions has to do with your analysis. When you write the answer choices for a question, you can see what level of quantitative data you’re collecting. Is it nominal, ordinal, interval, or ratio data? (jump to this Topic for more).

If you know what level you’re collecting, you can also plan what statistical tests you want to use, how you want to visualize the data, or how you might need to structure that data before any analysis (jump to this Handbook for more). This means a faster, more intentional survey data analysis phase for you. And a faster analysis phase means you’ll be able to deliver valuable findings back to your stakeholders quicker.

Closed-ended survey data is like footprints in the sand: you can see the footprints but you don’t know why the person stepped there.

The biggest risk with closed-ended questions has to do with intention. When someone selects a closed-ended response, you can see the result of their actions in the data you collect. But you don’t know what they were thinking or intending when they selected a specific response.

Think of closed-ended survey data like a footprint in the sand: you can see that someone pressed their foot there, but you don’t know why they did it. You might get thousands of respondents selecting the same response. But that doesn’t mean you fully understand their interpretation of the question or their intention when selecting a specific answer choice.

Unlike open-ended questions, there’s a variety of types of closed-ended survey questions you can use. The table below shows some common and effective variants. The last type in the last, scales, is a special closed-ended question, discussed more in the next topic.

‍

Rating vs. Response Scales

Rating closed-ended questions are popular in quantitative experience research because the collected data can be easily visualized and broken apart to find interesting patterns. They’re more commonly known as rating scales.

You might’ve seen the word “scale” being used when reading about surveys online, but there’s a very confusing but often unspoken idea about them: scales can mean different things within even one survey question. There’s a distinction between rating scales and response scales.

A response scale is a specific format for how response choices are ordered and shown to your respondents. Your question stem can be anything but you’re asking respondents to select from a range of ordered responses.

You can order the response scale format in two ways: unipolar or bipolar. You’ve probably seen examples of both if you’ve ever looked online for survey design help, but a quick example of each is provided below.

A unipolar response scale measures the presence or absence of one specific variable (an emotion, a behavior, a belief, etc.). A bipolar response scale measures how far and in what direction from a neutral midpoint someone is. Each specific response choice is known as an anchor point.

It’s best practice to have only 5-7 points on your scales, where the middle response is “undecided” or “neutral” for a bipolar scale. While useful, writing congruent or consistent response choices can be challenging, so it’s best to treat rating scales as ordinal, not interval, data (jump to this Topic for more on quantitative data types).

A rating scale is when your response choices are ordered but the question stem is trying to measure opinions, attitudes, beliefs, or reactions. You can not only visualize the results easily (as it’s closed-ended data) but also see nuance across the measured attitudes or opinions. Rating scale questions use a response scale format, as shown below.

What sounds like unnecessarily complex detail actually serves as the foundation for one the most complex, debated, and misunderstood ideas in survey research: the idea of scaling.

‍

Scaling

A quick starting note: the concept of scales is somewhat confusing and there’s disagreement and debate over how scales are used and written about in survey research. Think of this section as a quick introduction to the magical, complex world of scales.

Sometimes in a survey, you want to provide a range of response choices to see subtle variations in how people feel or think. It might be easy to ask if something was clean or not, but what if you want to measure a construct like perception?

For example, you might want to measure how fast the redesigned mobile app felt. Perception of speed is a qualitative construct but, in the survey, it’s measured with quantitative units (aka specific, ordered response choices), as shown in the example below. The qualitative construct has been put on an ordinal scale (jump to this Topic for more on ordinal data).

Scaling is when you assign numerical values to a qualitative construct. Scales are used to measure a hard-to-measure qualitative construct quantitatively. This includes constructs like addiction, happiness, satisfaction, or intelligence (read more on constructs in this Topic). The scales make it easier to use numbers to describe, analyze, and interpret these constructs rather than describe them using qualitative data.

Scales are used to measure unidimensional and multidimensional constructs (see 5c for more). Think of scales as groups of related or similar rating scale questions with the same or similar answer choices being combined together. Each question measures some small part of an underlying construct, but when combined, it should provide a good estimate of that construct (if the scale was made appropriately).

If you believe the construct you want to measure is unidimensional, you might just need one rating scale question to measure it. If you believe the construct to be multidimensional, then you’d need several rating scales grouped to measure specific dimensions of the larger construct. Check out the example in the next subsection as well as this article for more help.

The most popular scale used in surveys is the Likert scale.

‍

Likert Scales

Pronounced “lick-ert,” the Likert scale is used to capture attitudes toward a declarative statement with specific, ordered responses. Named after an American psychologist, Rensis Likert, the scales help measure attitudes and opinions. It’s possibly the most popular rating scale question, used across dozens of academic and business domains.

But there’s confusion about what exactly a Likert scale is. Non-researchers tend to use the Likert scale and Likert items interchangeably, even if they’re different. Explaining the difference can be one place to establish credibility as a researcher, especially when dealing with new teammates or disliking survey research. Let’s break down the diagram below, which shows the relationship between a Likert scale and an individual Likert item.

A Likert item is an individual survey question with a specific question stem and response choices. The question stem is always a declarative statement about which someone can have an opinion or attitude. The response scale format is always ordered from “strongly agree, “agree, “neither agree nor disagree,” “disagree,” to “strongly disagree”). Check out this resource for some common Likert scale responses.

If you use a Likert item but don’t follow this order, you’re creating a Likert-type item. Likert-type items resemble Likert items but don’t share the same validity when used.

On the other hand, Likert scales are built up of at least five individual but related Likert items. Likert scales are used to measure constructs, on the assumption respondents feel more or less strongly about the construct being measured or studied.

When you score (or assign a numerical value to each Likert scale response choice) and aggregate all of the Likert items together, you end up with a Likert scale that shows the range of differences for the specific construct.

What gets even more complex when you use several Likert scales to measure a multidimensional construct. For example, let’s say you were interested in measuring attitudes about how “friendly” fruits are, with “friendly” being the construct you want to measure. You conceptualize “friendliness” as all of the “outwardly social behaviors or interactions that any fruit has with another fruit in public spaces.”

You could then operationalize this further by having “friendliness” be made up of different dimensions (also known as factors) such as “inclusiveness”, “humorous”, and “similarity”. Finally, you could write different survey questions to measure the various dimensions, as shown in the example below. Check out this paper for a step-by-step example of creating a multidimensional scale.

You're in bad luck if you want simple, clear advice on how to score (or assign values) to each Likert item. There’s debate on how to view Likert scale data (and evidence that even academic researchers score in inconsistent or incongruent ways, as discussed in this paper). If you need some quick advice, check out the decision tree below.

The last piece of advice is to avoid Likert scales until you understand and practice with them. Feel free to use all the Likert-type items you want but treat them as ordinal data. Using Likert scales appropriately means performing the statistical methods (like calculating Cronbach’s alpha or reducing items with factor analysis), which are outside the scope of this library. (You can get a small overview of factor analysis in this Topic). Check out this great article on developing scales for more help.

‍

Survey Logic

While survey logic isn’t a type of question, it can affect the order and type of questions you use. Below are some of the more common types of survey logic available in popular survey tools (such as Qualtrics).

Piping is more complex and prone to error (like when respondents misinterpret a question or satisfice), so looking for survey tools that offer branching and randomization is a good place to start (such as with Microsoft Forms or Google Forms). When designing your survey, think about how the survey logic can allow you to write fewer questions or create fewer branches or paths. More branches can lead to more “customizable” survey experiences, but it can also be hard to compare branches when analyzing the collected data.

Consider the tradeoffs you get with survey logic and use what’s appropriate. As mentioned, you must test your survey logic before launching it. Avoid the awful situation of looking at your data and realizing the survey logic is either biasing respondents or not working as intended.

‍

Handbook Closing Thoughts

While a lot was covered in this Handbook, the world of effective and valid survey designs is incredibly complex. Once again, while there isn’t a single perfect survey, using the constraints, strategies and frameworks in this chapter can make every survey you run impactful.

But interviews and surveys help you understand the people you build for the better. How do you know if your product is usable? What does it mean to be “usable”? And what easier to measure – usability or un-usability? These ideas are discussed in the next chapter, Usability Testing, arguably the quintessential experience research method.

Open-ended and closed-ended questions
Rating and ranking scales
Quantitative scaling
Unidimensional and multidimensional constructs

Resources