Unit 5

Handbook

–

Lesson 1

Exploring your quant data

summary

This is some text inside of a div block.

Practical Levels for Quantitative Analysis

If you’re new to quantitative research, the terms in this handbook might seem all over the place and overwhelming. Even simple ideas and terms can some scary. You can think of quantitative analysis as having practical levels to make it easier to understand.

Each level represents a clear jump in terms of mathematical complexity, data manipulation, interpretation of your results, and the value gained from your data. Once you figure out what level you’re currently comfortable with, the goal then is to move through the levels via education, practice, and feedback.

As you go move to higher levels, the amount of skill and confidence needed to manipulate your quantitative goes up. The higher the level, the more data manipulation and data interpretation skills are required.

For example, visualizing the frequency of one variable (aka level 3) is relatively easy in Microsoft Excel but you’d need software and experience when running a principal components analysis (PCA, covered more in Topic 4 in this handbook), which involves multiple variables and interpretations.

The higher the level, the more data manipulation and data interpretation skills are required.

This chapter doesn’t cover very mathematical heavy analytical techniques for quantitative data. If you didn’t plan to know to use such techniques before collecting data, you can’t expect to use them after collecting quantitative data. As mentioned throughout here, quantitative research is about planning, intention, and logic. Without clear research questions, study goals, or research hypotheses, analyzing your quantitative data will be a tedious, confusing process.

Without a clear idea of what you’re trying to learn or validate, you’ll find yourself in one of two negative situations: inappropriate or endless data analysis. Inappropriate in this context means your analysis doesn’t match the properties of the data you have (such as using ratio data techniques on nominal data).

Without goals, you’ll end up in a worse situation by exhausting or tiring yourself, trying many techniques without extracting any value from your quantitative data.

When you know exactly what you’re trying to find or validate in your quantitative data, you make it easier for you to (a) move quickly and (b) move with intention. The worst situation is when you recognize that you should’ve collected (or avoided collecting) certain types of responses to address your quantitative research questions or test hypotheses.

Levels 0 through 3 actually form the foundation for the best place to start when analyzing any quantitative dataset: the exploratory data analysis (EDA) process.

‍

The Quantitative Data Analysis Process

Quantitative data analysis takes structured quantitative data and translates that into patterns, relationships, and results using specific statistical methods. You can think of any quantitative analytical technique as a way of breaking down your data into smaller pieces. You interpret the patterns and relationships among the smaller pieces to arrive at quantitative findings and results.

The quantitative data analysis process can be in fact listed neatly:

The Basic Quantitative Data Analysis Process:

Plan your analysis when designing your study
Explore and describe data (EDA process)
Use statistical methods and tests
Interpret test results
Make conclusions

On paper, the process seems pretty simple. But the process above is misleading: meaningful quantitative data analysis is cyclical, not linear. It'd look like the diagram below if you had to visualize the process.

Like qualitative data analysis, it’s a cyclical process without a clear end. You figure out what to do first, where to take your analysis next, and when to wrap up and think about reporting findings.

The first step is determining study goals and writing hypotheses (jump to this Topic for more). Then you collect your quantitative data using a quantitative method. Before starting your analysis you clean or structure your data to make it easier to analyze. Next, comes the core analytical cycle — the exploratory data analysis (EDA) cycle.

Covered more below, the EDA cycle is about recognizing and describing patterns in your data. If you’ve also designed your quantitative study to take advantage of specific statistical methods, you’d use them after the EDA cycle and interpret their results.

You might have to restructure your data several times if you find unexpected results. When you feel confident you’ve really understood your quantitative data, and addressed your research questions and study goals, you’d arrive at your final conclusions.

Let’s break down the various parts of this process throughout this chapter. You might have some quantitative data right now. But that doesn’t jump into analysis right away. If you’re using an instrument to collect quantitative data (like a survey or A/B testing tool), you’ll have to first clean and structure your data before exploring it. Let’s also ignore designing a quantitative study as that’s the focus in an earlier handbook (Collection 3, Handbook 3).

‍

Data Cleaning & Structuring

Data cleaning means taking your raw quantitative dataset and making it easier to upload, work with, manipulate, and read when analyzing. Data structuring is when you take your cleaned dataset and modify it to fit your current quantitative study. Data cleaning is generic across quantitative studies while data structuring is contextual to each study.

Data cleaning is generic while data structuring is contextual.

For example, if you use a survey tool to collect quantitative, you might always have to remove (or clean) the column that lists the date the survey was taken. But if you wanted to focus only on analyzing the open-ended data, you might split (or structure) your dataset into two datasets, one open-ended and one close-ended.

Other common cleaning or structuring activities are listed below.

Common data cleaning and structuring activities

Removing irrelevant columns or rows
Adding or combining relevant columns, rows, or data
Adding in log or product analytics data
Splitting data by questions, topics, or participants
Dealing with missing data, skips, blanks, and duplicates
Weighting your data based on known population parameters
Ignore or impute missing values/data
Rename variables or columns to something useful & human
Remove blanks or convert them "Na" (not applicable)
Reorder columns or split groups of tabs into their own data based on similar or important characteristics
Rebasing your data

Cleaning and structuring your data is essential when analyzing quantitative data for two reasons: (1) you’re making it easier and faster for you to start your planned analysis and (2) you’re learning about unexpected or interesting patterns and areas to understand further and dissect. For more, check out this link or this public access book. Jump here for more on sample weighting.

After structuring, you start the core analytical cycle when analyzing quantitative data, the exploratory data analysis (EDA) cycle.

‍

The Exploratory Data Analysis (EDA) Cycle

Exploration and description are your two goals when performing something known as exploratory data analysis or EDA for short. EDA is a process of summarizing and visualizing your data to “explore” what patterns exist in your collected data. You then confirm those patterns in the later stages of your analysis. And like analysis in general, the EDA process is also cyclical.

The diagram below shows how you jump between approaches, focuses, and methods in the EDA cycle.

The cyclical nature of Exploratory Data Analysis (EDA)

Let’s break down each part of the cycle.

‍

EDA: Planned vs. Emergent Approaches

The first micro-cyclical element for exploring your data starts with your approach. Approaches fall into two types: planned and emergent approaches. Planned approaches are things you knowingly set out to do (aka your research questions and hypotheses). A planned approach leads to faster quantitative data analysis because you know exactly what to focus on. Planned analysis steps lead to findings.

The other type is an emergent approach, which you find yourself doing when your data is different from expected. With this approach, you’re keeping an open mind to what’s there, allowing yourself to notice unexpected or insightful patterns. Some actions you can do for both approaches are listed below to make it easier. An emergent analysis leads to insights or non-obvious, relevant, and actionable patterns.

A list of possible steps for planned and emergent approaches is listed below.

One very helpful emergent step you can take is looking for informative nonresponse. Recall that nonresponse bias is when participants can’t or don’t want to participate (or complete the full study). Take a look at your quantitative data. Look at you who didn’t respond but was expected to. Review all of your empty cells in your raw data. Describe who didn’t fully complete the study and at what points.

Informative nonresponse bias is about asking yourself one question: “Is the lack of data or participation evidence of a useful, unexpected pattern?”

For example, this diagram shows two different survey datasets as rows and columns. If you looked at only the blank cells — or evidence of nonresponse bias, you could see that dataset A doesn’t seem to have a real pattern. In contrast, dataset B clearly shows a large chunk of respondents skipping or clicking “no response” for a sequence of survey questions.

Your data will probably never be this obvious, but if you looked at the actual survey questions, you could learn why certain data is missing (an insight coming out of the EDA cycle).

Sometimes the lack of data is evidence of an important pattern.

Alternate between a planned and an emergent approach; using both gives you speed and flexibility when analyzing your quantitative data.

‍

EDA: Focusing on 1 or 2+ Variables

Next, is the focus of your exploration. You could focus on one variable, one question/response, or a column in your data. For example, you focus on a single column of data showing the survey completion times to see if there’s a pattern. When you’re focusing on just one variable/column, then you’re doing univariate analysis (“uni” meaning one).

Other times, you might want to focus on the patterns between two or more variables or columns. This is known as multivariate analysis. Start your analysis by exploring one variable/column and then moving to multiple variables/columns.

But if you have many, many columns of data, you could spend a significant amount of time just going through all the pairs of columns (if you had ten columns of data, you’d have 90 possible two-column sets to analyze).

The assumption is that every column has unique data, but that’s not always true. Once again, if you’ve designed your study to look for specific patterns or test specific hypotheses, only certain column sets will be important to you.

This is yet another reason you need a clear idea of what you expect to find in your data and how you’ll analyze it. Given your study goals, you might be content with performing one or two variable-variable multivariate analyses instead of going through every possible combination of columns. It’s really about how you need to analyze with intention and speed.

‍

EDA: Exploring Numerically or Graphically

Finally, let’s look at how exactly you make sense of your data. You either use numerical methods or graphical or display methods, switching between the two to notice and confirm patterns.

Numerical methods include descriptive statistics (which describes the patterns in your collected data and sample, covered in the next Topic) and inferential statistical methods (which infer or try to understand your sample relative to your larger population or segment). Inferential methods are covered covered in Topic 3 in this Handbook, so let’s break down descriptive statistics (using both numbers and graphs) in the next topic.

Exploratory data analysis
Data structuring; data cleaning; data wrangling
Descriptive and inferential statistics
Informative nonresponse

‍

Resources