Unit 5

Handbook

–

Lesson 4

Complex quant techniques

summary

This is some text inside of a div block.

Complex, Mathematical, Powerful

All the techniques in this final subsection are much more complex than basic descriptive statistics or running a chi-squared test — and that’s the point.

These techniques require a strong grasp of statistical thinking concepts (like probability, distributions, data reduction, multivariate analysis, etc.) and a functional understanding of the underlying mathematics that fuels each technique (such as linear algebra or Bayesian statistics). Like significance tests, all of the techniques have specific requirements you need to meet to use them properly.

One quick note: nearly all these techniques will require advanced statistical software or programming tools (like R or Python). Online calculators exist for some of these techniques, but they tend to be very basic or assume that you’re collecting “perfectly clean” data, which is never the case.

This topic is here to expand and excite you about quantitative data analysis so you can think about how and when to use these techniques in your research. If you want to hyper-charge your quantitative abilities, understanding and applying these techniques appropriately is the best way to do it.

The techniques aren’t ordered in a particular way so let’s jump right in.

‍

Analysis of Variance (ANOVA)

Many basic statistical techniques use only two groups and compare the differences between means or proportions. But sometimes, you want to compare three or more groups. A technique known as Analysis of Variance (or ANOVA for short) can be helpful.

What gets complex is that using ANOVA means looking for relationships between a qualitative/categorical variable (commonly referred to as a factor) and a quantitative/numerical variable.

For example, you might use ANOVA to see if there’s a relationship between the country where someone lives and uses your product (aka an independent categorical variable) and the amount of time someone spends per month using your product (aka a dependent, numerical variable). Using the ANOVA, you can see if – and how much – country affects usage.

You can learn more here.

‍

Factor Analysis

When you get to more advanced techniques, you wind up dealing with lots of data, and studying many variables at one time. For example, you might ask twenty questions in a single survey, meaning you’d have twenty columns of data to analyze.

But how likely is it that every question/column contains unique or meaningful patterns? It’s possible that many of the patterns within your data can be explained or condensed to just a few essential columns of data. This process of data reduction is what factor analysis helps you do.

Factor analysis works with the assumption that much of the variation, differences, and patterns within your multivariate quantitative dataset can be reduced and explained by just a handful of variables (also known as factors). The goal with factor analysis is to reduce your focus from every variable to the few that are most meaningful, so your further analysis is focused & faster.

You read more at these links: resource 1, resource 2, resource 3.

‍

Cluster Analysis

You might choose to visualize your data to see how similar or different it is. When you visualize your data, you can see patterns or trends that might be hidden when you view your quantitative data as only rows and columns.

If there are groups that exist when visualized, you could group certain data points into clusters or groups of close data points. The data points within a cluster are like each other and different across clusters.

Cluster analysis is used across a range of domains, but it’s common in experience research to cluster people into meaningful and similar segments (like “power users” or “small-dollar-purchasers”).

Cluster analysis is very useful in discovering groups that “naturally” exist based on the data collected. There are also techniques to start with a starting number of clusters and use math to determine the smallest number of most informative clusters.

You can read more here: resource 1, resource 2, resource 3,

‍

Generalized Linear Models (GLM)

If you want to make predictions or inferences for future or unobserved events, you might want to look at generalized linear models (GLM or GLiM for short). GLMs form the base for many advanced statistical techniques (see the search below) like the ANOVA technique described above.

GLMs take your independent and dependent variables (sometimes called predictor and response variables) and model them as a linear relationship. You can use this linear relationship to predict values with some amount of error (remember that error always exists in quantitative research and can only be reduced, never removed).

This is a very simple explanation for a very complex idea. The helpful resources listed below get into the math. You might not like the math, but without understanding it works, GLMs are best used when you partner with a quantitative researcher or data scientist (or in the later stages of your own, guided education).

You can read more with these resources: resource 1, resource 2

‍

Resampling Techniques

When you sample a population, you use some people to estimate or infer what’s true about that larger population. But if you only have one sample that produces one estimate for the true population value you’re studying, you don’t know how good that single estimate is.

You might over-or underestimate the true population value but never know it. But what if you could use your sample to create many “new” samples? You’d then have multiple estimates, allowing for a more precise estimate of that true population value.

The math for resampling is basic, but using it can be tedious. You need statistical software to resample. With software (like R), you could resample from one sample hundreds – if not hundreds of thousands – of times to produce new estimates. For these reasons, resampling is becoming a popular and practical way of performing inferential statistics, creating confidence intervals, and testing hypotheses.

You can read more here: resource 1, resource 2, resource 3

‍

Bayesian Statistics

The final advanced idea is known as Bayesian statistics. This Handbook is built around a school of statistics known as frequentist statistics. Frequentists assume that the true population value is somewhat fixed. Statisticians using the frequentist approach believe this fixed true population value can only be estimated accurately with multiple trials, experiments, or samples. Essentially, frequentists believe the more data you collect, the better your estimate of a true population value.

On the other hand, Bayesian statisticians believe that each time you measure something and make an estimate, you’re getting a better understanding of what the true population value could be. You’re updating your beliefs of not only what the true population value is, but values are even possible in the first place. Bayesian statisticians aren’t focused on collecting more data but on recognizing what’s different from expected and updating your approach accordingly.

A frequentist approach means collecting lots of data in a reliable, consistent way to see patterns that can only be seen with many samples. You know the true population value when you average or aggregate all of those samples. A Bayesian approach means you’re collecting some data to update your beliefs, to then collect better, more focused data.

There’s a very simple example below to help explain the differences between the two. It’s not perfect, but it should help make these ideas slightly easier to digest.

Frequentist vs. Bayesian Statistics Example

You're at home and can't find your phone. You have an important phone call in 15 minutes. You begin your search. ‍

Frequentist: There are ten rooms in the house. This means you have a 1-in-10 chance that you find your phone in the first room you check. To confirm this, you’d essentially check a room, see if it's in there or not, and then check the next closest room. If the phone isn’t in one room, then the chances you find it in the next room drops to 1-in-9 and so on, until you find it.
‍Bayesian: You know you don't really use your phone upstairs because of the bad internet connectivity, which means it can only be in the rooms on the bottom floor. You also know that you like to use phone in the kitchen. You start your search in the kitchen, using your past experiences and knowledge to limit your search. If your phone isn't in the kitchen, you'll continue your search in other rooms you usually use your phone in.

From a non-researcher perspective, the Bayesian approach is easier or “intuitive” to understand. But in practice, you don’t always have a prior belief or data you can use to refine your approach.

The math required for a Bayesian approach (such as understanding linear algebra and probabilities) is much more complex than a frequentist approach. But understanding the topics covered in this handbook should make learning and using a Bayesian approach easier.

You learn more with these resources: resource 1, resource 2, resource 3

‍

Handbook Closing Thoughts

Quantitative data analysis can and should be rigorous. Numbers can be unforgiving, abstract, and intangible. Statistical techniques can help you make sense of the complexity of people and their behaviors, attitudes, opinions, and expectations.

When you practice and budget for meaningful, cyclical, and skeptical quantitative analysis, you’re training your brain and stakeholders to be structured and logical with what’s learned.

The final tip is to start small! The world of quantitative research is vast and only growing with more and more tools (like software and different analytical strategies) and more practitioners (from data scientists to quant-savvy marketers).

Before considering more complex techniques, doing solid summary statistics and educating your stakeholders is a great place to start. Over time, you can increase your knowledge, confidence, and hunger for effective quantitative findings and insights.

If you’ve confidently analyzed your qualitative and quantitative data, you enter the research study's final phase: reporting.

Analysis of Variance (ANOVA); Analysis of Covariance (ANCOVA)
Multiple analysis of variance (MANOVA); Multiple analysis of covariance (MANCOVA)
General linear models (GLM)
Structural equation modeling (SEM)
Exploratory factor analysis (EFA); confirmatory factor analysis (CFA); principal component analysis (PCA); eigenvalues; factor loadings; correlation matrix
K-means clustering, regression trees, random forests, intra-class and inter-class similarity, portioning algorithms,
Linear models, generalized linear models, linear regression, logistic regression, Poisson regression, time-series modeling, Data-driven or empirical models vs. theory-driven models
Bayes’ formula, credible interval, conditional probabilities

‍

Resources

"Data Science for Human-Centered Product Design" (website)
‍Specific resources are in-line, above.