Light bulb moments are rare. But what if they occurred at the flip of a switch? And what if that let you see and change the future? Making that happen is our mission, starting with our plan to address a problem in the $90 billion market research industry: an emphasis on scorekeeping at the expense of prediction.
In a 1976
Througout his career, Harris advised clients, including John F. Kennedy, on how to raise scores on crucial measures. For JFK, Harris had to estimate the vote-share bump JFK would earn if he emphasized specific issues in the run up to the 1960 presidential election. To be effective, Harris had to transform typical scorekeeping surveys into platforms for prediction. It was a massive, mathematically intensive challenge and a reason he worked
First, a bit of background on modern-day crystal balls. Linear regression is excellent for predicting how changes in the values of independent variables (e.g., years of experience) will affect a continuous dependent variable (e.g., salary). But when the dependent variable is binary (e.g., "engaged" or "not engaged"), logistic regression "
Binary dependent variables are everywhere in market research. Yet market researchers rely on logistic regression only rarely because they've been unable to translate logits and odds–conventional measures for describing a predictor variable's impact–into clear findings and recommendations. Here's what a researcher might say to a hotel client after using logistic regression: "If all five million of your guests check in remotely, their logit of recommending the hotel will increase by 0.51." Or maybe this: "Guests who check in remotely have a 33% higher odds of recommending the hotel than those who check in traditionally." Those statements would leave most people scratching their heads.
From Esoteric to ElectricPercentage point effects simulation would have enabled the researcher to say: "If all five million of your guests check in remotely, the percentage who recommend the hotel will rise from 20 to 24. That translates to an increase of 200,000 guests (from 1 to 1.2 million)." The researcher also could have predicted remote check-in's effect on key segments (e.g., business travelers) and individual guests. That's the kind of clear-cut information most people want and need. Check out our simulations involving Trust in Pollsters and NBA 3-Pointers. Please login or register to access all other simulations.
Is percentage points effects simulation right for you? Opportunities abound to figure out how to raise scores, grow groups, and change the future.
Percentage point effects simulation works with any data type—it can enrich or even resuscitate what you've got already.
Starting anew can be a good option, too: the right design can enable optimal simulation. (See our research design FAQ.)
Send us your data. We'll return simulators in as little as a day.
We'll handle everything from survey (or information-system) design to data collection, reporting, analysis, and consulting.
We'll give you the keys to the factory.
We'd be happy to work on retainer.
There are three main reasons:
It's not possible to represent a percentage point effect with a single number in a summary regression equation (in contrast to logits and linear regression coefficients). The size of the effect will depend on how close the predicted probability (i.e., y-hat) is to 0 or 1 and the values of the model's other variables (i.e., the x's). The effect is slimmer at the top and bottom of the probability scale than in the middle. Alfred DeMaris, an sociologist and statistician, called this an "intractable" problem.
We believe the phenomenon reflects reality. Consider, for example, a television program like Fox News. Its advertising probably won't persuade diehard Democrats to make Fox their go-to news source. That same advertising probably won't make loyal Fox News viewers even more loyal—they’re already all-in. But Fox's advertising could, and should, be quite effective among people more in the middle. (They're part of what some term the movable middle.)
Compared to other methods (and models), logistic regression produces evidence that’s often more credible and trustworthy. That's one reason experts, such as sociologist Paul Allison, describe it as the
Logistic regression is central to some applications, such as discrete choice modeling (e.g., statistically modeling the choice to stay at one hotel rather than five alternatives) and attribution modeling (e.g., determining how different advertising “touch points" contribute to a desired action, such as purchasing a product).
It's also the go to method of survey researchers who use propensity score approaches to enhance sample representativeness and improve accuracy. (
Multinomial logit modeling would be the method of choice. The downside is that it complicates reporting and analysis.
If the starting probability is close to 0 or 1 (as in online advertising where click-through rates are often below 1 percent), the percentage point effect of a change in an otherwise important predictor variable could fall under the radar. An analysis might suggest, for example, that the use of active-voice language in call-to-action display ads, controlling for other variables' effects, increases the click-through probability from .0025 to .0067, a tiny number easy to overlook.
Conventional logistic regression reporting would supply the information (e.g., logits, z scores, odds ratios) needed to reduce the risk of making a mistake: the .0042 percentage point increase would translate to a 172% odds increase, and a one-point logit increase, both of which are substantial. It is a good example of what the statistician Frederick Mosteller termed "balancing biases," or letting "weaknesses from one method...be buttressed by strength from another."
Absolutely. If 40% of Americans say they are enthusiastic about the development of driverless vehicles, then any single person's probability of being enthusiastic is .40 (assuming we know nothing else about that person).
A crosstabulation ("crosstab") reports the relationship between two or more categorical variables. It might show, for example, that 44% of men and 36% of women are enthusiastic about the development of driverless vehicles. It tells you how things are. Percentage point effects simulation, in contrast, tells you how things would be had survey respondents responded differently. In short, it produces predictions.
Researchers have explored the possibility of estimating causal effects through survey research for decades. The belief is that a high-quality survey, coupled with an analytic method like percentage point effects simulation, can generate estimates of effect equivalent to those from a randomized controlled experiment. Through an experimenter's lens, the latter approach would be akin to quasi-experimentation, a method for estimating causal effects without random assignment. (See our
Although many different research methods and modules answer specific questions, they don't necessarily deepen stakeholders' understanding of markets, customers, prospects, and competitors. That's a big problem for companies looking to build and sustain a competitive advantage. Percentage point effects simulation makes it easy to dig beneath the surface to develop that kind of deep understanding.
Let's say a specific action you take grows a key group by 30 percentage points. You'd then need to estimate the total cost of that action. If it's $3 million, then the cost per percentage point increase (or ROI) would be $100,000 (i.e., $3 million/30 percentage points). See also page 645 of
Most concept tests (e.g., BASES) estimate the percentage (and number) of definite/probable buyers. We'd reframe that percentage as the probability of buying the eventual product. With that in hand, we'd develop a model to predict it, using diagnostic and other data from the concept test. We'd then package it in a simulator (akin to those on this site). You then could refine the concept (e.g., by emphasizing features that increase purchase probability—the simulator will report the size of any/every feature's increase)—before re-testing it. The simulator also will enable you to identify desirable groups (e.g., 30-49-year-old women living in the Northeast) and individuals based on their purchase probability.
Proponents of CX systems market their systems aggressively (e.g., "software to help turn customers into fanatics, products into obsessions, employees into ambassadors, and brands into religions."). And in fact, good CX systems possess many attributes. Through analysis of customer data (e.g., customer experience surveys, customer transactions), for instance, they make it easy to assign scores to individual customers, with those scores representing membership within, or proximity to, a critical group, such as brand promoters or repeat customers. They also make it easy to notify stakeholders of issues that may require attention. But they're missing at least one feature: the ability to produce precise predictions of the likely impact of potential actions on, say, group membership/size. That's where percentage point effects simulation fits in—it's all about producing those predictions.
There are many ways to estimate a variable's importance. One way, stated importance, involves direct questioning (e.g., “How important—not at all important, not too important, somewhat important, very important—is it for a hotel to offer guests a mobile check-in option?"). A second way, derived importance, uses correlation or regression analysis to estimate the degree to which a response to a stated importance question is associated with a response to a key outcome question (e.g., “On a scale of 0-10 where 0 represents not at all satisfied and 10 represents completely satisfied, how would you rate your overall satisfaction with your hotel stay?").
Many research agencies will create a matrix comparing these two importance measures. But that won't show you how a higher rating on a particlar variable (e.g., use of a mobile check-in option) would increase the size of a critical group (e.g., the percentage of people who give you a 9 or 10 on the overall satisfaction question). Percentage point effects simulation would report the increase.
Percentage point effects simulation works with all data, including B2B survey data (e.g., an annual customer experience survey) and CRM data (e.g., salesforce.com data).
Yes, it works with any data that can be coded, not just survey data.
It’s similar but typing tools don’t allow you to hold constant the values of predictor variables, in contrast to percentage point effects simulation. Typing tools also don’t make it easy for you to understand how individuals within the same segment differ from one another—percentage point effects simulation will generate a unique prediction (of belonging to the group of interest) for every member of every segment (thereby enabling 1-to-1 communication).
Through Nielsen, IRI, NPD, GfK, or STR, you already may know the size of your market, your share, and the average selling price of products (or services) in particular categories. It should be straightforward, as a next step, to convert those market-wide measures to per-customer spend estimates. Through linkage to other data sources (e.g., brand health data, customer experience, loyalty card data), synchronization, and percentage point effects simulation, you then could estimate how changes you could make would grow (or shrink) the size of your customer base, customer spending, and your market share. (See our
It depends on the character and quality of the available data. Ideally, that data will include at least one binary dependent variable and several potential drivers, as in our simulations.
We think of percentage point effects simulation as a quasi-experiment. An implication from a design perspective is that you should conceptualize your survey or information system as a platform for estimating causal effects. The platform should include the variables needed to enable percentage point effects simulation. In principle, the dependent variable—the one you’re trying to grow or shrink—should be a true dichotomy from the start (rather than, say, a 10-point scale you transform to a dichotomy post hoc). And the predictor variables, aside from any socio-demographic ones, should be potential levers.
If you're trying to figure out what question types to include as predictor variables in a survey, follow best practice. Recently conducted research suggests that unipolar, four-category, fully-anchored scales work well (in terms of usability, reliability, and validity) across modes (e.g., mobile, online, telephone, face-to-face, mail) and languages. Think about using two-point scales, too—they ease interpretation.
You also should keep in mind the concept of linking and syncing. It involves enhancing the extent to which the data sources on which you rely share common factors (e.g., sampling frames, data collection dates, questions). (See our
Data linkage is the act of putting together different data sources to enhance the usefulness of the combined information. It also can involve using data from one source to adjust data from another—we call that a link and sync process. For instance, a brand might use key data from market size and share reporting as a check and basis for adjusting survey information (e.g., self-reports of purchase behavior) from a brand health or customer experience survey. It would be akin to how Pew, Gallup, Ipsos, or YouGov use Census data (e.g., population percentages for key demographics) to adjust survey data. To describe that process, they would say something like this: results were weighted for age within sex, region, and race-ethnicity…to align them with their population proportions. Although data linkage may seem sensible in theory, it can be difficult to apply, particularly when the targeted data sources were neither designed nor conceived of as neatly-fitting puzzle pieces.
Those systems lack built-in modules for producing and reporting percentage point effects so custom programming would be needed.
After you assess all plausible scenarios, develop a plan to increase the target populations's probability of belonging to the group of interest (e.g., customers). A successfully-executed plan will increase the size (and, where applicable, spend) of the key group. You'll also need to think through how difficult it may be to make a particular change (or changes).
Interpreting the effect of a change in the value of a socio-demographic predictor variable on a key outcome can be tricky. Imagine that you select the 18-29-year-old value for the Age variable and hold constant (i.e., keep "as is") the values of all other predictor variables in our marijuana legalization simulator. In return, the simulator would churn out:
We would need to identify a handful of plausible values for the continuous variable and include them in the simulator. We also could convert a categorical predictor variable into a continuous one and take the same steps.
Our simulator produces predictions with 95% confidence intervals though some clients prefer that we not show them. With that said, those intervals can be an important safety check. In general, predictions based on larger sample sizes are more trustworthy than those based on smaller ones. By reporting 95% confidence intervals, or lower and upper estimates, we quantify the trustworthiness of our predictions. Here's how you should interpret 95% confidence intervals: "If the study had fielded (or had the same data been collected), say, 100 times, and nothing else had changed, then our predictions would lie within the upper and lower estimates 95 times out of 100."
That statement assumes that there were no biases other than those associated with how cases (e.g., people) were selected (i.e., sampled). That may not be realistic. In survey research, potential biases include non-coverage error, non-response error, question wording, and question order. Other data types (e.g., point-of-sale) may suffer from different biases.
In logistic regression analysis, those terms are synonyms for a variable (i.e., “an attribute that describes a person, place, thing, or idea") on the right-hand side of the equation. They explain or predict the binary dependent variable, or y. Sometimes, they can be thought of as levers, actions, linchpins, or drivers (e.g., a mobile check-in option at a hotel)—it depends partly on what they describe (e.g., a person, place, thing, or idea).
The model is unlikely to change meaningfully from month to month. Our suggestion, absent additional information, is to update it once or twice a year. It will give you time to implement the actions you've identfied to affect the key outcome(s) from the tracker.
We'd start by reviewing your objectives and the data available to support them. If you have what you need, we'd work with you to identify the group(s) (e.g., customers) of interest. We'd then build one or more logistic regression model to understand which predictor variables move the needle. Once we decide on the ultimate model(s), we'd generate all predicted probabilities before building the simulator(s). We'd also make sure you understand how to use it.
If you don't have the data you need, we'd advise you on how to produce it. Topics we typically touch on include
If the future is like the past, our predictions will be accurate.
Words like simplify, predict, and advise resonate with George Terhanian, founder of Electric Insights. The experiences he had as a basketball player and coach shape his view. Every player wanted to know how to increase their shooting percentage. They were looking for simple numbers: the expected increase in percentage points associated with any change they could make. Then they'd decide whether to put in the effort to make a particular change. Terhanian knows that CEOs, CMOs, brand managers, insights professionals, and others want that same thing: trustworthy predictions of how their actions will affect critical outcomes.
Terhanian has held C-level roles worked for The NPD Group, Toluna, and Harris Interactive, as his
Earlier in his career, he taught and coached in public and private schools. The basketball teams he helped coach at the Episcopal Academy in Philadelphia won three league titles, with an overall record of 75 wins and 6 losses.
Terhanian holds a Ph.D. from the University of Pennsylvania, a master’s degree from Harvard, and his undergraduate degree from Haverford College. He's known for conceiving of the idea of using propensity score matching to make survey data more accurate.
His work is published in several refereed journals. The UK’s Market Research Society named The Possible Benefits of Reporting Percentage Point Effects a finalist for "best paper in 2019's International Journal of Market Research."
Hit? Stand? Double? Master "Likely Effects" to Make the Right Call is Terhanian's most recent work. It is published in Quirk's.