Light bulb moments are rare. But what if they occurred at the flip of a switch? And what if that enabled you to see and change the future?
Making that happen is our mission, starting with our plan to
tackle a big problem in the $90 billion "insights" industry: a massive over-reliance
on scorekeeping at the expense of prediction.
In a 1976 interview, pollster and consultant Lou Harris derided peers, whom he dubbed political eunuchs, for believing their job was done once they had tallied scores, such as presidential approval ratings. He directed his criticism at George Gallup, who had argued that pollsters should be
"fact-finders and scorekeepers, nothing else."
Like Gallup, Harris took great care to gauge opinions, attitudes, and experiences.
Unlike Gallup, who coined the term scientific polling decades earlier, Harris believed that all social scientists and all pollsters had a fundamental duty to
produce roadmaps that showed the way to higher (or lower) scores.
A great example is not hard to find.
In the run up to the 1960 presidential election,
Harris predicted the vote-share bump his client John F. Kennedy
would earn by emphasizing specific issues in his battle with Richard Nixon.
To accomplish that feat, Harris broke new ground by transforming typical scorekeeping surveys into platforms for prediction.
It was an enormous, mathematically-intensive challenge and a reason he worked "until 3 or 4 a.m." (See our surveys as experiments FAQ.)
Percentage points effects simulation supercharges Harris's approach, determining instantly in plain language how changes in opinions, attitudes, and experiences will lift scores and individual probabilities (e.g., a specific person's probability of voting for JFK).
It also empowers you to like, rate, and comment on moves you might make. So you'll know what to do, who to target, and what to expect in return.
See our simulations involving
Trust in Pollsters,
NBA 3-Pointers, and
Steph Curry 3-Pointers.
First, a bit of background on modern-day crystal balls. Linear regression is excellent for predicting how changes in the values of independent variables (e.g., years of experience) will affect a continuous dependent variable (e.g., salary). But when the dependent variable is binary (e.g., "engaged" or "not engaged"), logistic regression "dominates
all other methods in the social and biomedical sciences," producing
"a more accurate description of the world."
Binary dependent variables—or two mutually-exclusive groups—are ubiquitous yet "insights" industry researchers rely on logistic regression only rarely because
they have been unable to translate logits and odds–conventional measures for describing a predictor variable's impact–into clear findings and recommendations.
Here is what a researcher might say to a hotel client after using logistic regression: "If all five million of your guests check in remotely, their logit of recommending the hotel will increase by 0.51." Or maybe this: "Guests who check in remotely have a 33% higher odds of recommending the hotel than those who check in traditionally."
Those statements would leave most people scratching their heads.
Percentage point effects simulation would have enabled the researcher to say: "If all five million of your guests check in remotely, the percentage who recommend the hotel will rise from 20 to 24. That translates to an increase of 200,000 guests (from 1 to 1.2 million)."
The researcher also could have predicted remote check-in's effect on key segments (e.g., business travelers) and individual guests. That's the kind of clear-cut information most people want and need.
Check out our simulations involving
Trust in Pollsters,
NBA 3-Pointers, and
Steph Curry 3-Pointers.
Is percentage points effects simulation right for you? Opportunities abound to figure out how to raise scores, grow groups, and change the future.
Percentage point effects simulation works with any data type—it can enrich or even resuscitate what you've got already.
Starting anew can be a good option, too: the right design can enable optimal simulation. (See our research design FAQ.)
Send us your data. We'll return simulators in as little as a day.
We'll handle everything from survey (or information-system) design to data collection, reporting, analysis, and consulting.
We'll give you the keys to the factory.
We'd be happy to work on retainer.
There are three main reasons:
It's not possible to represent a percentage point effect with a single number in a summary regression equation (in contrast to logits and linear regression coefficients). The size of the effect will depend on how close the predicted probability (i.e., y-hat) is to 0 or 1 and the values of the model's other variables (i.e., the x's). The effect is slimmer at the top and bottom of the probability scale than in the middle. Alfred DeMaris, a sociologist and statistician, called this an "intractable" problem.
We believe the phenomenon reflects reality. Consider, for example, a television program like Fox News. Its advertising probably won't persuade diehard Democrats to make Fox their go-to news source. That same advertising probably won't make loyal Fox News viewers even more loyal—they’re already all-in. But Fox's advertising could, and should, be quite effective at influencing people more in the middle.
George Terhanian asked a team of marketing scientists from Harris Interactive that same question a decade ago. They returned a list of 31 methods.
Compared to other methods (and models), logistic regression produces evidence that’s often more credible and trustworthy. That's one reason experts, such as sociologist Paul Allison, describe it as the "dominant" method for predicting binary dependent variables.
Logistic regression is central to some applications, such as discrete choice modeling (e.g., statistically modeling the choice to stay at one hotel rather than five alternatives) and attribution modeling (e.g., determining how different advertising “touch points" contribute to a desired action, such as purchasing a product).
It's also the go to method of survey researchers who use propensity score approaches to enhance sample representativeness and improve accuracy. (George Terhanian introduced propensity scoring to the "insights" industry community in the late 1990s, as described in Public Opinion Quarterly.) But "insights" industry researchers turn to logistic regression only rarely for core work (e.g., brand tracking, concept testing), despite the ubiquity of binary dependent variables.
Multinomial logit modeling would be the method of choice. The downside is that it complicates reporting and analysis.
If the starting probability is close to 0 or 1 (as in online advertising where click-through rates are often below 1 percent), the percentage point effect of a change in an otherwise important predictor variable could fall under the radar. An analysis might suggest, for example, that the use of active-voice language in call-to-action display ads, controlling for other variables' effects, increases the click-through probability from .0025 to .0067, a tiny number easy to overlook.
Conventional logistic regression reporting would supply the information (e.g., logits, z scores, odds ratios) needed to reduce the risk of making a mistake: the .0042 percentage point increase would translate to a 172% odds increase, and a one-point logit increase, both of which are substantial. It is a good example of what the statistician Frederick Mosteller termed "balancing biases," or letting "weaknesses from one method...be buttressed by strength from another."
Absolutely. If 40% of Americans say they are enthusiastic about the development of driverless vehicles, then any single person's probability of being enthusiastic is .40 (assuming we know nothing else about that person).
A crosstabulation ("crosstab") does not change the underlying (i.e., recorded, observed, original) data. It tells you how things are. Percentage point effects simulation tells you how things would if the underlying data were to change. It produces predictions.
Researchers have explored the possibility of estimating causal effects through survey research for decades. The belief is that a high-quality survey, coupled with an analytic method like percentage point effects simulation, can generate estimates of effect equivalent to those from a randomized controlled experiment. Through an experimenter's lens, the approach would be akin to quasi-experimentation, a method for estimating causal effects without random assignment. (See our research design FAQ.)
Although many different research methods and modules answer specific questions, they don't necessarily deepen stakeholders' understanding of markets, customers, prospects, and competitors. That's a big problem for companies looking to build and sustain a competitive advantage. Percentage point effects simulation makes it easy to dig beneath the surface to develop that kind of deep understanding.
Let's say a specific action you take grows a key group by 30 percentage points. You'd then need to estimate the total cost of that action. If it's $3 million, then the cost per percentage point increase (or ROI) would be $100,000 (i.e., $3 million/30 percentage points). See also page 645 of The Possible Benefits of Reporting Percentage Point Effects.
Most concept tests (e.g., BASES) estimate the percentage (and number) of definite/probable buyers. We'd reframe that percentage as the probability of buying the eventual product. With that in hand, we'd develop a model to predict it, using diagnostic and other data from the concept test. We'd then package it in a simulator (akin to those on this site). You then could refine the concept (e.g., by emphasizing features that increase purchase probability—the simulator will report the size of any/every feature's increase)—before re-testing it. The simulator also will enable you to identify desirable groups (e.g., 30-49-year-old women living in the Northeast) and individuals based on their purchase probability.
Proponents of CX systems market their systems aggressively (e.g., "software to help turn customers into fanatics, products into obsessions, employees into ambassadors, and brands into religions."). And in fact, good CX systems possess many attributes. Through analysis of customer data (e.g., customer experience surveys, customer transactions), for instance, they make it easy to assign scores to individual customers, with those scores representing membership within, or proximity to, a critical group, such as brand promoters or repeat customers. They also make it easy to notify stakeholders of issues that may require attention. But they're missing at least one feature: the ability to produce precise predictions of the likely impact of potential actions on, say, group membership/size. That's where percentage point effects simulation fits in—it's all about producing those predictions.
There are many ways to estimate a variable's importance. One way, stated importance, involves direct questioning (e.g., “How important—not at all important, not too important, somewhat important, very important—is it for a hotel to offer guests a mobile check-in option?"). A second way, derived importance, uses correlation or regression analysis to estimate the degree to which a response to a stated importance question is associated with a response to a key outcome question (e.g., “On a scale of 0-10 where 0 represents not at all satisfied and 10 represents completely satisfied, how would you rate your overall satisfaction with your hotel stay?").
Many research agencies will create a matrix comparing these two importance measures. But that won't show you how a higher rating on a particlar variable (e.g., use of a mobile check-in option) would increase the size of a critical group (e.g., the percentage of people who give you a 9 or 10 on the overall satisfaction question). Percentage point effects simulation would report the increase.
Percentage point effects simulation works with all data, including B2B survey data (e.g., an annual customer experience survey) and CRM data (e.g., Salesforce data).
Yes, it works with any data that can be coded, not just survey data.
It’s similar but typing tools don’t allow you to hold constant the values of predictor variables, in contrast to percentage point effects simulation. Typing tools also don’t make it easy for you to understand how individuals within the same segment differ from one another—percentage point effects simulation will generate a unique prediction (of belonging to the group of interest) for every member of every segment (thereby enabling 1-to-1 communication).
Through Nielsen, IRI, NPD, GfK, or STR, you already may know the size of your market, your share, and the average selling price of products (or services) in particular categories. It should be straightforward, as a next step, to convert those market-wide measures to per-customer spend estimates. Through linkage to other data sources (e.g., brand health data, customer experience, loyalty card data), synchronization, and percentage point effects simulation, you then could estimate how changes you could make would grow (or shrink) the size of your customer base, customer spending, and your market share. (See our prediction accuracy FAQ.) You also could learn more (e.g., socio-demographics, attitudes, behaviors, beliefs, proclivities) about the customers and prospects most likely to contribute to your growth.
It depends on the character and quality of the available data. Ideally, that data will include at least one binary dependent variable and several potential drivers, as in our simulations.
We think of percentage point effects simulation as a quasi-experiment. An implication from a design perspective is that you should conceptualize your survey or information system as a platform for estimating causal effects. The platform should include the variables needed to enable percentage point effects simulation. In principle, the dependent variable—the one you’re trying to grow or shrink—should be a true dichotomy from the start (rather than, say, a 10-point scale you transform to a dichotomy post hoc). And the predictor variables, aside from any socio-demographic ones, should be potential levers.
If you're trying to figure out what question types to include as predictor variables in a survey, follow best practice. Recently conducted research suggests that unipolar, four-category, fully-anchored scales work well (in terms of usability, reliability, and validity) across modes (e.g., mobile, online, telephone, face-to-face, mail) and languages. Think about using two-point scales, too—they ease interpretation.
You also should keep in mind the concept of linking and syncing. It involves enhancing the extent to which the data sources on which you rely share common factors (e.g., sampling frames, data collection dates, questions). (See our data linkage FAQ.)
Data linkage is the act of putting together different data sources to enhance the usefulness of the combined information. It also can involve using data from one source to adjust data from another—we call that a link and sync process. For instance, a brand might use key data from market size and share reporting as a check and basis for adjusting survey information (e.g., self-reports of purchase behavior) from a brand health or customer experience survey. It would be akin to how Pew, Gallup, Ipsos, or YouGov use Census data (e.g., population percentages for key demographics) to adjust survey data. To describe that process, they would say something like this: results were weighted for age within sex, region, and race-ethnicity…to align them with their population proportions. Although data linkage may seem sensible in theory, it can be difficult to apply, particularly when the targeted data sources were neither designed nor conceived of as neatly-fitting puzzle pieces.
Those systems lack built-in modules for producing and reporting percentage point effects so custom programming would be needed.
After you assess all plausible scenarios, develop a plan to increase the target populations's probability of belonging to the group of interest (e.g., customers). A successfully-executed plan will increase the size (and, where applicable, spend) of the key group. You'll also need to think through how difficult it may be to make a particular change (or changes).
We would need to identify a handful of plausible values for the continuous variable and include them in the simulator. We also could convert a categorical predictor variable into a continuous one and take the same steps.
Our simulator produces predictions with 95% confidence intervals though some clients prefer that we not show them. With that said, those intervals can be an important safety check.
In general, predictions based on larger sample sizes are more trustworthy than those based on smaller ones. By reporting 95% confidence intervals, or lower and upper estimates, we quantify the trustworthiness of our predictions. Here's how you should interpret 95% confidence intervals: "If the study had fielded (or had the same data been collected), say, 100 times, and nothing else had changed, then our predictions would lie within the upper and lower estimates 95 times out of 100."
That statement assumes that there were no biases other than those associated with how cases (e.g., people) were selected (i.e., sampled). That may not be realistic. In survey research, potential biases include non-coverage error, non-response error, question wording, and question order. Other data types (e.g., point-of-sale) may suffer from different biases.
In logistic regression analysis, those terms are synonyms for a variable (i.e., “an attribute that describes a person, place, thing, or idea") on the right-hand side of the equation. They explain or predict the binary dependent variable, or y. Sometimes, they can be thought of as levers, actions, linchpins, or drivers (e.g., a mobile check-in option at a hotel)—it depends partly on what they describe (e.g., a person, place, thing, or idea).
The model is unlikely to change meaningfully from month to month. Our suggestion, absent additional information, is to update it once or twice a year. It will give you time to implement the actions you've identfied to affect the key outcome(s) from the tracker.
Electric Insights is a new start-up so we don't have many clients, yet. With that said, we've worked with data sets covering several broad areas, including patient outcomes, sports performance, hospitality, public relations, public opinion, and sales performance.
We'd start by reviewing your objectives and the data available to support them. If you have what you need, we'd work with you to identify the group(s) (e.g., customers) of interest. We'd then build one or more logistic regression model to understand which predictor variables move the needle. Once we decide on the ultimate model(s), we'd generate all predicted probabilities before building the simulator(s). We'd also make sure you understand how to use it.
If you don't have the data you need, we'd advise you on how to produce it. Topics we typically touch on include
data linkage, and any other that contributes to better data, reporting, analysis, and decisions.
If the future is like the past, our predictions will be accurate.
Words like simplify, predict, and advise resonate with George Terhanian, founder of Electric Insights. The experiences he had as a basketball player and coach shape his view. Every player wanted to know how to increase their shooting percentage.
They were looking for simple numbers: the expected increase in percentage points associated with any change they could make. Then they'd decide whether to put in the effort to make a particular change. Terhanian knows that CEOs, CMOs, brand managers, insights professionals, and others want that same thing: trustworthy predictions of how their actions will affect critical outcomes.
Terhanian has held C-level roles worked for The NPD Group, Toluna, and Harris Interactive, as his curriculum vitae shows. He has also served as a board or advisory group member for the National Academy of Sciences, the US Department of Education, the Advertising Research Foundation, the Insights Association, and the British Polling Council.
Earlier in his career, he taught and coached in public and private schools. The basketball teams he helped coach at the Episcopal Academy in Philadelphia won three league titles, with an overall record of 75 wins and 6 losses.
Terhanian holds a Ph.D. from the University of Pennsylvania, a master’s degree from Harvard, and his undergraduate degree from Haverford College. He's known for conceiving of the idea of using propensity score matching to make survey data more accurate.
His work is published in several refereed journals.
The UK’s Market Research Society named The Possible Benefits of Reporting Percentage Point Effects a finalist for "best paper in 2019's International Journal of Market Research."
Hit? Stand? Double? Master "Likely Effects" to Make the Right Call is Terhanian's most recent work. It is published in Quirk's.