Electric Insights: Simulation, Prediction, Forecasting, Optimization Electric Insights: Simulation for Better Decisions
Electric Insights
 Log in (or register) to access all
site features, including simulations.
Forgot username or password?
See & Change the Future

Light bulb moments are rare. But what if they occurred at the flip of a switch? And what if that let you see and change the future? Making that happen is our mission, starting with our plan to address a problem in the $90 billion market research industry: an emphasis on scorekeeping at the expense of prediction.

Scorekeeping
  1. "39% of Americans approve of how Joe Biden is handling his job as president."
  2. "Your market share is 12%."
Prediction
  1. "If all Americans (vs. 12%) felt economic conditions were excellent, 67% would approve of Joe Biden's performance."
  2. "If you eliminate all concern about your product's safety, you'll increase your share from 12 to 14%."
The Scorekeeping Problem is Not New

In a 1976 interview, pollster and consultant Lou Harris derided peers, whom he dubbed political eunuchs, for believing their job was done once they'd tallied scores, such as presidential approval ratings. He directed his criticism at George Gallup, who had argued that pollsters should be "fact-finders and scorekeepers, nothing else." Like Gallup, Harris took great care to gauge opinions, attitudes, and experiences. In contrast to Gallup, who coined the term scientific polling decades earlier, Harris believed that all social scientists and all pollsters had a fundamental duty to uncover cause-and-effect relationships. Scorekeeping was a step in that process.

Prediction is Paramount

Througout his career, Harris advised clients, including John F. Kennedy, on how to raise scores on crucial measures. For JFK, Harris had to estimate the vote-share bump JFK would earn if he emphasized specific issues in the run up to the 1960 presidential election. To be effective, Harris had to transform typical scorekeeping surveys into platforms for prediction. It was a massive, mathematically intensive challenge and a reason he worked "until 3 or 4 a.m." (See our surveys as experiments FAQ.)

Simulation Supercharged

Percentage point effects simulation supercharges Harris's approach, determining instantly in plain language how changes in opinions, attitudes, and experiences will lift scores and individual probabilities (e.g., a specific person's probability of voting for JFK). It also empowers you to like, rate, and comment on moves you might make. So you'll know what to do, who to target, and what to expect in return.

Percentage Point Effects Simulation

First, a bit of background on modern-day crystal balls (i.e., widely-used prediction methods). Linear regression is excellent for predicting how changes in the values of independent variables (e.g., years of relevant experience) will affect a continuous dependent variable (e.g., salary). But when the dependent variable is binary (e.g., "motivated" or "not motivated"), logistic regression "dominates all other methods in the social and biomedical sciences." It produces "a more accurate description of the world" and lets you interpret each prediction as a valid probability (e.g., a 0.77 probability of being a motivated employee).

Peculiar Predictions No More

Valid probabilities (i.e., between 0 and 1) are important. "If you want to give osteoporosis patients an estimate of their probability of hip fracture in the next 5 years," sociologist Paul Allison wrote, "you won't want to tell them it's 1.05 [as you could with linear regression]." Nor would Lou Harris have wanted to tell JFK he had a -0.13 probability of winning the 1960 presidential election before the voting booths had opened. Logistic regression won't produce peculiar predictions like that.

What the Hell's a Logit?

In market research, binary dependent variables are everywhere. Yet market researchers rely on logistic regression only rarely because they've been unable to translate logits and odds–conventional measures for describing a predictor variable's impact–into clear findings and recommendations.

Speaking in Tongues

Here's what a researcher might say to a hotel client after using logistic regression: "If all five million of your guests check in remotely, their logit of recommending the hotel will increase by 0.51." Or maybe this: "Guests who check in remotely have a 33% higher odds of recommending the hotel than those who check in traditionally." Those statements would leave most people scratching their heads.

From Esoteric to Electric

Percentage point effects simulation would have enabled the researcher to say: "If all five million of your guests check in remotely, the percentage who recommend the hotel will rise from 20 to 24. That translates to an increase of 200,000 guests (from 1 to 1.2 million)." The researcher also could have predicted remote check-in's effect on key segments (e.g., business travelers) and individual guests. That's the kind of clear-cut information most people want and need. Check out our simulations involving Driverless Vehicles, Trump Approval & Gun Policy, Trump Approval & Coronavirus, and Marijuana Legalization.


Applicability

Is percentage point effects simulation right for you? Opportunities abound to figure out how to raise scores, grow groups, and change the future.

Illustrative Group-Growing Opportunities
  • Market Size & Share Tracking: Buyers rather than Non-Buyers (See our market share FAQ.)
  • Usage & Attitudes: Cannabis Users rather than Non-Users
  • Targeting & Segmentation: Chocolate Lovers vs. All Others
  • Customer Experience Monitoring: Promoters vs. All Others (See our CX monitoring systems FAQ.)
  • Brand Tracking: Brand Lovers vs. All Others (See our continuous tracking surveys FAQ.)
  • Concept Testing: Definite/Probable Buyers vs. All Others (See our concept testing FAQ.)
  • Ad Testing: Ad Lovers vs. All Others
  • Pricing: Definite/Probable Buyers at specific price points vs. All Others
Data Needs

Percentage point effects simulation works with any data type—it can enrich or even resuscitate what you've got already.

Starting anew can be a good option, too: the right design can enable optimal simulation. (See our research design FAQ.)

Working Together
FAQs

Binary Dependent Variables

There are three main reasons:

  1. Market researchers are following the lead of academic researchers, who often focus on the sign and statistical significance of logit coefficients, with "little emphasis on the substantive and practical significance of the findings," according to sociologist Richard Williams.
  2. Most major statistical software packages don’t produce percentage point effects through pre-packaged procedures.
  3. It's not possible to represent a percentage point effect with a single number in a summary regression equation (in contrast to logits and linear regression coefficients). The size of the effect will depend on how close the predicted probability (i.e., y-hat) is to 0 or 1 and the values of the model's other variables (i.e., the x's). The effect is slimmer at the top and bottom of the probability scale than in the middle. Alfred DeMaris, an sociologist and statistician, called this an "intractable" problem. We believe the phenomenon reflects reality.

    Consider, for example, a television program like Fox News. Its advertising probably won't persuade diehard Democrats to make Fox their go-to news source. That same advertising probably won't make loyal Fox News viewers even more loyal—they’re already all-in. But Fox's advertising could, and should, be quite effective among people more in the middle. (They're part of what some term the movable middle.)

George Terhanian asked a team of marketing scientists from Harris Interactive that same question a decade ago. They returned a list of 31 methods.

Compared to other methods (and models), logistic regression produces evidence that’s often more credible and trustworthy. That's one reason experts, such as sociologist Paul Allison, describe it as the "dominant" method for predicting binary dependent variables.

Logistic regression is central to some applications, such as discrete choice modeling (e.g., statistically modeling the choice to stay at one hotel rather than five alternatives) and attribution modeling (e.g., determining how different advertising “touch points" contribute to a desired action, such as purchasing a product).

It's also the go to method of survey researchers who use propensity score approaches to enhance sample representativeness and improve accuracy. (George Terhanian introduced propensity scoring to the market research community in the late 1990s, as described in Public Opinion Quarterly.) But market researchers turn to logistic regression only rarely for core work (e.g., brand tracking, concept testing), despite the ubiquity of binary dependent variables.

Multinomial logit modeling would be the method of choice. The downside is that it complicates reporting and analysis.

If the starting probability is close to 0 or 1 (as in online advertising where click-through rates are often below 1 percent), the percentage point effect of a change in an otherwise important predictor variable could fall under the radar. An analysis might suggest, for example, that the use of active-voice language in call-to-action display ads, controlling for other variables' effects, increases the click-through probability from .0025 to .0067, a tiny number easy to overlook.

Conventional logistic regression reporting would supply the information (e.g., logits, z scores, odds ratios) needed to reduce the risk of making a mistake: the .0042 percentage point increase would translate to a 172% odds increase, and a one-point logit increase, both of which are substantial. It is a good example of what the statistician Frederick Mosteller termed "balancing biases," or letting "weaknesses from one method...be buttressed by strength from another."

Absolutely. If 40% of Americans say they are enthusiastic about the development of driverless vehicles, then any single person's probability of being enthusiastic is .40 (assuming we know nothing else about that person).

Researchers have explored the possibility of estimating causal effects through survey research for decades. The belief is that a high-quality survey, coupled with an analytic method like percentage point effects simulation, can generate estimates of effect equivalent to those from a randomized controlled experiment. Through an experimenter's lens, the latter approach would be akin to quasi-experimentation, a method for estimating causal effects without random assignment. (See our research design FAQ.)

Benefits of Percentage Point Effects Simulation

Although many different research methods and modules answer specific questions, they don't necessarily deepen stakeholders' understanding of markets, customers, prospects, and competitors. That's a big problem for companies looking to build and sustain a competitive advantage. Percentage point effects simulation makes it easy to dig beneath the surface to develop that kind of deep understanding.

Let's say a specific action you take grows a key group by 30 percentage points. You'd then need to estimate the total cost of that action. If it's $3 million, then the cost per percentage point increase (or ROI) would be $100,000 (i.e., $3 million/30 percentage points). See also page 645 of The Possible Benefits of Reporting Percentage Point Effects.

Most concept tests (e.g., BASES) estimate the percentage (and number) of definite/probable buyers. We'd reframe that percentage as the probability of buying the eventual product. With that in hand, we'd develop a model to predict it, using diagnostic and other data from the concept test. We'd then package it in a simulator (akin to those on this site). You then could refine the concept (e.g., by emphasizing features that increase purchase probability—the simulator will report the size of any/every feature's increase)—before re-testing it. The simulator also will enable you to identify desirable groups (e.g., 30-49-year-old women living in the Northeast) and individuals based on their purchase probability.

Proponents of CX systems market their systems aggressively (e.g., "software to help turn customers into fanatics, products into obsessions, employees into ambassadors, and brands into religions."). And in fact, good CX systems possess many attributes. Through analysis of customer data (e.g., customer experience surveys, customer transactions), for instance, they make it easy to assign scores to individual customers, with those scores representing membership within, or proximity to, a critical group, such as brand promoters or repeat customers. They also make it easy to notify stakeholders of issues that may require attention. But they're missing at least one feature: the ability to produce precise predictions of the likely impact of potential actions on, say, group membership/size. That's where percentage point effects simulation fits in—it's all about producing those predictions.

There are many ways to estimate a variable's importance. One way, stated importance, involves direct questioning (e.g., “How important—not at all important, not too important, somewhat important, very important—is it for a hotel to offer guests a mobile check-in option?"). A second way, derived importance, uses correlation or regression analysis to estimate the degree to which a response to a stated importance question is associated with a response to a key outcome question (e.g., “On a scale of 0-10 where 0 represents not at all satisfied and 10 represents completely satisfied, how would you rate your overall satisfaction with your hotel stay?").

Many research agencies will create a matrix comparing these two importance measures. But that won't show you how a higher rating on a particlar variable (e.g., use of a mobile check-in option) would increase the size of a critical group (e.g., the percentage of people who give you a 9 or 10 on the overall satisfaction question). Percentage point effects simulation would report the increase.

Percentage point effects simulation works with all data, including B2B survey data (e.g., an annual customer experience survey) and CRM data (e.g., salesforce.com data).

Yes, it works with any data that can be coded, not just survey data.

It’s similar but typing tools don’t allow you to hold constant the values of predictor variables, in contrast to percentage point effects simulation. Typing tools also don’t make it easy for you to understand how individuals within the same segment differ from one another—percentage point effects simulation will generate a unique prediction (of belonging to the group of interest) for every member of every segment (thereby enabling 1-to-1 communication).

Through Nielsen, IRI, NPD, GfK, or STR, you already may know the size of your market, your share, and the average selling price of products (or services) in particular categories. It should be straightforward, as a next step, to convert those market-wide measures to per-customer spend estimates. Through linkage to other data sources (e.g., brand health data, customer experience, loyalty card data), synchronization, and percentage point effects simulation, you then could estimate how changes you could make would grow (or shrink) the size of your customer base, customer spending, and your market share. (See our prediction accuracy FAQ.) You also could learn more (e.g., sociodemographics, attitudes, behaviors, beliefs, proclivities) about the customers and prospects most likely to contribute to your growth.

Design & Analysis

It depends on the character and quality of the available data. Ideally, that data will include at least one binary dependent variable and several potential drivers, as in our simulations.

We think of percentage point effects simulation as a quasi-experiment. An implication from a design perspective is that you should conceptualize your survey or information system as a platform for estimating causal effects. The platform should include the variables needed to enable percentage point effects simulation. In principle, the dependent variable—the one you’re trying to grow or shrink—should be a true dichotomy from the start (rather than, say, a 10-point scale you transform to a dichotomy post hoc). And the predictor variables, aside from any sociodemographic ones, should be potential levers.

If you're trying to figure out what question types to include as predictor variables in a survey, follow best practice. Recently conducted research suggests that unipolar, four-category, fully-anchored scales work well (in terms of usability, reliability, and validity) across modes (e.g., mobile, online, telephone, face-to-face, mail) and languages. Think about using two-point scales, too—they ease interpretation. You also should keep in mind the concept of linking and syncing. It involves enhancing the extent to which the data sources on which you rely share common factors (e.g., sampling frames, data collection dates, questions). (See our data linkage FAQ.)

Data linkage is the act of putting together different data sources to enhance the usefulness of the combined information. It also can involve using data from one source to adjust data from another—we call that a link and sync process. For instance, a brand might use key data from market size and share reporting as a check and basis for adjusting survey information (e.g., self-reports of purchase behavior) from a brand health or customer experience survey. It would be akin to how Pew, Gallup, Ipsos, or YouGov use Census data (e.g., population percentages for key demographics) to adjust survey data. To describe that process, they would say something like this: results were weighted for age within sex, region, and race-ethnicity…to align them with their population proportions. Although data linkage may seem sensible in theory, it can be difficult to apply, particularly when the targeted data sources were neither designed nor conceived of as neatly-fitting puzzle pieces.

Those systems lack built-in modules for producing and reporting percentage point effects so custom programming would be needed.

After you assess all plausible scenarios, develop a plan to increase the target populations's probability of belonging to the group of interest (e.g., customers). A successfully-executed plan will increase the size (and, where applicable, spend) of the key group. You'll also need to think through how difficult it may be to make a particular change (or changes).

Interpreting the effect of a change in the value of a sociodemographic predictor variable on a key outcome can be tricky. Imagine that you select the 18-29-year-old value for the Age variable and hold constant (i.e., keep "as is") the values of all other predictor variables in our marijuana legalization simulator. In return, the simulator would churn out:

  1. A 0.75 simulated probability of supporting marijuana's legalization, with the .08 difference from the .68 starting probabilty (Pew, 2019) the percentage point effect, and
  2. The associated 192 million (68%) adults who would support marijuana's legalization if all 255 million adults were 18-29 years old.
Knowing those numbers can be important for strategy, planning, and policy. To assume that all 255 million adults are (or could be) 18-29, however, is a stretch. That's why some people hold constant the values of sociodemographic variables when estimating population-wide effects. Need be, they also can zoom in on specific sub-populations and even particular individuals.

We would need to identify a handful of plausible values for the continuous variable and include them in the simulator. We also could convert a categorical predictor variable into a continuous one and take the same steps.

Miscellaneous

Our simulator produces predictions with 95% confidence intervals though some clients prefer that we not show them. With that said, those intervals can be an important safety check. In general, predictions based on larger sample sizes are more trustworthy than those based on smaller ones. By reporting 95% confidence intervals, or lower and upper estimates, we quantify the trustworthiness of our predictions. Here's how you should interpret 95% confidence intervals: "If the study had fielded (or had the same data been collected), say, 100 times, and nothing else had changed, then our predictions would lie within the upper and lower estimates 95 times out of 100."

That statement assumes that there were no biases other than those associated with how cases (e.g., people) were selected (i.e., sampled). That may not be realistic. In survey research, potential biases include non-coverage error, non-response error, question wording, and question order. Other data types (e.g., point-of-sale) may suffer from different biases.

In logistic regression analysis, those terms are synonyms for a variable (i.e., “an attribute that describes a person, place, thing, or idea") on the right-hand side of the equation. They explain or predict the binary dependent variable, or y. Sometimes, they can be thought of as levers, actions, linchpins, or drivers (e.g., a mobile check-in option at a hotel)—it depends partly on what they describe (e.g., a person, place, thing, or idea).

The model is unlikely to change meaningfully from month to month. Our suggestion, absent additional information, is to update it once or twice a year. It will give you time to implement the actions you've identfied to affect the key outcome(s) from the tracker.

Electric Insights is a new start-up so we don't have many clients, yet. With that said, we've worked with data sets covering several broad areas, including patient outcomes, sports performance, hospitality, public relations, public opinion, and sales performance.

We'd start by reviewing your objectives and the data available to support them. If you have what you need, we'd work with you to identify the group(s) (e.g., customers) of interest. We'd then build one or more logistic regression model to understand which predictor variables move the needle. Once we decide on the ultimate model(s), we'd generate all predicted probabilities before building the simulator(s). We'd also make sure you understand how to use it. If you don't have the data you need, we'd advise you on how to produce it. Topics we typically touch on include research design, data linkage, and any other that contributes to better data, reporting, analysis, and decisions.


If the future is like the past, our predictions will be accurate.

About

George Terhanian Words like simplify, predict, and advise resonate with George Terhanian, founder of Electric Insights. The experiences he had as a basketball player and coach shape his view. Every player wanted to know how to increase their shooting percentage. They were looking for simple numbers: the expected increase in percentage points associated with any change they could make. Then they'd decide whether to put in the effort to make that change. Terhanian knows that CEOs, CMOs, brand managers, insights professionals, and others want that same thing: predictions of how their actions will affect critical outcomes.

Experience

Terhanian has held C-level roles worked for The NPD Group, Toluna, and Harris Interactive, as his curriculum vitae shows. He has also served as a board or advisory group member for the National Academy of Sciences, the US Department of Education, the Advertising Research Foundation, the Insights Association, and the British Polling Council.

Earlier in his career, he taught and coached in public and private schools. The basketball teams he helped coach at the Episcopal Academy in Philadelphia won three league titles, with an overall record of 75 wins and 6 losses.

Academic Background

Terhanian holds a Ph.D. from the University of Pennsylvania, a master’s degree from Harvard, and his undergraduate degree from Haverford College. He's known for conceiving of the idea of using propensity score matching to make survey data more accurate.

His work is published in several refereed journals. The UK’s Market Research Society named The Possible Benefits of Reporting Percentage Point Effects a finalist for "best paper in 2019's International Journal of Market Research."

Contact Question? Drop a note!
New York, US
+1-646-430-3420
Info@ElectricInsights.com