1. How does Scoring Science improve my decision-making?
A: Scoring Science uses the power of predictive modeling to replace guesswork in decision-making. If you have decisions that involve interpreting large amounts of customer or client data, Scoring Science can be used to predict the outcomes of your actions, such as whether customers are likely to respond to your promotional campaigns.
2. What is modeling?
A: Think about the processes you use every day to make decisions and judgments-given a wealth of past experiences and a variety of observations about the question at hand, you weigh the facts and make a determination. You are, in effect, modeling. Statistical modeling is the standardization of this process. Scoring models, in particular, seek to explain the relationships between a critical variable (such as response to a marketing campaign), and a group of factors that help predict its outcome (such as the demographic characteristics of the individuals to whom the campaign will be directed). The goal is to replace guesswork with objective analysis.
3. How reliable are my results?
A: Even the most sophisticated models in the world are only as good as the data used to build them. That said, Scoring Science employs some of the most effective and robust modeling techniques available. This includes a rating function that automatically evaluates the accuracy of the models it creates. Furthermore, whenever you deploy a model built with Scoring Science, it generates an expected response rate and a confidence interval that predicts the range in which your results will fall 95 times out of 100.
4. Do I need to be mathematically savvy to use Scoring Science?
A: Absolutely not. Scoring Science enables you to build and deploy sophisticated predictive models without the need to apply or decipher complex mathematical functions. It is very helpful, however, to have experience working with the data that you input so that you can better interpret the results of your work.
5. What is a score and how do I use it?
A: Scoring Science uses a scoring system to rank a population in order of probability. When building or deploying a model, Scoring Science assigns a score between 0 and 1 to each data point, whether the data point represents an individual or a geographic region such as a census or block group. Data points with scores closer to 1 are more likely to be of the type that you are looking for than data points with scores closer to 0. In other words, using the above example of a promotional campaign, the closer a score is to 1, the more likely it is to represent a customer who will respond positively to the campaign.
6. Can't I just use the statistical functions in Excel? Why do I need Scoring Science?
A: Scoring Science offers functionality far superior to the basic tools found in Excel and other spreadsheet applications. Using Excel is analogous to sticking with a typewriter when word processing programs are available. You might get basic results, but you are going to struggle in the process and make a lot of mistakes that are difficult to correct.
1. What is correlation and how do I use it?
A: Correlation is a statistical concept that measures the strength of a relationship between two variables. When building a model, you want to choose predictor variables that are highly correlated with your target variable. For example, if you were deciding where to locate a new luxury goods store, you would include predictor variables that describe a potential area's average income level and age because these factors are likely to correlate well with luxury purchases. Whether or not people exercise on a regular basis is probably less relevant.
2. In the User Guide, it says that you must formulate your target variable using a Boolean variable. What is a Boolean variable?
A: A Boolean variable is a variable for which there are only two possible answers, such as Yes or No, or True or False. For example, if you are looking for a good location to open a new store, you would assign True to indicate profitable locations and False to indicate unprofitable locations. Using profit information about existing store locations, you can build a model to predict whether or not prospective locations are likely to be profitable.
3. How do I turn my question into a Boolean variable?
A: The first step is to name your target variable, which in the above case might be called "Profitable Locations." The next step is to decide on a threshold to use in determining what qualifies as a profitable location. For example:
Profitable Store = Sales greater than $500,000 per year
In this case, a True value designates areas that contain stores with sales greater than $500,000 annually.
ArcView provides an option for creating a new data field as a Boolean variable. You use this option to assign a value of True to store locations with sales greater than $500,000, and a value of False to store locations with sales less than or equal to $500,000. Now you have a target variable with only two possible outcomes: True or False.
4. What guidelines should I use when choosing predictor variables?
A: The following are general guidelines to keep in mind when choosing predictor variables:
- More is not necessarily better. A few good predictor variables are typically all that you need.
- Choose only variables whose relationship to the target you can justify and explain.
- Avoid including two predictor variables that are highly correlated with each other, such as monthly income and yearly income.
5. How many records do I need to build a successful model?
A: There are no hard and fast rules regarding the minimum amount of data needed to build a model. Generally speaking, the bigger the dataset, the better the outcome. There are several factors to keep in mind:
- The dataset used to build the model should be similar in composition to the dataset on which the model will be deployed.
- Scoring Science randomly selects 20% of the records as a "hold-out set" on which to test the validity of the model.
- In situations where you must use a very small dataset (say, 50 records), keep in mind that the results will be less reliable than those in which you can use a larger dataset with, say, 10,000 records.
6. What should I do if I need to build more advanced models?
A: Contact Stone Analytics at firstname.lastname@example.org or call (877) 503-7540. Stone Analytics offers custom model building services for organizations that need the assistance of experienced statisticians.
1. How do I select the Unique Identifier in my data?
A: Unique Identifier is the field in your data that provides a distinct ID for every record. If your data focuses on individuals, it may be a customer ID number or social security number; for aggregate data, it may be the block group or zip code. No two entries under the Unique Identifier column in your data can be the same.
2. What does the Scan button do in the initial build and deploy screens?
A: The Scan button automatically searches for the Unique Identifier or Target Variable. It can save you the work of manually selecting the appropriate variables for these fields. In some cases, the Scan button can only narrow the choices to a few variables from which you then select.
3. How do I use the Apply Model to Map screen?
A: The Apply Model to Map screen contains a slider bar that can be used to select the desired number of records to view. As you move the slider bar to the right, it increases the number of records selected. Records are selected in order of score, highest to lowest. The results are then automatically updated in ArcView map view.
4. What is the Campaign Scenario screen and how do I use it?
A: When you click the More button on the Apply Model to Map screen, the Campaign Scenario screen becomes active. It projects the profitability of different marketing scenarios based on expected response rates, costs, and revenues per customer. You must enter values in the cost and revenue fields in order to complete the analysis. Fixed cost is the portion of your expenses that does not vary regardless of the number of customers you solicit, e.g., the cost of designing your promotional flyer. Unit cost, on the other hand, is the additional cost incurred for each solicitation, such as the postage for mailing each flyer. Expected revenue per response is the amount you are projected to earn from each customer who responds to your campaign.
Once values are entered, Scoring Science automatically calculates an Expected Net Return for your campaign. It also reports a 95% confidence interval for this return and displays it on a corresponding graph. The dotted line on the graph represents the most profitable cutoff percentage; the slider bar automatically moves to this profit-maximizing position when you change your values.