9/11 Remembered
Scoring Science® SDK
A White Paper on Integration

Executive Summary
The Scoring Science Software Development Kit (SDK) has been developed to provide analytical features for vertical software applications. It is an ideal enhancement to almost any data-rich application environment, such as those used to record and retrieve customer information. Scoring Science maximizes the value of data stored in the host application by allowing users to generate predictive models that optimize decision-making about customers, products and services, and future resource requirements.

The Scoring Science SDK can perform analytics against any ODBC-compliant database. It works seamlessly with Oracle, SQL Server, DB2, and Access. The SDK can be integrated quickly and easily, with full support and updates available on a timely basis from Stone Analytics. This paper provides an overview of the key features of the SDK, the integration process, and how predictive models can be built and deployed.

SDK Overview
The SDK is a C Application Programmers Interface (API) that allows developers to create applications that build and deploy statistical scoring models from an existing ODBC-compliant database. It is currently available for multiple operating environments. The SDK is delivered as a Dynamically Linked Library (DLL) that can be redistributed with the resulting application, pursuant to the license agreement. This provides end users with easy updates to the Scoring Science SDK without requiring a recompile and distribution of the parent application.

The SDK provides interfaces to build scoring models either manually or automatically. Interfaces are also included for returning a list of models previously built, for performing categorical selection, and for deploying a current model to a new data set. The majority of the API follows a very similar set of calls: initialize, set options, start the process, poll the progress, end the process, and then obtain the results.

Scoring Science performs optimally when interacting with a well-known database definition. The greater the insight into the meaning of the data—and into what problem is being solved—the better the solution that will be produced. For example, if you are using Scoring Science to build a model to predict which individuals are most likely to buy a luxury automobile, data fields that help predict individual net worth or disposable income (e.g. income, age, and number of children) might have the greatest impact on the model. The complexity of integration has less to do with the SDK or its API, and more to do with the complexity of the data, its ability to support model solutions, and how much is exposed for interaction by the end user.

Integration efforts will also vary greatly based on the complexity of the parent application. Graphical User Interface applications require a greater amount of time to create, because of the design and implementation requirements that allow the user to interact with the SDK. Creating a simple, command line driven application to build and deploy Scoring Science results may require only an hour to develop. The majority of the integration effort is focused on defining your data source, and possibly determining which data fields are strong predictors for the models.

Development Environments
The SDK can be integrated into a C, C++, MFC, or VB application environment. A developer who is familiar with any of these environments will find the Scoring Science SDK fairly straightforward. Additionally, the SDK includes a detailed Developers Guide that outlines each function and provides numerous sample codes for each of the functions, as well as several "How To" applications. Most importantly, the SDK includes a Microsoft Visual Studio Project that exercises all the SDK interfaces and provides a sample database so that you can create a Scoring Science application right out of the box.

Scoring Science can be run in a threaded environment. Because some functions that run against large data sets require some time to complete, it is recommended that you run a Scoring Science "job" inside its own thread. It is not recommended to run multiple "jobs" at the same time. However, there are no restrictions against running Scoring Science in multiple, simultaneous applications, depending on the constraints of system resources (memory, CPU, etc).

The SDK offers full support in its Developers Manual, FAQ document, sample code, and technical support via email. More advanced technical support packages are also available.

The Modeling Process
The Scoring Science SDK consists of two main processes. The first is the model-building phase, which uses a data set with a known outcome for the target variable. It defines a model using predictor variables selected for their relevancy in determining the desired outcome. The second process involves deploying a previously defined model on a new data set in which the outcome of the target variable is unknown. The deployed model ranks, or scores, the records in the data set according to the likelihood of a desired outcome for the target variable. These scores can then be presented in a manner best suited for the product's integration.

The Scoring Science SDK provides additional functionality to produce quality models. Optional APIs can be called based on the type of integration being accomplished. Correlation, for example, allows the integrator to determine the strength of the relationship between the target variable and each of the selected predictor variables. Variables with strong correlation to the target variable are generally more accurate predictors.

For example, a target variable that defines which individuals are likely to be good customers at a new luxury car dealership might be strongly correlated with data fields that measure income, age, or number of dependents. Additionally, variables with a high percentage of missing values are typically unreliable predictors and should be avoided. Scoring Science provides an optional API that can be called to calculate the percentage of missing values for all potential predictor variables.

Automatic Analyst™ Function
A critical feature of the Scoring Science SDK is its ability to automatically build a model without requiring the intervention of the end user.

The Automatic Analyst function can construct a quality model by scanning a complete set of data and performing a series of analyses to determine which variables are effective predictors of the target outcome. More specifically, Automatic Analyst analyzes each potential predictor's relationship with the target variable and removes those with weak predictive strength. The remaining variables are then sorted by the degree to which each one can represent the entire set of candidate variables. For example, several candidate variables in a data set may describe an individual's income in a related manner, e.g., on an annual, monthly, and hourly basis. The variable that best describes a given set is elevated on the list of candidates and the others are lowered or removed. Automatic Analyst then returns a full sequence of variable names selected as the most effective predictors.

Scoring Science's Automatic Analyst function can produce a quality predictive model without requiring end users to be knowledgeable in statistical methods or processes.

Conclusion
The Scoring Science SDK provides robust predictive modeling technology that can be quickly and seamlessly integrated into data-rich application environments. The software puts powerful modeling tools at the fingertips of statistical novices to optimize decision making in business environments.

Evaluation copies of the Scoring Science SDK are available upon request. The evaluation copy contains the full API but produces models with zeroed out results. It is intended to provide the integrator the ability to measure the integration effort and performance characteristics of the Scoring Science engine.

For more information or an evaluation copy of the SDK, please contact sales@stoneanalytics.com


Decision Science
Decision Science for Marketing
ESRI Integrated Solutions
Automatic Analyst
White Paper
Support


Scoring Science
Valuation Science
System Requirements
White Paper
Scoring Science FAQs
Purchase
Documentation


Scoring Science Demo
Valuation Science Demo

home | company | products | services | partners | contact | site map | privacy policy