Foundations of Predictive Analytics
Features of the book include:
- Contains all of the key elements required for statistical modeling and predictive analytics
- Covers a wide range of important but difficult-to-find topics
- Gives a step-by-step mathematical derivation of each technique, from the underlying assumptions to final conclusion
- Discusses the practical aspects of modeling and predicting, with many examples from consumer behavior modeling and more
Summary:
Foundations of Predictive Analytics presents the fundamental background required for analyzing data and building models for many practical applications such as consumer behavior modeling, risk and marketing analytics, stock forecasting, and many other applied areas. In addition to the foundational theory and methodologies, it discusses a variety of practical topics that are frequently missing from similar texts.
The book begins with the statistical and linear algebra/matrix foundation of modeling methods, from distributions to cumulant and copula functions to Cornish-Fisher expansion and other useful but hard-to-find statistical techniques. It then describes common and unusual linear methods as well as popular nonlinear modeling approaches, including additive models, trees, support vector machines, fuzzy systems, clustering, naive Bayes, and neural nets. The authors go on to cover methodologies used in time series and forecasting such as ARIMA, GARCH, and survival analysis. They also present a wide variety of optimization techniques and explore several unusual topics, such as Dempster-Shafer theory.
An in-depth collection of the most important fundamental material on predictive analytics, this self-contained book provides the necessary information for understanding various techniques for exploratory data analysis and modeling. It explains the algorithmic details behind each technique (including underlying assumptions and mathematical formulations) and shows how to prepare and encode data, select variables, use a variety of model goodness measures, normalize odds, and perform reject inference.
Foundations of Predictive Analytics is a volume in the Chapman & Hall/CRC Data Mining and Knowledge Discovery Series edited by Vipin Kumar, University of Minnesota, Minneapolis.
We invite you to browse through the Table of Contents. If you are interested in these topics, you may be ready to buy a copy at Amazon.com. We hope you enjoy reading it!
Table of Contents
- Introduction
- 1.1 What Is a Model?
- 1.2 What Is a Statistical Model?
- 1.3 The Modeling Process
- 1.4 Modeling Pitfalls
- 1.5 Characteristics of Good Modelers
- 1.6 The Future of Predictive Analytics
- Properties of Statistical Distributions
- 2.1 Fundamental Distributions
- 2.1.1 Uniform Distribution
- 2.1.2 Details of the Normal (Gaussian) Distribution
- 2.1.3 Lognormal Distribution
- 2.1.4 Gamma Distribution
- 2.1.5 Chi-Squared Distribution
- 2.1.6 Non-Central Chi-Squared Distribution
- 2.1.7 Student's t-Distribution
- 2.1.8 Multivariate t-Distribution
- 2.1.9 F-Distribution
- 2.1.10 Binomial Distribution
- 2.1.11 Poisson Distribution
- 2.1.12 Exponential Distribution
- 2.1.13 Geometric Distribution
- 2.1.14 Hypergeometric Distribution
- 2.1.15 Negative Binomial Distribution
- 2.1.16 Inverse Gaussian (IG) Distribution
- 2.1.17 Normal Inverse Gaussian (NIG) Distribution
- 2.2 Central Limit Theorem
- 2.3 Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Data
- 2.4 Estimate of the Standard Deviation of the Sample Mean
- 2.5 (Pseudo) Random Number Generators
- 2.5.1 Mersenne Twister Pseudorandom Number Generator
- 2.5.2 Box-Muller Transform for Generating a Normal Distribution
- 2.6 Transformation of a Distribution Function
- 2.7 Distribution of a Function of Random Variables
- 2.7.1 Z = X + Y
- 2.7.2 Z = X . Y
- 2.7.3 (Z1, Z2, ..., Zn) = (X1, X2, ..., Xn) . Y
- 2.7.4 Z = X/Y
- 2.7.5 Z = max(X, Y)
- 2.7.6 Z = min(X, Y)
- 2.8 Moment Generating Function
- 2.8.1 Moment Generating Function of Binomial Distribution
- 2.8.2 Moment Generating Function of NormalDistribution
- 2.8.3 Moment Generating Function of the Gamma Distribution
- 2.8.4 Moment Generating Function of Chi-Square Distribution
- 2.8.5 Moment Generating Function of the Poisson Distribution
- 2.9 Cumulant Generating Function
- 2.10 Characteristic Function
- 2.10.1 Relationship between Cumulative Function and Characteristic Function
- 2.10.2 Characteristic Function of Normal Distribution
- 2.10.3 Characteristic Function of Gamma Distribution
- 2.11 Chebyshev's Inequality
- 2.12 Markov's Inequality
- 2.13 Gram-Charlier Series
- 2.14 Edgeworth Expansion
- 2.15 Cornish-Fisher Expansion
- 2.15.1 Lagrange Inversion Theorem
- 2.15.2 Cornish-Fisher Expansion
- 2.16 Copula Functions
- 2.16.1 Gaussian Copula
- 2.16.2 t-Copula
- 2.16.3 Archimedean Copula
- Important Matrix Relationships
- 3.1 Pseudo-Inverse of a Matrix
- 3.2 A Lemma of Matrix Inversion
- 3.3 Identity for a Matrix Determinant
- 3.4 Inversion of Partitioned Matrix
- 3.5 Determinant of Partitioned Matrix
- 3.6 Matrix Sweep and Partial Correlation
- 3.7 Singular Value Decomposition (SVD)
- 3.8 Diagonalization of a Matrix
- 3.9 Spectral Decomposition of a Positive Semi-Definite Matrix
- 3.10 Normalization in Vector Space
- 3.11 Conjugate Decomposition of a Symmetric Definite Matrix
- 3.12 Cholesky Decomposition
- 3.13 Cauchy-Schwartz Inequality
- 3.14 Relationship of Correlation among Three Variables
- Linear Modeling and Regression
- 4.1 Properties of Maximum Likelihood Estimators
- 4.1.1 Likelihood Ratio Test
- 4.1.2 Wald Test
- 4.1.3 Lagrange Multiplier Statistic
- 4.2 Linear Regression
- 4.2.1 Ordinary Least Squares (OLS) Regression
- 4.2.2 Interpretation of the Coeficients of Linear Regression
- 4.2.3 Regression on Weighted Data
- 4.2.4 Incrementally Updating a Regression Model with Additional Data
- 4.2.5 Partitioned Regression
- 4.2.6 How Does the Regression Change When Adding One More Variable?
- 4.2.7 Linearly Restricted Least Squares Regression
- 4.2.8 Significance of the Correlation Coeficient
- 4.2.9 Partial Correlation
- 4.2.10 Ridge Regression
- 4.3 Fisher's Linear Discriminant Analysis
- 4.4 Principal Component Regression (PCR)
- 4.5 Factor Analysis
- 4.6 Partial Least Squares Regression (PLSR)
- 4.7 Generalized Linear Model (GLM)
- 4.8 Logistic Regression: Binary
- 4.9 Logistic Regression: Multiple Nominal
- 4.10 Logistic Regression: Proportional Multiple Ordinal
- 4.11 Fisher Scoring Method for Logistic Regression
- 4.12 Tobit Model: A Censored Regression Model
- 4.12.1 Some Properties of the Normal Distribution
- 4.12.2 Formulation of the Tobit Model
- Nonlinear Modeling
- 5.1 Naive Bayesian Classifier
- 5.2 Neural Network
- 5.2.1 Back Propagation Neural Network
- 5.3 Segmentation and Tree Models
- 5.3.1 Segmentation
- 5.3.2 Tree Models
- 5.3.3 Sweeping to Find the Best Cutpoint
- 5.3.4 Impurity Measure of a Population: Entropy and Gini Index
- 5.3.5 Chi-Square Splitting Rule
- 5.3.6 Implementation of Decision Trees
- 5.4 Additive Models
- 5.4.1 Boosted Tree
- 5.4.2 Least Squares Regression Boosting Tree
- 5.4.3 Binary Logistic Regression Boosting Tree
- 5.5 Support Vector Machine (SVM)
- 5.5.1 Wolfe Dual
- 5.5.2 Linearly Separable Problem
- 5.5.3 Linearly Inseparable Problem
- 5.5.4 Constructing Higher-Dimensional Space and Kernel
- 5.5.5 Model Output
- 5.5.6 C-Support Vector Classification (C-SVC) for Classification
- 5.5.7 Epsilon-Support Vector Regression (Epsilon-SVR) for Regression
- 5.5.8 The Probability Estimate
- 5.6 Fuzzy Logic System
- 5.6.1 A Simple Fuzzy Logic System
- 5.7 Clustering
- 5.7.1 K Means, Fuzzy C Means
- 5.7.2 Nearest Neighbor, K Nearest Neighbor (KNN)
- 5.7.3 Comments on Clustering Methods
- Time Series Analysis
- 6.1 Fundamentals of Forecasting
- 6.1.1 Box-Cox Transformation
- 6.1.2 Smoothing Algorithms
- 6.1.3 Convolution of Linear Filters
- 6.1.4 Linear Difference Equation
- 6.1.5 The Autocovariance Function and Autocorrelation Function
- 6.1.6 The Partial Autocorrelation Function
- 6.2 ARIMA Models
- 6.2.1 MA(q) Process
- 6.2.2 AR(p) Process
- 6.2.3 ARMA(p; q) Process
- 6.3 Survival Data Analysis
- 6.3.1 Sampling Method
- 6.4 Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1)
- 6.4.1 Exponentially Weighted Moving Average (EWMA)
- 6.4.2 ARCH and GARCH Models
- Data Preparation and Variable Selection
- 7.1 Data Quality and Exploration
- 7.2 Variable Scaling and Transformation
- 7.3 How to Bin Variables
- 7.3.1 Equal Interval
- 7.3.2 Equal Population
- 7.3.3 Tree Algorithms
- 7.4 Interpolation in One and Two Dimensions
- 7.5 Weight of Evidence (WOE) Transformation
- 7.6 Variable Selection Overview
- 7.7 Missing Data Imputation
- 7.8 Stepwise Selection Methods
- 7.8.1 Forward Selection in Linear Regression
- 7.8.2 Forward Selection in Logistic Regression
- 7.9 Mutual Information, KL Distance
- 7.10 Detection of Multicollinearity
- Model Goodness Measures
- 8.1 Training, Testing, Validation
- 8.2 Continuous Dependent Variable
- 8.2.1 Example: Linear Regression
- 8.3 Binary Dependent Variable (Two-Group Classification)
- 8.3.1 Kolmogorov-Smirnov (KS) Statistic
- 8.3.2 Confusion Matrix
- 8.3.3 Concordant and Discordant
- 8.3.4 R2 for Logistic Regression
- 8.3.5 AIC and SBC
- 8.3.6 Hosmer-Lemeshow Goodness-of-Fit Test
- 8.3.7 Example: Logistic Regression
- 8.4 Population Stability Index Using Relative Entropy
- Optimization Methods
- 9.1 Lagrange Multiplier
- 9.2 Gradient Descent Method
- 9.3 Newton-Raphson Method
- 9.4 Conjugate Gradient Method
- 9.5 Quasi-Newton Method
- 9.6 Genetic Algorithms (GA)
- 9.7 Simulated Annealing
- 9.8 Linear Programming
- 9.9 Nonlinear Programming (NLP)
- 9.9.1 General Nonlinear Programming (GNLP)
- 9.9.2 Lagrange Dual Problem
- 9.9.3 Quadratic Programming (QP)
- 9.9.4 Linear Complementarity Programming (LCP)
- 9.9.5 Sequential Quadratic Programming (SQP)
- 9.10 Nonlinear Equations
- 9.11 Expectation-Maximization (EM) Algorithm
- 9.12 Optimal Design of Experiment
- Miscellaneous Topics
- 10.1 Multidimensional Scaling
- 10.2 Simulation
- 10.3 Odds Normalization and Score Transformation
- 10.4 Reject Inference
- 10.5 Dempster-Shafer Theory of Evidence
- 10.5.1 Some Properties in Set Theory
- 10.5.2 Basic Probability Assignment, Belief Function, and Plausibility Function
- 10.5.3 Dempster-Shafer's Rule of Combination
- 10.5.4 Applications of Dempster-Shafer Theory of Evidence: Multiple Classifier Function
- Appendix A Useful Mathematical Relations
- A.1 Information Inequality
- A.2 Relative Entropy
- A.3 Saddle-Point Method
- A.4 Stirling's Formula
- A.5 Convex Function and Jensen's Inequality
- Appendix B DataMinerXL - Microsoft Excel Add-In for Building Predictive Models
- B.1 Overview
- B.2 Utility Functions
- B.3 Data Manipulation Functions
- B.4 Basic Statistical Functions
- B.5 Modeling Functions for All Models
- B.6 Weight of Evidence Transformation Functions
- B.7 Linear Regression Functions
- B.8 Partial Least Squares Regression Functions
- B.9 Logistic Regression Functions
- B.10 Time Series Analysis Functions
- B.11 Naive Bayes Classifier Functions
- B.12 Tree-Based Model Functions
- B.13 Clustering and Segmentation Functions
- B.14 Neural Network Functions
- B.15 Support Vector Machine Functions
- B.16 Optimization Functions
- B.17 Matrix Operation Functions
- B.18 Numerical Integration Functions
- B.19 Excel Built-in Statistical Distribution Functions
- Bibliography
- Index