The analytics industry is becoming ever more sophisticated and advanced.  Nearly every week, we hear about a new analytics innovation that involves a new technique, a new area of data or a new approach to an old problem that enables some company make a significant step forward in how they are analyzing their customer base to make better decisions.

That said, I find that there are still plenty of opportunities to apply our favorite analytics workhorse – the logistic regression predictive model – to a wide variety of problems.

I write this as I am staring at the modeling output for a model that predicts which automobiles out on the road are most likely to need a major brake job in the next six months, and it strikes me just how similar this problem is to so many other problems that we in the analytics industry face practically every day.  In a word, we have a deep, longitudinal data set full of a long list of highly correlated variables, and we desire to assign the odds of a given event occurring in an outcome dataset so that we can take targeted customer-level action. 

I’ve noticed, however, that many analysts have become so adept at “new” analytic techniques that their ability to create a truly effective logistic regression model has suffered.  To help prevent “great” modeling from becoming a lost art, and to serve as a resource for any aspiring (and/or rusty) modelers out there, I dusted off an old White Paper I co-wrote with Randy Smith, a former colleague of mine from Peak Data Solutions, back in 2003 and found that many of its key points are still highly relevant today.  I have updated some of the examples to reflect new examples and realities (and to remove the pages and pages that talked about sampling in order to save disk space and permutational power – which was still factor just five years ago!) and I offer a revised link here (172.1K PDF).

Stored in: Analytics