Abbott Analytics: Data Mining Consulting
Services

Services: Data Mining Project Assessment, Data Preparation For Data Mining, Data Mining Model Development, Data Mining Model Deployment, Data Mining Course: Overview for Project Managers, Data Mining Course: Overview for Practitioners, Customized Data Mining Engagements

Abbott Insights

Insight 1: Find Correlated Variables Prior to Modeling Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection Insight 2: Beware of Outliers in Computing Correlations Topic: Data Preparation Sub-Topic: Outliers Insight 3: Create Three Sampled Data Sets, not Two Topic: Modeling Sub-Topic: Sampling Insight 4: Use Priors to Balance Class Counts Topic: Modeling Sub-Topic: Decision Trees Insight 5: Beware of Automatic Handling of Categorical Variables Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection and Creation Insight 6: Gain Insights by Building Models from Several Algorithms Topic: Modeling Sub-Topic: Algorithm Selection Insight 7: Beware of Being Fooled with Model Performance Topic: Data Evaluation Sub-Topic: Model Performance

Data Mining Clients

Client List and Case Studies

Courses and Seminars

Upcoming Data Mining Seminars A Practical Introduction to Data Mining Upcoming courses (nationwide) Data Mining Level II: A drill-down of the data mining process, techniques, and applications Data Mining Level III: A hands-on day of data mining using real data and real data mining software Anytime Courses Overview for Project Managers: Train project managers on the data mining process. Overview for Practitioners: Train practitioners (data analysts, project managers, managers) on the data mining process.

Data Mining Resources

Data Mining Resources, Books, Websites, White Papers, Presentations, Tutorials

About Us

Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught applied data mining courses for major software vendors, including Clementine (SPSS), Affinium Model (Unica Corporation), Model 1 (Group1 Software), and hands-on courses using S-Plus and Insightful Miner (Insightful Corporation), and CART (Salford Systems).

Contact Us

Home

Abbott Insights™, Data Mining Advice
Abbott Insights™ Data Mining Advice
Data Mining and Predictive Analytics
08/25/2010 | 02:39 AM
Predictive Models are Only as Good as Their Acceptance by Decision-Makers
I have been reminded in the past couple weeks working with customers that in many applications of data mining and predictive analytics, unless the stakeholders of predictive models understand what the models are doing, they are utterly useless. When rules from a decision tree, no matter how statistically significant, don't resonate with domain experts, they won't be believed. Arguments that "the model wouldn't have picked this rule if it wasn't really there in the data" makes no difference when the rule doesn't make sense.

There is always a tradeoff in these cases between the "best" model (i.e., most accurate by some measure) and ... Read More >>
08/19/2010 | 08:58 PM
Building Correlations in Clementine / Modeler
I just responded to this question on LinkedIn, Clementine group, and thought it might be of interest to a broader audience.

Q: Hi,
Does anyone have any suggestion or any knowledge on how to make cross-correlation in the Modeler/Clementine?

A:

Read More >>
08/13/2010 | 11:25 AM
IBM and Unica, Affinium Model and Clementine
After seeing that IBM has purchased Unica I have to wonder how this will effect Affinium Model and Clementine (I revert to the names that were used for so long here, now PredictExpress and Modeler, respectively). They are so very different in interfaces, features and deployment options that it is hard to see how they will be "joined": the big-button wizard interface vs. the block-diagram flow interface.

One thing I always liked about Affinium Model was the ability to automate the building of thousands of models. Clementine now has that same capability, so that advantage is lost. To me, that ... Read More >>
08/02/2010 | 10:52 PM
Is there too much data?
I was reading back over some old blog posts, and came across this quote from Moneyball: The Art of Winning an Unfair Game

Intelligence about baseball statistics had become equated in the public mind with the ability to recite arcane baseball stats. What ... Read More >>
07/08/2010 | 11:24 PM
Neural Network books
I was talking with a colleague today who is taking a business-oriented data mining course, and there was a list of neural network books recommended by the instructor. It was fascinating looking at the books in the list because I didn't know several of them. When I examined several of the recommended books on amazon.com, I found they contained what I would call "academic" treatments of neural networks. That means they covered all kinds of varieties of neural networks, including brain-state-in-a-box, Boltzmann machines, Read More >>
06/22/2010 | 06:57 PM
Salford to Launch New Integrated Data Mining Suite
Tomorrow night is the launch of SPM (Salford Predictive Miner). If you are in San Diego, give them a holler to let them know you are coming. See you there! Read More >>
06/22/2010 | 06:45 PM
A/B Testing and the Need for Clear Business Objectives
The website http://videolectures.net/ contains a wealth of interesting lectures on a wide variety of topics, including data mining. I was reminded of one today by Ronny Kohavi entitled  "Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO" It's short (only 23 minutes) and filled with some very good common-sense principles.

First, it is a talk about the importance of A/B testing, or in other words, constructing experiments to learn customer behavior rather than ... Read More >>
06/02/2010 | 10:39 PM
Embedded Analytics and Business Rules: The Holy Grail?
Tomorrow (Thursday) at 3pm EDT I'll be on DM Radio for the broadcast "Embedded Analytics and Business Rules: The Holy Grail?".  I'm not sure what the other guests are going to talk about, but my comments will resemble the talk I gave at Predictive Analytics World in February 2010 in the talk Rules Rule: Inductive Business-Rule Discovery in Text Mining. In this help-desk case study, we used decision trees to ... Read More >>
05/27/2010 | 09:20 PM
PAKDD-10 Data Mining Competition Winner: Ensembles Again!
The PAKDD-10 Data Mining Competition results are in, and ensembles occupied the top 4 positions, and I think the top 5. The winner used Stochastic Gradient Boosting and Random Forests in Statistica, second place a combination of logistic regression and Stochastic Gradient Boosting (and Salford Systems CART for some feature extraction). Interestingly to me, the 5th place finisher used WEKA, an open source software tool.
Read More >>
05/27/2010 | 09:12 PM
The Trimmed Mean has Intuitive Appeal
I was listening to Colin Cowherd of ESPN radio this morning and he made a very interesting observation that we data miners know, or at least should know and make good use of. The context was evaluating teams and programs: are they dynasties or built off of one great player or coach. Lakers? dynasty. Celtics? dynasty. Bulls? without Jordan, they have been a mediocre franchise. The Lakers without Magic are still a dynasty. The Celtics without Bird are still a dynasty.

So his rule of thumb that he applied to college football programs was this: remove the best coach ... Read More >>
05/24/2010 | 12:37 AM
Upcoming DMRadio Interview: Analytics and Business Rules
On June 3rd, a week from this Thursday, I'll be participating in my third DMRadio interview, this time on business rules (the first two were related to text mining, including this one last year). I always have found these interviews enjoyable to do. I'll probably be discussing an inductive rule discovery process I participated in with a Fortune 500 company (and Read More >>

Health Club Survey Analysis, Part I: Successful application of data mining by Abbott Analytics

DM Radio Broadcast: September 9, 2010
Predictive Analytics World Conference Washington, DC - October 18 - 20, 2010
ACM Data Mining Camp November 13, 2010

Abbott, D.W., I.P. Matkovsky, and J.F. Elder, An Evaluation of High-end Data Mining Tools for Fraud Detection, 1998 IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA, October 12-14, 1998.