Abbott Analytics: Data Mining Consulting
Services

Services: Data Mining Project Assessment, Data Preparation For Data Mining, Data Mining Model Development, Data Mining Model Deployment, Data Mining Course: Overview for Project Managers, Data Mining Course: Overview for Practitioners, Customized Data Mining Engagements

Abbott Insights

Insight 1: Find Correlated Variables Prior to Modeling Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection Insight 2: Beware of Outliers in Computing Correlations Topic: Data Preparation Sub-Topic: Outliers Insight 3: Create Three Sampled Data Sets, not Two Topic: Modeling Sub-Topic: Sampling Insight 4: Use Priors to Balance Class Counts Topic: Modeling Sub-Topic: Decision Trees Insight 5: Beware of Automatic Handling of Categorical Variables Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection and Creation Insight 6: Gain Insights by Building Models from Several Algorithms Topic: Modeling Sub-Topic: Algorithm Selection Insight 7: Beware of Being Fooled with Model Performance Topic: Data Evaluation Sub-Topic: Model Performance

Data Mining Clients

Client List and Case Studies

Courses and Seminars

Upcoming Data Mining Seminars A Practical Introduction to Data Mining Upcoming courses (nationwide) Data Mining Level II: A drill-down of the data mining process, techniques, and applications Data Mining Level III: A hands-on day of data mining using real data and real data mining software Anytime Courses Overview for Project Managers: Train project managers on the data mining process. Overview for Practitioners: Train practitioners (data analysts, project managers, managers) on the data mining process.

Data Mining Resources

Data Mining Resources, Books, Websites, White Papers, Presentations, Tutorials

About Us

Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught applied data mining courses for major software vendors, including Clementine (SPSS), Affinium Model (Unica Corporation), Model 1 (Group1 Software), and hands-on courses using S-Plus and Insightful Miner (Insightful Corporation), and CART (Salford Systems).

Contact Us

Home

Abbott Insights™, Data Mining Advice
Abbott Insights™ Data Mining Advice
Data Mining and Predictive Analytics
01/05/2012 | 07:44 PM
Top 5 Posts from 2011
By far, the most visited post of 2011 was the "What Do Data Miners Need to Learn" post from June.

The top five visited posts that were first posted in 2011 are (with actual ranks for all posts):
1. What Do Data Miners Need to Learn
2. Statistical Rules of Thumb, Part III
3. Read More >>
12/29/2011 | 12:42 AM
Models Behaving Badly
I just read a fascinating book review in the Wall Street Journal Physics Envy: Models Behaving Badly. The author of the book, Emanuel Derman (former head of Quantitative Analsis at Goldman Sachs) argues that the financial models involved human beings and therefore were inherently brittle: as human behavior changed, the models failed. "in physics you're playing against God, and He doesn't change His laws very often. In finance, you're playing against God's creatures."

I'll agree with Derman that whenever human beings ... Read More >>
11/04/2011 | 06:36 PM
Statistical Rules Of Thumb, part III: Always Visualize the Data
As I perused Statistical Rules of Thumb again, as I do from time to time, I came across this gem. (note: I live in CA, so get no money from these amazon links).

Van Belle uses the term "Graph" rather than "Visualize", but it is the same idea. The point is to visualize in addition to computing summary statistics. Summaries are useful, but can be deceiving; any time you summarize data you will lose some information unless the distributions are well behaved. The scatterplot, histogram, box ... Read More >>
07/30/2011 | 12:23 AM
Yet another "Wisdom of Crowds" success
I was at the Federal Building downtown San Diego for a consulting job, and met some representatives for a life and disability insurance company who were giving away a big-screen HD TV for the individual who came closest to guessing the number of M&Ms (chocolate and peanut butter filled) in a container. Because they do this often, I won't show the specific container they use.

I offered to make a guess of the total, but only if I could see all of the guesses so far. I was drawing from the Wisdom of Crowds example from Chapter 1 of the book where a set ... Read More >>
06/27/2011 | 08:09 PM
What do Data Miners Need to Learn?
I've been asked by several folks recently what they need to learn to succeed in data mining and predictive analytics. This is a different twist on the question I also get, namely what degree should one get to be a good (albeit "green") data miner. Usually, the latter question gets the answer "it doesn't matter" because I know so many great data miners without a statistics or mathematics degree. Understandably, there are many non-stats/math degrees that have a very strong statistics or mathematics component, such as psychology, political science, and engineering to name a few. But then again, you don't necessarily have to load up on the ... Read More >>
05/06/2011 | 12:27 AM
Number of Hidden Layer Neurons to Use
In the linkedin.com Artificial Neural Networks group, a question arose about how many hidden neurons one should choose. I've never found a fully satisfactory answer to this, but there is quite a lot of guesses and rules of thumb out there.

I've always like Warren Sarle's neural network FAQ that includes a discussion on this topic.

There is another reference on the web that I agree with only about 50%, but the references are excellent: http://www.faqs. Read More >>
04/26/2011 | 01:34 AM
Statistical Rules of Thumb, part II
A while back, Will Dwinnell posted on two books, one of which is one of my favorites as well:

Will mentioned a few general topics covered in the book, but I thought I would mention two specific ones that I agree with wholeheartedly.

Read More >>
04/19/2011 | 04:59 PM
Rexer Analytics data mining survey
Rexer Analytics, a data mining consulting firm, is conducting their 5th annual survey of the analytic behaviors, views and preferences of data mining professionals. I urge all of you to respond to the survey and help us all understand better the nature of the data mining and predictive analytics industry. The following text contains their instructions and overview.

If you want to skip the verbage and just get on with the survey, use code RL3X1 and go here.

Your responses are completely confidential: no information you provide on ... Read More >>
04/11/2011 | 07:36 PM
Predictive Models are not Statistical Models — JT on EDM
This post was first posted on Predictive Models are not Statistical Models — JT on EDM

My friend and colleague James Taylor asked me last week to comment on a question regarding statistics vs. predictive analytics. The bulk of my reply is on James' blog; my fully reply is here, re-worked from my initial response to clarify some points further.

I have always love reading the green "Sage" books, such as Read More >>
03/29/2011 | 01:07 PM
Analyzing the Results of Analysis
Sometimes, the output of analytical tools can be voluminous and complicated. Making sense of it sometimes requires, well, analysis. Following are two examples of applying our tools to their own output.


Model Deployment Verification

From time to time, I have deployed predictive models on a vertical application in the finance industry which is not exactly "user friendly". I have virtually no access to the actual deployment and execution processes, and am largely limited to examination the production mode output, as implemented on the system in question.

As sometimes happens, the model output ... Read More >>
03/10/2011 | 07:30 PM
Statistics: The Need for Integration
I'd like to revisit an issue we covered here, way back in 2007: Statistics: Why Do So Many Hate It?. Recent comments made to me, both in private conversation ("Statistics? I hated that class in college!"), and in print prompt me to reconsider this issue.

One thing which occurs to me is that many people have a tendency to think of statistics in an isolated way. This world view keeps statistics at bay, as something which is done separately from other business activities, and, importantly, which is done and understood only by ... Read More >>

Health Club Survey Analysis, Part I: Successful application of data mining by Abbott Analytics

University of California, San Diego La Jolla, CA - November 4, 11, & 18, 2011

Predictive Analytics World Conference New York, NY - March 4 - 10, 2012

DM Radio April 5, 2012