![]() |
|
COPYING THIS IS UNLAWFUL WITHOUT PERMISSION FROM AUTHOR
Data Mining
INTRODUCTION.............................................................................................................. 3
CONTENTS OF A DATABASE..................................................................................... 3
HOW DATA IS GATHERED.......................................................................................... 4
HISTORICAL UTILIZATION OF CUSTOMER INFORMATION............................... 6
STATISTICAL SOFTWARE PACKAGES................................................................... 6
DATA MINING TECHNOLOGY..................................................................................... 7
HOW DATA MINING WORKS....................................................................................... 7
Association Rule............................................................................................................................................................. 9
Apriori Algorithms..................................................................................................................................................... 10
Distributed/Parallel Algorithms............................................................................................................................... 11
Sequential Rule............................................................................................................................................................. 11
AprioriAll and AprioriSome Algorithms................................................................................................................ 11
Classification or Clustering Rule............................................................................................................................. 12
ID3 Algorithm............................................................................................................................................................. 13
C4.5 Algorithm............................................................................................................................................................ 13
SLIQ Algorithm.......................................................................................................................................................... 14
Other Types of Algorithms......................................................................................................................................... 14
REAL-LIFE EXAMPLES.............................................................................................. 14
CONCLUSION............................................................................................................... 15
It does not make sense to
spend your marketing dollars on the entire population when you can target only
the prime customer candidates and market the product to them through
segmentation. By reducing wasted money
spent chasing after people who are unlikely to purchase your product, you will
maximize your return on the marketing investment.
The key to segmentation is gathering and utilizing information about your customers and/or prospects effectively. This information is collected and stored in a database, several databases with the ability to interact, or a data warehouse. A “data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect” (www.whatis.com). “Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user or knowledge worker who may need access to specialized, sometimes local databases” (www.whatis.com). Many companies do not have the database capability to maximize segmentation and are reluctant to place the appropriate emphasis on, and investment in, database technology.
Simple databases include
information like: first name, last name, salutation, department, title,
company, address line 1, address line 2, city, state, country, and zip
code. In a more sophisticated database,
other fields stored would be: gender, income, age, interests, products
purchased, purchase dates, marketing strategy or medium that initially brought
about the response. For a very thorough
analysis, customer attributes can be collected, such as: demographics (address,
income, etc.), psychographic information (personality types), technographics
(if a technical interface is involved – type of system used), product
characteristics, buyer or visitor statistics (purchase history, click stream
information), and permissions (mediums or options that the person opted-in
for).

Companies gather data through the use of
online or offline: forms, surveys, focus groups, and by using other marketing
research techniques. Questions should
be presented in a way in which data can be represented quantitatively. Results are directly or indirectly inputted
into the database, databases, or data warehouse. The following illustrations provide an example of a form that is
required by users who are signing up for a free yahoo email account.

Generally, marketers would
use the information stored in the database (or databases) to run queries based
on assumptions about purchase behavior.
For example, if you are marketing Jaguars, you may have a suspicion that
your customers earn upwards of $100,000 per year. This assumption may be based on a hunch or a logical prediction
of the target market. It may even be
based on an examination of the customer records in your database.
Often times, entire
marketing plans were (and in many cases are still) built on human hunches and
initial predictions. Wouldn’t it be
useful to draw relationships first through data analysis for better accuracy
and later add the human interpretations?
What if the information is too complex for humans to discover the
patterns?
Some tools used to assist in
examining or reporting are online analytic processing systems (OLAP) or
statistical packages. This technology “gives users access to analytical content
such as time series and trend analysis views and summary-level information as
well as insight into data organized into multiple dimensions”
(Carickhoff). Many applications of OLAP
and statistical packages are basically extensions of database or data warehouse
capability. OLAP and statistical
software generally provide users with a GUI (graphical user interface) platform
that breaks down the complicated input, which is necessary from users in order
to run the analysis. These systems
“rely on you to discover patterns and decide what to do with them” (Greening). While this technology has been at the
forefront of the decision support industry, it is only a tool in the knowledge
discovery process and cannot actually perform the knowledge discovery.
OLAP has been useful not
only in analysis of relational databases, but particularly with regard to
multidimensional databases. Another
advantage to using OLAP is that many vendors have been able to provide web
browser access to their OLAP engines; often referred to as Web OLAP or
WOLAP.
For true knowledge
discovery, a data-mining tool should be utilized. Data mining tools discover relationships or patterns among data
and can report or act on those findings.
“Data mining is data-driven, not user-driven or verification-driven”
(Gilman). Through this technology, the
marketer actually looks at the data-driven relationships whereas before, they
would develop theories based on hunches about their potential customers (our
example was in reference to the marketing of Jaguars). So if the marketer for Jaguar had been using
data-mining technology, the system may have supported her theory that customers
are earners of $100,000 or more. But,
the technology may have also discovered that customers are generally under the
age of 45. This bit of information
would be very useful in determining who to target and how best to relay the
message (i.e. which mediums to use and how the message should look, sound and
feel). The marketer may not have been
looking for this relationship among customers, so the pattern may have been
overlooked had it not been analyzed through a data-driven system.
Data mining works by
utilizing algorithms to search the database for hidden patterns. The technology was first developed to help scientists
make sense of experimental data, but was quickly applied to business
applications. “Data-mining tools can
sift through immense collections of customer, marketing, production, financial
data, and statistical and artificial intelligence techniques, identify what’s
worth noting and what’s not” (Verity).
Data mining has three
components including: (1) associations (one event can be correlated to
another), (2) sequences (one event leads to another), and (3) classification
(pattern recognition which leads to data reorganization) or clustering
(find/visualize groups of facts previously unknown). We can utilize results from these components as they are, and/or
to forecast and uncover patterns of data which lead to educated predictions
about the future.
Technical designers have
tried various methods of programming to best attain the desired results. Machine learning algorithms have had the
widest use and have thus far been the most successful. I’ll discuss machine learning algorithms in
depth, following the brief explanation of these other techniques:
·
Statistical
algorithms such as SAS and SPSS have been widely used to detect unusual
patterns and explain patterns using linear models.
·
Genetic
algorithms are “optimization techniques that use processes such as genetic
combination, mutation, and natural selection in a design based on the concepts
of natural evolution” (Joshi).
·
The
nearest neighbor method classifies each record based on a combination of
classes.
·
Rule
induction extracts sets of if/then rules predetermined based on statistical
significance.
·
Data
visualization provides a “visual interpretation of complex relationships in
multidimensional data” (Joshi).
·
Neural
networks, a relatively new data mining tool, is a form of artificial intelligence,
“modeled on the logical associations made by the human brain” (DiCarlo). The network is trained to recognize
parameters set by administrators, which are based on mathematical models that
accumulate data. Once the network
recognizes these parameters it makes an evaluation, reaches a conclusion, and
takes action (predetermined and set by administrators). Neural networks have been successful
particularly in applications that involve classification.
The following figure (Figure
2) provides an overview of the data mining process.

In the next portion of this
paper, I will discuss the types of machine learning algorithms that are
successful for each of the three components; association, sequential, and
classification or clustering.
Association
rules scour the database to find associations between items that satisfy
user-specified minimum support and statistical confidence constraints. An association rule would take the form of
‘A’ and ‘B’ where ‘A’ and ‘B’ are sets of items. The rule derives a meaning that transactions of the database that
contain ‘A’ tend to contain ‘B’. For
example, 60% of customers at a fast food chain who order hamburgers tend to
also purchase soda. 25% of transactions
in the database contain both of these items.
In this case, 60% is the confidence level and 25% is the support level
of the rule.
Apriori algorithms are a
type of association rule algorithm that was developed by IBM’s Quest project
team for use on large transaction databases.
They begin scouring the database by first finding all combinations that
meet the minimum support requirements and then determine if the rules hold by
computing:
ratio
r = support (ABCD)/support (AB).
Finally, the code runs the minimum confidence
requirements to generate the desired results.
Apriori algorithms accomplish the final result by passing over the
database multiple times in steps to gather frequency information first and then
joining conditions. They use a decision
tree data structure to display the counts of potential candidates. Decision trees are considered the best
models for displaying results for a number of reasons: they are inexpensive to
construct, easy to interpret, easy to integrate, and return comparable or
better accuracy. An illustration of a
decision tree follows in Figure 3.

Distributed/Parallel algorithms were developed in order to mine data using less processing power and time. This tool distributes the processing across multiple sites so as to generate results quickly. Desired results are achieved in a manner similar to the Apriori, however the number of messages passed are reduced by exploring relationships between large sets of data and using pruning techniques to remove useless data at individual processing sites.
The discovery of sequential rules was motivated by advances in customer satisfaction and opportunities to cross-sell products, but sequential analysis results can be applied to many fields both inside and outside of the scope of business. In order to analyze sequential patterns, input data must have the ability to be organized into sequences. The sequences are ordered lists of transactions or items and may have a transaction time associated with each item. Sequential rules scour the database to find all of the sequential patterns that comply with the minimum level of support (percentage of data sequences that contain the pattern) specified by the user. For example, 45% of customers who took a training company’s introductory course on word processing later enrolled in their more advanced course on Microsoft Windows. In this scenario, 45% is the support level.
Sequential analysis begins with the sorting phase in which items or transactions can be concatenated to form sequences and then we can run sequential algorithms to do the analysis and discover the underlying patterns. The next step involves grouping all like itemsets and large sequences. These large sequences are then tested against those in the customer database. Records that do not fit the pattern are dropped from the newly transformed database, but are still counted in the total number of records. The final phase in sequential analysis involves determining the maximal desired sequences. This is accomplished by combing the data multiple times using one of two types of algorithms; count-all and count-some. The familiar Apriori algorithm can be utilized (called AprioriSome and/or AprioriAll) to apply to sequential rules.
The count-all approach gathers all sequences and must later be pruned to remove the non-maximal sequences. The count-some approach starts by counting the longer sequences first, since many sequences also reside in the longer sequences, so as to limit the count to only the maximal sequences (no need for pruning).
Classification or clustering rules attempt to create decision trees that label and shelve data into categories. A company might “build a classification model to predict who is likely to purchase identified products or services” (Liu and Yap). Or the company may “build a classification model to predict the likelihood of buying a product based on those customers that have been identified from association rules only” (Liu and Yap). Classification analysis has been of particular interest to direct marketers for its ability to determine who would be best to target and then to actually gather the necessary individuals for the campaign. Classification can also determine customer attrition (churn) and can be utilized in predicting a customer’s loyalty and/or likelihood of switching to a competitor.
Clustering has also been highly useful in detecting fraud in banks and credit card companies. By labeling each transaction ‘honest’ or ‘fraud’ and analyzing purchases and payment history, the classification algorithm can detect fraud by monitoring transactions on the account.
Classification, or clustering, works by finding groups in which data points are more similar to one another, or data points in separate clusters that differ. One of the earliest and most widely used classification algorithms is Hunt’s. Many later algorithms were written based on the principles of Hunt’s method. Hunt’s method constructs decision trees using binary tests to determine class distribution followed by calculation based on either information theory (used in ID3 and C4.5 algorithms) or Gini index (used in SLIQ algorithms). Information theory (INFO) tends to result in many smaller clusters, whereas Gini tends to lump more data together in fewer large groups.
The ID3 algorithm builds decision trees by testing the values of the properties of objects in the database. The tree is built in a top down model and a property is tested at each node and results of the test partition the data until each leaf node contains homogeneous data.
The C4.5 algorithm is also based on the principles of the Hunt’s method. It’s considered a depth-first strategy because it basically attempts to accomplish the same things as the ID3, but it generates a decision tree by first considering all possible tests and begins with the one that will provide the most information gain.
Supervised Learning in Quest (SLIQ) generates a decision tree in a breadth-first fashion. Data is pre-sorted and class-listed instead of splitting attribute lists. This method, though cost-effective, consumes an excessive amount of memory. In order to run this algorithm efficiently IBM developed a new version of the algorithm with no memory restrictions. They called this new algorithm ‘SPRINT’ (Scalable Parallelizable Induction of Decision Trees). SPRINT works similarly to SLIQ, but much like distributed/parallel association algorithms, SPRINT distributes the processing across multiple sites so as to generate results quickly.
Other types of machine learning algorithms include: Nearest-Neighbor, Naïve-Bayes, OODG (Oblivious Read-Once Decision Graph), and Lazy Decision Trees. Also, hybrids are often created and/or used in practice so they can be tailored to serve the needs of the users. As I mentioned earlier, though machine learning algorithms have had the widest use thus far, many other methods are used as well.
There are many real-life data mining success stories. The phone company, U S West Inc. wanted to pinpoint customers who would install second phone lines and keep them long enough for the carrier to make a profit. They designed a data mining program called “PALMS” which runs on a powerful NCR parallel-processing computer and was ‘told’ to provide a statistical model of the ideal prospect. Using this information, marketers launched a campaign targeting the clusters of prospects that fit the profile, which were also identified in the database through the use of PALMS. The marketers chose to relay their message through several direct mail campaigns, “which ran from November 4 to early January. U S West has enjoyed a response rate equal to that of a broadcast campaign costing ‘several million dollars’ more” (Verity).
Wal-Mart has been collecting transaction data through its cash registers since the early 1980s. The company was “faced with a mind-boggling 700 million potential forecasts to calculate – one for each item in 2,700 stores” (Verity) but was unable to use all of the data, until the introduction of data mining. Recently, Wal-Mart has taken advantage of knowledge discovery software to predict demand for individual items in specific stores, and to work on improving accuracy on their market-basket analysis (examining the combinations of items that customers purchase together).
I’m sure we all remember the 1997 chess match between human champion Garry Kasparov and IBM’s Deep Blue supercomputer. The implications of that chess match ignited serious controversy over artificial intelligence. Data mining is a form of artificial intelligence that has changed the face of all facets of business, particularly marketing. The chess match may have left many of us sympathizing for Garry and forming negative opinions about AI, but as we’ve seen throughout this paper, the positive outcomes cannot be ignored. Of course, I wouldn’t suggest ignoring human predictions and intuition altogether. “It’s a good idea to combine analytical results with business intuition” (Liu and Yap). There should always be a balance between data-driven and user-driven analysis.
Otherwise overlooked relationships in data can be exploited through data mining technology, resulting in a vast amount of time and money saved. Today, many companies have a significant edge over competition simply due to the investment that they place on knowledge discovery. As the saying goes, ‘knowledge is power’.
Carickhoff,
Rich. “A New Face for OLAP”. DBMS Magazine, January 1997.
DiCarlo,
Lisa. “The Rebirth of Artificial
Intelligence”. Forbes Magazine, May 16,
2000.
Gilman,
Michael, PhD. “Data Mining
Overview”. The Direct Marketing
Association
White
Papers, 2000.
Greening,
Dan R. “Data Mining on the Web –
There’s Gold in that Mountain of Data”.
DBMS Magazine 2000.
Joshi,
Karuna Pande. “Analysis of Data Mining
Algorithms”. 1997.
Liu,
Shiping and Jeremy Yap. “Beyond
Intuition”. DB2 Magazine, Quarter 4,
2001.
Verity,
John W. “Coaxing Meaning Out of Raw
Data – Software can now Find Patterns
Never Seen Before”.
Business Week, May 1997.