Thursday, December 11, 2008

Associative Classfication Medhod

ASSOCIATIVE-CLASSIFICATION METHOD
CMAR (Classification based on Multiple Association Rules) is a classification method adopted from the frequent pattern or FP-growth method for generation of frequent itemsets. Although the basic principles of CMAR are explained in this section, for a better understanding of all details we suggest that a reader start with Section 8.6 as an introduction. The main reason we include CMAR methodology in this chapter is its logic-based approach to classification problems, and the possibility of comparing its accuracy and efficiency with the C4.5 methodology.

Suppose data samples are given with n attributes (A1, A2, …, An). Attributes can be categorical or continuous. For a continuous attribute, we assume that its values are discretized into intervals in the preprocessing phase. A training data set T is a set of samples such that for each sample there exists a class label associated with it. Let C = {c1, c2, …, cm} be a finite set of class labels.

In general, a pattern P = {a1, a2, …, ak} is a set of attribute values for different attributes ( 1 ≤ k ≤ n). A sample is said to match the pattern P if it has all the attribute values given in the pattern. For rule R: P → c, the number of data samples matching pattern P and having class label c is called the support of rule R, denoted sup(R). The ratio of the number of samples matching pattern P and having class label c versus the total number of samples matching pattern P is called the confidence of R, denoted as conf(R). The association-classification method (CMAR) consists of two phases:

Rule generation or training and

Classification or testing.

In the first rule generation phase, CMAR computes the complete set of rules in the form R: P → c, such that sup(R) and conf(R) pass the given threshold. For a given support threshold and confidence threshold, the associative-classification method finds the complete set of class-association rules (CAR) passing the thresholds. In a testing phase, when a new (unclassified) sample comes, the classifier, represented by a set of association rules, selects the rule which matches the sample and has the highest confidence, and uses it to predict the classification of the new sample.

No comments: