ASSOCIATIVE-CLASSIFICATION METHOD
CMAR (Classification based on Multiple Association Rules) is a classification method adopted from the frequent pattern or FP-growth method for generation of frequent itemsets. Although the basic principles of CMAR are explained in this section, for a better understanding of all details we suggest that a reader start with Section 8.6 as an introduction. The main reason we include CMAR methodology in this chapter is its logic-based approach to classification problems, and the possibility of comparing its accuracy and efficiency with the C4.5 methodology.
Suppose data samples are given with n attributes (A1, A2, …, An). Attributes can be categorical or continuous. For a continuous attribute, we assume that its values are discretized into intervals in the preprocessing phase. A training data set T is a set of samples such that for each sample there exists a class label associated with it. Let C = {c1, c2, …, cm} be a finite set of class labels.
In general, a pattern P = {a1, a2, …, ak} is a set of attribute values for different attributes ( 1 ≤ k ≤ n). A sample is said to match the pattern P if it has all the attribute values given in the pattern. For rule R: P → c, the number of data samples matching pattern P and having class label c is called the support of rule R, denoted sup(R). The ratio of the number of samples matching pattern P and having class label c versus the total number of samples matching pattern P is called the confidence of R, denoted as conf(R). The association-classification method (CMAR) consists of two phases:
Rule generation or training and
Classification or testing.
In the first rule generation phase, CMAR computes the complete set of rules in the form R: P → c, such that sup(R) and conf(R) pass the given threshold. For a given support threshold and confidence threshold, the associative-classification method finds the complete set of class-association rules (CAR) passing the thresholds. In a testing phase, when a new (unclassified) sample comes, the classifier, represented by a set of association rules, selects the rule which matches the sample and has the highest confidence, and uses it to predict the classification of the new sample.
Thursday, December 11, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment