Tuesday, December 16, 2008

Apriori Algorithm

ALGORITHM APRIORI
The algorithm Apriori computes the frequent itemsets in the database through several iterations. Iterations i computes all frequent i-itemsets (itemsets with i elements). Each iteration has two steps: candidate generation and candidate counting and selection.

In the first phase of the first iteration, the generated set of candidate itemsets contains all I-itemsets (i.e, all items in the database). In the counting phase, the algorithm counts their support searching again through the whole database. Finally, only I-itemsets (items) with s above required threshold will be selected as frequent. Thus, after the first iteration, all frequent I-itemsets will be known.

What are the itemsets generated in the second iteration? In other words, how does one generate 2-itemset candidates? Basically, all pairs of items are candidates. Based on knowledge about infrequent itemsets obtained from previous iterations, the Apriori algorithm reduces the set of candidate itemsets by pruning -a priori-those candidate itemsets that cannot be frequent. The pruning is based on the observation that if an itemset is frequent all its subsets could be frequent as well. Therefore, before entering the candidate-counting step, the algorithm discards every candidate itemset that has an infrequent subset.

Assume that the minimum support s= 50% so an itemset is frequent if it is contained in at least 50% of the transactions-in our example, in two out of every four transactions in the database. In each iteration, the Apriori algorithm constructs a candidate set of large itemsets, counts the number of occurrences of each candidate, and then determines large itemsets based on the predetermined minimum support s = 50%.

No comments: