Wednesday, December 10, 2008

Decision Tree

Decision tree

Decision tree is one technique for classification, it has flowchart structure like tree. (Han., 2001). Decision tree build by two nodes, there is the node and leaf. Nodes represent to attribute test, branch of the node related on probabilities result from node test. Hence, the leaf represent value of the class. (kantardzic, 2003).

Decision tree handles two kind of attribute,

  1. Numeric or continue : Domain have infinite values, it represent in real number. Examples : ages, salary.
  2. Nominal or category : Domain have finite values (finite set). Examples : jobs, status.

Missing Data

Missing data or missing value is a generally problems in data processing. Data had been collected not always have complete value. In huge data, missing value not influence in data processing result. However, if missing values more, it can influence the data processing result.

Generally we can handles missing values with this methods,

1. Deleting all record with missing values.

2. Make new algorithm or modified old algorithm which handles missing data. (Kantardzic, 2003).

C4.5 Algorithm

C4.5 algorithm is an decision tree algorithm, it showed by Quinlan as result in developing ID3 algorithm. The result is:

1. Counting attribute selection measure have more accurate tree. C4.5 algorithms have counting information Gain or Gain-ratio.

2. It can handle training data with missing value. To handle this problem C4.5 algorithm use counting gain-ratio for get test attributes.

It can handles continue attribute (numeric).

No comments: