The decision tree is a classification method that uses a tree structure, where each node represents an attribute and the branch represents the value of the attribute, while the leaves are used to represent the class. The top node of the decision tree is called the root.
Breiman et al. (1984) stated that this method is a very popular method to use because the results of the model formed are easy to understand. It is named a decision tree because the rules are similar to the shape of a tree. Trees are formed from the binary recursive sorting process in the data group so that the value of the response variable in each data group makes the sorting results more homogeneous.
The concept of a decision tree is to convert data into a decision tree and decision rules. The main benefit of using a decision tree is the ability to simplify complex decision-making processes so that decision-makers can interpret solutions to problems.
Baca Juga :
Another name for a decision tree is CART (Classification and Regression Tree). This method is a combination of two tree species, a classification tree, and a regression tree. For the sake of simplicity.
Types of decision tree nodes
The root is the top node, there is no input at this node and there is no output or it can have more than one output.
2. Internal Node
Internal nodes are branch nodes, there are only one input and at least two outputs at this node.
A leaf is the end node or terminal node, there are only one input and no output (end node) at this node.
Decision Tree Formation Stage
1. Tree construction begins with the formation of roots (located at the top). Then the data is broken down using attributes that are suitable for use as sheets.
2. Tree pruning, which is identifying and removing unnecessary branches on an already formed tree. This is because the decision tree can be large, so it can be simplified by pruning based on the trust value (level of confidence). In addition to reducing tree size, tree planting is also carried out to reduce the rate of prediction errors in new cases from the results of splits and solutions.
3. Formation of decision rules, namely making decision rules from the trees that have been formed. The rule can be in the form of an if-then which extends from the decision tree by tracing from the root to the leaf. For each node and branch, if specified, sheet values are entered. After all the rules have been created, they can be simplified or combined.
The decision tree is the most popular classification model because it can be easily interpreted by humans. Many algorithms can be used to build decision trees such as ID3, C4.5, CART, and GUIDE.
Benefits of the Decision Tree Machine Learning
The decision tree is also useful for exploring data, finding hidden relationships between some candidate input variables and a target variable. The decision tree combines data exploration and modeling which makes it an excellent first step in the modeling process even when used as the final model for some other techniques.
In some applications, the accuracy of classification or prediction is the only thing that is highlighted in this method. For example, direct mail companies create accurate models to predict which members could potentially respond to requests regardless of how or why the model works.
Another advantage of this method is that it can eliminate unnecessary calculations or data. Because existing samples are usually only tested based on certain criteria or classes.
Although it has many advantages, it does not mean that this method has no drawbacks. These decision trees may overlap, especially if the classes and criteria are used very frequently to increase the decision time according to the required memory capacity.
Strengths and Weaknesses of the Decision Tree
Pros of the Decision Tree:
- Easy integration into database systems.
- Has good accuracy.
- Can find unexpected combinations of data.
- Decision areas that were previously complex and highly global can be made
- simpler and more specific.
- Can eliminate unnecessary calculations. Because with this method, the sample is
- only tested based on certain criteria or classes.
- With flexible feature selection from different internal nodes, the selected feature differentiates the criteria from other criteria in the same node.
Weaknesses of the Decision Tree:
- Overlap occurs especially when very many classes and criteria are used. This can
- also lead to longer decision times and required memory.
- The accumulated number of errors from each level in a large decision tree.
- Difficulty in designing optimal decision trees. The decision quality results obtained by the decision tree method depend largely on how the tree is designed.
Decision Tree Consists of Three Types of Knots
- Decision node – usually represented by a box
- Odds knot – usually represented by a circle
- End node – usually represented by a triangle
Application of the Decision Tree
A decision tree or decision tree is a decision support tool that uses a decision model that is shaped like a tree. The decision tree describes the various alternatives that are possible to solve a problem and there are also potential factors that can influence these alternatives along with the final estimate when an alternative is selected. A decision tree is a method that can be used to display algorithms that contain only conditional control statements.
The use of the decision tree is generally in operations research, especially in decision analysis. The purpose of using a decision tree is to identify the strategies most likely to achieve the goal and is a popular tool in machine learning.
The decision tree is a flowchart-like structure where each internal node represents the possibilities that exist in an attribute, each branch represents the outcome of that possibility and each leaf node represents the class name (decisions are made after all attributes have been calculated). The path from the root to the leaf represents the classification rule.
Indecision analysis, decision trees, and related diagrams are used as visual and analytical decision support tools, where the expected value or utility of the alternatives is calculated.