{"id":382,"date":"2018-09-15T22:52:22","date_gmt":"2018-09-16T04:52:22","guid":{"rendered":"http:\/\/www.jacobsoft.com.mx\/?p=382"},"modified":"2020-03-27T22:48:50","modified_gmt":"2020-03-28T04:48:50","slug":"arboles-de-regresion-usando-python","status":"publish","type":"post","link":"https:\/\/www.jacobsoft.com.mx\/en\/arboles-de-regresion-usando-python\/","title":{"rendered":"Regression trees using Python"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Regression trees using Python<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In data mining, <a rel=\"noreferrer noopener\" aria-label=\"En miner\u00eda de datos, machine learning y\/o ciencia de datos, en lo que se refiere al an\u00e1lisis con \u00e1rboles, existen dos enfoques principales: los \u00e1rboles de decisi\u00f3n y los \u00e1rboles de regresi\u00f3n. (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\"><strong>machine learning<\/strong><\/a> and \/ or data science, in what refers to the analysis with trees, there are two main approaches: <strong><em>the decision trees<\/em><\/strong> Y <strong><em>the regression trees<\/em><\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In both cases the trees constitute <strong>predictive methods of segmentation<\/strong>, known as <strong>classification trees<\/strong>. They are sequential partitions of the data set made to maximize the differences of the dependent variable given that a division of the cases into groups is carried out.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Through different indices and <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.370&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"A trav\u00e9s de diferentes \u00edndices y procedimientos estad\u00edsticos se determina la divisi\u00f3n m\u00e1s discriminante de entre los criterios seleccionados, aquella que permite diferenciar mejor a los distintos grupos del criterio base, con lo que se obtiene as\u00ed, una primera segmentaci\u00f3n. A partir de esa primera segmentaci\u00f3n, se realizan nuevas segmentaciones de cada uno los segmentos resultantes y as\u00ed sucesivamente hasta que el proceso finaliza con alguna norma estad\u00edstica. (opens in a new tab)\">statistical procedures<\/a><\/strong> the most discriminating division among the selected criteria is determined, the one that allows the different groups to be better differentiated from the base criterion, thus obtaining a first segmentation. From that first segmentation, new segmentations are made of each of the resulting segments and so on until the process ends with some statistical rule.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Suppose now that we want to know which passengers of the Titanic were more likely to survive its sinking and what characteristics were associated with the survival of the shipwreck. In this case, the variable of interest (GS) is the<strong> degree of survival<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We could then divide the passengers into groups by <strong>age<\/strong>, <strong>sex<\/strong>, <strong>class<\/strong> in which they traveled and observe the proportion of survivors of each group.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A procedure based on <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.372&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Un procedimiento basado en \u00e1rboles, selecciona autom\u00e1ticamente los grupos homog\u00e9neos con la mayor diferencia en proporci\u00f3n de supervivientes entre ellos. En el primer caso, sexo (hombres y mujeres). (opens in a new tab)\">trees<\/a><\/strong>, automatically selects the homogeneous groups with the greatest difference in the proportion of survivors among them. In the first case, sex (men and women).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The next step consists in subdividing each group of men and women according to other characteristics. As a result, men are divided into adults and children, while women are divided into groups based on the class in which they traveled.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When the subdivision process is completed, the result is a set of rules that can be easily visualized by a <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.372&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Cuando se termina el proceso de subdivisi\u00f3n, el resultado es un conjunto de reglas que pueden visualizarse f\u00e1cilmente mediante un \u00e1rbol. (opens in a new tab)\">tree<\/a><\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"652\" height=\"265\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_1.png\" alt=\"\" class=\"wp-image-384\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_1.png 652w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_1-300x122.png 300w\" sizes=\"auto, (max-width: 652px) 100vw, 652px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">With the representation of the previous figure we can observe, for example, that if a passenger is male and is an adult, then he has a 20% chance of survival.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The proportion of survival in each of the subdivisions can be used for purposes <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"La proporci\u00f3n de la supervivencia en cada una de las subdivisiones puede utilizarse con fines predictivos para vaticinar el grado de supervivencia de los miembros de ese grupo. (opens in a new tab)\">predictive<\/a><\/strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"La proporci\u00f3n de la supervivencia en cada una de las subdivisiones puede utilizarse con fines predictivos para vaticinar el grado de supervivencia de los miembros de ese grupo. (opens in a new tab)\"> <\/a>to predict the degree of survival of the members of that group.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using different predictors (independent variables) at each level of the division process represents a simple and elegant way to handle iterations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The procedure creates a classification model based on <a rel=\"noreferrer noopener\" aria-label=\"El procedimiento crea un modelo de clasificaci\u00f3n basado en \u00e1rboles, y clasifica casos en grupos o pronostica valores de una variable dependiente basada en los valores de las variables independientes. (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\"><strong>trees<\/strong><\/a>, and classifies cases into groups or forecasts values \u200b\u200bof a dependent variable based on the values \u200b\u200bof the independent variables.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Segmentation<\/strong>. Identify individuals who can be members of a specific group.<strong><\/strong><\/li><li><strong>Stratification<\/strong>. Assign the cases to a category among several, for example groups of high risk, low risk or intermediate risk.<\/li><li><strong>Prediction<\/strong>. Create rules and use them to predict future events, such as likelihood of a person causing default on a credit, or the resale value of a vehicle or property <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">A <strong>tree-based data analysis<\/strong> It allows identifying homogeneous groups with high or low risk and facilitates the construction of rules to make forecasts about individual cases.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For trees both dependent and independent variables can be nominal, ordinal and scale.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>They are <strong>nominal<\/strong> when their values \u200b\u200bpresent categories that do not obey an intrinsic order. For example, the area where an employee works<\/li><li>They are <strong>ordinal<\/strong> when their values \u200b\u200bpresent categories with some intrinsic order. For example, the levels of satisfaction of a service.<\/li><li>They are <strong>of scale<\/strong> when their values \u200b\u200brepresent categories ordered with a metric with meaning, because here the comparisons of distance between values \u200b\u200bare adequate. For example, age in years, income in currency, etc.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Types of trees<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The three types of trees most used today are: CHAID trees, CART trees and QUEST trees<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>CHAID trees<\/strong> (Chi-square Automatic Interaction Detector). It is the conclusion of a series of methods based on the automatic interaction detector (AID) of Morgan and Sonquist. It is a useful exploratory method to identify important variables and their interactions focused on segmentation and descriptive analysis.<\/li><li><strong>CART trees<\/strong> (Classification and Regression Tree). It is an alternative to the exhaustive CHAID for classification trees with categorical dependent variables. For what is used for classification with qualitative dependent variables and for regression with quantitative dependent variable, generating binary trees.<\/li><li><strong>QUEST trees<\/strong> (Quick, Unbiased, Efficient, Statistica Tree). It consists of an arborescent classification algorithm specially created to solve two of the main problems presented by the comprehensive CART and CHAID methods when dividing a group of subjects according to an independent variable.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In this article we will focus on the CART trees for <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"En este art\u00edculo nos enfocaremos en los \u00e1rboles CART para regresi\u00f3n y en el siguiente lo haremos para clasificaci\u00f3n. (opens in a new tab)\">regression<\/a><\/strong> and in the next one we will do it for classification.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Algorithm for Regression CART Trees<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Suppose we have a data set with two dependent variables X1 and X2 and Y being the dependent variable to predict.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_2.png\" alt=\"\" class=\"wp-image-385\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_2.png 837w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_2-300x170.png 300w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_2-768x435.png 768w\" sizes=\"auto, (max-width: 837px) 100vw, 837px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">According to some established criteria we could begin to segment the data in relation to certain values \u200b\u200bfor X1 and X2. For example, if we need to create groups with data where X1 is less than 20, we would have a group for X1 &lt;20 and a group for X1&gt; = 20, so for the algorithm we created a division in X1 = 20<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"747\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_3.png\" alt=\"\" class=\"wp-image-387\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_3.png 747w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_3-300x190.png 300w\" sizes=\"auto, (max-width: 747px) 100vw, 747px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Then, for the data where X1&gt; 20 we need to create a group of values \u200b\u200bwith X2&gt; 170 and X2 &lt;= 170, so we mark another division as in the following graph.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"761\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_4.png\" alt=\"\" class=\"wp-image-388\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_4.png 761w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_4-300x187.png 300w\" sizes=\"auto, (max-width: 761px) 100vw, 761px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">and after that, we create two more divisions, one for the data where X1 &lt;20, we divide into two groups, those with values \u200b\u200bfor X2 smaller and greater than 200. Division 4 for the data where X2&gt; 20 and X1 &lt;170 , we require those that are greater than 40 in X1<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"753\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_5.png\" alt=\"\" class=\"wp-image-389\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_5.png 753w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_5-300x189.png 300w\" sizes=\"auto, (max-width: 753px) 100vw, 753px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As the segments are created, a binary, tree-like structure is formed in the following way, where we represent the groups based on the division lines shown in the previous graphs.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_6.png\"><img loading=\"lazy\" decoding=\"async\" width=\"739\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_6.png\" alt=\"\" class=\"wp-image-390\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_6.png 739w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_6-300x192.png 300w\" sizes=\"auto, (max-width: 739px) 100vw, 739px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The blue nodes represent the four divisions that were made in the graphs of the data sets and the white nodes, the data belonging to each group. If in each of these groups the value of the dependent variable is the same within the same group and different between groups. We can now predict the value of Y (dependent variable) for unknown or additional data with values \u200b\u200bfor X1 and X2 that fall into a specific group.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_7.png\"><img loading=\"lazy\" decoding=\"async\" width=\"839\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_7.png\" alt=\"\" class=\"wp-image-391\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_7.png 839w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_7-300x169.png 300w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_7-768x434.png 768w\" sizes=\"auto, (max-width: 839px) 100vw, 839px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The green boxes represent the value of Y (the dependent variable) and now represented in the graph of the tree, we give it in the following way:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_8.png\"><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"474\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_8.png\" alt=\"\" class=\"wp-image-392\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_8.png 720w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_8-300x198.png 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">With this diagram we can determine that, for example, the value of Y for a point given by (28, 115) where X1 = 28 and X2 = 115, then Y will be equal to -64.10<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Arboles de Regresion Parte 1\" width=\"780\" height=\"439\" src=\"https:\/\/www.youtube.com\/embed\/WiP2B_WYtp8?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Regression trees with Python<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For the example with <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.373&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Para el ejemplo con python, supongamos que tenemos el siguiente conjunto de datos que representa el salario de un empleado de acuerdo al nivel y puesto en el que se encuentra en la organizaci\u00f3n (opens in a new tab)\">python<\/a><\/strong>, suppose we have the following set of data that represents the salary of an employee according to the level and position in which he is in the organization<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Position<\/strong><\/td><td><strong>level<\/strong><\/td><td><strong>Salary<\/strong><\/td><\/tr><tr><td>Business Analyst<\/td><td>1<\/td><td>45,000<\/td><\/tr><tr><td>Junior Consultant<\/td><td>2<\/td><td>50,000<\/td><\/tr><tr><td>Senior Consultant<\/td><td>3<\/td><td>60,000<\/td><\/tr><tr><td>Manager<\/td><td>4<\/td><td>80,000<\/td><\/tr><tr><td>Country Manager<\/td><td>5<\/td><td>110,000<\/td><\/tr><tr><td>Region Manager<\/td><td>6<\/td><td>150,000<\/td><\/tr><tr><td>Partner<\/td><td>7<\/td><td>200,000<\/td><\/tr><tr><td>Senior Partner<\/td><td>8<\/td><td>300,000<\/td><\/tr><tr><td>C-level<\/td><td>9<\/td><td>500,000<\/td><\/tr><tr><td>CEO<\/td><td>10<\/td><td>1,000,000<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We load the libraries and the data file that contains the previous table, where we will use the level as the independent variable and the salary as the dependent variable.<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted\"># Regression trees import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv (&#039;Salary_per_Position.csv&#039;) X = dataset.iloc [:, 1: 2] .values \u200b\u200by = dataset.iloc [ :, 2] .values\n<\/pre>\n\n\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Once we have executed the previous code fragment, we obtain the dataset and the variables X and Y<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_9.png\"><img loading=\"lazy\" decoding=\"async\" width=\"470\" height=\"428\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_9.png\" alt=\"\" class=\"wp-image-394\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_9.png 470w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_9-300x273.png 300w\" sizes=\"auto, (max-width: 470px) 100vw, 470px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_10.png\"><img loading=\"lazy\" decoding=\"async\" width=\"594\" height=\"421\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_10.png\" alt=\"\" class=\"wp-image-395\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_10.png 594w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_10-300x213.png 300w\" sizes=\"auto, (max-width: 594px) 100vw, 594px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The next step is to import the DecisionTreeRegressor class from the tree package of the sklearn library to create a regressor object and adjust the X and Y data and then make the prediction with a Level 6.5 value, that is, a position in the organization that has the level 6.5<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted\">#Regional Linear Simple import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv (&#039;Salary_per_Position.csv&#039;) X = dataset.iloc [:, 1: 2] .values \u200b\u200by = dataset.iloc [ :, 2] .values \u200b\u200b# Setting the decision tree to the dataset from sklearn.tree import DecisionTreeRegressor regressor = DecisionTreeRegressor (random_state = 0) regressor.fit (X, y) # Salary prediction for level 6.5 y_pred = regressor.predict ( 6.5)\n<\/pre>\n\n\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">After executing the last fragments of the code, y_pred which is the variable where it saves the result of predicting which group the value of 6.5 belongs to for the salary level we have the following:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_11.png\"><img loading=\"lazy\" decoding=\"async\" width=\"416\" height=\"338\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_11.png\" alt=\"\" class=\"wp-image-396\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_11.png 416w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_11-300x244.png 300w\" sizes=\"auto, (max-width: 416px) 100vw, 416px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We observe that the result is 150,000, which corresponds to the salary of level 6, that is, the node of level 6 corresponds to values \u200b\u200bof X from 5.6 to 6.5, to observe it that way we create the following graph:<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted\"># Visualization of decision tree results X_grid = np.arange (min (X), max (X), 0.01) X_grid = X_grid.reshape ((len (X_grid), 1)) plt.scatter (X, y, color = &#039;network&#039;) plt.plot (X_grid, regressor.predict (X_grid), color = &#039;blue&#039;) plt.title (&#039;Decision Tree Regression&#039;) plt.xlabel (&#039;Position Level&#039;) plt.ylabel (&#039;Salary &#039;) plt.show ()\n<\/pre>\n\n\n<\/div>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_12.png\"><img loading=\"lazy\" decoding=\"async\" width=\"656\" height=\"584\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_12.png\" alt=\"\" class=\"wp-image-397\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_12.png 656w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Arboles_12-300x267.png 300w\" sizes=\"auto, (max-width: 656px) 100vw, 656px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As seen in the stepped graph, the value of 150,000 for the salary is maintained from level 5.6 to level 6.5. A level 6.6 would already give us a salary of 200,000 corresponding to level 7.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Regression trees allow you to predict a value for the dependent variable that belongs to a group created by the tree.<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u00c1rboles de Regresi\u00f3n Parte 2 ejemplo con Python\" width=\"780\" height=\"439\" src=\"https:\/\/www.youtube.com\/embed\/OzQfeYiblKc?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Resources and additional comments<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Run the models of <strong><a rel=\"noreferrer noopener\" aria-label=\"Ejecutar los modelos de machine learning en la nube puede (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\">machine learning<\/a><\/strong> in the cloud it can be an advantage depending on the amount of data we have given that we could require more processing power for model training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this sense, knowing the advantages of cloud services becomes an important need, so in the following link you could learn, in a very economical way, AWS (Amazon Web Service) with the certification of either <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.373&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"En este sentido, conocer las ventajas de los servicios en la nube se convierte en una necesidad importante, por lo que en el siguiente enlace podr\u00edas aprender, de manera muy econ\u00f3mica, AWS (Amazon Web Service) con la certificaci\u00f3n ya sea de asociado o profesional: AWS Professional Certification (opens in a new tab)\">associated <\/a><\/strong>or professional: <strong><a rel=\"noreferrer noopener\" aria-label=\"En este sentido, conocer las ventajas de los servicios en la nube se convierte en una necesidad importante, por lo que en el siguiente enlace podr\u00edas aprender, de manera muy econ\u00f3mica, AWS (Amazon Web Service) con la certificaci\u00f3n ya sea de asociado o profesional: AWS Professional Certification (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.372&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\">AWS Professional Certification<\/a><\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If, on the other hand, you are interested in Azure, here you can see for free a webinar that presents all the advantages and how to start with <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Si por el contrario te interesa Azure, aqu\u00ed puedes ver de forma gratuita un webinar que te presenta todas las ventajas y como iniciar con Azure (opens in a new tab)\">Azure<\/a><\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>","protected":false},"excerpt":{"rendered":"<p>In data mining, machine learning and \/ or data science, in terms of tree analysis, there are two main approaches: decision trees and regression trees. In both cases trees are predictive methods of segmentation, known as classification trees. They are sequential partitions of the data set made to maximize the differences of the dependent variable given that a division of the cases into groups is carried out. Through different indexes and statistical procedures, the most discriminating division among the selected criteria is determined, the one that makes it possible to better differentiate the different groups from the base criterion, thus obtaining a first segmentation. From that first segmentation, new segmentations are made of each of the resulting segments and so on until the process ends with some statistical rule. <\/p>","protected":false},"author":2,"featured_media":402,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[25,35,46],"tags":[66,67,68,69,57,58,56,50,70,59,51],"class_list":["post-382","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-algoritmos","category-inteligencia-artificial","category-machine-learning","tag-analisis-de-datos","tag-arboles-de-clasificacion","tag-arboles-de-decision","tag-arboles-de-regresion","tag-ciencia-de-datos","tag-data-mining","tag-data-science","tag-machine-learning","tag-metodos-predictivos","tag-mineria-de-datos","tag-regresion"],"aioseo_notices":[],"author_meta":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"featured_img":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/destacada_arboles_regresion-300x165.png","featured_image_src":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/destacada_arboles_regresion.png","featured_image_src_square":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/destacada_arboles_regresion.png","author_info":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/algoritmos\/\" class=\"advgb-post-tax-term\">Algoritmos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Machine Learning<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Algoritmos<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">Machine Learning<\/span>"]},"tags":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">an\u00e1lisis de datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">\u00e1rboles de clasificaci\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">\u00e1rboles de decisi\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">\u00e1rboles de regresi\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Ciencia de Datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Data Mining<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Data Science<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">machine learning<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">m\u00e9todos predictivos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Miner\u00eda de Datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">regresi\u00f3n<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">an\u00e1lisis de datos<\/span>","<span class=\"advgb-post-tax-term\">\u00e1rboles de clasificaci\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">\u00e1rboles de decisi\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">\u00e1rboles de regresi\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">Ciencia de Datos<\/span>","<span class=\"advgb-post-tax-term\">Data Mining<\/span>","<span class=\"advgb-post-tax-term\">Data Science<\/span>","<span class=\"advgb-post-tax-term\">machine learning<\/span>","<span class=\"advgb-post-tax-term\">m\u00e9todos predictivos<\/span>","<span class=\"advgb-post-tax-term\">Miner\u00eda de Datos<\/span>","<span class=\"advgb-post-tax-term\">regresi\u00f3n<\/span>"]}},"comment_count":"4","relative_dates":{"created":"Posted 8 years ago","modified":"Updated 6 years ago"},"absolute_dates":{"created":"Posted on September 15, 2018","modified":"Updated on March 27, 2020"},"absolute_dates_time":{"created":"Posted on September 15, 2018 10:52 pm","modified":"Updated on March 27, 2020 10:48 pm"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/comments?post=382"}],"version-history":[{"count":12,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/382\/revisions"}],"predecessor-version":[{"id":1446,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/382\/revisions\/1446"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media\/402"}],"wp:attachment":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media?parent=382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/categories?post=382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/tags?post=382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}