{"id":295,"date":"2018-09-09T19:34:00","date_gmt":"2018-09-10T01:34:00","guid":{"rendered":"https:\/\/jacobsoft.com.mx\/?p=295"},"modified":"2021-08-19T23:11:20","modified_gmt":"2021-08-20T05:11:20","slug":"regresion-lineal-simple-con-python","status":"publish","type":"post","link":"https:\/\/www.jacobsoft.com.mx\/en\/regresion-lineal-simple-con-python\/","title":{"rendered":"Simple Linear Regression with Python"},"content":{"rendered":"<h1 class=\"wp-block-heading\">Linear Regression with Python<\/h1>\n\n\n\n<p><strong>Introduction to Linear Regression<\/strong><\/p>\n\n\n\n<p>The <strong>linear regression<\/strong> is one of the analytical or inference methods, where some of the variables stand out as <strong><em>main dependent<\/em><\/strong> in relation to the rest of the variables, that is, the dependent variable is defined or explained by the others <strong><em>independent variables<\/em><\/strong>.<\/p>\n\n\n\n<p>The relationship that exists between the dependent variable and the independent variables could be linked to a possible equation or model that links them, mainly when all the variables are quantitative. In this way, you can reach <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.373&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"La relaci\u00f3n que existe entre la variable dependiente y las variables independientes podr\u00eda estar ligada a una posible ecuaci\u00f3n o modelo que las liga, principalmente cuando todas las variables son cuantitativas. De esta forma, se podr\u00e1 llegar a predecir el valor de la variable dependiente conociendo el perfil de todas las dem\u00e1s. (opens in a new tab)\">predict <\/a><\/strong>the value of the dependent variable knowing the profile of all the others.<\/p>\n\n\n\n<p>If the dependent variable is qualitative dichotomous, that is, (0, 1) or (Yes, No), then the linear regression could be used as <strong>sorter<\/strong>. If the qualitative dependent variable confirms the assignment of each element in previously defined groups, two or more, it can be used to classify new cases and convert it into the <strong>discriminant analysis<\/strong>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<script async src=\"https:\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2380084220870127\"\n     crossorigin=\"anonymous\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block; text-align:center;\"\n     data-ad-layout=\"in-article\"\n     data-ad-format=\"fluid\"\n     data-ad-client=\"ca-pub-2380084220870127\"\n     data-ad-slot=\"2437322509\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p><\/p>\n\n\n\n<p>On the other hand, if the dependent variable is qualitative and the independent variables are quantitative, it is a model of <strong>variance analysis<\/strong>. But if the dependent variable is qualitative or quantitative and the independent variables are qualitative, then it is a case of <strong>segmentation<\/strong>.<\/p>\n\n\n\n<p>In the <strong>linear regression<\/strong> both the independent variables and the dependent variable are quantitative and the linear model is given by the equation:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal.png\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"23\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal-300x23.png\" alt=\"\" class=\"wp-image-298\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal-300x23.png 300w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal.png 477w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Where b1, b2, ... bn are the coefficients or parameters that denote the magnitude of the effect that the independent variables x1, x2, ... xn have on the independent variable y.<\/p>\n\n\n\n<p>The coefficient b0 is the constant term oo independent of the model. yu is the term that represents the model error.<\/p>\n\n\n\n<p>Ahora, si se dispone de un conjunto de observaciones para cada una de las variables independientes y dependiente, \u00bfC\u00f3mo podemos entonces&nbsp; conocer los valores num\u00e9ricos de los par\u00e1metros b0, b1, .. bn basados en los datos de las variables? Esto es conocido como estimaci\u00f3n de los par\u00e1metros del modelo y una vez obtenidos estos valores, se podr\u00e1 realizar una <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Ahora, si se dispone de un conjunto de observaciones para cada una de las variables independientes y dependiente, \u00bfC\u00f3mo podemos entonces&nbsp; conocer los valores num\u00e9ricos de los par\u00e1metros b0, b1, .. bn basados en los datos de las variables? Esto es conocido como estimaci\u00f3n de los par\u00e1metros del modelo y una vez obtenidos estos valores, se podr\u00e1 realizar una predicci\u00f3n del comportamiento futuro de la variable y. (opens in a new tab)\">prediction<\/a><\/strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Ahora, si se dispone de un conjunto de observaciones para cada una de las variables independientes y dependiente, \u00bfC\u00f3mo podemos entonces&nbsp; conocer los valores num\u00e9ricos de los par\u00e1metros b0, b1, .. bn basados en los datos de las variables? Esto es conocido como estimaci\u00f3n de los par\u00e1metros del modelo y una vez obtenidos estos valores, se podr\u00e1 realizar una predicci\u00f3n del comportamiento futuro de la variable y. (opens in a new tab)\"> <\/a>of the future behavior of the variable y.<\/p>\n\n\n\n<p>For example, if we have a hypothetical case where we create a table with information about sales in past periods, based on the advertising expense, the number of interested prospects and the number of quotes made in each period, we could infer sales for later periods:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Prospects<\/strong><\/td><td><strong>Advertising<\/strong><\/td><td><strong>Quotes<\/strong><\/td><td><strong>Sales<\/strong><\/td><\/tr><tr><td>300<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5,000.00<\/td><td>100<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 50,000.00<\/td><\/tr><tr><td>400<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4,500.00<\/td><td>120<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 45,000.00<\/td><\/tr><tr><td>200<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7,000.00<\/td><td>90<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 30,000.00<\/td><\/tr><tr><td>800<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7,000.00<\/td><td>350<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 90,000.00<\/td><\/tr><tr><td>600<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7,500.00<\/td><td>220<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 75,000.00<\/td><\/tr><tr><td>650<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4,800.00<\/td><td>300<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 81,000.00<\/td><\/tr><tr><td>180<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3,000.00<\/td><td>100<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 28,000.00<\/td><\/tr><tr><td>700<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6,500.00<\/td><td>400<\/td><td>&nbsp;&nbsp; 128,000.00<\/td><\/tr><tr><td>700<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5,400.00<\/td><td>300<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 89,000.00<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Prospects (x1), advertising (x2) and quotes (x3) are the independent variables of 9 cases and sales is the dependent variable of those same 9 cases or observations.<\/p>\n\n\n\n<p>Using the linear regression we could calculate the sales that would be had if there were 900 prospects, an advertising expense of 10,000 and the realization of 500 quotes. For this we could apply the model, so it would be necessary to know what the values \u200b\u200bof the parameters b1, b2 and b3 are mainly.<\/p>\n\n\n\n<p><strong>Simple Linear Regression<\/strong><\/p>\n\n\n\n<p>In the case of simple linear regression, the model would only have one coefficient, since there would only be one independent variable, as in the following equation:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal-simple_.png\"><img loading=\"lazy\" decoding=\"async\" width=\"219\" height=\"44\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/ecuacion-regresion-lineal-simple_.png\" alt=\"\" class=\"wp-image-305\"\/><\/a><\/figure>\n\n\n\n<p>where <strong>i<\/strong> = 1 .. n and <strong>n<\/strong> is the total of cases or observations<\/p>\n\n\n\n<p>when we clear b0 we have the equation:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b0-regresion-lineal.png\"><img loading=\"lazy\" decoding=\"async\" width=\"174\" height=\"39\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b0-regresion-lineal.png\" alt=\"\" class=\"wp-image-307\"\/><\/a><\/figure>\n\n\n\n<p>where <strong>Y<\/strong>(average) is the average value of the dependent variable for all cases and&nbsp; <strong>x<\/strong>(average) is the average of the values \u200b\u200bof the independent variable for all cases<\/p>\n\n\n\n<p>When clearing b1 for all cases we have that the value of b1 is given by:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b1-regresion-lineal.png\"><img loading=\"lazy\" decoding=\"async\" width=\"431\" height=\"90\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b1-regresion-lineal.png\" alt=\"\" class=\"wp-image-308\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b1-regresion-lineal.png 431w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/coeficiente-b1-regresion-lineal-300x63.png 300w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><\/a><\/figure>\n\n\n\n<p>Once the coefficients of the equation have been calculated, we can calculate y for new values \u200b\u200bof x with which we will be making a prediction.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/resumen-regresion-lineal.png\"><img loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"285\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/resumen-regresion-lineal.png\" alt=\"\" class=\"wp-image-311\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/resumen-regresion-lineal.png 609w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/resumen-regresion-lineal-300x140.png 300w\" sizes=\"auto, (max-width: 609px) 100vw, 609px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>In summary, simple linear regression can be seen as follows:<\/p>\n\n\n\n<p>We can calculate any value of y for values \u200b\u200bof x that are not in the current data set and that we use as a training set, that is, for the calculation of the coefficients of the equation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<script async src=\"https:\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2380084220870127\"\n     crossorigin=\"anonymous\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block; text-align:center;\"\n     data-ad-layout=\"in-article\"\n     data-ad-format=\"fluid\"\n     data-ad-client=\"ca-pub-2380084220870127\"\n     data-ad-slot=\"2437322509\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=347188.10000502&amp;type=3&amp;subid=0\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Python (opens in a new tab)\">Python<\/a><\/h2>\n\n\n\n<p>For the example with python, let&#039;s consider the following data table:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Years of experience<\/strong><\/td><td><strong>Salary<\/strong><\/td><\/tr><tr><td>1.1<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 39,343.00<\/td><\/tr><tr><td>1.3<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 46,205.00<\/td><\/tr><tr><td>1.5<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 37,731.00<\/td><\/tr><tr><td>2<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 43,525.00<\/td><\/tr><tr><td>2.2<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 39,891.00<\/td><\/tr><tr><td>2.9<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 56,642.00<\/td><\/tr><tr><td>3<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 60,150.00<\/td><\/tr><tr><td>3.2<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 54,445.00<\/td><\/tr><tr><td>3.2<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 64,445.00<\/td><\/tr><tr><td>3.7<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 57,189.00<\/td><\/tr><tr><td>3.9<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 63,218.00<\/td><\/tr><tr><td>4<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 55,794.00<\/td><\/tr><tr><td>4<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 56,957.00<\/td><\/tr><tr><td>4.1<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 57,081.00<\/td><\/tr><tr><td>4.5<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 61,111.00<\/td><\/tr><tr><td>4.9<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 67,938.00<\/td><\/tr><tr><td>5.1<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 66,029.00<\/td><\/tr><tr><td>5.3<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 83,088.00<\/td><\/tr><tr><td>5.9<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 81,363.00<\/td><\/tr><tr><td>6<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 93,940.00<\/td><\/tr><tr><td>6.8<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 91,738.00<\/td><\/tr><tr><td>7.1<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp; 98,273.00<\/td><\/tr><tr><td>7.9<\/td><td>&nbsp; 101,302.00<\/td><\/tr><tr><td>8.2<\/td><td>&nbsp; 113,812.00<\/td><\/tr><tr><td>8.7<\/td><td>&nbsp; 109,431.00<\/td><\/tr><tr><td>9<\/td><td>&nbsp; 105,582.00<\/td><\/tr><tr><td>9.5<\/td><td>&nbsp; 116,969.00<\/td><\/tr><tr><td>9.6<\/td><td>&nbsp; 112,635.00<\/td><\/tr><tr><td>10.3<\/td><td>&nbsp; 122,391.00<\/td><\/tr><tr><td>10.5<\/td><td>&nbsp; 121,872.00<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Where the independent variable X represents the years of experience and the dependent variable AND the salary. We save this data in a text file separated by commas as CSV using Excel and we give it the name Salary_Data.csv<\/p>\n\n\n\n<p>The first thing we will do is import the libraries that we are going to require:<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n<\/pre>\n\n\n<\/div>\n\n\n\n<p>numpy is a fundamental package for scientific computation with python because it contains objects and functions to perform operations with multidimensional arrays, tools to integrate fortran code and C \/ C ++, support for linear algebra, Fourier transform and the ability to generate numbers random.<\/p>\n\n\n\n<p>matplotlib contains the tools, functions and objects to create graphics<\/p>\n\n\n\n<p>pandas is the library to manipulate data structures, it is an extension of numpy that also allows the manipulation of external data files.<\/p>\n\n\n\n<p>So the next step is to load the Salary_Data.csv file using the pandas library and separate the dependent and independent variables:<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n\n<span style=\"color: #339966;\"># We load the data set<\/span>\ndataset = pd.read_csv (&#039;<span style=\"color: #008000;\">Salary_Data.csv<\/span>&#039;) x = dataset.iloc [:,:<span style=\"color: #0000ff;\">-1<\/span>] .values \u200b\u200by = dataset.iloc [:, <span style=\"color: #0000ff;\">1<\/span>] .values<\/pre>\n\n\n<\/div>\n\n\n\n<p>Once the independent and dependent variables are created with the data of years of experience and salary, we will divide both vectors into two vectors each. The first one with random data extracted from the independent variable (years of experience) to create a set of data to train the model, that is, to calculate the coefficients a and b (or b0 and b1), which we call the training set and the second to test the model, so we call it the test set.<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n\n<span style=\"color: #339966;\"># We load the data set<\/span>\ndataset = pd.read_csv (&#039;<span style=\"color: #008000;\">Salary_Data.csv<\/span>&#039;) x = dataset.iloc [:,:<span style=\"color: #0000ff;\">-1<\/span>] .values \u200b\u200by = dataset.iloc [:, <span style=\"color: #0000ff;\">1<\/span>] .values\n\n<span style=\"color: #339966;\"># We divide the data into the training set and the test set<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.cross_validation <span style=\"color: #0000ff;\">import<\/span> train_test_split x_train, x_test, y_train, y_test = train_test_split (x, y, test_size =<span style=\"color: #0000ff;\">1\/3<\/span>, random_state =<span style=\"color: #0000ff;\">0<\/span>)\n\n<\/pre>\n\n\n<\/div>\n\n\n\n<p>From the Sci-Kit Learn (sklearn) library we import the function train_test_split to divide the data. In the function we use the variables x and y plus two additional parameters: test_size and random_state. With the first one (test_size) we indicate that the size for the test set, stored in the variables x_test and y_test, will have a third of the total set, ie 10 records, given that the total set is 30 and the 10 selected will be chosen from randomly from among those 30, since the last parameter of the function is random_state = 0.<\/p>\n\n\n\n<p>The next step is to load the training set to the linear regression model to calculate the coefficients, that is, to train the model.<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n\n<span style=\"color: #339966;\"># We load the data set<\/span>\ndataset = pd.read_csv (&#039;<span style=\"color: #008000;\">Salary_Data.csv<\/span>&#039;) x = dataset.iloc [:,:<span style=\"color: #0000ff;\">-1<\/span>] .values \u200b\u200by = dataset.iloc [:, <span style=\"color: #0000ff;\">1<\/span>] .values\n\n<span style=\"color: #339966;\"># We divide the data into the training set and the test set<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.cross_validation <span style=\"color: #0000ff;\">import<\/span> train_test_split x_train, x_test, y_train, y_test = train_test_split (x, y, test_size =<span style=\"color: #0000ff;\">1\/3<\/span>, random_state =<span style=\"color: #0000ff;\">0<\/span>)\n\n<span style=\"color: #339966;\"># We load the training set to the Linear Regression model<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.linear_model <span style=\"color: #0000ff;\">import<\/span> LinearRegression regressor = LinearRegression () regresor.fit (x_train, y_train)\n<\/pre>\n\n\n<\/div>\n\n\n\n<p>From the same library sklearn in the subpackage linear_model we import the class LinearRegression and create the object regressor with the constructor of the class.<br>Once the object is created, we invoke the fit () method by providing the training data.<\/p>\n\n\n\n<p>Now with the trained model, we can now predict the values \u200b\u200bof Y with the test set x_test and compare with the real values \u200b\u200by_test.<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n\n<span style=\"color: #339966;\"># We load the data set<\/span>\ndataset = pd.read_csv (&#039;<span style=\"color: #008000;\">Salary_Data.csv<\/span>&#039;) x = dataset.iloc [:,:<span style=\"color: #0000ff;\">-1<\/span>] .values \u200b\u200by = dataset.iloc [:, <span style=\"color: #0000ff;\">1<\/span>] .values\n\n<span style=\"color: #339966;\"># We divide the data into the training set and the test set<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.cross_validation <span style=\"color: #0000ff;\">import<\/span> train_test_split x_train, x_test, y_train, y_test = train_test_split (x, y, test_size =<span style=\"color: #0000ff;\">1\/3<\/span>, random_state =<span style=\"color: #0000ff;\">0<\/span>)\n\n<span style=\"color: #339966;\"># We load the training set to the Linear Regression model<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.linear_model <span style=\"color: #0000ff;\">import<\/span> LinearRegression regressor = LinearRegression () regressor.fit (x_train, y_train)\n\n<span style=\"color: #339966;\"># Prediction of the results of the test suite (x_test)<\/span>\ny_pred = regressor.predict (x_test)\n\n<span style=\"color: #339966;\">#Now we compare y_pred with the real values \u200b\u200by_test<\/span><\/pre>\n\n\n<\/div>\n\n\n\n<p>Now that we have the values \u200b\u200bcalculated with the linear regression model, we can compare them against the real values \u200b\u200b(y_pred vs y_test) by showing a graph.<\/p>\n\n\n<div id=\"code\">\n\n\n\n<pre class=\"wp-block-preformatted brush: cpp; gutter: true; first-line: 1\"><span style=\"color: #0000ff;\"><span style=\"color: #339966;\"># Simple Linear Retraction<\/span>\nimport<\/span> numpy <span style=\"color: #0000ff;\">ace<\/span> np\n<span style=\"color: #0000ff;\">import<\/span> matplotlib.pyplot <span style=\"color: #0000ff;\">ace<\/span> plt\n<span style=\"color: #0000ff;\">import<\/span> pandas <span style=\"color: #0000ff;\">ace<\/span> P.S\n\n<span style=\"color: #339966;\"># We load the data set<\/span>\ndataset = pd.read_csv (&#039;<span style=\"color: #008000;\">Salary_Data.csv<\/span>&#039;) x = dataset.iloc [:,:<span style=\"color: #0000ff;\">-1<\/span>] .values \u200b\u200by = dataset.iloc [:, <span style=\"color: #0000ff;\">1<\/span>] .values\n\n<span style=\"color: #339966;\"># We divide the data into the training set and the test set<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.cross_validation <span style=\"color: #0000ff;\">import<\/span> train_test_split x_train, x_test, y_train, y_test = train_test_split (x, y, test_size =<span style=\"color: #0000ff;\">1\/3<\/span>, random_state =<span style=\"color: #0000ff;\">0<\/span>)\n\n<span style=\"color: #339966;\"># We load the training set to the Linear Regression model<\/span>\n<span style=\"color: #0000ff;\">desde<\/span> sklearn.linear_model <span style=\"color: #0000ff;\">import<\/span> LinearRegression regressor = LinearRegression () regressor.fit (x_train, y_train)\n\n<span style=\"color: #339966;\"># Prediction of the results of the test suite (x_test)<\/span>\ny_pred = regressor.predict (x_test)\n\n<span style=\"color: #339966;\">#Now we compare y_pred with the real values \u200b\u200by_test\n<span style=\"color: #000000;\">plt.scatter (x_test, y_test, color = &#039;<span style=\"color: #008000;\">net<\/span>')<\/span>\n<span style=\"color: #000000;\">plt.plot (x_train, y_pred, color = &#039;<span style=\"color: #008000;\">blue<\/span>')<\/span>\n<span style=\"color: #000000;\">plt.title (&#039;<span style=\"color: #008000;\">Salary vs. Experience (Trial Set)<\/span>')<\/span>\n<span style=\"color: #000000;\">plt.xlabel (&#039;<span style=\"color: #008000;\">Years of experience<\/span>')<\/span>\n<span style=\"color: #000000;\">plt.ylabel (&#039;<span style=\"color: #008000;\">Salary<\/span>')<\/span>\n<span style=\"color: #000000;\">plt.show ()<\/span>\n<\/span><\/pre>\n\n\n<\/div>\n\n\n\n<p>With red dots the real values \u200b\u200bfor Y of the test set are shown and the blue line shows the regression line calculated for the predicted values.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Grafica-RegresionLineal.png\"><img loading=\"lazy\" decoding=\"async\" width=\"711\" height=\"471\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Grafica-RegresionLineal.png\" alt=\"\" class=\"wp-image-326\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Grafica-RegresionLineal.png 711w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/Grafica-RegresionLineal-300x199.png 300w\" sizes=\"auto, (max-width: 711px) 100vw, 711px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>The values \u200b\u200bof both vectors are the following:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"537\" height=\"381\" src=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/comparativa-regresionlineal.png\" alt=\"\" class=\"wp-image-327\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/comparativa-regresionlineal.png 537w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/comparativa-regresionlineal-300x213.png 300w\" sizes=\"auto, (max-width: 537px) 100vw, 537px\" \/><\/figure><\/div>\n\n\n\n<p><a href=\"http:\/\/jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/comparativa-regresionlineal.png\"><br><\/a><\/p>\n\n\n\n<script async src=\"https:\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-2380084220870127\"\n     crossorigin=\"anonymous\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block; text-align:center;\"\n     data-ad-layout=\"in-article\"\n     data-ad-format=\"fluid\"\n     data-ad-client=\"ca-pub-2380084220870127\"\n     data-ad-slot=\"2437322509\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p><\/p>\n\n\n\n<p>The table on the left side shows the years of experience (x_test), for the 10 values \u200b\u200bof the test set, the table in the middle y_test is the original salary corresponding to the years of experience. The table on the far right shows the values \u200b\u200bcalculated with the linear regression model for the salary corresponding to years of experience.<\/p>\n\n\n\n<p>For example, in the second line mentions that with 10.3 years of experience the current salary is 123,079.39 and the prediction indicates a salary of 122,391.00 the difference is 688.39<\/p>\n\n\n\n<p>We can see that the values \u200b\u200bare different, but not very far from reality. The error is minimal and the result of the regression can be accepted for unknown values \u200b\u200bof x.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">recommendations<\/h3>\n\n\n\n<p>To start with the programming in python, this course of <strong><a rel=\"noreferrer noopener\" aria-label=\"Para iniciar con la programaci\u00f3n en python, este curso de Introducci\u00f3n a la programaci\u00f3n con python te puede ser muy \u00fatil (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=347188.10000502&amp;type=3&amp;subid=0\" target=\"_blank\">Introduction to programming with python<\/a><\/strong> It can be very useful. On the other hand, in this webinar you will be able to know the architectural details and solutions for <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.462&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Para iniciar con la programaci\u00f3n en python, este curso de Introducci\u00f3n a la programaci\u00f3n con python te puede ser muy \u00fatil. Por otro lado, en este webinar podr\u00e1s conocer los detalles de arquitectura y soluciones para machine learning de Azure. (opens in a new tab)\">Azure machine learning<\/a><\/strong>.<\/p>\n\n\n\n<p>Also for the development in the cloud we can use <strong><em>Amazon Web Services<\/em><\/strong>, these leagues are for the certification course of <strong><a rel=\"noreferrer noopener\" aria-label=\"Tambi\u00e9n para el desarrollo en la nube podemos utilizar Amazon Web Services, estas ligas son para el curso de certificaci\u00f3n de Asociado y el de Profesional en dise\u00f1o y arquitectura de AWS. (opens in a new tab)\" href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.373&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\">Associated <\/a><\/strong>and the one <strong><a href=\"https:\/\/click.linksynergy.com\/fs-bin\/click?id=cTjR400Zjac&amp;offerid=579862.372&amp;type=3&amp;subid=0&amp;LSNSUBSITE=LSNSUBSITE\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Tambi\u00e9n para el desarrollo en la nube podemos utilizar Amazon Web Services, estas ligas son para el curso de certificaci\u00f3n de Asociado y el de Profesional en dise\u00f1o y arquitectura de AWS. (opens in a new tab)\">Professional <\/a><\/strong>in AWS design and architecture.<\/p>","protected":false},"excerpt":{"rendered":"<p>Regresi\u00f3n Lineal con Python Introducci\u00f3n a la Regresi\u00f3n Lineal La regresi\u00f3n lineal es uno de &hellip; <\/p>","protected":false},"author":2,"featured_media":332,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[25,35,46],"tags":[55,54,50,53,51,52],"class_list":["post-295","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-algoritmos","category-inteligencia-artificial","category-machine-learning","tag-clasificacion","tag-inferencia","tag-machine-learning","tag-prediccion","tag-regresion","tag-regresion-lineal"],"aioseo_notices":[],"author_meta":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"featured_img":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/imagen-destacada-regresion-lineal-simple-300x165.png","featured_image_src":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/imagen-destacada-regresion-lineal-simple.png","featured_image_src_square":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/09\/imagen-destacada-regresion-lineal-simple.png","author_info":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/algoritmos\/\" class=\"advgb-post-tax-term\">Algoritmos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Machine Learning<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Algoritmos<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">Machine Learning<\/span>"]},"tags":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">inferencia<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">machine learning<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">predicci\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">regresi\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">regresi\u00f3n lineal<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">inferencia<\/span>","<span class=\"advgb-post-tax-term\">machine learning<\/span>","<span class=\"advgb-post-tax-term\">predicci\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">regresi\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">regresi\u00f3n lineal<\/span>"]}},"comment_count":"12","relative_dates":{"created":"Posted 8 years ago","modified":"Updated 5 years ago"},"absolute_dates":{"created":"Posted on September 9, 2018","modified":"Updated on August 19, 2021"},"absolute_dates_time":{"created":"Posted on September 9, 2018 7:34 pm","modified":"Updated on August 19, 2021 11:11 pm"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/comments?post=295"}],"version-history":[{"count":18,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/295\/revisions"}],"predecessor-version":[{"id":1803,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/295\/revisions\/1803"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media\/332"}],"wp:attachment":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media?parent=295"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/categories?post=295"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/tags?post=295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}