{"id":479,"date":"2018-10-13T22:08:05","date_gmt":"2018-10-14T04:08:05","guid":{"rendered":"http:\/\/www.jacobsoft.com.mx\/?p=479"},"modified":"2025-02-20T13:37:50","modified_gmt":"2025-02-20T19:37:50","slug":"regresion-logistica","status":"publish","type":"post","link":"https:\/\/www.jacobsoft.com.mx\/en\/regresion-logistica\/","title":{"rendered":"Logistic regression"},"content":{"rendered":"<h3 class=\"wp-block-heading\">Logistic regression <\/h3>\n\n\n\n<p>Logistic regression is a classification method, which unlike simple regression, where a continuous number is predicted, logistic regression is used to predict a category.<\/p>\n\n\n\n<p>The classification has a wide range of applications ranging from medical diagnosis to even marketing. Among the classification models we can talk about linear and non-linear models<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Linear classification models<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Logistic regression<\/li><li>SVM (Support Vector Machine)<\/li><li>Naive Bayes<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Non-linear classification models<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Decision trees<\/li><li>K-NN (K-Nearest Neighbors)<\/li><li>Kernel SVM<\/li><li>Random Forest<\/li><\/ul>\n\n\n\n<p>When a target variable takes only two values \u200b\u200bYes or No, &#039;0&#039; or &#039;1&#039;, then the classification problem is known as a binary classification problem and the effective way to deal with this type of problem is to use the <strong>Logistic regression<\/strong>.<\/p>\n\n\n\n<p>Normally the linear regression function takes values \u200b\u200bfrom a straight line, being this function of the type: <strong>f (x) = b<sub>0<\/sub> + b<sub>1<\/sub>x&nbsp;<\/strong> &nbsp;and for multiple regression <strong>f (x) = b<sub>0<\/sub> + b<sub>1<\/sub>x<sub>1<\/sub> + ... + b<sub>n<\/sub>x<sub>n<\/sub><\/strong><\/p>\n\n\n\n<p>For the classification problem &#039;0&#039;, &#039;1&#039; we can insert&nbsp; &nbsp;<strong>z = m<sup>T<\/sup>x<\/strong> in the logistic function which is known as the function <strong>sigmoid<\/strong> expressed as follows:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"133\" height=\"63\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/sigmoid.png\" alt=\"\" class=\"wp-image-507\"\/><\/figure><\/div>\n\n\n\n<p>The function maps a real number to the interval [0,1] and is used to transform any function of arbitrary value into a function that best fits the classification.<\/p>\n\n\n\n<p>the function f (x) represents the probability P (y = 1 | x; m) so the logistic regression is a type of probabilistic classification used to represent a binary response of a binary predictor.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"858\" height=\"450\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica-sigmoi.png\" alt=\"\" class=\"wp-image-508\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica-sigmoi.png 858w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica-sigmoi-300x157.png 300w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica-sigmoi-768x403.png 768w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica-sigmoi-800x420.png 800w\" sizes=\"auto, (max-width: 858px) 100vw, 858px\" \/><\/figure><\/div>\n\n\n\n<p>On the curve of the sigmoid function we have the probability value for Y, however for the binary classification, what is required is a decision boundary, which is a curve that separates the area where y = 0 from the area where y = 1 to obtain the classification 0 or 1, so the output of transforms with:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"187\" height=\"63\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/decision-boundary.png\" alt=\"\" class=\"wp-image-509\"\/><\/figure><\/div>\n\n\n\n<p>If the estimated value is less than 0.5, the output will be 0 and if the estimated value is greater than or equal to 0.5 then the output of the sigmoid function will be 1.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Regresion Logistica y Arboles de Decision ejemplo con Python\" width=\"780\" height=\"439\" src=\"https:\/\/www.youtube.com\/embed\/AahwyCvQx00?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Logistic Regression with Python<\/h2>\n\n\n\n<p>For this example of the logistic regression with python we will use a data file that contains information about clients that buy or not certain products online, for this we have information about gender, age and estimated salary, classifying customers with 0 and 1 if you did not buy or if you bought respectively.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"698\" height=\"537\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/Compras_en_linea.png\" alt=\"\" class=\"wp-image-510\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/Compras_en_linea.png 698w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/Compras_en_linea-300x231.png 300w\" sizes=\"auto, (max-width: 698px) 100vw, 698px\" \/><\/figure><\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>We load the libraries and the data set from the file with:<\/p>\n\n\n\n\n<pre><span class=\"coments\"># Logistic regression<\/span>\n\n<span class=\"coments\"># Import of libraries<\/span>\n<span class=\"keyword\">import<\/span> numpy <span class=\"keyword\">ace<\/span> np\n<span class=\"keyword\">import<\/span> matplotlib.pyplot <span class=\"keyword\">ace<\/span> plt\n<span class=\"keyword\">import<\/span> pandas <span class=\"keyword\">ace<\/span> P.S\n\n<span class=\"coments\"># Importation of the dataset<\/span>\ndataset = pd.read_csv (&#039;<span class=\"texto\">Compras_en_Linea.csv<\/span>&#039;) X = dataset.iloc [:, [<span class=\"keyword\">2<\/span>, <span class=\"keyword\">3<\/span>]]. values \u200b\u200by = dataset.iloc [:, <span class=\"keyword\">4<\/span>] .values\n<\/pre>\n\n\n\n<p>For the independent variable x, we use age and salary, columns 2 and 3, the dependent variable Y is column 4 with information on whether or not it was purchased.<\/p>\n\n\n\n<p>We use 25% of the data contained in the file for tests and 75% for the training set. The complete file contains 400 customer records, so 300 will be used for training and 100 for tests.<\/p>\n\n\n\n\n<pre><span class=\"coments\"># Division of data set in training data # and test data<\/span>\n<span class=\"keyword\">desde<\/span> sklearn.cross_validation <span class=\"keyword\">import<\/span> train_test_split X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = <span class=\"keyword\">0.25<\/span>, random_state = <span class=\"keyword\">0<\/span>)\n<\/pre>\n\n\n\n<p>Now, given that the value of age and salary are in different scales, we make a scale adjustment with the StandarScaler class<\/p>\n\n\n\n\n<pre><span class=\"coments\"># Scale adjustment<\/span>\n<span class=\"keyword\">desde<\/span> sklearn.preprocessing <span class=\"keyword\">import<\/span> StandardScaler sc = StandardScaler () X_train = sc.fit_transform (X_train) X_test = sc.transform (X_test)\n<\/pre>\n\n\n\n<p>Now we create the model with the class <strong>LogisticRegression<\/strong> from the bookstore <strong>sklearn<\/strong> and we train him with the training set data <em><strong>X_train<\/strong><\/em>.<\/p>\n\n\n\n\n<pre><span class=\"coments\">\n# Adjustment of the logistic regression to the training set<\/span>\n<span class=\"keyword\">desde<\/span> sklearn.linear_model <span class=\"keyword\">import<\/span> LogisticRegression classifier = LogisticRegression (random_state = <span class=\"keyword\">0<\/span>) classifier.fit (X_train, y_train)\n<\/pre>\n\n\n\n<p>Once trained the model we can make the prediction of the data contained in the test set <em><strong>X_test<\/strong><\/em><\/p>\n\n\n\n\n<pre><span class=\"coments\"># Test set prediction<\/span>\ny_pred = classifier.predict (X_test)\n<\/pre>\n\n\n\n<p><em><strong>y_pred<\/strong><\/em> contains the data calculated or predicted by the logistic regression model, so we can buy <em><strong>y_test<\/strong><\/em> with <em><strong>y_pred<\/strong><\/em><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"427\" height=\"531\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/resultados_logistica.png\" alt=\"\" class=\"wp-image-512\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/resultados_logistica.png 427w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/resultados_logistica-241x300.png 241w\" sizes=\"auto, (max-width: 427px) 100vw, 427px\" \/><\/figure><\/div>\n\n\n\n<p>We note that in line 9 there was an error in the prediction, which is why we can generate the confusion matrix to analyze both the false positives and the false negatives.<\/p>\n\n\n\n\n<pre><span class=\"coments\"># Confusion Matrix<\/span>\nfrom sklearn.metrics import confusion_matrix cm = confusion_matrix (y_test, y_pred)\n<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"416\" height=\"338\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/matriz_confusion_logistica.png\" alt=\"\" class=\"wp-image-513\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/matriz_confusion_logistica.png 416w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/matriz_confusion_logistica-300x244.png 300w\" sizes=\"auto, (max-width: 416px) 100vw, 416px\" \/><\/figure><\/div>\n\n\n\n<p>We observe that of the 100 records that the test set contains, 8 records that should be 1 were classified as 0 and 3 records that should be 0 were classified as 1. The results are acceptable, however to achieve better results in the classification we require more data to train the model.<\/p>\n\n\n\n<p>The classification of the customers that belong to the test set is carried out to determine if the customer buys or does not buy. This set contains unknown records for the given model that do not belong to the training set and were separated with the function <em><strong>train_test_split<\/strong><\/em> randomly.<\/p>\n\n\n\n<p>To have a clearer vision of the results we can graph the data of both the test set and the training set. In our case we will only do it for the training set as follows:<\/p>\n\n\n\n\n<pre><span class=\"coments\"># Visualization of Test results<\/span>\n<span class=\"keyword\">desde<\/span> matplotlib.colors <span class=\"keyword\">import<\/span> ListedColormap X_set, y_set = X_test, y_test X1, X2 = np.meshgrid (np.arange (start = X_set [:, <span class=\"keyword\">0<\/span>] .min () - <span class=\"keyword\">1<\/span>, stop = X_set [:, <span class=\"keyword\">0<\/span>] .max () + <span class=\"keyword\">1<\/span>, step = <span class=\"keyword\">0.01<\/span>), np.arange (start = X_set [:, <span class=\"keyword\">1<\/span>] .min () - <span class=\"keyword\">1<\/span>, stop = X_set [:, <span class=\"keyword\">1<\/span>] .max () + <span class=\"keyword\">1<\/span>, step = <span class=\"keyword\">0.01<\/span>)) plt.contourf (X1, X2, classifier.predict (np.array ([X1.ravel (), X2.ravel ()]). T) .reshape (X1.shape), alpha = <span class=\"keyword\">0.75<\/span>, cmap = ListedColormap ((&#039;red&#039;, &#039;green&#039;))) plt.xlim (X1.min (), X1.max ()) plt.ylim (X2.min (), X2.max ())\n<span class=\"keyword\">for<\/span> i, j <span class=\"keyword\">in<\/span> enumerate (np.unique (y_set)): plt.scatter (X_set [y_set == j, <span class=\"keyword\">0<\/span>], X_set [y_set == j, <span class=\"keyword\">1<\/span>], c = ListedColormap ((&#039;<span class=\"texto\">net<\/span>', '<span class=\"texto\">green<\/span>&#039;)) (i), label = j) plt.title (&#039;<span class=\"texto\">Logistic Regression (Test Set)<\/span>&#039;) plt.xlabel (&#039;<span class=\"texto\">Age<\/span>&#039;) plt.ylabel (&#039;<span class=\"texto\">Estimated salary<\/span>&#039;) plt.legend () plt.show ()\n<\/pre>\n\n\n\n<p>The graph that we obtain is the following:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"656\" height=\"584\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica_regresion_logistica.png\" alt=\"\" class=\"wp-image-516\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica_regresion_logistica.png 656w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/grafica_regresion_logistica-300x267.png 300w\" sizes=\"auto, (max-width: 656px) 100vw, 656px\" \/><\/figure><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>In the graph we can see the two regions, red for customers who do not buy with the value of y = 0 and green for customers who buy with the value of y = 1, in it we can also see the 8 green points on the red zone and the 3 red points on the green zone.<\/p>\n\n\n\n<p>Dado que de 100 registros del conjunto de prueba 11 fueron clasificados err\u00f3neamente, podemos concluir que para este caso, la precisi\u00f3n del&nbsp; modelo es del 89%<\/p>\n\n\n\n\n<pre><span class=\"coments\">#the accuracy of the model is obtained with the score<\/span>\nscore_test = classifier.score (X_test, y_test)\n<\/pre>\n\n\n\n<p>The value of the variable score_test is 0.89000, which means 89% accuracy in the classification and \/ or prediction.<\/p>\n<style class=\"advgb-styles-renderer\">\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n\n\t.coments{color:gray;}\n\t.keyword{color:blue;}\n\t.texto{color:green;}\n<\/style>","protected":false},"excerpt":{"rendered":"<p>Regresi\u00f3n Log\u00edstica La regresi\u00f3n log\u00edstica es un m\u00e9todo de clasificaci\u00f3n, que a diferencia de la &hellip; <\/p>","protected":false},"author":2,"featured_media":481,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[25,35,46],"tags":[66,57,55,56,82,50,51,84],"class_list":["post-479","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-algoritmos","category-inteligencia-artificial","category-machine-learning","tag-analisis-de-datos","tag-ciencia-de-datos","tag-clasificacion","tag-data-science","tag-inteligencia-artificial","tag-machine-learning","tag-regresion","tag-regresion-logistica"],"aioseo_notices":[],"author_meta":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"featured_img":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/destacada_regresion_logistica-300x165.png","featured_image_src":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/destacada_regresion_logistica.png","featured_image_src_square":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2018\/10\/destacada_regresion_logistica.png","author_info":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/algoritmos\/\" class=\"advgb-post-tax-term\">Algoritmos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Machine Learning<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Algoritmos<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">Machine Learning<\/span>"]},"tags":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">an\u00e1lisis de datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Ciencia de Datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Data Science<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">machine learning<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">regresi\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">regresi\u00f3n log\u00edstica<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">an\u00e1lisis de datos<\/span>","<span class=\"advgb-post-tax-term\">Ciencia de Datos<\/span>","<span class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">Data Science<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">machine learning<\/span>","<span class=\"advgb-post-tax-term\">regresi\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">regresi\u00f3n log\u00edstica<\/span>"]}},"comment_count":"4","relative_dates":{"created":"Posted 8 years ago","modified":"Updated 1 year ago"},"absolute_dates":{"created":"Posted on October 13, 2018","modified":"Updated on February 20, 2025"},"absolute_dates_time":{"created":"Posted on October 13, 2018 10:08 pm","modified":"Updated on February 20, 2025 1:37 pm"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/479","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/comments?post=479"}],"version-history":[{"count":9,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/479\/revisions"}],"predecessor-version":[{"id":1447,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/479\/revisions\/1447"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media\/481"}],"wp:attachment":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media?parent=479"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/categories?post=479"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/tags?post=479"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}