{"id":1114,"date":"2019-01-17T13:28:24","date_gmt":"2019-01-17T19:28:24","guid":{"rendered":"http:\/\/www.jacobsoft.com.mx\/?p=1114"},"modified":"2025-02-20T13:37:49","modified_gmt":"2025-02-20T19:37:49","slug":"aprendizaje-con-reglas-de-asociacion","status":"publish","type":"post","link":"https:\/\/www.jacobsoft.com.mx\/en\/aprendizaje-con-reglas-de-asociacion\/","title":{"rendered":"Learning with Association Rules"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Learning with Association Rules using Python<\/h2>\n\n\n\n<p>The <strong>learning <\/strong>with <em><strong>association rules<\/strong><\/em> we see it applied mainly in the recommendation systems, as in the case where we are shown that the people who bought this product also bought this one .. or those who saw such a movie also recommend these others, etc.<\/p>\n\n\n\n<p>For this, the algorithm <strong>a priori<\/strong> is one of the most used in this topic and allows to find efficiently sets of <strong><em>frequent items<\/em><\/strong>, which are the basis for generating <strong>association rules<\/strong> between the items.<\/p>\n\n\n\n<p>First identify the <strong>items <\/strong>frequent datasets within the data set and then extend it to a larger set as long as those data sets appear consistently and frequently in accordance with a <strong>threshold <\/strong>settled down.<\/p>\n\n\n\n<p>The algorithm is applied mainly in the analysis of commercial transactions and prediction problems. That is why the algorithm is designed to work with databases that contain transactions such as products or items purchased by consumers, or details about visits to a website, etc.<\/p>\n\n\n\n<p>The way to generate <strong><em>association rules<\/em><\/strong> It consists of two steps:<\/p>\n\n\n\n<div class=\"wp-block-advgb-list\"><ul class=\"advgblist-4b293ceb-87d7-415b-9f63-6b796b9fdd60\"><li><strong>Generation of frequent combinations<\/strong>: whose objective is to find those sets that are frequent in the database. To determine the frequency, a threshold is established.<\/li><li><strong>Generation of rules<\/strong>: Based on the frequent sets, the rules are created based on the ordering of an index that establishes the groups of items or frequent products.<\/li><\/ul><\/div>\n\n\n\n<p>The index for the generation of combinations is called <strong><em>support <\/em><\/strong>and the index for generating rules is called <strong><em>confidence<\/em><\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Algorithm<\/h3>\n\n\n\n<div class=\"wp-block-advgb-list\"><ul class=\"advgblist-a9d774d4-5515-4ca0-8dce-e2c19a0ebbe4\"><li><strong>Step 1<\/strong>. The minimum values for support and confidentiality are established<\/li><li><strong>Step 2<\/strong>. All subsets of transactions that have a support greater than the minimum support value are taken.<\/li><li><strong>Step 3<\/strong>. Take all the rules of these subsets that have a confidence greater than the minimum confidence value.<\/li><li><strong>Step 4.<\/strong> Order the rules in a decreasing way based on the value of the lift.<\/li><\/ul><\/div>\n\n\n\n<p>Si quieres ver el tema en video, checalo aqu\u00ed y suscribete al canal en Youtube.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Aprendizaje con Reglas de Asociaci\u00f3n | Algoritmo Apriori\" width=\"780\" height=\"439\" src=\"https:\/\/www.youtube.com\/embed\/YRhu6yEseh8?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen><\/iframe>\n<\/div><figcaption>Entra a youtube y suscribete al canal<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Example<\/h3>\n\n\n\n<p>If we have a set of 5 transactions with different products in each of them according to the following table<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>1<\/td><td>Bread, milk, diapers<\/td><\/tr><tr><td>2<\/td><td>Bread, diapers, beer, egg<\/td><\/tr><tr><td>3<\/td><td>Milk, diapers, beer, soda, coffee<\/td><\/tr><tr><td>4<\/td><td>Bread, milk, diapers, beer<\/td><\/tr><tr><td>5<\/td><td>Bread, soda, milk, diapers<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The first step is to generate the frequent compilations, and, if we want more than 50% support, then we count the frequency of each of the articles, that is, in how many transactions each of the articles appear.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Article<\/strong><\/td><td><strong>Transactions<\/strong><\/td><\/tr><tr><td>Beer<\/td><td>3<\/td><\/tr><tr><td>Bread<\/td><td>4<\/td><\/tr><tr><td>Soda<\/td><td>2<\/td><\/tr><tr><td>Diapers<\/td><td>5<\/td><\/tr><tr><td>Milk<\/td><td>4<\/td><\/tr><tr><td>Egg<\/td><td>1<\/td><\/tr><tr><td>Coffee<\/td><td>1<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>To calculate the support of each article, we divide the number of transactions of each article, among the total of transactions. That is, for beer we have that appears in 3 of the 5 transactions, then it is 3\/5 = 0.6 which represents 60%. For the rest of the articles we have the following:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Article<\/strong><\/td><td><strong>Support<\/strong><\/td><\/tr><tr><td>Beer<\/td><td>60%<\/td><\/tr><tr><td>Bread<\/td><td>80%<\/td><\/tr><tr><td>Soda<\/td><td>40%<\/td><\/tr><tr><td>Diapers<\/td><td>100%<\/td><\/tr><tr><td>Milk<\/td><td>80%<\/td><\/tr><tr><td>Egg<\/td><td>20%<\/td><\/tr><tr><td>Coffee<\/td><td>20%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Since more than 50% support is required, we eliminate all items below this threshold: refreshment, egg and coffee.<\/p>\n\n\n\n<p>The next step is to generate the combinations with the products that were left to iterate first with combinations of two, calculate the support and then with combinations of 3 and so on.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Sets<\/strong><\/td><td><strong>Frequency<\/strong><\/td><td><strong>Support<\/strong><\/td><\/tr><tr><td>Beer, Bread<\/td><td>2<\/td><td>40%<\/td><\/tr><tr><td>Beer, Diapers<\/td><td>3<\/td><td>60%<\/td><\/tr><tr><td>Beer, Milk<\/td><td>2<\/td><td>40%<\/td><\/tr><tr><td>Bread, diapers<\/td><td>4<\/td><td>80%<\/td><\/tr><tr><td>Bread, Milk<\/td><td>3<\/td><td>60%<\/td><\/tr><tr><td>Diapers, Milk<\/td><td>4<\/td><td>80%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We eliminate those that are below 50% and we are left with the first frequent sets whose support is higher than 50%<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Beer, Diapers<\/td><\/tr><tr><td>Bread, diapers<\/td><\/tr><tr><td>Bread, Milk<\/td><\/tr><tr><td>Diapers, Milk<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>From the generated sets, we create sets of three articles and calculate their support<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Sets<\/strong><\/td><td><strong>Frequency<\/strong><\/td><td><strong>Support<\/strong><\/td><\/tr><tr><td>Beer, Diapers, Bread<\/td><td>2<\/td><td>40%<\/td><\/tr><tr><td>Beer, diapers, milk<\/td><td>2<\/td><td>40%<\/td><\/tr><tr><td>Bread, diapers, milk<\/td><td>3<\/td><td>60%<\/td><\/tr><tr><td>Bread, Milk, Beer<\/td><td>1<\/td><td>20%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In these combinations of three, we only have the set consisting of Bread, Diapers and Milk, which we use to make combinations of 4 items, however for this case, they have 20% support so, here ends the argorithm.<\/p>\n\n\n\n<p>The result showed an element of 3 articles and four of 2 articles:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Bread, diapers, milk<\/td><\/tr><tr><td>Beer, Diapers<\/td><\/tr><tr><td>Bread, diapers<\/td><\/tr><tr><td>Bread, Milk<\/td><\/tr><tr><td>Diapers, Milk<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>From these 5 sets we obtain the association rules, for which we establish that we also want a higher index 50%. This index is the confidence and we calculate it dividing the repetitions of the observations of the set between the repetitions of the rule:<\/p>\n\n\n\n<p>Taking the first set of Bread, Diapers, Milk, the possible rules are:<\/p>\n\n\n\n<div class=\"wp-block-advgb-list\"><ul class=\"advgblist-bc6bbd44-0347-441c-8fcb-b9c6299974cd\"><li> Bread =&gt; Diapers, Milk <\/li><li>Diapers =&gt; Bread, Milk<\/li><li>Milk = Bread, Diapers<\/li><li>Bread, diapers =&gt; Milk<\/li><li>Bread, Milk =&gt; Diapers<\/li><li>Milk, Diapers =&gt; Bread<\/li><\/ul><\/div>\n\n\n\n<p>If we take the first rule: Pan =&gt; Diapers, Milk we observe that in the original transactions that <strong>Bread, diapers, milk <\/strong>appears in 3 transactions and the Pan rule appears in 4 transactions, so the confidence is 3\/4 = 0.75, which is 75%<\/p>\n\n\n\n<p>For the rule formed by: Bread, Diapers =&gt; Milk we have the combination Bread, Diapers, Milk appears in 3 transactions and the rule Diapers, Milk in 4 transactions so your confidence is 75% too, that is 3\/4 = 0.75<\/p>\n\n\n\n<p>Once we calculate the confidence of all the rules, we order them from highest to lowest based on that calculated confidence and we obtain the association rules for the whole set, which is how the algorithm works <strong>A priori<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Priori with Python<\/h2>\n\n\n\n<p>For the example with <strong>python <\/strong>We will use a business transaction data set called: <strong><em>Market_Basket_Optimisation.csv<\/em><\/strong> with 7,501 records or transactions, each of which contains one or more products of a supermarket:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"887\" height=\"532\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_dataset.png\" alt=\"\" class=\"wp-image-1131\" srcset=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_dataset.png 887w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_dataset-300x180.png 300w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_dataset-768x461.png 768w, https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_dataset-100x60.png 100w\" sizes=\"auto, (max-width: 887px) 100vw, 887px\" \/><figcaption>Business transaction data set<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-file\"><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/Market_Basket_Optimisation.csv\">Market_Basket_Optimization<\/a><a href=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/Market_Basket_Optimisation.csv\" class=\"wp-block-file__button\" download>Download<\/a><\/div>\n\n\n\n<iframe loading=\"lazy\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori.html\" width=\"99%\" height=\"900\" frameborder=\"0\" scrolling=\"auto\"><\/iframe>\n\n\n\n<p>We observe the resulting rules with 2, 3 or more items that imply another group of products and we also have the support, the confidence and the lift.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apriori Class<\/h3>\n\n\n\n<p>The a priori class used in the previous implementation is the following:<\/p>\n\n\n\n<iframe loading=\"lazy\" src=\"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/apriori_class.html\" width=\"99%\" height=\"900\" frameborder=\"0\" scrolling=\"auto\"><\/iframe>\n\n\n\n<p>Both files must be in the same folder in order to use the class in the script that creates the association rules.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div id=\"amzn-assoc-ad-eeed538b-d6e2-4233-ac92-c9505f45495c\"><\/div><script async=\"\" src=\"\/\/z-na.amazon-adsystem.com\/widgets\/onejs?MarketPlace=US&amp;adInstanceId=eeed538b-d6e2-4233-ac92-c9505f45495c\"><\/script>\n\n\n\n<p><\/p>\n<style class=\"advgb-styles-renderer\">.advgblist-4b293ceb-87d7-415b-9f63-6b796b9fdd60 li { font-size: 16px; margin-left: 20px }.wp-block-advgb-list ul.advgblist-4b293ceb-87d7-415b-9f63-6b796b9fdd60 > li{font-size:16px;}.advgblist-a9d774d4-5515-4ca0-8dce-e2c19a0ebbe4 li { font-size: 16px; margin-left: 20px }.wp-block-advgb-list ul.advgblist-a9d774d4-5515-4ca0-8dce-e2c19a0ebbe4 > li{font-size:16px;}.advgblist-bc6bbd44-0347-441c-8fcb-b9c6299974cd li { font-size: 16px; margin-left: 20px }.wp-block-advgb-list ul.advgblist-bc6bbd44-0347-441c-8fcb-b9c6299974cd > li{font-size:16px;}<\/style>","protected":false},"excerpt":{"rendered":"<p>Aprendizaje con Reglas de Asociaci\u00f3n usando Python El aprendizaje con reglas de asociaci\u00f3n lo vemos &hellip; <\/p>","protected":false},"author":2,"featured_media":1115,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[25,35,46],"tags":[101,57,55,58,56,82,50,59,61],"class_list":["post-1114","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-algoritmos","category-inteligencia-artificial","category-machine-learning","tag-aprendizaje","tag-ciencia-de-datos","tag-clasificacion","tag-data-mining","tag-data-science","tag-inteligencia-artificial","tag-machine-learning","tag-mineria-de-datos","tag-python"],"aioseo_notices":[],"author_meta":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"featured_img":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/destacada_apriori-300x165.png","featured_image_src":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/destacada_apriori.png","featured_image_src_square":"https:\/\/www.jacobsoft.com.mx\/wp-content\/uploads\/2019\/01\/destacada_apriori.png","author_info":{"display_name":"Jacob Avila Camacho","author_link":"https:\/\/www.jacobsoft.com.mx\/en\/author\/jacob-avila\/"},"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/algoritmos\/\" class=\"advgb-post-tax-term\">Algoritmos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Machine Learning<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Algoritmos<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">Machine Learning<\/span>"]},"tags":{"linked":["<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">aprendizaje<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Ciencia de Datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Data Mining<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Data Science<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Inteligencia Artificial<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">machine learning<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Miner\u00eda de Datos<\/a>","<a href=\"https:\/\/www.jacobsoft.com.mx\/en\/category\/inteligencia-artificial\/machine-learning\/\" class=\"advgb-post-tax-term\">Python<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">aprendizaje<\/span>","<span class=\"advgb-post-tax-term\">Ciencia de Datos<\/span>","<span class=\"advgb-post-tax-term\">clasificaci\u00f3n<\/span>","<span class=\"advgb-post-tax-term\">Data Mining<\/span>","<span class=\"advgb-post-tax-term\">Data Science<\/span>","<span class=\"advgb-post-tax-term\">Inteligencia Artificial<\/span>","<span class=\"advgb-post-tax-term\">machine learning<\/span>","<span class=\"advgb-post-tax-term\">Miner\u00eda de Datos<\/span>","<span class=\"advgb-post-tax-term\">Python<\/span>"]}},"comment_count":"4","relative_dates":{"created":"Posted 7 years ago","modified":"Updated 1 year ago"},"absolute_dates":{"created":"Posted on January 17, 2019","modified":"Updated on February 20, 2025"},"absolute_dates_time":{"created":"Posted on January 17, 2019 1:28 pm","modified":"Updated on February 20, 2025 1:37 pm"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/1114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/comments?post=1114"}],"version-history":[{"count":27,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/1114\/revisions"}],"predecessor-version":[{"id":1769,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/posts\/1114\/revisions\/1769"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media\/1115"}],"wp:attachment":[{"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/media?parent=1114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/categories?post=1114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jacobsoft.com.mx\/en\/wp-json\/wp\/v2\/tags?post=1114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}