{"id":1817,"date":"2020-04-10T16:22:44","date_gmt":"2020-04-10T13:22:44","guid":{"rendered":"http:\/\/beeeye.com\/?p=1817"},"modified":"2021-04-10T23:59:47","modified_gmt":"2021-04-10T23:59:47","slug":"weight-of-evidence-woe-implementation","status":"publish","type":"post","link":"http:\/\/beeeye.com\/weight-of-evidence-woe-implementation\/","title":{"rendered":"Guest blog: Weight of Evidence, Information Value, and Population Stability Index: Background and implementation notes"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"background-color: rgba(255,255,255,0);background-position: center center;background-repeat: no-repeat;border-width: 0px 0px 0px 0px;border-color:#eae9e9;border-style:solid;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start\" style=\"max-width:105%;margin-left: calc(-5% \/ 2 );margin-right: calc(-5% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\" style=\"background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;padding: 0px 0px 0px 0px;\"><div class=\"fusion-text fusion-text-1\"><\/p>\n<p>We are thrilled to have a guest post in <a href=\"http:\/\/beeeye.com\/blog\/\">our blog<\/a> by <strong><a href=\"https:\/\/www.saferanalytics.ai\/\">Dr. Hershel Safer<\/a><\/strong>. Dr.Safer is an expert in taking the most advanced mathematics, statistics and machine learning techniques and generating the most <a href=\"http:\/\/beeeye.com\/why-2019-is-the-year-to-move-to-al-based-credit-scoring-modeling\/\">robust credit risk models<\/a> possible. Through many years of experience in research and development of models, Dr.Safer has developed a set of guidelines which prove essential when developing new credit risk models. In this guest blog post, we&#8217;d like to share his cookbook for the development of some of the most basic functions used in modelling: Weight of Evidence (WOE), Information Value (IV) and PSI (Population Stability Index).<\/p>\n<h2 data-fontsize=\"36\" data-lineheight=\"43.2px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:36; line-height: 1.2;\">Introduction<\/h2>\n<p>In credit risk modelling, as in other fields where a predictive model is built based on raw historical data, preparing the data for the training is the most crucial stage in the creation of a strong model. The statistical nature of many raw features is often not perfectly aligned with the requirements of various training algorithms and may result in inferior models. Thus, preparing the data properly will yield stronger results.<\/p>\n<p>In credit risk, as in other areas of financial and behavioral modelling, certain scenarios occur repeatedly. Applying the appropriate data transformations can yield vastly improved results. This post explores several such functions along with the relevant mathematical and statistical details. The post can provide a solid foundation for understanding, implementing, and using these functions in your modelling projects.<\/p>\n<p>In this post, the term \u201ccharacteristic\u201d means \u201cvariable\u201d or \u201cfeature.\u201d An \u201cattribute\u201d is a specific value taken by a characteristic. This terminology is not universal in machine learning, but it is common in the credit risk literature.<\/p>\n<h2 data-fontsize=\"36\" data-lineheight=\"43.2px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:36; line-height: 1.2;\">Weight of Evidence (WOE)<\/h2>\n<p>Weight of Evidence (WOE) is used to assess the predictive value of individual attribute values of a characteristic.<\/p>\n<p>Suppose that the sample has <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"11\" style=\"vertical-align: 0px;\"\/> negative instances and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> positives, with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-00add6440335307d41e78b891ab47bfe_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"17\" style=\"vertical-align: -6px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b3c8f7d5fd7ab0b63dcc4e57106e37f9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"16\" style=\"vertical-align: -6px;\"\/> being the numbers of negative and positive instances with attribute <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-43c82d5bb00a7568d935a12e3bd969dd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"9\" style=\"vertical-align: -4px;\"\/>. A common way to represent the data for a characteristic with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-3422b6bb5c160593658b7c39425d9880_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#107;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/> attributes is a <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-6d809f359c17aa7d153d46aba7698f04_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#107;&#32;&#116;&#105;&#109;&#101;&#115;&#32;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"62\" style=\"vertical-align: 0px;\"\/> table. Each row corresponds to an attribute and each column to a value of <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-82606c3098bb09002088b0f6f9ffbb2a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#89;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> (0 or 1). Each cell contains the number of observations with the corresponding attribute and target values.<\/p>\n<p>The WOE for attribute <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-43c82d5bb00a7568d935a12e3bd969dd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"9\" style=\"vertical-align: -4px;\"\/> is <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-48169663c9c0af55e5b6bd980d7d6b37_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#119;&#95;&#106;&#32;&#61;&#32;&#108;&#110;&#32;&#108;&#101;&#102;&#116;&#40;&#110;&#105;&#99;&#101;&#102;&#114;&#97;&#99;&#123;&#102;&#114;&#97;&#99;&#123;&#110;&#95;&#106;&#125;&#123;&#110;&#125;&#125;&#123;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#106;&#125;&#123;&#112;&#125;&#125;&#32;&#114;&#105;&#103;&#104;&#116;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"338\" style=\"vertical-align: -6px;\"\/>. This can be rewritten as <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-20613c96eedc7c8ec2a0ab9f38f8d167_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#119;&#95;&#106;&#32;&#61;&#32;&#108;&#110;&#32;&#108;&#101;&#102;&#116;&#40;&#32;&#110;&#105;&#99;&#101;&#102;&#114;&#97;&#99;&#123;&#110;&#95;&#106;&#125;&#123;&#112;&#95;&#106;&#125;&#32;&#114;&#105;&#103;&#104;&#116;&#41;&#32;&#45;&#32;&#108;&#110;&#32;&#108;&#101;&#102;&#116;&#40;&#32;&#110;&#105;&#99;&#101;&#102;&#114;&#97;&#99;&#123;&#110;&#125;&#123;&#112;&#125;&#32;&#114;&#105;&#103;&#104;&#116;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"457\" style=\"vertical-align: -6px;\"\/>; this highlights WOE as being the difference between the log odds of the attribute and the population log odds. Attributes with log odds close to that of the population have little WOE.<\/p>\n<p>The Weight of Evidence (WOE) transformation replaces each attribute with a risk value. When <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-925c87c401f0137ee5260d93f63081b3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#119;&#95;&#106;&#32;&#62;&#32;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"52\" style=\"vertical-align: -6px;\"\/>, the probability of observing <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-cadf699a97317776c22818c57b862fab_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#89;&#61;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"47\" style=\"vertical-align: 0px;\"\/> for instances with attribute <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-43c82d5bb00a7568d935a12e3bd969dd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"9\" style=\"vertical-align: -4px;\"\/> is above average for the sample, and vice versa for <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-0c539087f5e89fdeb001668606c7d5d5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#119;&#95;&#106;&#32;&#60;&#32;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"52\" style=\"vertical-align: -6px;\"\/>. WOE also standardizes each characteristic, so the parameters in logistic regression can be directly compared.<\/p>\n<h4 data-fontsize=\"24\" data-lineheight=\"28.8px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:24; line-height: 1.2; --minFontSize:24;\">Implementation notes:<\/h4>\n<ul>\n<li>Bin continuous values so that each attribute has approximately the same number of observations. Alternatively, use a decision tree to select the bins borders for each continuous variable.<\/li>\n<li>Put missing values in a separate row and treat them as another attribute.<\/li>\n<li>WOE is undefined for any row that has a zero in a cell. Change the 0 counts to 1; this small change to the data allows WOE to be calculated for all rows. An alternative is to add 0.5 to every bin.<\/li>\n<li>Feature values with similar weight of evidence are sometimes merged (coarse classing). For continuous or other ordered variables, only adjacent classes should be combined.<\/li>\n<li>The <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-414b36a0b1b31beb7b9df3590233f2bf_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;&#105;&#99;&#101;&#102;&#114;&#97;&#99;&#123;&#110;&#95;&#106;&#125;&#123;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"98\" style=\"vertical-align: -6px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-afbc3b60259c4005e0a33b24a149b37b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;&#105;&#99;&#101;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#106;&#125;&#123;&#112;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"94\" style=\"vertical-align: -6px;\"\/> values are fractions within the corresponding column.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/towardsdatascience.com\/logistic-regression-detailed-overview-46c4da4303bc\">Logistic regression<\/a> tries to predict the conditional logit or conditional log-odds of <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-33e6939e41cb6af01834f1bff74a68fb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#123;&#88;&#95;&#106;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"142\" style=\"vertical-align: -6px;\"\/>. The conditional logit can be written as the sum of the sample log-odds and the log-density ratio; the latter is the WOE.<\/p>\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-13f5eae3579d249f09208d5ee8930eab_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#123;&#88;&#95;&#106;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#123;&#88;&#95;&#106;&#125;&#125;&#32;&#61;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#125;&#32;&#43;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#68;&#101;&#110;&#115;&#105;&#116;&#121;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#68;&#101;&#110;&#115;&#105;&#116;&#121;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#48;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"40\" width=\"625\" style=\"vertical-align: -6px;\"\/><\/p>\n<\/p>\n<p>Since the sample log-odds is constant, logistic regression effectively tries to predict the WOE.<\/p>\n<\/p>\n<p>The naive Bayes model can be written<\/p>\n<\/p>\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-c826aa7a48148c427f24f3e40dc04cb9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#123;&#88;&#95;&#49;&#44;&#32;&#108;&#100;&#111;&#116;&#115;&#44;&#32;&#88;&#95;&#112;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#123;&#88;&#95;&#49;&#44;&#32;&#108;&#100;&#111;&#116;&#115;&#44;&#32;&#88;&#95;&#112;&#125;&#125;&#32;&#61;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#125;&#32;&#43;&#32;&#115;&#117;&#109;&#95;&#123;&#106;&#61;&#49;&#125;&#94;&#112;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#48;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"42\" width=\"632\" style=\"vertical-align: -8px;\"\/><\/p>\n<\/p>\n<p>So the conditional logit equals the sum of the individual WOE vectors.<\/p>\n<\/p>\n<p>A semi-naive model relaxes the assumption that the predictors are independent:<\/p>\n<\/p>\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b0a17295d1034ec37f2c551e67f2bb89_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#123;&#88;&#95;&#49;&#44;&#32;&#108;&#100;&#111;&#116;&#115;&#44;&#32;&#88;&#95;&#112;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#123;&#88;&#95;&#49;&#44;&#32;&#108;&#100;&#111;&#116;&#115;&#44;&#32;&#88;&#95;&#112;&#125;&#125;&#32;&#61;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#112;&#114;&#111;&#98;&#123;&#89;&#61;&#48;&#125;&#125;&#32;&#43;&#32;&#115;&#117;&#109;&#95;&#123;&#106;&#61;&#49;&#125;&#94;&#112;&#32;&#98;&#101;&#116;&#97;&#95;&#106;&#32;&#108;&#110;&#32;&#102;&#114;&#97;&#99;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#49;&#125;&#125;&#123;&#99;&#111;&#110;&#100;&#80;&#114;&#111;&#98;&#123;&#88;&#95;&#106;&#125;&#123;&#89;&#61;&#48;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"42\" width=\"671\" style=\"vertical-align: -8px;\"\/><\/p>\n<\/p>\n<p>The individual WOE vectors are estimated separately, and the <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-593e875da420f7abf029d12577bb1720_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;&#101;&#116;&#97;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"37\" style=\"vertical-align: -6px;\"\/> coefficients are scalars.<\/p>\n<h2 data-fontsize=\"36\" data-lineheight=\"43.2px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:36; line-height: 1.2;\">Information Value<\/h2>\n<p>WOE describes the relationship between an attribute value and a binary target variable; the Information Value (IV) measures the predictive power of a characteristic, i.e., to what extent it can be used to separate observations with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-22d3f4a60d27b65298ee641165fd7142_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#89;&#61;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"46\" style=\"vertical-align: 0px;\"\/> from those with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-cadf699a97317776c22818c57b862fab_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#89;&#61;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"47\" style=\"vertical-align: 0px;\"\/>. IV is a weighted sum of the WOE values:<\/p>\n<\/p>\n<p>WOE considers only the relative risk of each bin, without regard to the proportion of observations in the bin. The terms used to compute Information Value can be used to assess the relative contribution of each bin.<\/p>\n<\/p>\n<p>IV is always non-negative, and higher values indicate that the value is more informative. Features with information value less than 0.02 are not useful for prediction, 0.02&#8211;0.1 are weakly predictive, 0.3&#8211;0.5 are highly predictive, and greater than 0.5 are too good to be true. That said, features that are weak on their own may be useful in combination with other features. IV is sensitive to the binning of continuous values and to the total number of groups. It does not have an associated statistical test, so variables are often selected based on the information value or the chi-squared test, but the Gini coefficient is used for the scorecard.<\/p>\n<h4 data-fontsize=\"24\" data-lineheight=\"28.8px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:24; line-height: 1.2; --minFontSize:24;\">Population Stability Index<\/h4>\n<p>The Population Stability Index (PSI) measures a shift in distributions of a measure across population groups, similar to the chi-squared statistic. A common use in credit risk is to measure the drift between two times. The formulation is the same as for IV, but with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"11\" style=\"vertical-align: 0px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-00add6440335307d41e78b891ab47bfe_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"17\" style=\"vertical-align: -6px;\"\/> referring to the new observation, and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b3c8f7d5fd7ab0b63dcc4e57106e37f9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"16\" style=\"vertical-align: -6px;\"\/> referring to the expected values (previous time or development sample).<\/p>\n<p>A larger PSI indicates a larger shift in the distribution compared to the benchmark, but does not say anything about the direction of the shift. Values less than 0.1 indicate little drift, between 0.1 and 0.25 indicates moderate drift (cause for concern), and greater than 0.25 indicates large drift (possible problems).<\/p>\n<\/div><\/div><style type=\"text\/css\">.fusion-body .fusion-builder-column-0{width:100% !important;margin-top : 0px;margin-bottom : 0px;}.fusion-builder-column-0 > .fusion-column-wrapper {padding-top : 0px !important;padding-right : 0px !important;margin-right : 2.375%;padding-bottom : 0px !important;padding-left : 0px !important;margin-left : 2.375%;}@media only screen and (max-width:1024px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 2.375%;margin-left : 2.375%;}}@media only screen and (max-width:640px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 2.375%;margin-left : 2.375%;}}<\/style><\/div><\/div><style type=\"text\/css\">.fusion-body .fusion-flex-container.fusion-builder-row-1{ padding-top : 0px;margin-top : 0px;padding-right : 30px;padding-bottom : 0px;margin-bottom : 0px;padding-left : 30px;}<\/style><\/div><\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"http:\/\/test.beeeye.com\/wp-content\/uploads\/2021\/04\/iv_formula.png\" alt=\"Information Value (IV) forumla\"><\/p>\n<p><\/p>\n<p>WOE considers only the relative risk of each bin, without regard to the proportion of observations in the bin. The terms used to compute Information Value can be used to assess the relative contribution of each bin.<\/p>\n<p><!-- wp:paragraph --><\/p>\n<p>IV is always non-negative, and higher values indicate that the value is more informative. Features with information value less than 0.02 are not useful for prediction, 0.02&#8211;0.1 are weakly predictive, 0.3&#8211;0.5 are highly predictive, and greater than 0.5 are too good to be true. That said, features that are weak on their own may be useful in combination with other features. IV is sensitive to the binning of continuous values and to the total number of groups. It does not have an associated statistical test, so variables are often selected based on the information value or the chi-squared test, but the Gini coefficient is used for the scorecard.<\/p>\n<h4 data-fontsize=\"24\" data-lineheight=\"28.8px\" class=\"fusion-responsive-typography-calculated\" style=\"--fontSize:24; line-height: 1.2; --minFontSize:24;\">Population Stability Index<\/h4>\n<p>The Population Stability Index (PSI) measures a shift in distributions of a measure across population groups, similar to the chi-squared statistic. A common use in credit risk is to measure the drift between two times. The formulation is the same as for IV, but with <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"11\" style=\"vertical-align: 0px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-00add6440335307d41e78b891ab47bfe_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"17\" style=\"vertical-align: -6px;\"\/> referring to the new observation, and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-3bf85f1087e9fbed3a319341134ac1a2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> and <img decoding=\"async\" src=\"http:\/\/beeeye.com\/wp-content\/ql-cache\/quicklatex.com-b3c8f7d5fd7ab0b63dcc4e57106e37f9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#106;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"16\" style=\"vertical-align: -6px;\"\/> referring to the expected values (previous time or development sample).<\/p>\n<p>A larger PSI indicates a larger shift in the distribution compared to the benchmark, but does not say anything about the direction of the shift. Values less than 0.1 indicate little drift, between 0.1 and 0.25 indicates moderate drift (cause for concern), and greater than 0.25 indicates large drift (possible problems).<\/p>[\/fusion_text][\/fusion_builder_column][\/fusion_builder_row][\/fusion_builder_container]","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":3387,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[12],"tags":[34,35,36,37,38,39],"_links":{"self":[{"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/posts\/1817"}],"collection":[{"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/comments?post=1817"}],"version-history":[{"count":8,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/posts\/1817\/revisions"}],"predecessor-version":[{"id":3401,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/posts\/1817\/revisions\/3401"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/media\/3387"}],"wp:attachment":[{"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/media?parent=1817"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/categories?post=1817"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/beeeye.com\/wp-json\/wp\/v2\/tags?post=1817"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}