Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

, '
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

],\n ['\\\\(', '\\\\)']\n ],\n processEscapes: true\n }\n });\n\n MathJax.Hub.Typeset()\n\n }])\n\u003c\/script\u003e","has_subscription_code_before":false,"has_subscription_code":false,"show_amp":true,"show_more_posts_with":"popup","used_disqus_comments_before":false,"show_rss":true,"enable_comments":true,"footer_custom_code":"","show_subscription_form":true,"hide_new_blog_tips":true},"isPro":true,"isV4":true,"forcedLocale":"en","userId":174108,"membership":"pro","theme":{"id":10,"css_file":"themes/fresh/main","color_list":"","created_at":"2012-08-15T19:55:05.697-07:00","updated_at":"2018-04-10T19:58:56.562-07:00","display_name":"Fresh","default_slide_list":"104","navbar_file":"fresh/navbar","footer_file":"fresh/footer","name":"fresh","thumb_image":"themes/fresh/icon.png","use_background_image":false,"demo_page_id":2002,"type_mask":1,"data_page_id":3016,"is_new":false,"priority":10,"header_file":"fresh/header","data":"{\"menu\":{\"type\":\"Menu\",\"components\":{\"logo\":{\"type\":\"Image\",\"image_type\":\"small\",\"url\":\"/images/defaults/default_logo.png\"},\"title\":{\"type\":\"RichText\",\"value\":\"Title Text\",\"text_type\":\"title\"},\"power_button\":{\"type\":\"Image\",\"image_type\":\"small\",\"url\":\"/images/themes/fresh/power.png\"}}}}","name_with_v4_fallback":"fresh"},"permalink":"data-cowboys","subscriptionPlan":"pro_yearly","subscriptionPeriod":"yearly","isOnTrial":false,"customColors":{"type":"CustomColors","id":"f_10731d12-f244-40ab-bda4-b83cb62bb89c","defaultValue":null,"active":true,"highlight1":null,"highlight2":null},"animations":{"type":"Animations","id":"f_4b94b139-5785-4649-8e36-2b255ae2318d","defaultValue":null,"page_scroll":"none","background":"parallax","image_link_hover":"none"},"s5Theme":{"type":"Theme","id":"f_1f1611a8-fa62-46cd-8b1c-42254169093d","version":"10","nav":{"type":"NavTheme","id":"f_f37195e2-ea4d-4ef0-b6f2-035854051a76","name":"topBar","layout":"a","padding":"medium","sidebarWidth":"small","topContentWidth":"full","horizontalContentAlignment":"left","verticalContentAlignment":"top","fontSize":"medium","backgroundColor1":"#dddddd","highlightColor":null,"presetColorName":"transparent","itemSpacing":"compact","dropShadow":"no","socialMediaListType":"link","isTransparent":true,"isSticky":true,"showSocialMedia":false,"highlight":{"type":"underline","textColor":null,"blockTextColor":null,"blockBackgroundColor":null,"blockShape":"pill","id":"f_11112f65-9d14-4512-91fa-a8a7f13b3365"},"border":{"enable":false,"borderColor":"#000","position":"bottom","thickness":"small"},"socialMedia":[],"socialMediaButtonList":[{"type":"Facebook","id":"d5d40f68-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"Twitter","id":"d5d40f69-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"LinkedIn","id":"d5d40f6a-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"Pinterest","id":"d5d40f6b-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false}],"socialMediaContactList":[{"type":"SocialMediaPhone","id":"d5d40f6e-9e33-11ef-955f-15ccbf3d509c","defaultValue":"","className":"fas fa-phone-alt"},{"type":"SocialMediaEmail","id":"d5d40f6f-9e33-11ef-955f-15ccbf3d509c","defaultValue":"","className":"fas fa-envelope"}]},"section":{"type":"SectionTheme","id":"f_dca510a1-da18-494e-80e8-68691a8754ca","padding":"normal","contentWidth":"full","contentAlignment":"center","baseFontSize":null,"titleFontSize":null,"subtitleFontSize":null,"itemTitleFontSize":null,"itemSubtitleFontSize":null,"textHighlightColor":null,"baseColor":null,"titleColor":null,"subtitleColor":null,"itemTitleColor":null,"itemSubtitleColor":null,"textHighlightSelection":{"type":"TextHighlightSelection","id":"f_09d5d3aa-2b97-44be-a1b9-c7a542c1dc39","title":false,"subtitle":true,"itemTitle":false,"itemSubtitle":true}},"firstSection":{"type":"FirstSectionTheme","id":"f_d7d2158c-1fb7-4f6a-b207-d4025ba4e365","height":"normal","shape":"none"},"button":{"type":"ButtonTheme","id":"f_d24e411e-decf-43e2-ab4a-4cbb15e28ec3","backgroundColor":"#000000","shape":"square","fill":"solid"}},"id":217121,"headingFont":"","titleFont":"bebas neue","bodyFont":"work sans","usedWebFontsNormalized":"Work+Sans:400,600,700|Varela+Round:regular","showAmp":true,"subscribersCount":13,"templateVariation":"default","showStrikinglyLogo":false,"multiPage":true,"sectionLayout":"one-smallCircle-long-none","siteName":"Data Cowboys: Machine Learning \u0026 AI Consulting","siteRollouts":{"custom_code":true,"pro_sections":true,"pro_apps":true,"multi_pages":false,"google_analytics":true,"strikingly_analytics":true,"manually_checked":false,"custom_form":false,"popup":null,"membership_feature":false,"custom_ads":true},"pageCustomDomain":"www.data-cowboys.com","pagePublicUrl":"https:\/\/www.data-cowboys.com\/","googleAnalyticsTracker":"G-5DWSBZVFVF","googleAnalyticsType":"ga4","facebookPixelId":"","gaTrackingId":"UA-25124444-6","errorceptionKey":"\"518ac810441fb4b7180002fa\"","keenioProjectId":"5317e03605cd66236a000002","keenioWriteKey":"efd460f8e282891930ff1957321c12b64a6db50694fd0b4a01d01f347920dfa3ce48e8ca249b5ea9917f98865696cfc39bc6814e4743c39af0a4720bb711627d9cf0fe63d5d52c3866c9c1c3178aaec6cbfc1a9ab62a3c9a827d2846a9be93ecf4ee3d61ebee8baaa6a1d735bff6e37b","wechatMpAccountId":null,"blogSubscriptionUrl":"\/show_iframe_component\/873380","chatSettings":null,"showNav":null,"hideNewBlogTips":true,"connectedSites":[]},"content":{"type":"Blog.BlogData","id":"f_8f0e068c-840d-4121-9177-943393ef744f","defaultValue":null,"showComments":true,"showShareButtons":null,"header":{"type":"Blog.Header","id":"f_44ffdd14-292c-42e9-8785-e8c8a20ef3c6","defaultValue":null,"title":{"type":"Blog.Text","id":"f_e4a722ff-28c3-43e5-9c98-7b69ad229041","defaultValue":false,"value":"\u003cp\u003eSHAP Values and Feature Variance\u003c\/p\u003e","backupValue":null,"version":1},"subTitle":{"type":"Blog.Text","id":"f_d1c4bebb-6599-49e0-8697-0d196e8e7bae","defaultValue":false,"value":"\u003cp\u003eSergey Feldman\u003c\/p\u003e","backupValue":null,"version":1},"backgroundImage":{"type":"Blog.BackgroundImage","id":"f_547de6ff-5501-47ed-b616-184a6028c2c4","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/countour_kcct6m","storage":"s","format":"png","h":1418,"w":2048,"s":233485,"useImage":true,"noCompression":null,"focus":{},"linkInputEnabled":null,"descriptionInputEnabled":null}},"footer":{"type":"Blog.Footer","id":"f_21997036-01af-4d18-833d-58868b8134bb","defaultValue":null,"comment":{"type":"Blog.Comment","id":"f_3843dc89-1584-43e1-ab4d-709ff5aafff7","defaultValue":null,"shortName":""},"shareButtons":{"type":"Blog.ShareButtons","id":"f_d3f368e7-188c-498a-9280-7e3182f44c79","defaultValue":false,"list_type":"link","button_list":[{"type":"Facebook","id":"f_8613353d-e0be-443f-95b4-33ba0102ecb8","defaultValue":null,"url":"","link_url":null,"share_text":null,"app_id":null,"show_button":true},{"type":"Twitter","id":"f_5ef94e2d-e221-43e9-9845-7e6f52782ded","defaultValue":null,"url":"","link_url":null,"share_text":null,"show_button":true},{"type":"GPlus","id":"f_4234c4de-1e85-4ed7-8c8d-4b21898d6d29","defaultValue":null,"url":"","link_url":null,"share_text":null,"show_button":true},{"type":"LinkedIn","id":"f_2eac30b3-853f-4323-b817-15d0955f5d14","defaultValue":null,"url":"","link_url":null,"share_text":null,"show_button":false},{"type":"Pinterest","id":"f_4429931c-cacc-40fb-b4af-bbfd9cb4d98c","defaultValue":null,"url":"","link_url":null,"share_text":null,"show_button":false}]}},"sections":[{"type":"Blog.Section","id":"f_aa27b906-a966-4484-8561-348625436c19","defaultValue":null,"component":{"type":"Blog.Title","id":"f_348897cf-503a-42ae-8e53-329d1bd21eca","defaultValue":false,"value":"\u003cp\u003eIntepretability is a Good Idea\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_195c8870-5d8e-47e8-8057-70e9b965e533","defaultValue":null,"component":{"type":"RichText","id":"f_19cfdb11-1437-4a95-9ee0-cbaa9577e57b","defaultValue":false,"value":"\u003cp\u003eMy machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable\/interpretable machine learning become a more popular research direction in 2016\/2017, the end-product of my ML analyses often looked like this:\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_effc7ab9-8825-403d-9e3b-2b7dcf86d6ef","defaultValue":null,"component":{"type":"Image","id":"f_95776c25-4aa2-4e23-ab26-f273beafb323","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/969042_41783","storage":"s","storagePrefix":null,"format":"png","h":192,"w":626,"s":44635,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_2572d734-67fb-44b6-a879-b3e3726c9224","defaultValue":null,"component":{"type":"RichText","id":"f_6f16eaaa-099c-470a-9955-a5f1def46f75","defaultValue":false,"value":"\u003cp\u003eIn other words, I thought demonstrating the success \u0026amp; importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I \u003cem\u003ealso\u003c\/em\u003e show a SHAP values plot:\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_a6c3365e-5667-4a73-83dd-be28aacf3ff6","defaultValue":null,"component":{"type":"Image","id":"f_32bf2136-f128-4db0-ba5a-21d686d8c36a","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/975564_674239","storage":"s","storagePrefix":null,"format":"png","h":930,"w":1086,"s":88685,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_9727c3f0-7310-412c-b2e8-ea5090f93d6a","defaultValue":null,"component":{"type":"RichText","id":"f_c21311c5-efe3-46d5-835b-87ce8664898c","defaultValue":null,"value":"\u003cp\u003eThis image is taken directly from the \u003ca target=\"_blank\" href=\"https:\/\/github.com\/slundberg\/shap\"\u003eSHAP Github repository\u003c\/a\u003e. There are \u003ca target=\"_blank\" href=\"http:\/\/papers.nips.cc\/paper\/7062-a-unified-approach-to-interpreting-model-predictions\"\u003eplenty\u003c\/a\u003e of \u003ca target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1905.04610\"\u003epapers\u003c\/a\u003e and \u003ca target=\"_blank\" href=\"https:\/\/medium.com\/@gabrieltseng\/interpreting-complex-models-with-shap-values-1c187db6ec83\"\u003eother\u003c\/a\u003e \u003ca target=\"_blank\" href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/shap.html\"\u003esources\u003c\/a\u003e explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature\/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.\u003c\/p\u003e","backupValue":null,"version":null}},{"type":"Blog.Section","id":"f_c57ecb06-b20c-447b-b5ad-f853a3a47407","defaultValue":null,"component":{"type":"RichText","id":"f_2797bc62-b159-49bf-b212-4a7cb2df26eb","defaultValue":false,"value":"\u003cp\u003eBy the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_97fdd3a7-3926-4c7e-9c14-c44819eb4656","defaultValue":null,"component":{"type":"Blog.Title","id":"f_d0d5c540-751f-48e8-9613-bb997e5d5f86","defaultValue":false,"value":"\u003cp\u003eReasons for Small SHAP Values\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_6b52ec90-fa8e-4bd7-83cc-4ce3dcb8b7fc","defaultValue":null,"component":{"type":"RichText","id":"f_e3472e90-4504-45cf-a4a5-7129eec06d69","defaultValue":false,"value":"\u003cp\u003eWhen looking at the SHAP value plots, what might be some reasons that certain variables\/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:\u003c\/p\u003e\u003cul\u003e\u003cli\u003eThe variable is measured in a noisy way.\u003c\/li\u003e\u003cli\u003eThe variable is not that causally related to the outcome of interest.\u003c\/li\u003e\u003cli\u003eThe variable is highly correlated with another variable that happens to be more predictive of the outcome.\u003c\/li\u003e\u003c\/ul\u003e\u003cp\u003eSome of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_c34783fe-7cbf-4aef-926e-b78df587b3da","defaultValue":null,"component":{"type":"RichText","id":"f_6177c381-b4a0-40af-a49c-72e57f32359e","defaultValue":false,"value":"\u003cp\u003eRecently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_74ce1194-6a86-42d0-a67b-275a359265d6","defaultValue":null,"component":{"type":"Blog.Title","id":"f_d27147ad-9154-4cfe-b7df-e94db769e4e6","defaultValue":false,"value":"\u003cp\u003eExperimental Setup\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_157a0357-e939-45b4-8c7c-74c07532610d","defaultValue":null,"component":{"type":"RichText","id":"f_81050059-bf83-4ddd-abeb-21ba824d978a","defaultValue":null,"value":"\u003cp\u003eHere's some Python code to generate the data:\u003c\/p\u003e","backupValue":null,"version":null}},{"type":"Blog.Section","id":"f_e6c974c4-3e4e-42d6-981d-b96f537cdadd","defaultValue":null,"component":{"type":"HtmlComponent","id":2925990,"defaultValue":false,"value":"\u0026lt;div align=\"left\"\u0026gt;\n\u0026lt;pre class=\"prettyprint linenums\"\u0026gt;\nimport numpy as np\nimport shap\nfrom sklearn.ensemble import RandomForestRegressor\nimport matplotlib.pyplot as plt\n\ndef generate_data(p, n):\n # generate X\n x123 = np.random.randn(n, 3)\n x4 = np.random.binomial(1, p, size=n)\n X = np.hstack((x123, x4.reshape(-1, 1)))\n # generate y\n noise = np.random.rand(X.shape[0]) \/ 10\n a = np.abs(X[:, 0]) \/ 2 # normal input\n b = np.cos(X[:, 1]) # normal input\n c = np.sin(X[:, 2]) \/ 2 # normal input\n d = np.cos(X[:, 3]) # bernoulli \n y = a**np.abs(b * c) + d \/ 2 + noise\n return X, y\n\u0026lt;\/pre\u0026gt;\n\u0026lt;\/div\u0026gt;","render_as_iframe":false,"selected_app_name":"HtmlApp","app_list":"{\"HtmlApp\":1575667}"}},{"type":"Blog.Section","id":"f_6f31f112-3ae1-4fa6-8fbc-3b274b811749","defaultValue":null,"component":{"type":"RichText","id":"f_aa9a5eda-455d-400f-9617-42e221384ff9","defaultValue":false,"value":"\u003cp style=\"font-size: 100%;\"\u003eThe input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_cd6ef54d-d54c-4817-bb55-45e8a9343f7f","defaultValue":null,"component":{"type":"HtmlComponent","id":2925997,"defaultValue":false,"value":"\u0026lt;div align=\"left\"\u0026gt;\n\u0026lt;pre class=\"prettyprint linenums\"\u0026gt;\nshap_abs_sums = []\nshap_diff = []\nps = [0.1, 0.3, 0.5, 0.7, 0.9]\nfeat_names = ['$\\mathcal{N}_1
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

, '$\\mathcal{N}_2
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

, '$\\mathcal{N}_3
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

, 'Bernoulli']\nfor p in ps:\n X, y = generate_data(p, 10000)\n model = RandomForestRegressor(n_estimators=25).fit(X, y)\n shap_values = shap.TreeExplainer(model).shap_values(X)\n feat_names[-1] = 'Bernoulli(%.2f)' % p\n shap.summary_plot(shap_values, X, feat_names)\n shap_abs_sums.append(np.abs(shap_values).sum(0))\n shap_diff.append(shap_values.max(0) - shap_values.min(0))\n\u0026lt;\/pre\u0026gt;\n\u0026lt;\/div\u0026gt;","render_as_iframe":null,"selected_app_name":"HtmlApp","app_list":"{\"HtmlApp\":1575668}"}},{"type":"Blog.Section","id":"f_2e74cc3d-11da-4971-8b8a-0efd7d4b49ef","defaultValue":null,"component":{"type":"Image","id":"f_630774d5-27a3-4398-8b0c-30fa051481b8","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/5121_180045","storage":"s","storagePrefix":null,"format":"png","h":228,"w":543,"s":36900,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_4350ec17-d5a2-4020-88cb-54a980f94fb2","defaultValue":null,"component":{"type":"Image","id":"f_028e16ab-59b2-4ff4-9819-97cc72877abf","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/551729_917091","storage":"s","storagePrefix":null,"format":"png","h":228,"w":543,"s":37605,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_736eccd7-9326-4eb0-be90-734346c20697","defaultValue":null,"component":{"type":"Image","id":"f_a744bead-9c0a-448c-b2ee-dd58c47eedaf","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/55222_867213","storage":"s","storagePrefix":null,"format":"png","h":228,"w":543,"s":38368,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_383b74b5-c7bc-4b8e-8c54-887a438a8dd7","defaultValue":null,"component":{"type":"Image","id":"f_b4b3764b-d012-4c41-8cff-bd7c1e2cbf54","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/96078_182253","storage":"s","storagePrefix":null,"format":"png","h":228,"w":543,"s":38486,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_ea48df3f-04e4-4bd1-baf9-d1d08ee47219","defaultValue":null,"component":{"type":"Image","id":"f_095eae29-a314-44e3-a967-0c76dbc67b87","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/71176_828195","storage":"s","storagePrefix":null,"format":"png","h":228,"w":543,"s":37489,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_00d01f68-55da-4d68-8820-a16b78b7a3d5","defaultValue":null,"component":{"type":"RichText","id":"f_0488e122-36a0-42b0-adeb-289ec8a16c1f","defaultValue":false,"value":"\u003cp\u003eYou can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_80947ec2-2946-419f-9001-fabd9e67756d","defaultValue":null,"component":{"type":"HtmlComponent","id":2926000,"defaultValue":false,"value":"\u0026lt;div align=\"left\"\u0026gt;\n\u0026lt;pre class=\"prettyprint linenums\"\u0026gt;\nfeat_names[-1] = 'Bernoulli(p)'\n\nplt.figure(0, figsize=(12, 8))\nplt.plot(ps, shap_abs_sums)\nplt.legend(feat_names, loc=(1.01,0))\nplt.xlabel('p')\nplt.ylabel('Sum of absolute SHAP values')\n\u0026lt;\/pre\u0026gt;\n\u0026lt;\/div\u0026gt;","render_as_iframe":null,"selected_app_name":"HtmlApp","app_list":"{\"HtmlApp\":1575669}"}},{"type":"Blog.Section","id":"f_6c455719-9062-43aa-9c2b-ddb888db1798","defaultValue":null,"component":{"type":"Image","id":"f_af739098-05ae-42ba-bb74-5c35af3eca47","defaultValue":null,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/934588_151283","storage":"s","storagePrefix":null,"format":"png","h":511,"w":973,"s":55107,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}}},{"type":"Blog.Section","id":"f_16c8d068-d40f-4d31-89c6-56ea82137790","defaultValue":null,"component":{"type":"RichText","id":"f_d615d980-8f2e-45e4-b786-ab76dd411a9c","defaultValue":false,"value":"\u003cp\u003eExactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_77c92295-a352-402a-9556-542b8c9b0d27","defaultValue":null,"component":{"type":"RichText","id":"f_fc1e4941-501b-48ea-870a-7a10059128d6","defaultValue":false,"value":"\u003cp\u003eHere all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive \u003cem\u003eoverall\u003c\/em\u003e because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_7814fb26-3158-4ed4-99ea-652203ffb242","defaultValue":null,"component":{"type":"Blog.Title","id":"f_f80ef06c-7d68-4178-b93a-9093ee0fa2a8","defaultValue":false,"value":"\u003cp\u003eSome Intuition\u003c\/p\u003e","backupValue":null,"version":1}},{"type":"Blog.Section","id":"f_328ed498-b608-46cb-8528-aed606323b68","defaultValue":null,"component":{"type":"RichText","id":"f_b9d7aa51-f11a-4744-81bd-c73b21fd8204","defaultValue":false,"value":"\u003cp\u003eTo gain some intuition about why this happens, let's think about a variable that is actually \u003cem\u003every\u003c\/em\u003e causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.\u003c\/p\u003e","backupValue":null,"version":1}}]},"settings":{"hideBlogDate":null},"pageMode":null,"pageData":{"type":"Site","id":"f_f78008a3-6282-4a1f-af6d-6106881bb104","defaultValue":null,"horizontal":false,"fixedSocialMedia":false,"new_page":false,"showMobileNav":true,"showCookieNotification":false,"useSectionDefaultFormat":true,"showTermsAndConditions":false,"showPrivacyPolicy":false,"activateGDPRCompliance":false,"multi_pages":true,"live_chat":false,"isFullScreenOnlyOneSection":true,"showNav":true,"showFooter":false,"showStrikinglyLogo":false,"showNavigationButtons":true,"showShoppingCartIcon":true,"showButtons":true,"navFont":"","titleFont":"bebas neue","logoFont":"","bodyFont":"work sans","buttonFont":"work sans","headingFont":"","theme":"fresh","templateVariation":"default","templatePreset":"default","termsText":null,"privacyPolicyText":null,"fontPreset":null,"GDPRHtml":null,"pages":[{"type":"Page","id":"f_fcca2448-2224-497f-b0cc-ff57f721a70c","defaultValue":null,"sections":[{"type":"Slide","id":"f_08eeaa73-bf9f-4a30-b497-0583a2b114b1","defaultValue":null,"template_id":null,"template_name":"title","template_version":null,"components":{"background1":{"type":"Background","id":"f_0439e8ae-d386-47a4-ba7b-8a3c3940465d","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":"","sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":"","videoHtml":"","storageKey":"174108\/website_background_v8atjt","storage":"c","format":"png","h":961,"w":1852,"s":152390,"useImage":null,"noCompression":null,"focus":{},"backgroundColor":{}},"media1":{"type":"Media","id":"f_e9c53b1c-8970-454d-b878-ac086d4b8a70","defaultValue":null,"video":{"type":"Video","id":"f_eaf56e5c-2976-4aeb-8a12-9d7c869bd771","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_f3540fd4-8632-491e-bbc0-c3f1f09b4156","defaultValue":true,"link_url":null,"thumb_url":null,"url":"","caption":"","description":"","storageKey":null,"storage":null,"storagePrefix":null,"format":null,"h":null,"w":null,"s":null,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"text3":{"type":"RichText","id":"f_23698f5e-9ea9-4fc7-a290-e1c910dd31c4","defaultValue":null,"value":null,"backupValue":null,"version":null},"text2":{"type":"RichText","id":"f_c1fd7a1d-020d-41f2-b70b-36c597e09a9d","defaultValue":false,"alignment":"","value":"","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_b316746e-8452-4373-bba3-f4b8e0fa310a","defaultValue":false,"alignment":"center","value":"\u003cp style=\"text-align: center; font-size: 160%;\"\u003eDATA SCIENCE, AI \u0026amp; MACHINE LEARNING CONSULTING\u003c\/p\u003e","backupValue":null,"version":1},"slideSettings":{"type":"SlideSettings","id":"f_2a78c78a-e361-4dea-a28a-ba1f13de513c","defaultValue":false,"show_nav":false,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"Data Cowboys","sync_key":null,"layout_variation":"center-bottom-full","display_settings":{},"padding":{},"layout_config":{}},"button1":{"type":"Button","id":"f_1a64fc7a-ae0c-4be1-a0df-7aa2b822237e","defaultValue":false,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":false}}},{"type":"Slide","id":"f_d1b33b17-4f8a-41ce-be85-cb8f82a7d323","defaultValue":null,"template_id":null,"template_name":"text","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_101eb165-a331-46ce-abec-b3da2c9e86d7","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"ABOUT","sync_key":null,"layout_variation":"box-one-text","display_settings":{},"padding":{},"layout_config":{}}}},{"type":"Slide","id":"f_3ffcc8f5-65b4-4765-9a9c-ddd0c524d58e","defaultValue":null,"template_id":null,"template_name":"blog","template_version":"beta-s6","components":{"slideSettings":{"type":"SlideSettings","id":"f_69e292a6-3ffc-427a-baeb-987161903eff","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":true,"hidden_section":false,"name":"BLOG","sync_key":null,"layout_variation":"one-smallCircle-long-none","display_settings":{},"padding":{},"layout_config":{"imageShape":"circle","columns":1,"snippetLength":"long","customized":true,"imageSize":"m","imageAlignment":"left","structure":"rows","templateName":"A"}},"text1":{"type":"RichText","id":"f_9d2fa9f3-f4c9-48a3-94f8-94002e744df8","defaultValue":false,"alignment":"auto","value":"\u003ch2 class=\"s-title s-font-title\"\u003eBlog\u003c\/h2\u003e","backupValue":"","version":1},"text2":{"type":"RichText","id":"f_cd32d4bf-e70c-4721-858b-a7c7249fa9c9","defaultValue":false,"value":"","backupValue":"","version":1},"background1":{"type":"Background","id":"f_ceef6eaf-6f13-4065-8009-9eeea341b6c6","defaultValue":false,"url":"","textColor":"light","backgroundVariation":"","sizing":null,"userClassName":"s-bg-gray","linkUrl":null,"linkTarget":null,"videoUrl":"","videoHtml":"","storageKey":null,"storage":null,"format":null,"h":null,"w":null,"s":null,"useImage":false,"noCompression":null,"focus":{},"backgroundColor":{}},"blog1":{"type":"BlogCollectionComponent","id":40,"defaultValue":null,"app_instance_id":null,"app_id":null,"category":{"id":"all","name":"All Categories"}}}},{"type":"Slide","id":"f_9b0a1617-fc31-42b1-b5a1-0d9ac085c651","defaultValue":null,"template_id":null,"template_name":"rows","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_66e8835c-c23b-4fe1-8d75-b91eaa9bfed0","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":true,"hidden_section":false,"name":"TESTIMONIALS","sync_key":null,"layout_variation":"col-two-text","display_settings":{},"padding":{},"layout_config":{"isNewMobileLayout":true}}}},{"type":"Slide","id":"f_86f6d7ed-3765-4aa2-844a-02cddf9f3b23","defaultValue":null,"template_id":null,"template_name":"title","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_566da130-1281-4393-bf65-abb2d781ccf1","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"CONTACT","sync_key":null,"layout_variation":"center-subTop-full","display_settings":{},"padding":{},"layout_config":{}}}}],"title":"Home","description":null,"uid":"77c9e0f9-c8df-4bef-b786-4638f0aaed73","path":"\/home","pageTitle":null,"pagePassword":null,"memberOnly":null,"paidMemberOnly":null,"buySpecificProductList":{},"specificTierList":{},"pwdPrompt":null,"autoPath":true,"authorized":true},{"type":"Page","id":"f_cbe77937-da0c-4d35-9cf4-a994db3e4bd4","defaultValue":null,"sections":[{"type":"Slide","id":"f_42817d6b-9bcf-4fdd-889e-bbef3a988109","defaultValue":null,"template_id":null,"template_name":"rows","template_version":null,"components":{"repeatable1":{"type":"Repeatable","id":"f_916ae5b6-6d0b-47d5-8e08-03a8fd31be3e","defaultValue":false,"list":[{"type":"RepeatableItem","id":"f_b6be4998-6998-4ac9-a489-b5c308b5e884","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_72363fdc-1373-4dd7-9f75-c3eab9058d8c","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eSergey works part-time as a senior applied research scientist at AI2, on the Semantic Scholar research team. He's worked on many different projects, including:\u003c\/p\u003e\u003cul\u003e\u003cli style=\"text-align: left;\"\u003eA \u003ca target=\"_blank\" href=\"https:\/\/jamanetwork.com\/journals\/jamanetworkopen\/fullarticle\/2737103\"\u003epaper\u003c\/a\u003e about gender bias in clinical trial recruitment published in JAMA Network Open, along with \u003ca target=\"_blank\" href=\"https:\/\/qz.com\/1657408\/why-are-women-still-underrepresented-in-clinical-research\/\"\u003enews coverage\u003c\/a\u003e.\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003eA complete overhaul of the Semantic Scholar author disambiguation system, described in a published \u003ca target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2103.07534\"\u003epaper \u003c\/a\u003eand a \u003ca target=\"_blank\" href=\"https:\/\/medium.com\/ai2-blog\/s2and-an-improved-author-disambiguation-system-for-semantic-scholar-d09380da30e6\"\u003eblog post\u003c\/a\u003e. Also, see the open-sourced \u003ca target=\"_blank\" href=\"https:\/\/github.com\/allenai\/S2AND\/\"\u003ecode \u0026amp; data\u003c\/a\u003e\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003eTwo published methods for high quality academic paper embeddings: \u003ca target=\"_blank\" href=\"https:\/\/www.semanticscholar.org\/paper\/Content-Based-Citation-Recommendation-Bhagavatula-Feldman\/e9baba6cd76ea7f347462056cde699d6e6af0abd\"\u003eCiteomatic\u003c\/a\u003e (\u003ca target=\"_blank\" href=\"https:\/\/github.com\/allenai\/citeomatic\/\"\u003ecode\u003c\/a\u003e) and \u003ca target=\"_blank\" href=\"https:\/\/www.semanticscholar.org\/paper\/SPECTER%3A-Document-level-Representation-Learning-Cohan-Feldman\/edcd5ee7cc16490804ca22e9c2717e62a99ae655\"\u003eSPECTER\u003c\/a\u003e (\u003ca target=\"_blank\" href=\"https:\/\/github.com\/allenai\/specter\"\u003ecode\u003c\/a\u003e).\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003eImproving the Semantic Scholar search engine, described in a \u003ca target=\"_blank\" href=\"https:\/\/medium.com\/ai2-blog\/building-a-better-search-engine-for-semantic-scholar-ea23a0b661e7\"\u003edetailed blog post\u003c\/a\u003e. \u003ca target=\"_blank\" href=\"https:\/\/github.com\/allenai\/s2search\/\"\u003eCode\u003c\/a\u003e is available as well.\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003e\u003ca target=\"_blank\" href=\"https:\/\/medium.com\/@sergeyfeldman\/the-association-between-early-arxiv-posting-and-citations-72034f0914b2\"\u003eA blog post\u003c\/a\u003e and \u003ca target=\"_blank\" href=\"https:\/\/www.semanticscholar.org\/paper\/Citation-Count-Analysis-for-Papers-with-Preprints-Feldman-Lo\/c9aef1d76799e24dd236889c0c7bacaa501a3b62\"\u003epaper\u003c\/a\u003e about the association between posting your papers on ArXiV before review and subsequent citations.\u003c\/li\u003e\u003c\/ul\u003e\u003cp style=\"text-align: left;\"\u003e\u0026nbsp;\u003c\/p\u003e\u003cdiv style=\"text-align: left;\"\u003eThis work is ongoing since March, 2016.\u003c\/div\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_12ca08a5-b28d-4585-9900-68133c4fa004","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: \u003ca target=\"_blank\" href=\"https:\/\/research.semanticscholar.org\"\u003eAllen Institute of Artificial Intelligence, Semantic Scholar\u003c\/a\u003e\u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_b775061b-e61f-4d77-ae10-5d06e35b634a","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eDeep Neural Networks for Natural Language Processing\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_834f8cee-3985-47ea-b13c-b7ce4d461048","defaultValue":null,"video":{"type":"Video","id":"f_eea79864-ff0e-4dad-b838-99d6591b404e","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_2fc59dc7-7b12-4b5b-9d7b-e58ea8124ec8","defaultValue":false,"link_url":"https:\/\/github.com\/allenai\/citeomatic","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/50534_401964","storage":"s","storagePrefix":null,"format":"png","h":141,"w":372,"s":15869,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_5b82cb03-b1c7-4413-9af9-02009892560f","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_bdc1a714-76ee-4d41-948d-da6a316b65d0","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_e4d29c44-287e-4b2d-92ee-f95f59e7a177","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eThe Healthy Birth, Growth, and Development (HBGD) program was launched in 2013 by the Bill \u0026amp; Melinda Gates Foundation.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u0026nbsp;\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eThe Knowledge Integration (Ki) initiative aims facilitates collaboration between researchers, quantitative experts, and policy makers in fields related to HBGD. The broad goal is to aggregate data from past longitudinal studies about pathways and risk factors that affect birth, growth, and neurocognitive development in order to better predict Ki outcomes.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u0026nbsp;\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eSergey works closely with Ki leadership - designing and overseeing data science contests; managing external collaborations with academic research labs and software companies; and modeling many diverse global health datasets (an example is described \u003ca target=\"_blank\" href=\"https:\/\/www.kiglobalhealth.org\/case-studies\/evaluating-the-feasibility-of-using-algorithms-to-stratify-pregnancy-risk\/\"\u003ehere\u003c\/a\u003e).\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u0026nbsp;\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eThis work is ongoing since February, 2015.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_6cc39d79-1a7a-4874-9b9f-46e5ab826768","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: \u003ca target=\"_blank\" href=\"http:\/\/hbgdki.org\/\"\u003eB\u003c\/a\u003e\u003ca href=\"http:\/\/kiglobalhealth.org\/\"\u003eill and Melinda Gates Foundation\u003c\/a\u003e\u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_f6fc3501-944d-485e-b981-ac8e9cf27474","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eMachine Learning Strategy Consulting\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_2f31d35b-1e68-438d-b2f5-c57d28dc7a70","defaultValue":false,"video":{"type":"Video","id":"f_eea79864-ff0e-4dad-b838-99d6591b404e","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_2fc59dc7-7b12-4b5b-9d7b-e58ea8124ec8","defaultValue":false,"link_url":"https:\/\/kiglobalhealth.org\/","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/ki_xvygv4","storage":"c","storagePrefix":null,"format":"png","h":199,"w":249,"s":6745,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_df04434a-6387-44b5-a4f8-f32c7af26dd9","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_f5eae955-a4c7-4218-9134-d2e0cebb8cc9","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_d34047e9-3be9-47ff-b5e8-15c1b1532773","defaultValue":false,"value":"\u003cdiv class=\"s-rich-text-wrapper\" style=\"display: block;\"\u003e\u003cp style=\"text-align: left;\"\u003eWe contribute to the Python data science ecosystem.\u003c\/p\u003e\u003cp\u003e\u0026nbsp;\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eMost notably, Sergey co-wrote and maintains the imputation package \u003ca href=\"https:\/\/github.com\/iskandr\/fancyimpute\" target=\"_blank\"\u003efancyimpute\u003c\/a\u003e, and merged \u003ca href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.impute.IterativeImputer.html\" target=\"_blank\"\u003eIterativeImputer\u003c\/a\u003e into the machine learning uber-library scikit-learn. Some other packages we've worked on:\u003c\/p\u003e\u003cul\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/S2AND\/\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/s2_fos\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/specter\/\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/scidocs\/\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/s2search\/\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/sergeyf\/SmallDataBenchmarks\/\u003c\/li\u003e\u003cli style=\"text-align: left;\"\u003ehttps:\/\/github.com\/allenai\/citeomatic\/\u003c\/li\u003e\u003c\/ul\u003e\u003c\/div\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_a04c25b8-882b-4c5e-bbdd-233cee97d74a","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: Everyone\u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_8ef2499e-1d89-4de2-893b-3f0b0dac3ec1","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eOpen Source Contributions\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_d71fe057-47e0-4920-866f-342da7035ecd","defaultValue":false,"video":{"type":"Video","id":"f_eea79864-ff0e-4dad-b838-99d6591b404e","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_2fc59dc7-7b12-4b5b-9d7b-e58ea8124ec8","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/654579_364680","storage":"s","storagePrefix":null,"format":"png","h":200,"w":200,"s":9412,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_51163551-8fd1-4aff-a6c0-cc25a9f6422d","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_4f8d4459-2b09-42ff-a085-66eeaa512da3","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_f14f2a47-f12f-4942-93a4-e93fb91cf04f","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eActively Learn makes a reading tool that enables teachers to guide, monitor, and improve student learning. With our help, they wrote and were awarded an \u003ca target=\"_blank\" href=\"https:\/\/www.research.gov\/research-portal\/appmanager\/base\/desktop;jsessionid=LpRXVc0T5WCN5cWJWMvkT6HL52kypwKccBZGCQDt7QGPmn3PQlTF!-745634694!1239949331?_nfpb=true\u0026amp;_windowLabel=rsrRecentAwards_2\u0026amp;wsrp-urlType=blockingAction\u0026amp;wsrp-url=\u0026amp;wsrp-requiresRewrite=\u0026amp;wsrp-navigationalState=eJyLL07OL0i1Tc-JT0rMUYNQtgBZ6Af8\u0026amp;wsrp-interactionState=wlprsrRecentAwards_2_action%3DviewRsrDetail%26wlprsrRecentAwards_2_fedAwrdId%3D1534790\u0026amp;wsrp-mode=wsrp%3Aview\u0026amp;wsrp-windowState=\"\u003eNSF SBIR grant\u003c\/a\u003e to answer the key question: \"How can we personalize reading instruction so as to increase comprehension \u0026amp; learning?\" We are diving deep into the data with sophisticated machine learning tools, and bringing back testable hypotheses about what helps and hinders students.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eThis work is ongoing since April, 2014.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_17eb06b2-628e-4610-90c3-4bfe0280afa4","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: \u003ca target=\"_blank\" href=\"http:\/\/www.activelylearn.com\/\"\u003eActively Learn\u003c\/a\u003e \u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_bb00411a-80fb-4839-8607-431d2b0dafd0","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eImproving Reading Comprehension\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_96ece3f7-4527-469a-af7f-951722b5205e","defaultValue":false,"video":{"type":"Video","id":"f_87bd6357-cb61-4888-8e39-41f2a571888b","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_88a34a04-6a19-452c-bb24-540b11df969e","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/actively_learn_2_tvds7b","storage":"c","storagePrefix":null,"format":"png","h":720,"w":720,"s":22906,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_8090c8ef-1b23-460a-a8a1-b462ab13c93d","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_4a0a3520-57b2-46f3-8f13-a807d6317f07","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_7204c36a-95eb-41cf-9e07-dd61e5727fbb","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eJenny Dearborn, Chief Learning Officer and Senior Vice President at SAP, has written \u003ca target=\"_blank\" href=\"http:\/\/www.wiley.com\/WileyCDA\/WileyTitle\/productCd-1119043123.html\"\u003eData Driven\u003c\/a\u003e, a \"practical guide to increasing sales success, using the power of data analytics,\" and \u003ca href=\"http:\/\/www.wiley.com\/WileyCDA\/WileyTitle\/productCd-1119382203.html\"\u003eThe Data Driven Leader\u003c\/a\u003e (with David Swanson), \"a clear, accessible guide to solving important leadership challenges through human resources-focused and other data analytics.\"\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eWe helped her and her team come up with clear and compelling ways to communicate the deep mathematical models that are at the core of the book, as well as contributed to the plot and characterizations.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_6b55f81b-f60b-4c8e-8c1e-f71a8e785a66","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003e\u003cstrong\u003eFor\u003c\/strong\u003e: \u003ca target=\"_blank\" href=\"http:\/\/jennydearborn.com\/\"\u003eJenny Dearborn\u003c\/a\u003e \u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_9231dfad-bb97-4d6d-a9a8-c73935866821","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eContributing to Technical Books\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_ca6b1514-9f8b-4b5a-94ce-9883bc2742bc","defaultValue":false,"video":{"type":"Video","id":"f_60f73773-7591-4b94-bbf2-1e9bb14c173d","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_9a39eea1-abe4-4c56-970b-b8406710a1d4","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/data_driven_books_hsbtil","storage":"c","storagePrefix":null,"format":"jpg","h":331,"w":428,"s":44543,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_44517dde-82fa-4666-a9de-34df6e0fbf02","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_562073b0-8955-420f-8382-69d79009d32e","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_cba5ab95-6af4-4571-955b-0dd33390fcef","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eSeattle Against Slavery mobilizes the community in the fight against labor and sex trafficking through education, advocacy, and collaboration with local and national partners. We are proud to provide them with analytics and statistics services on a volunteer basis.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_857f12c7-09c7-4fca-978d-6fe22ae84305","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: \u003ca href=\"https:\/\/www.seattleagainstslavery.org\/\"\u003eSeattle Against Slavery\u003c\/a\u003e\u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_1efff26d-0616-46d1-bb59-59cc986b58a1","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003ePro Bono Data Science\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_6efd3354-bc13-41eb-a67e-6b372a1f904b","defaultValue":null,"video":{"type":"Video","id":"f_247d3d8f-2419-4fe8-a58b-e1e45857fa1b","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_5fa881d1-ba9a-40dd-97d0-e03c7befb158","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/SaS_logo_blue-small_uc2b3o","storage":"c","storagePrefix":null,"format":"gif","h":414,"w":464,"s":14774,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_1373f375-fece-46d4-a912-258765ef3ee6","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_4159182e-fbaf-4c29-8fb0-eabc020fda04","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_f5ac93b3-096f-4fae-a094-2310d369836b","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eLong Tail NLP-Based Recommendations\u003c\/strong\u003e. Most e-commerce recommendation engines have difficulty highlighting less frequently bought products, which is an issue that compounds itself and ends up recommending the same popular products over and over. We developed a language-based model for RichRelevance that identifies good recommendations based on comparisons of the product descriptions and description metadata rather than purchase data. This evens the playing field between newer products and the old standbys, so the recommendations have more variety and are generally more applicable.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eBayesian A\/B Testing. \u003c\/strong\u003eRichRelevance swears by their top-notch recommendations. But what's the right way to measure their efficacy? Sergey put together an intuitive, comprehensive Bayesian A\/B testing system that works for any KPI, and can provide direct answers to key customer questions like \"What is the probability that algorithm A has at least 5% lift over algorithm B?\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eRead all about this work in Sergey's three (archived) blog posts: \u003ca title=\"Bayesian A\/B Tests\" target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20160117035128\/http:\/\/engineering.richrelevance.com:80\/bayesian-ab-tests\/\"\u003e[1]\u003c\/a\u003e, \u003ca title=\"Bayesian Analysis of Normal Distributions with Python\" target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20160304040821\/http:\/\/engineering.richrelevance.com\/bayesian-analysis-of-normal-distributions-with-python\/\"\u003e[2]\u003c\/a\u003e, and \u003ca title=\"Bayesian A\/B Testing with a Log-Normal Model\" target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20160304034523\/http:\/\/engineering.richrelevance.com\/bayesian-ab-testing-with-a-log-normal-model\/\"\u003e[3]\u003c\/a\u003e.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eBandits for Online Recommendations.\u003c\/strong\u003e The most important piece of RichRelevance's impressive big data pipeline is their core recommendation system. It serves thousands of recommendations every minute, and it has to learn quickly from new data. Working with their analytics team, Sergey engineered a modern bandit-based approach to online recommendations that learns from less data, adapts easily to any optimization metric, and does not compromise quality at production-scale.\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left;\"\u003eThree (now archived) blog posts describe the results of our research: \u003ca target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20161226125829\/http:\/\/engineering.richrelevance.com\/bandits-recommendation-systems\/\"\u003e[1]\u003c\/a\u003e, \u003ca target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20161226123730\/http:\/\/engineering.richrelevance.com\/recommendations-thompson-sampling\/\"\u003e[2]\u003c\/a\u003e, and \u003ca target=\"_blank\" href=\"https:\/\/web.archive.org\/web\/20161226121822\/http:\/\/engineering.richrelevance.com\/personalization-contextual-bandits\/\"\u003e[3]\u003c\/a\u003e.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_b721cbd1-4d96-4db0-a383-40b23bb18b54","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003e\u003cstrong\u003eFor: \u003ca href=\"http:\/\/www.richrelevance.com\/\"\u003eRichRelevance\u003c\/a\u003e \u003c\/strong\u003e\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_c647854c-d234-4b3a-a4f3-18bceefb70ab","defaultValue":false,"value":"\u003cp style=\"text-align: left;\"\u003eMultiple Projects\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_ff39c258-7096-4562-99d2-5e48f0f179d1","defaultValue":null,"video":{"type":"Video","id":"f_247d3d8f-2419-4fe8-a58b-e1e45857fa1b","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_5fa881d1-ba9a-40dd-97d0-e03c7befb158","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/rr_cyj3oa","storage":"c","storagePrefix":null,"format":"png","h":720,"w":720,"s":22703,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"button1":{"type":"Button","id":"f_c50f41fa-d36d-4223-a570-b67272fd1633","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}}],"components":{"text3":{"type":"RichText","id":"f_7594b743-4d42-4d19-bfb9-978d40ab7755","defaultValue":null,"value":"Ixtapa, Mexico\u003cbr\u003eOpportunity Collaboration brings together nonprofit leaders, social entrepreneurs, and social investors to move together towards poverty alleviation. With Kip's help, Opportunity Collaboration's Facebook reach grew by up to 700 percent.","backupValue":null,"version":null},"text2":{"type":"RichText","id":"f_37ab4311-d044-4ffa-9c97-43621d24a27d","defaultValue":null,"value":"\u003cstrong\u003eMission: Social Change Leaders + Conversations + Beaches\u003c\/strong\u003e","backupValue":null,"version":null},"text1":{"type":"RichText","id":"f_b8d2779a-bff0-4fd2-ac4e-8d9ebcb826b8","defaultValue":null,"value":"Opportunity Collaboration","backupValue":null,"version":null},"media1":{"type":"Media","id":"f_85a102b5-9f37-4305-83ff-5097d8fcef09","defaultValue":null,"video":{"type":"Video","id":"f_d2076034-9ec2-4a50-a463-eec119f51007","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_ffe437c3-89ef-46f9-9eff-960cf6b16d10","defaultValue":true,"link_url":"","thumb_url":"","url":"\/assets\/themes\/fresh\/logo3.png","caption":"","description":"","storageKey":null,"storage":null,"storagePrefix":null,"format":null,"h":null,"w":null,"s":null,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"}}},"text2":{"type":"RichText","id":"f_c0dc7a0a-c5b7-42a7-a21c-29a311725c6e","defaultValue":false,"value":"","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_35d9a22f-35b6-405c-aa9d-f420b8abb0fe","defaultValue":false,"value":"\u003cp style=\"text-align:center\"\u003ePAST AND ONGOING WORK\u003c\/p\u003e","backupValue":null,"version":1},"background1":{"type":"Background","id":"f_496ba060-8080-4645-81f0-eb8c01abb560","defaultValue":false,"url":null,"textColor":"dark","backgroundVariation":null,"sizing":null,"userClassName":"s-bg-white","linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":null,"storageKey":null,"storage":null,"format":null,"h":null,"w":null,"s":null,"useImage":null,"noCompression":null,"focus":{},"backgroundColor":{}},"slideSettings":{"type":"SlideSettings","id":"f_83d4d5e3-ad9f-4f11-96af-dbb587e03b55","defaultValue":false,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":true,"hidden_section":null,"name":"PAST AND ONGOING WORK","sync_key":null,"layout_variation":"row-medium1-text-right","display_settings":{},"padding":{},"layout_config":{}}}},{"type":"Slide","id":"f_4a843bb2-9432-469c-8c0c-0e160da07f1c","defaultValue":null,"template_id":null,"template_name":"rows","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_fca3220e-6232-4526-8de9-4cc37294917f","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":true,"hidden_section":false,"name":"PARTNERSHIPS","sync_key":null,"layout_variation":"row-medium1-text-right","display_settings":{},"padding":{},"layout_config":{}}}},{"type":"Slide","id":"f_091ff14d-a651-42cb-8aa6-392cc903cf63","defaultValue":null,"template_id":null,"template_name":"text","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_82999dd2-eca1-4b36-833c-85d21f022927","defaultValue":null,"show_nav":false,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"PUBLICATIONS","sync_key":null,"layout_variation":"text-one-text","display_settings":{},"padding":{},"layout_config":{}}}},{"type":"Slide","id":"f_ebdb3ae5-4ddd-43d0-add6-10d6249ccb79","defaultValue":null,"template_id":null,"template_name":"title","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_c2db69d9-8a0a-4723-bcb9-2cabec53c0ce","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"CONTACT","sync_key":null,"layout_variation":"center-subTop-full","display_settings":{},"padding":{},"layout_config":{}}}}],"title":"Work","description":"Data Cowboys is a data science and machine learning consulting cooperative, owned and run by professional consultants. We excel at using machine learning, AI, data science, and statistics tools to generate custom, practical solutions to complex real-world problems.","uid":"05ddcb0c-fc84-4b7e-b7df-5ef959b95299","path":"\/work","pageTitle":"Data Cowboys - Work","pagePassword":null,"memberOnly":null,"paidMemberOnly":null,"buySpecificProductList":{},"specificTierList":{},"pwdPrompt":null,"autoPath":true,"authorized":true},{"type":"Page","id":"f_b039a010-4494-48c8-8b03-27287bf4cc30","defaultValue":null,"sections":[{"type":"Slide","id":"f_ede10d1e-0164-4cd9-8218-c62c21a592d9","defaultValue":null,"template_id":null,"template_name":"title","template_version":null,"components":{"background1":{"type":"Background","id":"f_3392461b-fdb7-4d32-bfbd-263abafd60b9","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":"","sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":"","videoHtml":"","storageKey":"174108\/contour2_bhfkwz","storage":"c","format":"png","h":983,"w":2048,"s":83913,"useImage":true,"noCompression":null,"focus":{},"backgroundColor":{}},"media1":{"type":"Media","id":"f_1b4856cf-e364-4c72-8af3-ad5dcfdc9631","defaultValue":null,"video":{"type":"Video","id":"f_22613bca-6fcf-4e78-bd81-caa22d0c1c83","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_7f7f2e60-e797-407e-a8ce-9ca68e3a6ccd","defaultValue":true,"link_url":null,"thumb_url":null,"url":"","caption":"","description":"","storageKey":null,"storage":null,"storagePrefix":null,"format":null,"h":null,"w":null,"s":null,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"current":"image"},"text3":{"type":"RichText","id":"f_2f51e7e9-afaa-4758-8dd2-bb8166d59e07","defaultValue":null,"value":null,"backupValue":null,"version":null},"text2":{"type":"RichText","id":"f_c99e10f8-f1c5-4c53-94cb-f71b14ae075a","defaultValue":false,"value":"\u003cp style=\"font-size: 160%;\"\u003eTell us about your data challenges.\u003c\/p\u003e","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_46034566-a920-4a60-bfbd-42bd097b23f6","defaultValue":false,"value":"\u003cp style=\"text-align: center; font-size: 160%;\"\u003eILYA@DATA-COWBOYS.COM\u003c\/p\u003e","backupValue":null,"version":1},"slideSettings":{"type":"SlideSettings","id":"f_aa4f839e-6b16-413b-8f2c-badf78b89fd9","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"CONTACT","sync_key":null,"layout_variation":"center-subTop-full","display_settings":{},"padding":{},"layout_config":{}},"button1":{"type":"Button","id":"f_03cab037-6757-4a4f-bbbd-52ea8642517c","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":false}}}],"title":"Contact","description":"Data Cowboys is a data science and machine learning consulting cooperative, owned and run by professional consultants. We excel at using machine learning, AI, data science, and statistics tools to generate custom, practical solutions to complex real-world problems.","uid":"64443964-9faf-4e1b-b442-999a1cfacf48","path":"\/contact","pageTitle":"Data Cowboys - Contact","pagePassword":null,"memberOnly":null,"paidMemberOnly":null,"buySpecificProductList":{},"specificTierList":{},"pwdPrompt":null,"autoPath":true,"authorized":true},{"type":"Page","id":"f_dcb7b77f-d81c-433f-82df-41245139eaac","defaultValue":null,"sections":[{"type":"Slide","id":"f_68abbeff-13bf-4bfb-bc13-3f656603691b","defaultValue":null,"template_id":null,"template_name":"columns","template_version":null,"components":{"repeatable1":{"type":"Repeatable","id":"f_43c8c4e3-d93f-4679-be55-7f50223329f3","defaultValue":false,"list":[{"type":"RepeatableItem","id":"f_15b41d54-2e33-443f-bd85-a2aaf846f102","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_65d69a06-fbc4-49b6-8fab-1cec4abdc7c8","defaultValue":false,"value":"\u003cp style=\"text-align: left; font-size: 130%;\"\u003eSergey Feldman has been working with data and designing machine learning algorithms since 2007. He's done both academic and real-world data wrangling, and loves to learn about new domains in order to build the perfect solution for the problem at hand. Sergey has a PhD in machine learning from the University of Washington, and is also an expert in natural language processing, statistics, and signal processing.\u003c\/p\u003e\u003cp style=\"text-align: left; font-size: 130%;\"\u003e\u00a0\u003c\/p\u003e\u003cp style=\"text-align: left; font-size: 130%;\"\u003eSergey is based in Seattle.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_7f9dc12e-db46-4b6c-aa4c-3fe6ae8a1b69","defaultValue":false,"value":"","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_39c4086c-005d-4dea-8197-9176e53f4224","defaultValue":false,"value":"\u003cp\u003eSergey Feldman\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_818de00d-25c3-4c04-9703-33e9911c569f","defaultValue":null,"video":{"type":"Video","id":"f_ad9cdf7a-e4f7-497e-9c4e-bc62aa3734e1","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_ecf64181-94ea-4a89-a7d2-cc60004df871","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/696816_594199","storage":"s","storagePrefix":null,"format":"jpg","h":4032,"w":3024,"s":2611927,"new_target":true,"noCompression":null,"cropMode":"freshColumnLegacy","focus":null},"current":"image"},"button1":{"type":"Button","id":"f_96bf50f7-369e-4703-9dc2-0a9285060440","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}},{"type":"RepeatableItem","id":"f_e385cfdf-4ad7-42bf-8419-98b34ae6bea0","defaultValue":null,"components":{"text3":{"type":"RichText","id":"f_5e428f87-ffef-4234-8c19-321f86cd3723","defaultValue":false,"value":"\u003cp style=\"text-align: left; font-size: 130%;\"\u003eIlya Barshai has been tackling machine learning and data science problems with Data Cowboys since 2016, and worked in risk and failure analysis of electromechanical product designs for 8 years prior. He has built deep natural language processing systems, recommendation engines and a variety of predictive and explanatory models in all sorts of domains. Ilya has a B.S. in electrical engineering from the University of Illinois at Chicago with a focus in control theory and signal processing, and has completed the Johns Hopkins Data Science specialization program.\u003cbr\u003e\u00a0\u003cbr\u003eIlya is based in Chicago.\u003c\/p\u003e","backupValue":null,"version":1},"text2":{"type":"RichText","id":"f_170ad144-1089-4cce-8a4d-6822f57bf8e4","defaultValue":false,"value":"","backupValue":null,"version":1},"text1":{"type":"RichText","id":"f_c39ef068-adf6-46d1-9da1-857dd5d117bb","defaultValue":false,"value":"\u003cp\u003eIlya Barshai\u003c\/p\u003e","backupValue":null,"version":1},"media1":{"type":"Media","id":"f_f98a1189-87c3-4fa4-9cd4-167b596cd002","defaultValue":false,"video":{"type":"Video","id":"f_2e36d0a7-771e-454a-83e1-be109e79f0b9","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_f79bd5fe-ef08-4f92-a70c-c8e6fd36b541","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/ilya_headshot_sockru","storage":"s","storagePrefix":null,"format":"jpg","h":320,"w":320,"s":54535,"new_target":true,"noCompression":null,"cropMode":"freshColumnLegacy","focus":null},"current":"image"},"button1":{"type":"Button","id":"f_ec201be5-64cd-4b25-a991-84d4a8f6e11b","defaultValue":true,"text":"","link_type":null,"page_id":null,"section_id":null,"url":"","new_target":null}}}],"components":{"text3":{"type":"RichText","id":"f_aabaf505-d1f7-44bb-9e24-679d99bfc5a4","defaultValue":null,"value":"Enter a description here.","backupValue":null,"version":null},"text2":{"type":"RichText","id":"f_6d69f391-e4eb-4350-9526-268753953666","defaultValue":null,"value":"Your Title","backupValue":null,"version":null},"text1":{"type":"RichText","id":"f_d9c269ef-ba69-4fb3-8f0a-275eba405a5d","defaultValue":null,"value":"Your Name","backupValue":null,"version":null},"media1":{"type":"Media","id":"f_39c138a9-85e9-4d2e-9b60-9b11cc814499","defaultValue":null,"video":{"type":"Video","id":"f_01fb06fb-d52d-48c3-a9af-50296030ad02","defaultValue":null,"html":"","url":"","thumbnail_url":null,"maxwidth":700,"description":null},"image":{"type":"Image","id":"f_a13ad4e6-5a6d-4e55-bab7-e89801a59f1c","defaultValue":true,"link_url":"","thumb_url":"","url":"\/assets\/themes\/fresh\/pip.png","caption":"","description":"","storageKey":null,"storage":null,"storagePrefix":null,"format":null,"h":null,"w":null,"s":null,"new_target":true,"noCompression":null,"cropMode":"freshColumnLegacy","focus":{}},"current":"image"}}},"text2":{"type":"RichText","id":"f_eca9dcf0-65bf-4927-8767-bbafa80c6346","defaultValue":false,"value":"","backupValue":"","version":1},"text1":{"type":"RichText","id":"f_2d8725c4-cf08-4677-865f-3e6f6f49f1db","defaultValue":false,"value":"\u003cp style=\"text-align:center\"\u003eTEAM\u003c\/p\u003e","backupValue":null,"version":1},"background1":{"type":"Background","id":"f_15a7bfbc-8894-4c60-babe-ee82c3d574a2","defaultValue":false,"url":"","textColor":"light","backgroundVariation":"","sizing":null,"userClassName":"s-bg-white","linkUrl":null,"linkTarget":null,"videoUrl":"","videoHtml":"","storageKey":null,"storage":null,"format":null,"h":null,"w":null,"s":null,"useImage":false,"noCompression":null,"focus":{},"backgroundColor":{}},"slideSettings":{"type":"SlideSettings","id":"f_8a6ae540-f5a8-4468-aeb7-f1c639022a03","defaultValue":false,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"WHO WE ARE","sync_key":null,"layout_variation":"col-three-text","display_settings":{},"padding":{},"layout_config":{"isNewMobileLayout":true}}}},{"type":"Slide","id":"f_a3ecfb5d-3972-42c4-b005-04642f243f8e","defaultValue":null,"template_id":null,"template_name":"title","template_version":null,"components":{"slideSettings":{"type":"SlideSettings","id":"f_75556e28-ae2d-44c8-9b82-d85fd07f4df5","defaultValue":null,"show_nav":true,"show_nav_multi_mode":null,"nameChanged":null,"hidden_section":null,"name":"CONTACT","sync_key":null,"layout_variation":"center-subTop-full","display_settings":{},"padding":{},"layout_config":{}}}}],"title":"Team","description":"Data Cowboys is a data science and machine learning consulting cooperative, owned and run by professional consultants. We excel at using machine learning, AI, data science, and statistics tools to generate custom, practical solutions to complex real-world problems.","uid":"5dd84800-5ae5-4a19-9c36-3807551ed974","path":"\/team","pageTitle":"Data Cowboys - Team","pagePassword":null,"memberOnly":null,"paidMemberOnly":null,"buySpecificProductList":{},"specificTierList":{},"pwdPrompt":null,"autoPath":true,"authorized":true}],"menu":{"type":"Menu","id":"f_261c075b-965c-4ecd-8a4b-9ae9990e059d","defaultValue":null,"template_name":"navbar","logo":null,"components":{"button1":{"type":"Button","id":"f_549fa287-e758-469d-b3ae-00fa679f4c30","defaultValue":null,"text":"Add a button","link_type":null,"page_id":null,"section_id":null,"url":"http:\/\/strikingly.com","new_target":null},"text2":{"type":"RichText","id":"f_c2ba5401-f651-43ad-b6a7-f62bdb65be7d","defaultValue":null,"value":"Subtitle Text","backupValue":null,"version":null},"text1":{"type":"RichText","id":"f_27a88b5f-c981-430a-b7c2-278145209c3a","defaultValue":null,"value":"Title Text","backupValue":null,"version":null},"image2":{"type":"Image","id":"f_5142bdd9-647f-42c2-ac7d-9b89487c8b80","defaultValue":null,"link_url":"#1","thumb_url":"\/assets\/icons\/transparent.png","url":"\/assets\/icons\/transparent.png","caption":"","description":"","storageKey":null,"storage":null,"storagePrefix":null,"format":null,"h":null,"w":null,"s":null,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"image1":{"type":"Image","id":"f_29a158b6-f31c-4020-bd0e-52f3a9344926","defaultValue":false,"link_url":"","thumb_url":"!","url":"!","caption":"","description":"","storageKey":"174108\/6b5810c2e5c14b8c8251376818e398e0_cvuc2s","storage":"c","storagePrefix":null,"format":"png","h":302,"w":720,"s":38189,"new_target":true,"noCompression":null,"cropMode":null,"focus":{}},"background1":{"type":"Background","id":"f_a6d372ab-8812-4408-8260-53cdaaecf3f0","defaultValue":null,"url":"https:\/\/uploads.strikinglycdn.com\/static\/backgrounds\/striking-pack-2\/28.jpg","textColor":"light","backgroundVariation":"","sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":"","videoHtml":"","storageKey":null,"storage":null,"format":null,"h":null,"w":null,"s":null,"useImage":null,"noCompression":null,"focus":{},"backgroundColor":{}}}},"footer":{"type":"Footer","id":"f_269821f4-206d-4b81-874f-5bc794ddf928","defaultValue":null,"socialMedia":{"type":"SocialMediaList","id":"f_4347ccc4-02e2-44a1-b00f-e2ce02e182d1","defaultValue":null,"link_list":[{"type":"Facebook","id":"f_d802bbe5-6891-4692-934a-92adcd956498","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":null,"app_id":138736959550286},{"type":"Twitter","id":"f_1af2283e-2014-4dd6-a620-f5afbb3003a1","defaultValue":null,"url":"","link_url":"","share_text":"Saw an awesome one pager. Check it out #strikingly","show_button":null},{"type":"GPlus","id":"f_ab9d6ea6-2905-480f-a19a-bbbf3448d66a","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":null}],"button_list":[{"type":"Facebook","id":"f_c8682a32-ae3d-4166-aac9-e88a288fff8d","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false,"app_id":138736959550286},{"type":"Twitter","id":"f_e290c403-39a2-4871-b3a3-b464bb212afe","defaultValue":null,"url":"","link_url":"","share_text":"Saw an awesome one pager. Check it out @strikingly","show_button":false},{"type":"GPlus","id":"f_f13824e5-af29-476e-95f0-f7130e19edf8","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false},{"type":"LinkedIn","id":"f_53e266d2-3f6e-4f11-9031-ee849e94259e","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false}],"list_type":null},"copyright":{"type":"RichText","id":"f_d871c121-20b6-40c1-91d3-b46662fd0f54","defaultValue":null,"value":"\u003cdiv\u003e\u00a9\u00a02014\u003c\/div\u003e","backupValue":null,"version":null},"components":{"copyright":{"type":"RichText","id":"f_06443c13-7f62-450d-85b2-da111b90c536","defaultValue":null,"value":"\u003cdiv\u003e\u00a9\u00a02014\u003c\/div\u003e","backupValue":null,"version":null},"socialMedia":{"type":"SocialMediaList","id":"f_1f1fd053-deaa-44a5-84ad-dd9a0fa1d8a3","defaultValue":null,"link_list":[{"type":"Facebook","id":"f_04d1015d-f6a4-4bb0-8685-02648b7a0308","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":null,"app_id":138736959550286},{"type":"Twitter","id":"f_3fa9ce02-aeca-476f-8361-ecc93a2ab544","defaultValue":null,"url":"","link_url":"","share_text":"Saw an awesome one pager. Check it out #strikingly","show_button":null},{"type":"GPlus","id":"f_7d8af652-1f0f-4a04-8306-1b25bae5740d","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":null}],"button_list":[{"type":"Facebook","id":"f_d6eb4361-dae0-4037-b0e9-582383bbcfba","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false,"app_id":138736959550286},{"type":"Twitter","id":"f_57df9480-db0e-4616-bee8-2e5c011cd4bf","defaultValue":null,"url":"","link_url":"","share_text":"Saw an awesome one pager. Check it out @strikingly","show_button":false},{"type":"GPlus","id":"f_c05e275e-6b8b-4493-a2d8-e54b064c80b0","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false},{"type":"LinkedIn","id":"f_89ea920e-4de1-45af-a3c1-ad7ba9fcbaba","defaultValue":null,"url":"","link_url":"","share_text":"Data Cowboys, LLC is a Data Science \u0026 Machine Learning consultancy staffed by experienced PhDs, working out of Seattle and San Francisco.","show_button":false}],"list_type":null}},"layout_variation":null,"padding":{}},"submenu":{"type":"SubMenu","id":"f_84887124-c210-4990-b7e9-912d37c514d0","defaultValue":null,"list":[],"components":{"link":{"type":"Button","id":"f_3638681d-7e19-45fc-9e8b-f154ef131daa","defaultValue":null,"text":"Facebook","link_type":null,"page_id":null,"section_id":null,"url":"http:\/\/www.facebook.com","new_target":true}}},"customColors":{"type":"CustomColors","id":"f_10731d12-f244-40ab-bda4-b83cb62bb89c","defaultValue":null,"active":true,"highlight1":null,"highlight2":null},"animations":{"type":"Animations","id":"f_4b94b139-5785-4649-8e36-2b255ae2318d","defaultValue":null,"page_scroll":"none","background":"parallax","image_link_hover":"none"},"s5Theme":{"type":"Theme","id":"f_1f1611a8-fa62-46cd-8b1c-42254169093d","version":"10","nav":{"type":"NavTheme","id":"f_f37195e2-ea4d-4ef0-b6f2-035854051a76","name":"topBar","layout":"a","padding":"medium","sidebarWidth":"small","topContentWidth":"full","horizontalContentAlignment":"left","verticalContentAlignment":"top","fontSize":"medium","backgroundColor1":"#dddddd","highlightColor":null,"presetColorName":"transparent","itemSpacing":"compact","dropShadow":"no","socialMediaListType":"link","isTransparent":true,"isSticky":true,"showSocialMedia":false,"highlight":{"type":"underline","textColor":null,"blockTextColor":null,"blockBackgroundColor":null,"blockShape":"pill","id":"f_11112f65-9d14-4512-91fa-a8a7f13b3365"},"border":{"enable":false,"borderColor":"#000","position":"bottom","thickness":"small"},"socialMedia":[],"socialMediaButtonList":[{"type":"Facebook","id":"d5d40f68-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"Twitter","id":"d5d40f69-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"LinkedIn","id":"d5d40f6a-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false},{"type":"Pinterest","id":"d5d40f6b-9e33-11ef-955f-15ccbf3d509c","url":"","link_url":"","share_text":"","show_button":false}],"socialMediaContactList":[{"type":"SocialMediaPhone","id":"d5d40f6e-9e33-11ef-955f-15ccbf3d509c","defaultValue":"","className":"fas fa-phone-alt"},{"type":"SocialMediaEmail","id":"d5d40f6f-9e33-11ef-955f-15ccbf3d509c","defaultValue":"","className":"fas fa-envelope"}]},"section":{"type":"SectionTheme","id":"f_dca510a1-da18-494e-80e8-68691a8754ca","padding":"normal","contentWidth":"full","contentAlignment":"center","baseFontSize":null,"titleFontSize":null,"subtitleFontSize":null,"itemTitleFontSize":null,"itemSubtitleFontSize":null,"textHighlightColor":null,"baseColor":null,"titleColor":null,"subtitleColor":null,"itemTitleColor":null,"itemSubtitleColor":null,"textHighlightSelection":{"type":"TextHighlightSelection","id":"f_09d5d3aa-2b97-44be-a1b9-c7a542c1dc39","title":false,"subtitle":true,"itemTitle":false,"itemSubtitle":true}},"firstSection":{"type":"FirstSectionTheme","id":"f_d7d2158c-1fb7-4f6a-b207-d4025ba4e365","height":"normal","shape":"none"},"button":{"type":"ButtonTheme","id":"f_d24e411e-decf-43e2-ab4a-4cbb15e28ec3","backgroundColor":"#000000","shape":"square","fill":"solid"}},"navigation":{"items":[{"type":"page","id":"77c9e0f9-c8df-4bef-b786-4638f0aaed73","visibility":true},{"id":"05ddcb0c-fc84-4b7e-b7df-5ef959b95299","type":"page","visibility":true},{"id":"5dd84800-5ae5-4a19-9c36-3807551ed974","type":"page","visibility":true},{"id":"64443964-9faf-4e1b-b442-999a1cfacf48","type":"page","visibility":true}],"links":[]}}};$S.siteData={"terms_text":null,"privacy_policy_text":null,"show_terms_and_conditions":false,"show_privacy_policy":false,"gdpr_html":null,"live_chat":false};$S.stores={"fonts_v2":[{"name":"bebas neue","fontType":"hosted","displayName":"Bebas Neue","cssValue":"\"bebas neue\", bebas, helvetica","settings":null,"hidden":false,"cssFallback":"sans-serif","disableBody":true,"isSuggested":true},{"name":"varela round","fontType":"google","displayName":"Varela Round","cssValue":"\"varela round\"","settings":{"weight":"regular"},"hidden":false,"cssFallback":"sans-serif","disableBody":false,"isSuggested":false},{"name":"work sans","fontType":"google","displayName":"Work Sans","cssValue":"work sans, helvetica","settings":{"weight":"400,600,700"},"hidden":false,"cssFallback":"sans-serif","disableBody":null,"isSuggested":true},{"name":"helvetica","fontType":"system","displayName":"Helvetica","cssValue":"helvetica, arial","settings":null,"hidden":false,"cssFallback":"sans-serif","disableBody":false,"isSuggested":false}],"features":{"allFeatures":[{"name":"ecommerce_shipping_region","canBeUsed":true,"hidden":false},{"name":"ecommerce_taxes","canBeUsed":true,"hidden":false},{"name":"ecommerce_category","canBeUsed":true,"hidden":false},{"name":"product_page","canBeUsed":true,"hidden":false},{"name":"ecommerce_free_shipping","canBeUsed":true,"hidden":false},{"name":"ecommerce_custom_product_url","canBeUsed":true,"hidden":false},{"name":"ecommerce_coupon","canBeUsed":true,"hidden":false},{"name":"ecommerce_checkout_form","canBeUsed":true,"hidden":false},{"name":"mobile_actions","canBeUsed":true,"hidden":false},{"name":"ecommerce_layout","canBeUsed":true,"hidden":false},{"name":"portfolio_layout","canBeUsed":true,"hidden":false},{"name":"analytics","canBeUsed":true,"hidden":false},{"name":"fb_image","canBeUsed":true,"hidden":false},{"name":"twitter_card","canBeUsed":true,"hidden":false},{"name":"favicon","canBeUsed":true,"hidden":false},{"name":"style_panel","canBeUsed":true,"hidden":false},{"name":"google_analytics","canBeUsed":true,"hidden":false},{"name":"blog_custom_url","canBeUsed":true,"hidden":false},{"name":"page_collaboration","canBeUsed":true,"hidden":false},{"name":"bookings","canBeUsed":true,"hidden":false},{"name":"membership","canBeUsed":true,"hidden":false},{"name":"social_feed_facebook_page","canBeUsed":true,"hidden":false},{"name":"premium_templates","canBeUsed":true,"hidden":false},{"name":"custom_domain","canBeUsed":true,"hidden":false},{"name":"premium_support","canBeUsed":true,"hidden":false},{"name":"remove_branding_title","canBeUsed":true,"hidden":false},{"name":"full_analytics","canBeUsed":true,"hidden":false},{"name":"ecommerce_layout","canBeUsed":true,"hidden":false},{"name":"portfolio_layout","canBeUsed":true,"hidden":false},{"name":"ecommerce_digital_download","canBeUsed":true,"hidden":false},{"name":"password_protection","canBeUsed":true,"hidden":false},{"name":"remove_logo","canBeUsed":true,"hidden":false},{"name":"optimizely","canBeUsed":true,"hidden":false},{"name":"custom_code","canBeUsed":true,"hidden":false},{"name":"blog_custom_code","canBeUsed":true,"hidden":false},{"name":"premium_assets","canBeUsed":true,"hidden":false},{"name":"premium_apps","canBeUsed":true,"hidden":false},{"name":"premium_sections","canBeUsed":true,"hidden":false},{"name":"blog_mailchimp_integration","canBeUsed":true,"hidden":false},{"name":"multiple_page","canBeUsed":true,"hidden":false},{"name":"ecommerce_layout","canBeUsed":true,"hidden":false},{"name":"portfolio_layout","canBeUsed":true,"hidden":false},{"name":"facebook_pixel","canBeUsed":true,"hidden":false},{"name":"blog_category","canBeUsed":true,"hidden":false},{"name":"custom_font","canBeUsed":true,"hidden":false},{"name":"blog_post_amp","canBeUsed":true,"hidden":false},{"name":"site_search","canBeUsed":true,"hidden":false},{"name":"portfolio_category","canBeUsed":true,"hidden":false},{"name":"popup","canBeUsed":true,"hidden":false},{"name":"custom_form","canBeUsed":true,"hidden":false},{"name":"portfolio_custom_product_url","canBeUsed":true,"hidden":false},{"name":"email_automation","canBeUsed":true,"hidden":false},{"name":"blog_password_protection","canBeUsed":true,"hidden":false},{"name":"custom_ads","canBeUsed":true,"hidden":false},{"name":"portfolio_form_custom_fields","canBeUsed":true,"hidden":false},{"name":"live_chat","canBeUsed":false,"hidden":false},{"name":"auto_translation","canBeUsed":false,"hidden":false},{"name":"membership_tier","canBeUsed":false,"hidden":false},{"name":"redirect_options","canBeUsed":false,"hidden":false},{"name":"portfolio_region_options","canBeUsed":false,"hidden":false},{"name":"require_contact_info_view_portfolio","canBeUsed":false,"hidden":false},{"name":"ecommerce_product_add_on_categories","canBeUsed":false,"hidden":false}]},"showStatic":{"footerLogoSeoData":{"anchor_link":"https:\/\/www.strikingly.com\/?ref=logo\u0026permalink=data-cowboys\u0026custom_domain=www.data-cowboys.com\u0026utm_campaign=footer_pbs\u0026utm_content=https%3A%2F%2Fwww.data-cowboys.com%2F\u0026utm_medium=user_page\u0026utm_source=174108\u0026utm_term=pbs_b","anchor_text":"How to build a website"},"isEditMode":false},"ecommerceProductCollection":null,"ecommerceProductOrderList":{},"ecommerceCategoryCollection":null,"hasEcommerceProducts":false,"portfolioCategoryCollection":null,"hasPortfolioProducts":false,"blogCategoryCollection":{},"hasBlogs":true};$S.liveBlog=true;
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

, '
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.

], ['\\(', '\\)'] ], processEscapes: true } }); MathJax.Hub.Typeset() }])
Return to site

SHAP Values and Feature Variance

Sergey Feldman

Intepretability is a Good Idea

My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable/interpretable machine learning become a more popular research direction in 2016/2017, the end-product of my ML analyses often looked like this:

broken image

In other words, I thought demonstrating the success & importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:

broken image

This image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature/covariate input to a machine learning model, and each dot is a data point (sample). The x-axis is the SHAP value: how important a feature is for a particular sample in the model. Color is the original value of the feature. It takes some staring at this plot to fully internalize everything it's telling you, but showing it alongside standard ML results has been a great way to engage collaborators in other disciplines.

By the way, SHAP values are not the only way to interpret ML models, but they just happen to be one I like, and the SHAP library is simply excellent.

Reasons for Small SHAP Values

When looking at the SHAP value plots, what might be some reasons that certain variables/features are less important than others? If you had asked me this question a month ago, here is the list I would have given you:

  • The variable is measured in a noisy way.
  • The variable is not that causally related to the outcome of interest.
  • The variable is highly correlated with another variable that happens to be more predictive of the outcome.

Some of these are not that different from one another. For example, if a variable is noisy, then it will certainly look less related to the outcome of interest. The point isn't to have a theoretically iron-clad list of reasons, but to give the collaborators some idea of how ML models work.

Recently I chatted with a friend who does economics for a living, and he suggested another important reason that wasn't on my list: the variable has low variance. This wasn't immediately obvious to me, so I ran some simulations to gain intuition. My friend was right, and so I thought it would be a good idea to share the finding in case others have the same blind spot that I did.

Experimental Setup

Here's some Python code to generate the data:

The input is 4-dimensional, and we're only changing the variance of the 4th Bernoulli dimension. Bernoulli random variables are always 0 or 1, so all we are doing is changing the proportion of 1s and 0s. Tweaking the $p$ input doesn't at all change the ground-truth mapping between input $x$ and output $y$. Intuitively, I would have expected that SHAP values variable importance is not affected when changing $p$. Let's see what happens in practice with a simulation study:

broken image
broken image
broken image
broken image
broken image

You can see that when the variance is smallest ($p = 0.1$ and $p = 0.9$), the Bernoulli feature is at its lowest ranking. When the variance is largest ($p = 0.5$), the feature is at its highest ranking. Here is the how the sum of the absolute SHAP values looks when you plot it against $p$:

broken image

Exactly as described above: the overall importance is proportional to the variance. So we've learned that SHAP values are affected by the variance of the input feature.

Here all four variables are stable regardless of $p$. Range of SHAP values is also good to consider when thinking about a variable's overall interestingness. That is, a variable might not be predictive overall because it has low variance in our dataset, but it might very predictive for a small subset of the sample population.

Some Intuition

To gain some intuition about why this happens, let's think about a variable that is actually very causal for outcome $y$ but happens to be completely constant in our dataset. If $y$ is being able to survive and the Bernoulli feature $x_4$ is access to drinking water, then clearly $x_4$ is directly causal of $y$, but most human health datasets have $x_4 = 1$, and so the learned ML model would make no use of this variable; the SHAP values would all be zero. But imagine a dataset of 100k people where exactly one person has $x_4 = 1$ and the rest have $x_4 = 0$. Overall, there is little relationship in the data between $x_4$ and the outcome, and a ML model might not even notice this extremely rare sample as being predictive. The variance went from 0 to a little bit, and the SHAP values maybe went up a bit. The higher the variance rises, the more likely the model is to rely on this variable for making its decisions, and the larger the SHAP values will be in total.