• PAST AND ONGOING WORK

    broken image

    Deep Neural Networks for Natural Language Processing

    Sergey works part-time as a senior applied research scientist at AI2, on the Semantic Scholar research team. He's worked on many different projects, including:

    • A paper about gender bias in clinical trial recruitment published in JAMA Network Open, along with news coverage.
    • A complete overhaul of the Semantic Scholar author disambiguation system, described in a published paper and a blog post. Also, see the open-sourced code & data
    • Two published methods for high quality academic paper embeddings: Citeomatic (code) and SPECTER (code).
    • Improving the Semantic Scholar search engine, described in a detailed blog post. Code is available as well.
    • A blog post and paper about the association between posting your papers on ArXiV before review and subsequent citations.

     

    This work is ongoing since March, 2016.
    broken image

    Machine Learning Strategy Consulting

    The Healthy Birth, Growth, and Development (HBGD) program was launched in 2013 by the Bill & Melinda Gates Foundation.

     

    The Knowledge Integration (Ki) initiative aims facilitates collaboration between researchers, quantitative experts, and policy makers in fields related to HBGD. The broad goal is to aggregate data from past longitudinal studies about pathways and risk factors that affect birth, growth, and neurocognitive development in order to better predict Ki outcomes.

     

    Sergey works closely with Ki leadership - designing and overseeing data science contests; managing external collaborations with academic research labs and software companies; and modeling many diverse global health datasets (an example is described here).

     

    This work is ongoing since February, 2015.

    broken image

    Open Source Contributions

    For: Everyone

    We contribute to the Python data science ecosystem.

     

    Most notably, Sergey co-wrote and maintains the imputation package fancyimpute, and merged IterativeImputer into the machine learning uber-library scikit-learn. Some other packages we've worked on:

    • https://github.com/allenai/S2AND/
    • https://github.com/allenai/s2_fos
    • https://github.com/allenai/specter/
    • https://github.com/allenai/scidocs/
    • https://github.com/allenai/s2search/
    • https://github.com/sergeyf/SmallDataBenchmarks/
    • https://github.com/allenai/citeomatic/
    broken image

    Improving Reading Comprehension

     

    Actively Learn makes a reading tool that enables teachers to guide, monitor, and improve student learning. With our help, they wrote and were awarded an NSF SBIR grant to answer the key question: "How can we personalize reading instruction so as to increase comprehension & learning?" We are diving deep into the data with sophisticated machine learning tools, and bringing back testable hypotheses about what helps and hinders students.

     

    This work is ongoing since April, 2014.

    broken image

    Contributing to Technical Books

     

    Jenny Dearborn, Chief Learning Officer and Senior Vice President at SAP, has written Data Driven, a "practical guide to increasing sales success, using the power of data analytics," and The Data Driven Leader (with David Swanson), "a clear, accessible guide to solving important leadership challenges through human resources-focused and other data analytics."

     

    We helped her and her team come up with clear and compelling ways to communicate the deep mathematical models that are at the core of the book, as well as contributed to the plot and characterizations.

    broken image

    Pro Bono Data Science

     

    Seattle Against Slavery mobilizes the community in the fight against labor and sex trafficking through education, advocacy, and collaboration with local and national partners. We are proud to provide them with analytics and statistics services on a volunteer basis.

    broken image

    Multiple Projects

     

    Long Tail NLP-Based Recommendations. Most e-commerce recommendation engines have difficulty highlighting less frequently bought products, which is an issue that compounds itself and ends up recommending the same popular products over and over. We developed a language-based model for RichRelevance that identifies good recommendations based on comparisons of the product descriptions and description metadata rather than purchase data. This evens the playing field between newer products and the old standbys, so the recommendations have more variety and are generally more applicable.

     

    Bayesian A/B Testing. RichRelevance swears by their top-notch recommendations. But what's the right way to measure their efficacy? Sergey put together an intuitive, comprehensive Bayesian A/B testing system that works for any KPI, and can provide direct answers to key customer questions like "What is the probability that algorithm A has at least 5% lift over algorithm B?

     

    Read all about this work in Sergey's three (archived) blog posts: [1], [2], and [3].

     

    Bandits for Online Recommendations. The most important piece of RichRelevance's impressive big data pipeline is their core recommendation system. It serves thousands of recommendations every minute, and it has to learn quickly from new data. Working with their analytics team, Sergey engineered a modern bandit-based approach to online recommendations that learns from less data, adapts easily to any optimization metric, and does not compromise quality at production-scale.

     

    Three (now archived) blog posts describe the results of our research: [1], [2], and [3].

  • PARTNERSHIPS

    broken image

    Preva Group

    Preva Group is dedicated to helping organizations achieve large scale social change by combining existing structured and unstructured data, powered by sophisticated analytics and machine learning, delivered through simple user-centered interfaces. Data Cowboys operates a strategic employee sharing partnership with Preva Group.

  • Publications

    Journal Papers

    • Michael Cafarella, Michael Anderson, Iz Beltagy, Arie Cattan, Sarah Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Lu Wang, Yuning Wang, Yitong Wang, Daniel S. Weld, Jenny Vo-Phamhi, Anna Zeng, and Jiayun Zou, "Infrastructure for rapid open knowledge network development", AI Magazine 43: 59–68, 2022. [pdf]
    • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan, "ABNIRML: Analyzing the Behavior of Neural IR Models," Transactions of the Association for Computational Linguistics, 2022. [pdf]
    • Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, and Oren Etzioni, "Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction," JAMA Network Open, 2019. [link]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Revisiting Stein's Paradox: Multi-Task Averaging," Journal of Machine Learning Research, 2014. [link]
    • Eric K. Garcia, Sergey Feldman, Maya R. Gupta, and Santosh Srivastava, "Completely Lazy Learning," IEEE Trans. on Knowledge and Data Engineering, 2010. [pdf] [code + data]
    • Vagisha Sharma, Jimmy K. Eng, Sergey Feldman, Priska von Haller, Michael J. MacCoss, and William S. Noble, "Precursor Charge State Prediction for Electron Transfer Dissociation Tandem Mass Spectra," Journal of Proteome Research, 2010. [pdf]

    Conference Papers

    • Shivashankar Subramanian, Daniel King, Doug Downey, and Sergey Feldman, "S2AND: A Benchmark and Evaluation System for Author Name Disambiguation," JCDL, 2021. [pdf] [code] [blog]
    • Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier, "Overview of the TREC 2020 Fair Ranking Track," TREC 2020. [pdf]
    • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian, "Simplified Data Wrangling with ir_datasets," SIGIR, 2021. [pdf] [code]
    • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld, "SPECTER: Document-level Representation Learning using Citation-informed Transformers,"  ACL, 2020. [link] [code] [data]
    • Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar, "Content-Based Citation Recommendation," NAACL-HLT, 2018. [pdf
    • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni, "Construction of the Literature Graph in Semantic Scholar," NAACL-HLT, 2018. [pdf]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Multi-Task Averaging," NIPS, 2012. [pdf]
    • Luca Cazzanti, Sergey Feldman, Maya R. Gupta, and Michael Gabbay, "Multi-Task Regularization of Generative Similarity Models," Lecture Notes in Computer Science, 2011. [pdf]
    • Sergey Feldman, Marius A. Marin, Mari Ostendorf, and Maya R. Gupta, "Part-of-Speech Histogram Features for Genre Classification of Text," IEEE ICASSP, 2009. [pdf]
    • Marius A. Marin, Sergey Feldman, Mari Ostendorf, and Maya R. Gupta, "Filtering Web Text to Match Target Genres," IEEE ICASSP, 2009. [pdf]
    • Sergey Feldman, Marius A. Marin, Julie Medero, and Mari Ostendorf, "Classifying Factored Genres with Part-of-Speech Histograms," NAACL-HLT, 2009. [pdf]

     

    Theses & Technical Reports

    • Sergey Feldman, Kyle Lo, and Waleed Ammar, "Citation Count Analysis for Papers with Preprints," 2018. [pdf]
    • Sergey Feldman, "Multi-Task Averaging: Theory and Practice," University of Washington PhD Thesis, 2012. [pdf]
    • Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta, "Filtering Tandem Mass Spectra for Quality," University of Washington Dept. of Electrical Engineering Technical Report UWEETR-2012-0001, 2012. [pdf] [code + data]
, '
  • PAST AND ONGOING WORK

    broken image

    Deep Neural Networks for Natural Language Processing

    Sergey works part-time as a senior applied research scientist at AI2, on the Semantic Scholar research team. He's worked on many different projects, including:

    • A paper about gender bias in clinical trial recruitment published in JAMA Network Open, along with news coverage.
    • A complete overhaul of the Semantic Scholar author disambiguation system, described in a published paper and a blog post. Also, see the open-sourced code & data
    • Two published methods for high quality academic paper embeddings: Citeomatic (code) and SPECTER (code).
    • Improving the Semantic Scholar search engine, described in a detailed blog post. Code is available as well.
    • A blog post and paper about the association between posting your papers on ArXiV before review and subsequent citations.

     

    This work is ongoing since March, 2016.
    broken image

    Machine Learning Strategy Consulting

    The Healthy Birth, Growth, and Development (HBGD) program was launched in 2013 by the Bill & Melinda Gates Foundation.

     

    The Knowledge Integration (Ki) initiative aims facilitates collaboration between researchers, quantitative experts, and policy makers in fields related to HBGD. The broad goal is to aggregate data from past longitudinal studies about pathways and risk factors that affect birth, growth, and neurocognitive development in order to better predict Ki outcomes.

     

    Sergey works closely with Ki leadership - designing and overseeing data science contests; managing external collaborations with academic research labs and software companies; and modeling many diverse global health datasets (an example is described here).

     

    This work is ongoing since February, 2015.

    broken image

    Open Source Contributions

    For: Everyone

    We contribute to the Python data science ecosystem.

     

    Most notably, Sergey co-wrote and maintains the imputation package fancyimpute, and merged IterativeImputer into the machine learning uber-library scikit-learn. Some other packages we've worked on:

    • https://github.com/allenai/S2AND/
    • https://github.com/allenai/s2_fos
    • https://github.com/allenai/specter/
    • https://github.com/allenai/scidocs/
    • https://github.com/allenai/s2search/
    • https://github.com/sergeyf/SmallDataBenchmarks/
    • https://github.com/allenai/citeomatic/
    broken image

    Improving Reading Comprehension

     

    Actively Learn makes a reading tool that enables teachers to guide, monitor, and improve student learning. With our help, they wrote and were awarded an NSF SBIR grant to answer the key question: "How can we personalize reading instruction so as to increase comprehension & learning?" We are diving deep into the data with sophisticated machine learning tools, and bringing back testable hypotheses about what helps and hinders students.

     

    This work is ongoing since April, 2014.

    broken image

    Contributing to Technical Books

     

    Jenny Dearborn, Chief Learning Officer and Senior Vice President at SAP, has written Data Driven, a "practical guide to increasing sales success, using the power of data analytics," and The Data Driven Leader (with David Swanson), "a clear, accessible guide to solving important leadership challenges through human resources-focused and other data analytics."

     

    We helped her and her team come up with clear and compelling ways to communicate the deep mathematical models that are at the core of the book, as well as contributed to the plot and characterizations.

    broken image

    Pro Bono Data Science

     

    Seattle Against Slavery mobilizes the community in the fight against labor and sex trafficking through education, advocacy, and collaboration with local and national partners. We are proud to provide them with analytics and statistics services on a volunteer basis.

    broken image

    Multiple Projects

     

    Long Tail NLP-Based Recommendations. Most e-commerce recommendation engines have difficulty highlighting less frequently bought products, which is an issue that compounds itself and ends up recommending the same popular products over and over. We developed a language-based model for RichRelevance that identifies good recommendations based on comparisons of the product descriptions and description metadata rather than purchase data. This evens the playing field between newer products and the old standbys, so the recommendations have more variety and are generally more applicable.

     

    Bayesian A/B Testing. RichRelevance swears by their top-notch recommendations. But what's the right way to measure their efficacy? Sergey put together an intuitive, comprehensive Bayesian A/B testing system that works for any KPI, and can provide direct answers to key customer questions like "What is the probability that algorithm A has at least 5% lift over algorithm B?

     

    Read all about this work in Sergey's three (archived) blog posts: [1], [2], and [3].

     

    Bandits for Online Recommendations. The most important piece of RichRelevance's impressive big data pipeline is their core recommendation system. It serves thousands of recommendations every minute, and it has to learn quickly from new data. Working with their analytics team, Sergey engineered a modern bandit-based approach to online recommendations that learns from less data, adapts easily to any optimization metric, and does not compromise quality at production-scale.

     

    Three (now archived) blog posts describe the results of our research: [1], [2], and [3].

  • PARTNERSHIPS

    broken image

    Preva Group

    Preva Group is dedicated to helping organizations achieve large scale social change by combining existing structured and unstructured data, powered by sophisticated analytics and machine learning, delivered through simple user-centered interfaces. Data Cowboys operates a strategic employee sharing partnership with Preva Group.

  • Publications

    Journal Papers

    • Michael Cafarella, Michael Anderson, Iz Beltagy, Arie Cattan, Sarah Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Lu Wang, Yuning Wang, Yitong Wang, Daniel S. Weld, Jenny Vo-Phamhi, Anna Zeng, and Jiayun Zou, "Infrastructure for rapid open knowledge network development", AI Magazine 43: 59–68, 2022. [pdf]
    • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan, "ABNIRML: Analyzing the Behavior of Neural IR Models," Transactions of the Association for Computational Linguistics, 2022. [pdf]
    • Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, and Oren Etzioni, "Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction," JAMA Network Open, 2019. [link]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Revisiting Stein's Paradox: Multi-Task Averaging," Journal of Machine Learning Research, 2014. [link]
    • Eric K. Garcia, Sergey Feldman, Maya R. Gupta, and Santosh Srivastava, "Completely Lazy Learning," IEEE Trans. on Knowledge and Data Engineering, 2010. [pdf] [code + data]
    • Vagisha Sharma, Jimmy K. Eng, Sergey Feldman, Priska von Haller, Michael J. MacCoss, and William S. Noble, "Precursor Charge State Prediction for Electron Transfer Dissociation Tandem Mass Spectra," Journal of Proteome Research, 2010. [pdf]

    Conference Papers

    • Shivashankar Subramanian, Daniel King, Doug Downey, and Sergey Feldman, "S2AND: A Benchmark and Evaluation System for Author Name Disambiguation," JCDL, 2021. [pdf] [code] [blog]
    • Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier, "Overview of the TREC 2020 Fair Ranking Track," TREC 2020. [pdf]
    • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian, "Simplified Data Wrangling with ir_datasets," SIGIR, 2021. [pdf] [code]
    • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld, "SPECTER: Document-level Representation Learning using Citation-informed Transformers,"  ACL, 2020. [link] [code] [data]
    • Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar, "Content-Based Citation Recommendation," NAACL-HLT, 2018. [pdf
    • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni, "Construction of the Literature Graph in Semantic Scholar," NAACL-HLT, 2018. [pdf]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Multi-Task Averaging," NIPS, 2012. [pdf]
    • Luca Cazzanti, Sergey Feldman, Maya R. Gupta, and Michael Gabbay, "Multi-Task Regularization of Generative Similarity Models," Lecture Notes in Computer Science, 2011. [pdf]
    • Sergey Feldman, Marius A. Marin, Mari Ostendorf, and Maya R. Gupta, "Part-of-Speech Histogram Features for Genre Classification of Text," IEEE ICASSP, 2009. [pdf]
    • Marius A. Marin, Sergey Feldman, Mari Ostendorf, and Maya R. Gupta, "Filtering Web Text to Match Target Genres," IEEE ICASSP, 2009. [pdf]
    • Sergey Feldman, Marius A. Marin, Julie Medero, and Mari Ostendorf, "Classifying Factored Genres with Part-of-Speech Histograms," NAACL-HLT, 2009. [pdf]

     

    Theses & Technical Reports

    • Sergey Feldman, Kyle Lo, and Waleed Ammar, "Citation Count Analysis for Papers with Preprints," 2018. [pdf]
    • Sergey Feldman, "Multi-Task Averaging: Theory and Practice," University of Washington PhD Thesis, 2012. [pdf]
    • Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta, "Filtering Tandem Mass Spectra for Quality," University of Washington Dept. of Electrical Engineering Technical Report UWEETR-2012-0001, 2012. [pdf] [code + data]
],\n ['\\\\(', '\\\\)']\n ],\n processEscapes: true\n }\n });\n\n MathJax.Hub.Typeset()\n\n }])\n\u003c\/script\u003e","hasSubscriptionCodeBefore":false,"hasSubscriptionCode":false,"showAmp":true,"showMorePostsWith":"popup","usedDisqusCommentsBefore":false,"showRss":true,"enableComments":true,"footerCustomCode":"","showSubscriptionForm":true,"hideNewBlogTips":true,"mailchimpCode":""},"blogPosts":[{"id":9966135,"state":"published","settings":{"hideBlogDate":null,"editSessionUuid":null,"metaDescription":"Explore best practices in machine learning for small datasets with a study on fitting models to data involving 100-1000 samples, as seen from 108 datasets. Discover which ML classifiers perform best, from AutoGluon to SVC, and the implications for real-world applications. Full details and code at: https:\/\/github.com\/sergeyf\/SmallDataBenchmarks"},"title":"Which Machine Learning Classifiers are Best for Small Datasets?","icon":{"type":"Blog.BackgroundImage","id":"f_360f7760-cdf4-4951-84ed-fd4eb91f332c","defaultValue":false,"url":"!","textColor":"overlay","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/stacked_kdes_jk7z4r","storage":"s","format":"png","h":879,"w":1715,"s":24845,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"headerImage":{"type":"Blog.BackgroundImage","id":"f_360f7760-cdf4-4951-84ed-fd4eb91f332c","defaultValue":false,"url":"!","textColor":"overlay","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/stacked_kdes_jk7z4r","storage":"s","format":"png","h":879,"w":1715,"s":24845,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"firstContentImage":{"type":"Image","id":"f_c836358b-74a5-4aa7-8d15-44f7cadb3e73","defaultValue":null,"linkUrl":"","thumbUrl":"!","url":"!","caption":"","description":"","storageKey":"174108\/405478_658788","storage":"s","storagePrefix":null,"format":"png","h":473,"w":1200,"s":48911,"newTarget":true,"noCompression":null,"cropMode":null,"focus":{}},"publishedAt":"2021-01-04T09:47:09.112-08:00","updatedAt":"2023-11-13T09:39:44.475-08:00","createdAt":"2021-01-04T07:37:15.506-08:00","publicUrl":"https:\/\/www.data-cowboys.com\/blog\/which-machine-learning-classifiers-are-best-for-small-datasets","relativeUrl":"\/blog\/which-machine-learning-classifiers-are-best-for-small-datasets","pinned":false,"allTagsList":[],"postedToWechat":false,"longBlurb":"\\nAlthough \"big data\" and \"deep learning\" are dominant, my own work at the Gates Foundation involves a lot of small (but expensive) datasets, where the number of rows (subjects, samples) is between 100 and 1000. For example, detailed measurements throughout a pregnancy and subsequent neonatal outcomes from pregnant women. A lot of my collaborative investigations involve fitting machine learning models to small datasets like these, and it's not clear what best practices are in this case.\\n \\nAlong with my own experience, there is some informal wisdom floating around the ML community. Folk wisdom makes me wary and I wanted to do something more systematic. I took the following approach:\\n Get a lot of small classification benchmark datasets. I used a subset of\u00a0\\n \\nthis prepackaged repo. The final total was 108 datasets. (To do: also run regression benchmarks using this nice dataset library.)\\n Select some reasonably representative ML classifiers: linear SVM, Logistic Regression,...","blurb":"Although \"big data\" and \"deep learning\" are dominant, my own work at the Gates Foundation involves a lot of small (but expensive) datasets, where the number of rows (subjects, samples) is between 100 and 1000. For example, detailed measurements throughout a pregnancy and subsequent neonatal...","pendingCommentsCount":0,"approvedCommentsCount":9},{"id":8538295,"state":"published","settings":{"hideBlogDate":null},"title":"Underspecification in Machine Learning","icon":{"type":"Blog.BackgroundImage","id":"f_2c02de0a-c0b3-4ad3-a8cd-cf4d90bcbdc1","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"center","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/website_background_4_yxa6qx","storage":"s","format":"png","h":962,"w":1838,"s":363978,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"headerImage":{"type":"Blog.BackgroundImage","id":"f_2c02de0a-c0b3-4ad3-a8cd-cf4d90bcbdc1","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"center","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/website_background_4_yxa6qx","storage":"s","format":"png","h":962,"w":1838,"s":363978,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"firstContentImage":null,"publishedAt":"2020-11-15T20:00:34.940-08:00","updatedAt":"2020-11-15T20:06:34.971-08:00","createdAt":"2020-11-15T19:17:03.804-08:00","publicUrl":"https:\/\/www.data-cowboys.com\/blog\/underspecification-in-machine-learning","relativeUrl":"\/blog\/underspecification-in-machine-learning","pinned":false,"allTagsList":[],"postedToWechat":false,"longBlurb":"\\nNote: this blog post is an expanded version of my recent Twitter thread.\\n\u00a0\\nA paper was posted to arXiv this November that gives name to a phenomenon that I've had plenty of experience with, but never had a word for. From the abstract: \"An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains.\"\\n \\nThe last time I ran into underspecification was while working on the Semantic Scholar search...","blurb":"Note: this blog post is an expanded version of my recent Twitter thread.\u00a0A paper was posted to arXiv this November that gives name to a phenomenon that I've had plenty of experience with, but never had a word for. From the abstract: \"An ML pipeline is underspecified when it can return many...","pendingCommentsCount":0,"approvedCommentsCount":0},{"id":8260188,"state":"published","settings":{"hideBlogDate":null},"title":"Building a Better Search Engine for the Allen Institute for Artificial Intelligence \u00a0","icon":{"type":"Blog.BackgroundImage","id":"f_a6eba5e1-f346-407c-823e-ac1289f7120f","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/contour2_bhfkwz","storage":"s","format":"png","h":983,"w":2048,"s":83913,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"headerImage":{"type":"Blog.BackgroundImage","id":"f_a6eba5e1-f346-407c-823e-ac1289f7120f","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/contour2_bhfkwz","storage":"s","format":"png","h":983,"w":2048,"s":83913,"useImage":true,"noCompression":null,"focus":null,"linkInputEnabled":null,"descriptionInputEnabled":null},"firstContentImage":{"type":"Image","id":"f_c41f45e1-a61f-4b20-9109-8e96f2f1abb3","defaultValue":null,"linkUrl":"","thumbUrl":"!","url":"!","caption":"","description":"","storageKey":"174108\/388771_620511","storage":"s","storagePrefix":null,"format":"png","h":533,"w":700,"s":162090,"newTarget":true,"noCompression":null,"cropMode":null,"focus":{}},"publishedAt":"2020-10-25T19:57:29.327-07:00","updatedAt":"2020-11-15T20:03:09.753-08:00","createdAt":"2020-10-25T19:26:39.468-07:00","publicUrl":"https:\/\/www.data-cowboys.com\/blog\/building-a-better-search-engine-for-the-allen-institute-for-artificial","relativeUrl":"\/blog\/building-a-better-search-engine-for-the-allen-institute-for-artificial","pinned":false,"allTagsList":[],"postedToWechat":false,"longBlurb":"\\nNote: this blog post first appeared elsewhere and is reproduced here in a slightly altered format.\\n \\n2020 is the year of search for Semantic Scholar (S2), a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. One of S2's biggest endeavors this year is to improve the relevance of our search engine, and my mission was to figure out how to use about three years of search log data to build a better search ranker.\\n \\nWe now have a search engine that provides more relevant results to users, but at the outset I underestimated the complexity of getting machine learning to work well for search. \u201cNo problem,\u201d I thought to myself, \u201cI can just do the following and succeed thoroughly in 3 weeks\u201d:Get all of the search logs.Do some feature engineering.Train, validate, and test a great machine learning model.Deploy.\\n \\nAlthough this is what seems to be established practice in the search engine literature, many of the experiences and insights from the...","blurb":"Note: this blog post first appeared elsewhere and is reproduced here in a slightly altered format. 2020 is the year of search for Semantic Scholar (S2), a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. One of S2's biggest endeavors this year is to...","pendingCommentsCount":0,"approvedCommentsCount":0},{"id":2315692,"state":"published","settings":{"hideBlogDate":null},"title":"SHAP Values and Feature Variance","icon":{"type":"Blog.BackgroundImage","id":"f_547de6ff-5501-47ed-b616-184a6028c2c4","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/countour_kcct6m","storage":"s","format":"png","h":1418,"w":2048,"s":233485,"useImage":true,"noCompression":null,"focus":{},"linkInputEnabled":null,"descriptionInputEnabled":null},"headerImage":{"type":"Blog.BackgroundImage","id":"f_547de6ff-5501-47ed-b616-184a6028c2c4","defaultValue":false,"url":"!","textColor":"light","backgroundVariation":null,"sizing":"cover","userClassName":null,"linkUrl":null,"linkTarget":null,"videoUrl":null,"videoHtml":"","storageKey":"174108\/countour_kcct6m","storage":"s","format":"png","h":1418,"w":2048,"s":233485,"useImage":true,"noCompression":null,"focus":{},"linkInputEnabled":null,"descriptionInputEnabled":null},"firstContentImage":{"type":"Image","id":"f_95776c25-4aa2-4e23-ab26-f273beafb323","defaultValue":null,"linkUrl":"","thumbUrl":"!","url":"!","caption":"","description":"","storageKey":"174108\/969042_41783","storage":"s","storagePrefix":null,"format":"png","h":192,"w":626,"s":44635,"newTarget":true,"noCompression":null,"cropMode":null,"focus":{}},"publishedAt":"2019-10-06T18:38:20.292-07:00","updatedAt":"2019-10-07T16:49:09.974-07:00","createdAt":"2019-10-06T17:30:24.261-07:00","publicUrl":"https:\/\/www.data-cowboys.com\/blog\/shap-values-and-feature-variance","relativeUrl":"\/blog\/shap-values-and-feature-variance","pinned":false,"allTagsList":[],"postedToWechat":false,"longBlurb":"\\nIntepretability is a Good Idea\\n \\nMy machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable\/interpretable machine learning become a more popular research direction in 2016\/2017, the end-product of my ML analyses often looked like this:\\n \\nIn other words, I thought demonstrating the success \u0026 importance of a ML-based analysis was the same as demonstrating methodological validity in an academic publication. This is wrong. My collaborators rarely cared about the results, and forgot them quickly. These days, I still show a table like the one above but I also show a SHAP values plot:\\n \\nThis image is taken directly from the SHAP Github repository. There are plenty of papers and other sources explaining SHAP values in detail, so I won't do that here. Briefly, each row is a feature\/covariate input to a machine learning...","blurb":"Intepretability is a Good Idea My machine learning graduate program was technically excellent, but I had to learn how to (semi-)convincingly communicate with interdisciplinary collaborators the hard way: by failing a lot on the job. Before explainable\/interpretable machine learning become a more...","pendingCommentsCount":0,"approvedCommentsCount":0}],"wechatMpAccountId":null,"pagination":{"blogPosts":{"currentPage":1,"previousPage":null,"nextPage":null,"perPage":20,"totalPages":1,"totalCount":5}}}}},"ecommerceProductCollection":{"data":{"products":[]}},"ecommerceCategoriesProductCollection":null,"portfolioCategoriesProductCollection":null,"portfolioProductCollection":{"data":{"products":[]}},"blogCategoriesPostCollection":null,"ecommerceProductOrderList":{},"ecommerceCategoryCollection":{"data":{"categories":[]}},"portfolioCategoryCollection":{"data":{"categories":[]}},"blogCategoryCollection":{},"eventTypeCategoryCollection":null};$S.blink={"page":{"logo_url":"https:\/\/user-images.strikinglycdn.com\/res\/hrscywv4p\/image\/upload\/c_fill,g_faces:center,h_300,q_90,w_300\/174108\/Data-Cowboys-Logotype-Black-Cowboy-Type-for-web-transparency_square_snk1yr.png","weitie_url":"http:\/\/data-cowboys.weitie.co","description":"Data Cowboys is a data science and machine learning consulting cooperative, owned and run by professional consultants. We excel at using machine learning, AI, data science, and statistics tools to generate custom, practical solutions to complex real-world problems.","name":"Data Cowboys: Machine Learning \u0026 AI Consulting"},"conf":{"WECHAT_APP_ID":"wxd009fb01de1ec8b5"}}; //]]>
  • PAST AND ONGOING WORK

    broken image

    Deep Neural Networks for Natural Language Processing

    Sergey works part-time as a senior applied research scientist at AI2, on the Semantic Scholar research team. He's worked on many different projects, including:

    • A paper about gender bias in clinical trial recruitment published in JAMA Network Open, along with news coverage.
    • A complete overhaul of the Semantic Scholar author disambiguation system, described in a published paper and a blog post. Also, see the open-sourced code & data
    • Two published methods for high quality academic paper embeddings: Citeomatic (code) and SPECTER (code).
    • Improving the Semantic Scholar search engine, described in a detailed blog post. Code is available as well.
    • A blog post and paper about the association between posting your papers on ArXiV before review and subsequent citations.

     

    This work is ongoing since March, 2016.
    broken image

    Machine Learning Strategy Consulting

    The Healthy Birth, Growth, and Development (HBGD) program was launched in 2013 by the Bill & Melinda Gates Foundation.

     

    The Knowledge Integration (Ki) initiative aims facilitates collaboration between researchers, quantitative experts, and policy makers in fields related to HBGD. The broad goal is to aggregate data from past longitudinal studies about pathways and risk factors that affect birth, growth, and neurocognitive development in order to better predict Ki outcomes.

     

    Sergey works closely with Ki leadership - designing and overseeing data science contests; managing external collaborations with academic research labs and software companies; and modeling many diverse global health datasets (an example is described here).

     

    This work is ongoing since February, 2015.

    broken image

    Open Source Contributions

    For: Everyone

    We contribute to the Python data science ecosystem.

     

    Most notably, Sergey co-wrote and maintains the imputation package fancyimpute, and merged IterativeImputer into the machine learning uber-library scikit-learn. Some other packages we've worked on:

    • https://github.com/allenai/S2AND/
    • https://github.com/allenai/s2_fos
    • https://github.com/allenai/specter/
    • https://github.com/allenai/scidocs/
    • https://github.com/allenai/s2search/
    • https://github.com/sergeyf/SmallDataBenchmarks/
    • https://github.com/allenai/citeomatic/
    broken image

    Improving Reading Comprehension

     

    Actively Learn makes a reading tool that enables teachers to guide, monitor, and improve student learning. With our help, they wrote and were awarded an NSF SBIR grant to answer the key question: "How can we personalize reading instruction so as to increase comprehension & learning?" We are diving deep into the data with sophisticated machine learning tools, and bringing back testable hypotheses about what helps and hinders students.

     

    This work is ongoing since April, 2014.

    broken image

    Contributing to Technical Books

     

    Jenny Dearborn, Chief Learning Officer and Senior Vice President at SAP, has written Data Driven, a "practical guide to increasing sales success, using the power of data analytics," and The Data Driven Leader (with David Swanson), "a clear, accessible guide to solving important leadership challenges through human resources-focused and other data analytics."

     

    We helped her and her team come up with clear and compelling ways to communicate the deep mathematical models that are at the core of the book, as well as contributed to the plot and characterizations.

    broken image

    Pro Bono Data Science

     

    Seattle Against Slavery mobilizes the community in the fight against labor and sex trafficking through education, advocacy, and collaboration with local and national partners. We are proud to provide them with analytics and statistics services on a volunteer basis.

    broken image

    Multiple Projects

     

    Long Tail NLP-Based Recommendations. Most e-commerce recommendation engines have difficulty highlighting less frequently bought products, which is an issue that compounds itself and ends up recommending the same popular products over and over. We developed a language-based model for RichRelevance that identifies good recommendations based on comparisons of the product descriptions and description metadata rather than purchase data. This evens the playing field between newer products and the old standbys, so the recommendations have more variety and are generally more applicable.

     

    Bayesian A/B Testing. RichRelevance swears by their top-notch recommendations. But what's the right way to measure their efficacy? Sergey put together an intuitive, comprehensive Bayesian A/B testing system that works for any KPI, and can provide direct answers to key customer questions like "What is the probability that algorithm A has at least 5% lift over algorithm B?

     

    Read all about this work in Sergey's three (archived) blog posts: [1], [2], and [3].

     

    Bandits for Online Recommendations. The most important piece of RichRelevance's impressive big data pipeline is their core recommendation system. It serves thousands of recommendations every minute, and it has to learn quickly from new data. Working with their analytics team, Sergey engineered a modern bandit-based approach to online recommendations that learns from less data, adapts easily to any optimization metric, and does not compromise quality at production-scale.

     

    Three (now archived) blog posts describe the results of our research: [1], [2], and [3].

  • PARTNERSHIPS

    broken image

    Preva Group

    Preva Group is dedicated to helping organizations achieve large scale social change by combining existing structured and unstructured data, powered by sophisticated analytics and machine learning, delivered through simple user-centered interfaces. Data Cowboys operates a strategic employee sharing partnership with Preva Group.

  • Publications

    Journal Papers

    • Michael Cafarella, Michael Anderson, Iz Beltagy, Arie Cattan, Sarah Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Lu Wang, Yuning Wang, Yitong Wang, Daniel S. Weld, Jenny Vo-Phamhi, Anna Zeng, and Jiayun Zou, "Infrastructure for rapid open knowledge network development", AI Magazine 43: 59–68, 2022. [pdf]
    • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan, "ABNIRML: Analyzing the Behavior of Neural IR Models," Transactions of the Association for Computational Linguistics, 2022. [pdf]
    • Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, and Oren Etzioni, "Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction," JAMA Network Open, 2019. [link]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Revisiting Stein's Paradox: Multi-Task Averaging," Journal of Machine Learning Research, 2014. [link]
    • Eric K. Garcia, Sergey Feldman, Maya R. Gupta, and Santosh Srivastava, "Completely Lazy Learning," IEEE Trans. on Knowledge and Data Engineering, 2010. [pdf] [code + data]
    • Vagisha Sharma, Jimmy K. Eng, Sergey Feldman, Priska von Haller, Michael J. MacCoss, and William S. Noble, "Precursor Charge State Prediction for Electron Transfer Dissociation Tandem Mass Spectra," Journal of Proteome Research, 2010. [pdf]

    Conference Papers

    • Shivashankar Subramanian, Daniel King, Doug Downey, and Sergey Feldman, "S2AND: A Benchmark and Evaluation System for Author Name Disambiguation," JCDL, 2021. [pdf] [code] [blog]
    • Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier, "Overview of the TREC 2020 Fair Ranking Track," TREC 2020. [pdf]
    • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian, "Simplified Data Wrangling with ir_datasets," SIGIR, 2021. [pdf] [code]
    • Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld, "SPECTER: Document-level Representation Learning using Citation-informed Transformers,"  ACL, 2020. [link] [code] [data]
    • Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar, "Content-Based Citation Recommendation," NAACL-HLT, 2018. [pdf
    • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni, "Construction of the Literature Graph in Semantic Scholar," NAACL-HLT, 2018. [pdf]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Multi-Task Averaging," NIPS, 2012. [pdf]
    • Luca Cazzanti, Sergey Feldman, Maya R. Gupta, and Michael Gabbay, "Multi-Task Regularization of Generative Similarity Models," Lecture Notes in Computer Science, 2011. [pdf]
    • Sergey Feldman, Marius A. Marin, Mari Ostendorf, and Maya R. Gupta, "Part-of-Speech Histogram Features for Genre Classification of Text," IEEE ICASSP, 2009. [pdf]
    • Marius A. Marin, Sergey Feldman, Mari Ostendorf, and Maya R. Gupta, "Filtering Web Text to Match Target Genres," IEEE ICASSP, 2009. [pdf]
    • Sergey Feldman, Marius A. Marin, Julie Medero, and Mari Ostendorf, "Classifying Factored Genres with Part-of-Speech Histograms," NAACL-HLT, 2009. [pdf]

     

    Theses & Technical Reports

    • Sergey Feldman, Kyle Lo, and Waleed Ammar, "Citation Count Analysis for Papers with Preprints," 2018. [pdf]
    • Sergey Feldman, "Multi-Task Averaging: Theory and Practice," University of Washington PhD Thesis, 2012. [pdf]
    • Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta, "Filtering Tandem Mass Spectra for Quality," University of Washington Dept. of Electrical Engineering Technical Report UWEETR-2012-0001, 2012. [pdf] [code + data]