• PAST AND ONGOING WORK

    Deep Neural Networks for Natural Language Processing

     

    Citeomatic is a deep learning model for the citation prediction NLP task. You enter the title and abstract of an in-progress academic paper, and Citeomatic suggests other papers for you to review and potentially cite. It's specifically designed to learn a robust model that gives meaningful predictions, even when it’s wrong. We contributed to core algorithm design and development.

     

    This work appeared at NAACL '18. Read an intro here, and then try it yourself.

     

    Other results of our collaboration with AI2:

    This work is ongoing since March, 2016.

    Machine Learning Strategy Consulting

     

    The Healthy Birth, Growth, and Development (HBGD) program was launched in 2013 by the Bill & Melinda Gates Foundation.

     

    The Knowledge Integration (Ki) initiative aims facilitates collaboration between researchers, quantitative experts, and policy makers in fields related to HBGD. The broad goal is to aggregate data from past longitudinal studies about pathways and risk factors that affect birth, growth, and neurocognitive development in order to better predict Ki outcomes.

     

    Sergey works closely with Ki leadership - designing and overseeing data science contests; managing external collaborations with academic research labs and software companies; and modeling many diverse global health datasets.

     

    This work is ongoing since February, 2015.

    Improving Reading Comprehension

     

    Actively Learn makes a reading tool that enables teachers to guide, monitor, and improve student learning. With our help, they wrote and were awarded an NSF SBIR grant to answer the key question: "How can we personalize reading instruction so as to increase comprehension & learning?" We are diving deep into the data with sophisticated machine learning tools, and bringing back testable hypotheses about what helps and hinders students.

     

    This work is ongoing since April, 2014.

    Contributing to Technical Books

     

    Jenny Dearborn, Chief Learning Officer and Senior Vice President at SAP, has written Data Driven, a "practical guide to increasing sales success, using the power of data analytics," and The Data Driven Leader (with David Swanson), "a clear, accessible guide to solving important leadership challenges through human resources-focused and other data analytics."

     

    We helped her and her team come up with clear and compelling ways to communicate the deep mathematical models that are at the core of the book, as well as contributed to the plot and characterizations.

    Pro Bono Data Science

     

    Seattle Against Slavery mobilizes the community in the fight against labor and sex trafficking through education, advocacy, and collaboration with local and national partners. We are proud to provide them with analytics and statistics services on a volunteer basis.

    Multiple Projects

     

    Long Tail NLP-Based Recommendations. Most e-commerce recommendation engines have difficulty highlighting less frequently bought products, which is an issue that compounds itself and ends up recommending the same popular products over and over. We developed a language-based model for RichRelevance that identifies good recommendations based on comparisons of the product descriptions and description metadata rather than purchase data. This evens the playing field between newer products and the old standbys, so the recommendations have more variety and are generally more applicable.

     

    Bayesian A/B Testing. RichRelevance swears by their top-notch recommendations. But what's the right way to measure their efficacy? Sergey put together an intuitive, comprehensive Bayesian A/B testing system that works for any KPI, and can provide direct answers to key customer questions like "What is the probability that algorithm A has at least 5% lift over algorithm B?

     

    Read all about this work in Sergey's three (archived) blog posts: [1], [2], and [3].

     

    Bandits for Online Recommendations. The most important piece of RichRelevance's impressive big data pipeline is their core recommendation system. It serves thousands of recommendations every minute, and it has to learn quickly from new data. Working with their analytics team, Sergey engineered a modern bandit-based approach to online recommendations that learns from less data, adapts easily to any optimization metric, and does not compromise quality at production-scale.

     

    Three (now archived) blog posts describe the results of our research: [1], [2], and [3].

    Online Game Recommendation Engine

     

    Blastworks develops and publishes online games, with a large and growing list of content. We are helping them use behavioral data to develop a recommendation engine that provides their users with high quality suggestions for new games and creates greater exposure to their full catalog.

  • PARTNERSHIPS

    Preva Group

    Preva Group is dedicated to helping organizations achieve large scale social change by combining existing structured and unstructured data, powered by sophisticated analytics and machine learning, delivered through simple user-centered interfaces. Data Cowboys operates a strategic employee sharing partnership with Preva Group.

    Algorithmia

    Algorithmia is building a gigantic marketplace for algorithms. We are one of their certified partners for novel algorithmic development, working with their clients to design the machine learning algorithms hosted on Algorithmia's platform.

  • Publications

    Journal Papers

    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Revisiting Stein's Paradox: Multi-Task Averaging," Journal of Machine Learning Research, 2014. [link]
    • Eric K. Garcia, Sergey Feldman, Maya R. Gupta, and Santosh Srivastava, "Completely Lazy Learning," IEEE Trans. on Knowledge and Data Engineering, 2010. [pdf] [code + data]
    • Vagisha Sharma, Jimmy K. Eng, Sergey Feldman, Priska von Haller, Michael J. MacCoss, and William S. Noble, "Precursor Charge State Prediction for Electron Transfer Dissociation Tandem Mass Spectra," Journal of Proteome Research, 2010. [pdf]


    Conference Papers

    • Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar, "Content-Based Citation Recommendation," NAACL-HLT, 2018. [pdf
    • Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni, "Construction of the Literature Graph in Semantic Scholar," NAACL-HLT, 2018. [pdf]
    • Sergey Feldman, Maya R. Gupta, and Bela A. Frigyik, "Multi-Task Averaging," NIPS, 2012. [pdf]
    • Luca Cazzanti, Sergey Feldman, Maya R. Gupta, and Michael Gabbay, "Multi-Task Regularization of Generative Similarity Models," Lecture Notes in Computer Science, 2011. [pdf]
    • Sergey Feldman, Marius A. Marin, Mari Ostendorf, and Maya R. Gupta, "Part-of-Speech Histogram Features for Genre Classification of Text," IEEE ICASSP, 2009. [pdf]
    • Marius A. Marin, Sergey Feldman, Mari Ostendorf, and Maya R. Gupta, "Filtering Web Text to Match Target Genres," IEEE ICASSP, 2009. [pdf]
    • Sergey Feldman, Marius A. Marin, Julie Medero, and Mari Ostendorf, "Classifying Factored Genres with Part-of-Speech Histograms," NAACL-HLT, 2009. [pdf]

     

    Theses & Technical Reports

    • Sergey Feldman, Kyle Lo, and Waleed Ammar, "Citation Count Analysis for Papers with Preprints," 2018. [pdf]
    • Sergey Feldman, "Multi-Task Averaging: Theory and Practice," University of Washington PhD Thesis, 2012. [pdf]
    • Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta, "Filtering Tandem Mass Spectra for Quality," University of Washington Dept. of Electrical Engineering Technical Report UWEETR-2012-0001, 2012. [pdf] [code + data]