The R&D lab that takes your AI from "what if" to clear answers.

We help teams figure out which models, methods, and bets are worth making. That means experiments, benchmarking, failure analysis, and technical recommendations grounded in evidence.

Start with a question, a dataset, or a rough prototype. We'll design the work, run the experiments—in weeks, not months.

  • Data science & modeling: messy data, exploratory analysis, and methods grounded in your constraints.
  • Baselines & experiments: reproducible methods tailored to your data and constraints.
  • Evaluation & benchmarking: failure analysis for LLMs, search, ranking, and forecasting.
  • Safety & reliability: red-teaming, review, and reliability analysis.
  • Findings & next steps: technical recommendations and team walkthroughs.

Trusted by teams working on hard AI problems

What we do

R&D sprints

Rapid exploration, baseline comparisons, and hard-nosed feasibility checks. See our work.

LLM evaluation & guardrails

Prompting, rubric design, reliability testing, and failure analysis. Learn about our LLM chatbot work.

Search & ranking research

Offline/online metrics, ranking experiments, and user-impact analysis. Read our search engine case study.

Messy data science

We clean messy data, find meaningful patterns, and test hypotheses. Explore our small datasets research.

Forecasting & anomaly detection

We compare forecasting approaches, quantify uncertainty, and separate signal from noise.

Findings & next steps

Clear documentation, evaluation reports, and concrete technical recommendations.

How we work

We run 4–8 week research sprints. You get clear answers backed by experiments, benchmarks, and documented methods.

1–2 weeks

Discovery & data audit

Understand the decision, the data, and the constraints. We audit data quality, frame the problem, and plan experiments.

  • Problem framing & success metrics
  • Data quality review
  • Analysis plan & experiment design
2–8 weeks

Experiments & evaluation

We compare baselines, test approaches, and run evaluations to see what actually works.

  • Model development & prototyping
  • Evaluation design & benchmarking
  • Performance benchmarking & safety testing
2–8 weeks

Findings & next steps

We pull the work together into clear findings, concrete recommendations, and documented methods.

  • Written findings & recommendations
  • Benchmark results & methods documentation
  • Team walkthrough

The Team

Sergey Feldman

Sergey Feldman is a Principal Research Scientist at the Allen Institute for AI (Ai2) and Head of AI at Alongside. He has led research and development for large-scale AI, retrieval, and language systems, including work behind Semantic Scholar and Asta. His expertise spans LLMs, machine learning, natural language processing, and scientific document analysis.

Sergey founded Data Cowboys to bring research-grade thinking, honest evaluation, and clear communication to organizations solving hard problems. He holds a PhD in machine learning from the University of Washington and has been building AI systems since 2007.

Ilya Barshai

Ilya Barshai is a Principal Research Scientist at McGraw Hill, where he applies AI and machine learning to educational technology. He brings over a decade of hands-on data science experience.

Ilya joined Data Cowboys in 2016, bringing a practitioner's eye for what actually works: rigorous analysis, careful modeling, and straight answers. Before moving into data science, he spent eight years in risk and failure analysis of electromechanical product designs.

What clients say

Sergey and Ilya grasped what we do the fastest of anyone I’ve explained our startup to. They are professional, timely, and so fast with their thinking and output.

— Jay Goyal, Co-Founder, Actively Learn & Alongside

Sergey has a wonderful knack for quickly scoping a problem, making efficient decisions on model choice and parameters, and clearly communicating throughout the process.

— Apu Mishra, Lead Data Scientist, RichRelevance

Sergey Feldman is brilliant, collaborative, candid, and lightning-fast. He has a special gift for explaining complex analytical concepts in terms anyone can understand.

— Deb Arnold, Principal, Deb Arnold, Inc.

Sergey provides well-considered results quickly with in-depth knowledge communicated clearly to readers at every level.

— Jenny Dearborn, Chief People Strategy Officer, BTS

Data Cowboys’ work has been incredible. We understand our data, verify our impact, and keep people safe from exploitation.

— Robert Beiser, Technology Operations Lead, Microsoft

Got an AI or data question with real uncertainty?

Let’s figure out what the evidence says before you commit more time and budget.

Want to email instead? Reach us at ilya@data-cowboys.com.