See how companies like Uber and ZocDoc use machine learning to improve key business metrics

Ramping up for the keynotes at Strata Data Conference in New York — photo credit to the official O’Reilly flickr site

The majority of buzz around machine learning and AI focuses on things like computerized play of Dota or realistic speech synthesis. While these areas are sexy and present value to the field, there is not nearly enough attention on practical machine learning and the challenges that come with implementing actual pipelines.

Machine learning teams are still struggling to take advantage of ML due to challenges with inflexible frameworks, lack of reproducibility, collaboration issues, and immature software tools.

Over the past month, I had the opportunity to attend O’Reilly Media’s AI Conference and Strata Data Conference. With so many sessions and great companies presenting, it’s always difficult to pick which sessions to attend. There are many different approaches (here’s a great guidebook from The Muse) but I personally skew towards sessions around applied machine learning that cover actual implementation.

These applied ML presentations are valuable because:

  • the presenters are usually from the team that built the actual pipeline and handled specific requirements
  • the content is honest about failed approaches and pain points the team experienced, even with later iterations
  • there’s a real connection between business metrics (such as support ticket burn-down rate, customer satisfaction, etc…) and the machine learning models

The best sessions I saw at these two conferences came from Uber and ZocDoc. In this post, I’ll explain the key takeaways from those sessions and how your team can incorporate these lessons for your own machine learning workflows.

Session Deep-dives

Uber and ZocDoc are disruptive in their own ways, but both companies are using machine learning as a competitive differentiator and approach for improving the user experience.

Uber: Improving customer support with natural language processing and deep learning

With over four billion rides in 2017 alone, you can imagine that Uber’s support system needs to be scalable.

With Uber support, the Machine Learning team wanted to focus on making customer support representatives (CSRs) more effective by recommending the three most relevant solutions — essentially a ‘human-in-the-loop’ model architecture called Customer Obsession Ticket Assistant, or COTA.

The machine learning team at Uber decided to create and compare two different model pipelines to scale support: (1) COTA v1 which converts a multi-class classification task into a ranking problem and (2) COTA v2 which used a deep learning approach called Encoder-Combiner-Decoder.

At AI Conference, Piero Molino, Huaixiu Zheng, and Yi-Chia Wang from the Uber team did an incredible job incrementally laying out both their models’ architectures and the impact their two different approaches had on revenue and ticket handling time.

Piero was kind enough to share the slides to their presentation here.

You can see more of Piero’s work on his personal website:

Uber’s support UI with three suggested replies surfaced through the COTA models.

Both models ingested the ticket, user, and trip information to suggest ticket classifications and reply templates (answers) for CSRs.

You can see the architectures for both models in the image below. To summarize, the COTA v1 random forest model combines a classification algorithm with a pointwise-ranking algorithm while COTA v2 leverages a deep learning architecture that can learn to predict multiple outputs by optimizing losses for several different types of encoded features (text, categorial, numerical, and binary). The team conducted hyperparameter searches for each model (grid search with COTA v1 and parallel random search with COTA v2.

I highly recommend reading their paper to get the full set of details and implementation decisions

From feature engineering to predictions, the Uber team maps out how they processed different inputs to populated suggested replies to the CSR team.

The Uber team was able to compare their models’ impact with A/B tests (good resource here around A/B testing) and customer surveys around their support experience. The team ultimately found that COTA v2 was 20–30% more accurate than COTA v1 in their A/B tests. COTA v2 also reduced handling time by ~8% versus COTA v2’s ~15% reduction. While both approaches helped increase customer satisfaction, it was clear that COTA v2 was the champion architecture.

The Uber team set up an A/B test for both versions of COTA where COTA v2’s accuracy 20–30% higher than COTA v1’s (Slide 23 of 30)

Uber’s presentation showed how integrating machine learning into processes like customer support is an iterative process. They had to test different architectures and also make decisions around performance that impact accuracy (taking into account reasonable mistakes).

Zocdoc: Reverse engineering your AI prototype and the road to reproducibility

ZocDoc is an online medical care appointment booking service, providing a medical care search platform for end users by consolidating information about medical practices and doctors’ individual schedules.

The ZocDoc team honed in on a very specific part of their user’s journey: finding in-network physicians based on their insurance coverage.

For ZocDoc’s users, finding an in-network physicians can mean significant cost savings. Typically, if you visit a physician or other provider within the network, the amount you will be responsible for paying will be less than if you go to an out-of-network provider (source).

The ZocDoc team built an insurance card checker that allowed the patient to scan a picture of their insurance card, and then extracted the relevant details from the card to check whether a particular doctor and particular procedure was covered.

ZocDoc’s image recognition task was difficult because:

  • user-submitted images often have poor resolution and vary in dimension (due to a lack of formatting constraints) resulting in poor training data quality
  • insurance cards contain many other types of information and may sometimes repeat the member ID
  • the team had to quickly build a prototype then transform their process into a reproducible pipeline

At AI Conference, ZocDoc’s Brian Dalessandro (head of data science) and Chris Smith (senior principal software engineer) outlined these technical challenges by walking through the different phases of their model architecture (see screenshot below).

The most interesting part of the session was when Chris describes the team’s decision to completely tear down the infrastructure they had for the prototype because of scalability and reproducibility concerns. It was difficult for the team to identify and track key model artifacts such as the hyperparameters used, software dependencies, and more as they iterated.

For more details around the specific model implementation, you can read ZocDoc’s original blog post about this project here

ZocDoc’s MemberID extraction model architecture involved a base classification network, an alignment network, and an optical character recognition (OCR) model.

The ZocDoc team was eventually able to surpass the 82% baseline accuracy (user-reported stat) with their three-part model pipeline! However, their journey was one of constant iteration and frustration around the experience of data and model management.

ZocDoc’s presentation was striking because it showed that even a small tweaks in the user experience can bring tremendous value to customers, but also demand intense investment from data scientists — as expressed by a quote from their blog post:

“We quickly learned though that getting to a quality that is appropriate for a production level personal health application required a little more ingenuity and trial and error than just simply stringing together open sourced components.”

— Akash Kushal

Addressing practical ML challenges

These two presentations from Uber and ZocDoc illustrate how machine learning in practice involves much more than using the latest modeling frameworks. Imagine the frustration Chris and Brian felt when they had to rebuild their pipeline to make it production ready but realized they didn’t track their prototype’s metrics, hyperparameters, or code.

One of the most critical blockers to effective machine learning today is reproducibility. Reproducibility allows for robust models by reducing or eliminating variations when rerunning past experiments.

At, we allow data science teams to automagically track their datasets, code changes, experimentation history and production models creating efficiency, transparency, and reproducibility.

See a quick video of how has already helped thousands of users make their machine learning experimentation more effective and trackable:

For those in the New York area, join us on October 4th to learn about how Precision Health AI is applying machine learning to detect cancer.

We’re hosting PHAI’s director of client services, data scientist, and software engineer to explain how they’ve built out a rich ML pipeline. RSVP here!

Want to see more amazing examples of applied machine learning?

Posted by:Cecelia Shao

Product Lead @ Comet is doing for Machine Learning what GitHub did for software. We allow data science teams to automatically track their datasets, code changes, experimentation history and production models creating efficiency, transparency, and reproducibility. Learn more at

Leave a Reply

Your email address will not be published. Required fields are marked *