A pickaxe for the AI gold rush, Labelbox sells training data software
Every artificial intelligence startup or corporate R&D lab has to reinvent the wheel when it comes to how humans annotate training data to teach algorithms what to look for. Whether it’s doctors assessing the size of cancer from a scan or drivers circling street signs in self-driving car footage, all this labeling has to happen somewhere. Often that means wasting six months and as much as a million dollars just developing a training data system. With nearly every type of business racing to adopt AI, that spend in cash and time adds up.
Labelbox builds artificial intelligence training data labeling software so nobody else has to. What Salesforce is to a sales team, Labelbox is to an AI engineering team. The software-as-a-service acts as the interface for human experts or crowdsourced labor to instruct computers how to spot relevant signals in data by themselves and continuously improve their algorithms’ accuracy.
Today, Labelbox is emerging from six months in stealth with a $3.9 million seed round led by Kleiner Perkins and joined by First Round and Google’s Gradient Ventures.
“There haven’t been seamless tools to allow AI teams to transfer institutional knowledge from their brains to software,” says co-founder Manu Sharma. “Now we have over 5,000 customers, and many big companies have replaced their own internal tools with Labelbox.”
Kleiner’s Ilya Fushman explains that “If you have these tools, you can ramp up to the AI curve much faster, allowing companies to realize the dream of AI.”
Inventing the best wheel
Sharma knew how annoying it was to try to forge training data systems from scratch because he’d seen it done before at Planet Labs, a satellite imaging startup. “One of the things that I observed was that Planet Labs has a superb AI team, but that team had been for over six months building labeling and training tools. Is this really how teams around the world are approaching building AI?,” he wondered.
Before that, he’d worked at DroneDeploy alongside Labelbox co-founder and CTO Daniel Rasmuson, who was leading the aerial data startup’s developer platform. “Many drone analytics companies that were also building AI were going through the same pain point,” Sharma tells me. In September, the two began to explore the idea and found that 20 other companies big and small were also burning talent and capital on the problem. “We thought we could make that much smarter so AI teams can focus on algorithms,” Sharma decided.
Labelbox launched its early alpha in January and saw swift pickup from the AI community that immediately asked for additional features. With time, the tool expanded with more and more ways to manually annotate data, from gradation levels like how sick a cow is for judging its milk production to matching systems like whether a dress fits a fashion brand’s aesthetic. Rigorous data science is applied to weed out discrepancies between reviewers’ decisions and identify edge cases that don’t fit the models.
“There are all these research studies about how to make training data” that Labelbox analyzes and applies, says co-founder and COO Ysiad Ferreiras, who’d led all of sales and revenue at fast-rising grassroots campaign texting startup Hustle. “We can let people tweak different settings so they can run their own machine learning program the way they want to, instead of being limited by what they can build really quickly.” When Norway mandated all citizens get colon cancer screenings, it had to build AI for recognizing polyps. Instead of spending half a year creating the training tool, they just signed up all the doctors on Labelbox.
Any organization can try Labelbox for free, and Ferreiras claims hundreds of thousands have. Once they hit a usage threshold, the startup works with them on appropriate SaaS pricing related to the revenue the client’s AI will generate. One called Lytx makes DriveCam, a system installed on half a million trucks with cameras that use AI to detect unsafe driver behavior so they can be coached to improve. Conde Nast is using Labelbox to match runway fashion to related items in their archive of content.
Eliminating redundancy, and jobs?
The big challenge is convincing companies that they’re better off leaving the training software to the experts instead of building it in-house where they’re intimately, though perhaps inefficiently, involved in every step of development. Some turn to crowdsourcing agencies like CrowdFlower, which has their own training data interface, but they only work with generalist labor, not the experts required for many fields. Labelbox wants to cooperate rather than compete here, serving as the management software that treats outsourcers as just another data input.
Long-term, the risk for Labelbox is that it’s arrived too early for the AI revolution. Most potential corporate customers are still in the R&D phase around AI, not at scaled deployment into real-world products. The big business isn’t selling the labeling software. That’s just the start. Labelbox wants to continuously manage the fine-tuning data to help optimize an algorithm through its entire life cycle. That requires AI being part of the actual engineering process. Right now it’s often stuck as an experiment in the lab. “We’re not concerned about our ability to build the tool to do that. Our concern is ‘will the industry get there fast enough?’” Ferreiras declares.
Their investor agrees. Last year’s big joke in venture capital was that suddenly you couldn’t hear a startup pitch without “AI” being referenced. “There was a big wave where everything was AI. I think at this point it’s almost a bit implied,” says Fushman. But it’s corporations that already have plenty of data, and plenty of human jobs to obfuscate, that are Labelbox’s opportunity. “The bigger question is ‘when does that [AI] reality reach consumers, not just from the Googles and Amazons of the world, but the mainstream corporations?’”
Labelbox is willing to wait it out, or better yet, accelerate that arrival — even if it means eliminating jobs. That’s because the team believes the benefits to humanity will outweigh the transition troubles.
“For a colonoscopy or mammogram, you only have a certain number of people in the world who can do that. That limits how many of those can be performed. In the future, that could only be limited by the computational power provided so it could be exponentially cheaper” says co-founder Brian Rieger. With Labelbox, tens of thousands of radiology exams can be quickly ingested to produce cancer-spotting algorithms that he says studies show can become more accurate than humans. Employment might get tougher to find, but hopefully life will get easier and cheaper too. Meanwhile, improving underwater pipeline inspections could protect the environment from its biggest threat: us.
“AI can solve such important problems in our society,” Sharma concludes. “We want to accelerate that by helping companies tell AI what to learn.”