About this document: I wrote this memo for investors when I was initially raising money for our company, but did not initially plan to publish it for a broader audience. I changed my mind because I think the current risks to our business are primarily technical rather than structural,1 and on balance I believe companies in this space have a moral obligation to be transparent about our thinking, as we are all hoping to build a hugely disruptive technology which will have ramifications for all of humanity. If you are interested in reading some of my previous thoughts about likely areas of value creation in the robotics space, here is a post I wrote about general principles of what's working in AI, and an older post I wrote about different ideas for robotics businesses that people can build in the future.
Why K-Scale Wins
TL;DR: K-Scale Labs is a bet that the value-adding component of humanoid robots is in the brain, not the body.
When I was working on the Tesla Autopilot team, one of the big pushes I was involved in was removing the radar. This meant training a neural network that could predict the kinematics of other vehicles using just the front-facing camera feed, better than the radar could. We had an internal joke about how bad we were at our jobs - the hardware team was already done removing it, what was taking us so long? We all figured one day Elon was going to announce that the hardware team had done the laborious work of redesigning the car without a steering wheel, and all we had to do was make them drive.
I lead with this anecdote because it speaks to the fundamental shift that has happened in hardware value creation in the last five years with the advent of deep learning. In the automotive industry, what self-driving promised was the first low-marginal cost, high-value add feature on a car since the radio. A small team of software engineers could push an update and increase the value of every single car in a fleet by thousands of dollars.
It turned out that this problem was a lot harder than a lot of people initially anticipated. But it provided some instructive lessons on how to build a company that does real-world AI. This document will provide a framework for how I think we can build a winning humanoid robot company, shaped by my experience working on the front lines of both AI research at FAIR and real-world AI at Tesla.
The software ecosystem is the moat
A popular assertion in the early days of AI companies was that you couldn’t build a company off of a good model alone. Sure, the assertion went, maybe you could get some early revenue, but once your competitors catch up with an almost-as-good model, all your margins are going to dry up. The only way to build a successful long-term company is to have some other sort of moat, like being deeply integrated with some enterprise software stack.
As we’ve seen the field evolve, this has largely turned out to be incorrect. Great deep learning systems have built-in network effects if they can figure out how to leverage user feedback to drive quality. The canonical example of this is Midjourney - there are thousands of image generation models, and yet Midjourney keeps churning out high-margin subscription revenue, because they figured out how to use user feedback to keep making their model better, while simultaneously doing great work to make the core product useful for their users.
At the same time, companies which have chosen to focus on verticals where they think they can build strong moats have struggled. This is because, in order to make a model work really well, you need to throw as much data as possible into it. Trying to build a bespoke model for X will almost always result in a worse model than a model built for X, Y, and Z. Consider a company building “Midjourney for Cat Photos”. Models like this do exist, and they’re almost all worse than just asking Midjourney to generate cat photos.
Oddly enough, if you were to ask any of the existing humanoid robot companies what their moat is, none of them would say the model. Tesla’s moat is their expertise in manufacturing and ability to be their own customer; 1X’s moat is their actuator design; Figure, Agility, Sanctuary, Unitree, and most of the others would probably say something about their hardware expertise as well.
The reason they all say this is pretty obvious - it seems really hard to build a good robot, and without a robot, it doesn’t matter how good your software is.2 A good hardware platform will ride the rising tide of AI improvements to become more and more valuable. In the parlance of a famous Gwern essay, the complements to the hardware are becoming commoditized, which means that whoever wins the hardware game is going to win big. That’s why it makes sense for a company like Figure to raise $600 million+. If they win hardware, then they will be the platform for embodied AGI. To summarize, this path looks like:
- You build the best robot hardware
- OpenAI / Anthropic / Character use your hardware to deploy embodied AI models
- You extract lots of value by owning the hardware chain
But what if they have it backwards? What if it is actually the hardware being commoditized, while the software is the moat? I think this is a much, much more likely scenario, fundamentally because the hardware is clearly not where the value is being created. We’ve had the hardware for humanoid robots basically figured out for the better part of two decades. Consider the following graph, showing the torque versus price for a selection of brushless motors.3 That outlier in the top left is hoverboard motors. In torque per dollar terms, hoverboard motors dwarf any of the actuators produced by Unitree, Tesla, Figure, 1X, Agility, or anyone else.
But sure, you might say, that's your view as a naive software engineer who doesn't really understand the hardware space. What about the awesome hardware demo that Tesla showed off when they first unveiled their robot? While I might not be able to convince the extreme skeptics, the key nuance to remember with humanoid robots is that the hardware is more or less a means to an end, with the end being a robot that knows how to do things for you. If you build great hardware for the sake of building great hardware, you end up with demos like this:
While it is certainly impressive that an actuator can lift a piano, it is fairly unlikely that any humanoid robot will ever actually benefit from this. You can take any off-the-shelf actuator and increase the mechanical advantage to make it lift a piano, but the mechanical properties that you would want from your humanoid robot actuators are things like low inertia, low friction, and backdrivability.4 This is one example, but in my opinion, many of the hardware decisions that have been made by robotics hardware companies are bad decisions, born from a lack of clarity of how they will incorporate the software layer to make the robot do useful things.
Zooming out and thinking solely about the business model, however, it seems to me that there is a huge hole in the business plan for the largest and best-funded companies in the space, and I am confused how so many people seem to be betting on what feels like a losing strategy - it’s like betting that Compaq will be able to extract all the value from Microsoft because Windows is worthless without a physical computer. I suspect we are in for a big correction, and the thing that is going to trigger it is going to be something like Stable Diffusion for robots. If you could download a model that makes a humanoid robot do useful stuff, there will overnight be a hundred companies racing to make knockoff robots to run that model. You’ll have no idea who these companies are, but these robots are going to be insanely good. Unlike with electric cars, there aren’t mountains of regulatory barriers for foreign manufacturers to overcome.5 On a basic technological level, humanoid robots are a lot more like toys than cars.
So, if the hardware is easy, why can’t I buy a good, cheap humanoid robot yet?
Building great models is hard
Having been in machine learning for a while, I can confidently assert that the difference between a good and bad machine learning engineer basically comes down to their ability to minimize a ton of small failures. This is why it’s almost always more valuable to talk to someone in person about a paper than to read it - talking to them will reveal the hundreds of things that they did to get the final result in the paper. Nothing is more useless than a machine learning novice confidently asserting they reproduced a paper in an afternoon. If you look at the training logs for some of the best papers - papers which seem like simple, clean ideas - they will have hundreds of experiments for each good result.
This problem is even more evident in real-world AI. As evidence of this, look at three of the most exciting recent robotics papers: ALOHA from Stanford, UMI from Columbia, and DOBB-E from NYU. A common feature about these papers, which most people seemed to miss, was that these labs designed and built some part of the hardware that they used. The fact that these systems work so well is directly attributable to the fact that they all had a top-to-bottom understanding of how the system works.
There are some people right now who would rather focus just on building models and ignore any hardware at all. I think this is a mistake, as anyone who has ever used the Unitree or Hello Robot APIs can attest to. It is orders of magnitude more difficult and expensive to make your hardware do useful stuff if you try to abstract the hardware away from the people who are building the models to make it work. This is effectively what killed Everyday Robots and what is killing Tesla’s Autopilot.
Thematically, however, we are trying to take advantage of the same inflection point. The current state of robotics, where we are starting to perform complicated chores from 50-300 data points, is very reminiscent of the early days of NLP. In 2017, the methods were already relatively well-understood, but the field was limited to academic benchmarks. It took a willingness to bet on model scaling laws to bring us the success we see today. Similarly, robotics academia today is limited by the constraints of the lab; researchers use whichever robot they can buy that is easiest to work with, and try to beat out other labs on some benchmarks.
Our goal is to consider the hardware and software together, creating a platform that everybody can easily build on, to unlock the same scaling laws for robotics that have worked in text, audio, and video. We’re betting that the bitter lesson of machine learning will work as well in robotics as it has in those fields.
In the current robotics landscape, the missing business is the company which co-designs the hardware and software, because that is the only way to build a useful robot, but focuses resources on the high-margin software business instead of the low-margin hardware business.
The Master Plan
- Build great robot hardware
- Open-source the hardware design
- Build the best dataset
- Build the best model
To me, this is the only plan that has any chance of success. Rather than trying to prevent any copycats from building your product, embrace the copycats. Unleash the creative forces of hundreds of manufacturers to drive the idiot index of humanoid robots down to one. Let any high school kid with a free summer make some money assembling humanoid robots from a kit bought off Alibaba. Make the same thing happen to humanoid robots that happened to electric skateboards, hoverboards, and 3D printers.
In a world where robots are actually valuable, there is no reason to think that the hardware will not be commoditized. This is exactly what happened with 3D printers and electric longboards. Believing the opposite requires believing that humanoid robots as a product category will be more similar to cars than to 3D printers. Unlike those products, however, there will be a high degree of differentation in the software stack for humanoid robots. Without good models, the hardware is useless.
If we are able to solve the extremely difficult engineering problem of making a robot do something useful, then monetizing our models will be trivial. It can be as simple as selling signing keys to download a model update. If we can collect data from millions of robots, for a robot platform which we designed specifically for our models, we will have an insurmountable advantage in the race for embodied AI.
Failure modes
I think this plan has a 20% chance of success, where success means something like a million robots built. The failure modes that I see happening:
Failure to design decent robot hardware. If the hardware we build is useless, then we will never kick off the flywheel because there will be no incentive for anyone to build our robot. In other words, we need to design a hardware platform that can do something compelling enough to make other people want to build it. The Poppy Project is a good example of this failure mode - their robot never passed the threshold of being more than a novelty. This is essentially a cold start problem. I think this is an achievable target for several reasons. First, I don’t think this is a winner-take-all market. Even if a few players start selling robot hardware, I think this will make the appetite for an open-source version grow, not shrink, and with reasonable execution I suspect we can easily be the best open-source option. Second, humanoid robot hardware really isn’t that hard. Motors are largely commoditized, and modern 3D printers make rapid hardware iteration much more feasible. Third, our small team has the winning combination of experience both in shipping real-world AI products and working at the cutting edge of AI research, a very rare combination.
Failure to close the feedback loop. If we open-source the hardware and other people start building it, in order to turn this into a flywheel, we need to be able to close the loop in a way that each marginal robot built increases the rate of improvement of our models. This is analogous to Midjourney asking a user to choose the best photo from a selection of four photos. The problem with relying on other people to build the hardware is that we lose some control of the ecosystem, so we have to consider clever ways to maintain that control. Shipping a self-contained operating system that integrates with our own cloud platform is one avenue. At some point we can use our models and data as leverage to enforce that everyone is a good actor.
Failure to capture value from the ecosystem. This is the most important bet that the company is making, which is that most of the value in a humanoid robot will be in the software. There’s a risk that this bet is wrong, and that some hardware companies building our robot are able to capture more of the value than we are. There’s also a failure mode where someone comes along, takes our robot design, builds a better software stack for it, and pushes us out of the loop. This is where execution matters the most. We need to be better than any existing open source humanoid robot. Also, before we have a decisive data advantage, we need to win by having the best models and software ecosystem.
What the world looks like if we succeed
I started K-Scale Labs with the mission of bringing humanity to the next level on the Kardashev scale within my lifetime. This means moving the growth rate of humanity’s energy consumption from 1% to 15%. At any other time in human history, this would have been a preposterous idea, but today it seems basically achievable - in a world where there is one generally-intelligent robot for every human, you just need 30% of those robots to make a copy of themself once a year. In a future of real-world general intelligence, exponentials become very powerful.
Success means something like ending scarcity. World hunger and poverty will be eradicated. Humanity will build planet-scale particle accelerators and general intelligence will spread out across the galaxy. So, pretty cool stuff.
Footnotes
By this I mean, we are now firmly in in a position where if we can execute well on engineering, we wil have a great business, regardless of what other companies in the space are doing. ↩
Incidentally, when Jerry Pratt, the CTO of Figure, left to start a new company, his pitch was almost exactly premised on this argument, despite several concrete developments indicating that it is no longer as iron-clad it might have seemed when Figure was getting started. ↩
This is the type that is used by every actuator in every humanoid robot that exists today. There are variations on hwo to apply mechanical advantage, but the bulk of the price of the actuator comes from the copper and magnets that make up the actuator itself. ↩
Except for cases when you don't actually intend to do much walking, and you want the robot to be able to stand in place for a long time without using any energy - although, then what is the point of having legs? ↩
While true today, I think there are a few organizations out there who are very keen to get these regulatory barriers set up as quickly as possible, which is why I think it's so important to move quickly now. ↩