The Data is the Specification: A Manifesto for Iteratively Solving Complex Problems


The process of developing self-driving delivery robots that can safely navigate urban environments has required us to solve many complex problems. In doing so, we have developed and refined a process for efficiently and effectively solving these problems. We want to share the generalized version of this process and the journey we took to get there.

We used to believe that solving complex problems was about forming a complete understanding of the problem and specifying its solution. The advent of machine learning has shown we can solve problems by instead optimizing directly against data, even without at first fully understanding the problem or the solution.

This way of thinking can be applied more generally, in software development, and in many other fields. We believe it is possible to achieve superhuman solutions by finding the easiest subset of problems, solving it, and repeating this process as many times as necessary.

From all of our collected experiences, we have extracted three core principles:

  1. Instead of coming up with a good general solution, it is better to focus on solving specific cases of the problem.
  2. Instead of trying to solve all problematic cases right away, it is better to address a proportion of the easiest cases and repeat the process multiple times.
  3. Instead of writing a good specification of a solution, it is better to curate a collection of good problem cases. This collection of problem cases then essentially becomes your specification.

In the early days of developing our robots at Starship Technologies, they were accompanied by a human in public areas - just like self-driving car companies are doing right now with safety drivers behind the wheel. This was because our technology was not, at that stage, advanced enough to operate without supervision. By 2017, it had taken major steps forward, meaning our smart six-wheeler robots could operate in public spaces on their own. We were the first company in the world to accomplish this.

To allow a robot to drive unsupervised, you need to ensure safety for the robots and the people with whom they interact. To make sure of this, we started measuring and quantifying our safety metrics long before 2017. Our first results were 250 times short of the targets we knew we would have to achieve. Clearly, we had some work to do.

Improving something by 250x is an audacious goal - but the best startups are, by their nature, audacious. Our process to improve safety dramatically was methodical and incremental. We made a list of occasions where the robot did not behave in an ideal manner. From this, we took the worst 100 and tried to solve half of them by any means possible.

This process of taking on the lowest-hanging fruit first produced marked improvements. After several months, our robots were performing twice as safely and efficiently as they were before we began the improvement process.

We repeated this process about ten times, and eventually reached our goal of 250x improvement. The journey included steps we would have never thought of in the beginning. Sometimes the solution was as easy as fixing simple but hard-to-find bugs that caused our robots to not detect the edge of the sidewalk.

On other occasions, the solutions were not about software, but about changing the processes for the human operators who occasionally oversee robots, in especially demanding places, over a video link. At times the solution was to add new sensors to the robot. Not because these sensors were superior in terms of their general capability, but because they solved a handful of real-life cases that didn’t have other solutions.

Our manifesto for iteratively solving complex problems is an extension of test-driven development, in which, instead of manually written test cases, you test against real historical data. Agile and Lean methodologies, along with data-driven decision making, also focus on the iterative application of the build-measure-learn loop, but we believe in even faster iteration cycles.

The knowledge-aggregation and solution-creation processes can be improved by iterating directly against data itself, as we suggest above. In doing so, you don’t need to write complete specifications, and you don’t lose vital data about what is really happening. This can lead to a long-term all-round understanding.

We hope you find this manifesto useful in your own work. If you want to discuss any of it with us, or find out more about our problem-solving journey, get in touch with us.


Authors

Kristjan Korjus, Taivo Pungas, Rao Pärnpuu, Ahti Heinla