On craft and automation
excerpted from an email conversation about creating with LLM-enabled tools
I think there is probably a different kind of enjoyment that we could learn to feel when creating with such a heavy level of intermediation.
As I've gotten later in my (still pretty short) career I've gotten a deeper understanding of what some of my founder/leader friends call "physics of business". In the same way that building software is about organizing software abstractions in just the right arrangement to make it easy to reason about and improve, there's a skill to organizing people, economic incentives, communications, and networks of trust in a way that makes the whole people system (companies, partnerships) possible to debug and predict and improve.
Just as good software engineers seem to find deep craft in designing elegant software systems that hum and perform reliably, I've found good people-systems-organizers can find similar craft in designing people systems / companies that hum along to do big things in the world. There is a kind of elegant physics to teams that are working really really well, and that physics is often carefully balanced by a system designer or many.
So, I guess what I'm saying is that there may be a craft to building software at a distance, with things like LLM agents chugging along beneath your hands, but that feels to me like a distinct kind of joy from working directly in the medium of code. And just as we shouldn't expect the same people to enjoy both coding and managing all of the time, we shouldn't expect people to enjoy doing both coding "by themselves" and coding "with a team of agents" the same. And I think that is OK. And I think the people like you and me who like the craft of working directly with the system will find ways to accelerate ourselves in our craft.
I was joking with a friend earlier today that, if you can't work as fast as the best "agent-using" engineers using just code autocomplete, your codebase probably needs to be better designed. I kind of do mean it though — in well written codebases you can say a lot with a few words. Similarly, good writers don't need an LLM to write hundreds of words quickly because they know how to say the same in a dozen perfect words.
Maybe the question I'll leave here at the end is: how can we better organize our creative environments so we can say a lot with a few movements of our hands?
A founder friend of mine working on Flora told me about a mission statement I always found really inspiring — he wants to allow artists to "speak beauty into existence." I love this phrase because all of the focus here on about the precision of the words, and the ease with which the words can conjure ideas into being. There has got to be a way to get there by finding better ways to speak of beauty, rather than by mechanizing the means of beauty-production. It is a harder problem; language and culture moves at the speed of people rather than of computers. But I think it is what really yields new understanding about the world, and where real progress as a culture is made.
I would love to see fewer developer tools pitch "set up in minutes" and more of them pitch "we stay available and when things break we will be observable and debuggable".
Tragically rare priorities for modern software infrastructure products.
Problems of search and problems of learning, as far and wide as the eye can see.
And all this land, for the taking, if you are bright-eyed enough to see it and to seize it.
AI systems with research taste
How might we design an AI system skilled at asking questions whose verifiable answers/rewards are maximally useful for RL?
This would be like a "good research question taste" model which would allow us to assemble a training set of more sample-efficient Qs w/ verifiable answers. Like a teacher who guides a student by asking the right questions at the right time, a system optimized for good research taste would help humanity advance the frontier of unsolved problems as efficiently as possible, especially when paired with "good problem-solving taste" AI systems.
Good question taste feels like it may be a much harder technical accomplishment than good problem-solving taste, but feels much more fundamental to creating intelligence that meaningfully reduces the cost of new science.
Far-flown ones, you children of the hawk's dream future when you lean from a crag of the last planet on the ocean
Of the far stars, remember we also have known beauty.— Robinson Jeffers
Early lessons about independent exploration
Someone recently asked for my advice on pursuing independent research and exploration. My answer ended up long and winding, so I thought it might be useful for others to read and contemplate. Though I'm mostly speaking from my experience in machine learning and human-computer interaction, I think my general takeaways apply to many fields.
Eventually, this will end up on my blog. In the meantime, here's a less polished thought dump.
The biggest lesson I’ve learned is that a research field is simply a community of people who share (1) a small set of problems they agree are important and interesting, and (2) a set of investigative methods to go after those problems and uncover new knowledge. This definition of a research field is broader (and, I’d argue, more accurate) than the version tied strictly to academia, at least if your main goal is to make a meaningful discovery or claim about the world that matters beyond your own curiosity.
Given that framing, one way to think about how to make use of an independent exploration period is to figure out what community you want to contribute knowledge to, learn where those people congregate, identify the problems they consider significant, and become familiar with how that community evaluates and integrates new ideas into their canon. You can then use that understanding to talk about problems of interest to you in a way that makes the community listen, and frame your solutions/ideas/discoveries in ways that have a high chance of nudging that community in a direction you believe is right.
For instance, my current work intersects two communities: the interpretability research community closer to ML academia, and the more commercially oriented “tools for thought” or “HCI for AI” community. When I talk to the former, I focus on how my work can help debug and improve model performance. When I address the latter, I try to get people excited about the idea of a totally new way to interact with information. Each community cares about different things, so I frame my work accordingly.
Finally, the way you share your ideas—through academic conferences, open-source releases, demos, or personal networking—will vary. In general, I’ve found it valuable to regularly talk about what I'm working on and always reiterate why I'm working on it, both in public and with trusted friends, because that helps others figure out whether they identify themselves to be in the same community, per the above definition, as you and your work.
Be lucid about what you want to understand or enable. Know your audience. Communicate clearly and regularly.
A good thinking tool shouldn't just hand users answers to their questions, but also guide and enable them to discover and articulate more complex questions.
Asking more complex questions, and discovering answers to them, which lead to even more nuanced questions. Without one, the potential of the other in this pair becomes limited.
A related thought: While building tools to solve hard problems for humans, we should strive to also improve people's depth of engagement with those complex problems and their solutions, as a way to preserve human agency when working with increasingly capable aids for our work. Otherwise, we risk losing touch with, and therefore understanding over, critical decisions.
Scale xor Explore, a hypothesis.
In innovation ecosystems, for efficient resource allocation all resource in an organization must go towards only one of two spends:
- Scale: Taking some working formula for solving a problem or producing something valuable, where there is "sign of life" and a way to scale production, and single-mindedly scaling it;
- Explore: Open-ended exploration to discover new signs of life of new regimes or transformative technologies.
These feel like two distinct modes of operating a single group of people. An organization is either doing (1) or (2), and any attempt to straddle them by doing something in-between will not do what you wish it would.
So, how to blend the benefits of both?
In larger organizations, while each team must be in one mode or another, the organization as a whole can have a portfolio of bets that combine both approaches at a sub-team level to trade off risk tolerance against upside. Some teams can be working on efforts of category (1), while others can be in category (2) mode.