Random Thoughts

Homebrew, Again

Over Memorial Day, I drove by the HP garage.

There's a plaque on Addison Avenue in Palo Alto that calls a one-car garage the birthplace of Silicon Valley — California Historical Landmark No. 976. In 1938, two Stanford students, Bill Hewlett and Dave Packard, built an audio oscillator there by hand rather than joining established firms in the East. You could walk past it without looking up.

It wasn't the only one. A few miles south, in another garage, Steve Wozniak and Steve Jobs built the Apple I in 1976 (technically, they did not design it in the garage, more of a place to hang out). Woz designed it while still drawing a paycheck from HP. He'd offered the design to HP first, and HP passed, and he carried the boards down to the Homebrew Computer Club in Menlo Park to show the people who'd understand. For a while, the whole valley was a string of “garages”, joined by the conviction that the interesting machines were the ones you built yourself.

I think about those garages as I picture the modern version.

IMG_6576


Personal Computer

Personal computing was ideological before it was a product. The idea underneath it was that you own your tools. You don't rent them. The 1984 ad, the runner hurling a sledgehammer through the screen, is the cultural statement of exactly that idea: the machine that frees you from the mainframe.

The center of gravity in computing follows the economics of compute.

When computing was expensive, it was centralized because a smaller machine wasn’t that useful. As compute cost decreased exponentially, it decentralized, from the late '70s through the '90s, the PC. From roughly 2000 to 2020, as connectivity improved, the smartphone emerged alongside the internet, yielding a hybrid architecture.

Then large language models slammed the center of gravity all the way back.


Worse than the mainframe

The LLM era risks becoming more centralized than the mainframe era ever was.

A bank in 1970 might have rented IBM’s iron, but much of its operating knowledge still lived close to home: its data, procedures, applications, and institutional logic. The machine was expensive and proprietary, but the business did not usually rent the very capacity to think through its own work, token by token, from a remote landlord.

Now consider the AI-native startup in 2026. It may own the brand, the workflow, the prompt scaffolding, and the customer relationship. However, the core capability, the model, the training recipe, the GPUs, and the serving stack all belong to someone else. The crown jewel is metered, revocable, and repriced at the landlord’s discretion. That isn't tenancy. It's sharecropping.

And it isn't only the startups. When you, personally, open a window and talk to a frontier model, your $3,000 laptop is a glorified VT100. The AI capex boom even made owning a personal computer more expensive due to RAM and storage shortages.

The incumbents would like it to stay this way, and they tell a story for it: powerful AI is dangerous, so it should remain with careful, well-resourced, accountable actors. Some lobby to make that the law.

But centralization is just the opening phase of a technology. Capability emerged at scale, and a model that big doesn't fit on a desk yet, so it was centralized, the same way computing was centralized in 1960: an early, room-sized necessity. So is this. The models are still moving, the hardware is still moving, and we're very early in both.

The more completely a technology centralizes in its first phase, the more valuable decentralization becomes once the hardware catches up.


Restoring Forces

The pendulum swings back for three forces.

The hardware is coming

Most queries never needed the frontier. Summarize this, answer that, and transcribe the call; we could do these locally in the future. The Apple M3 Ultra can hold a quantized 600-billion-parameter model in memory today; with more advanced packaging, such as hybrid bonding, prosumer devices’ memory bandwidth will reach multiple TB/s by the early 2030s. By then, we should be able to run a trillion-parameter mixture-of-experts at reading speed. Then the model will ship the way software always did: a license on hardware you own. A model you run on your own device cannot be repriced or throttled by a remote landlord.

The standard reply is that the cloud is just cheaper. A server loads the model once and serves thousands of requests from it. A laptop loads it once and serves only you, idles the other 23 hours. Per token, the cloud wins.

However, that assumes one has bought the hardware for inference, which is not the case. The accelerators ship inside the phones and laptops people were buying anyway, bundled into the hardware and shared across everything those devices do. The cost is sunk and general-purpose. Once it's paid for and sitting there, one more local query costs you only the electricity to run it. The cloud, however well it batches, still meters every token. For anything the local model can handle, the marginal cost is essentially zero. The cloud wins where local can't compete on capability, the frontier, not on cost.

Diminishing marginal utility

Scaling may continue technically after it stops compounding economically. Revenue is productivity that someone values enough to pay for. Google can't 10x ad revenue by writing 10x the code, human or machine. Code was never the only bottleneck. Review, deployment, distribution, demand, and physical operations do not scale as quickly as tokens. Past “good enough,” much of the frontier becomes hole-digging: visible on the leaderboard, invisible on the income statement. By 2026, even the rhetoric began to soften. The same executives who had us bracing for entry-level white-collar work to vanish in 2025 were now talking about output multipliers, human bottlenecks, and slower-than-expected economic effects. Capability has certainly risen, but downstream productivity gains are not yet visible.

Speed of light

Information in fiber travels at about two-thirds of c (the speed of light), so a round trip to a datacenter is a few milliseconds at best and tens of milliseconds coast-to-coast, a floor you cannot engineer below, before any compute runs. Take voice: human conversation runs on ~200ms turns. The new speech-to-speech models make the model itself near-instant, which helps local, not the cloud. Once the model is fast, the network round trip is the only latency left, and it's the part physics caps. The cloud has a better model, but it can never match zero network latency. This applies to robotics, autonomous vehicles, and AR/VR as well. We settled this once: native apps have continued to dominate web apps on mobile. Asking for an essay, you don't care about 100ms. However, for a whole class of things, local will provide a better user experience, forever, whatever the cloud costs.

Force one says you'll be able to. Force two says it won't be worth renting anyway. Force three says for some things you'll have to.


Where we are

We are closer to 1975 than 1995. The Altair just shipped.

The tools exist and work the way the Altair did. Ollama and LM Studio run today. A 70B model runs on an M5 Max at maybe ~15 tokens per second, usable, not pleasant. Anything past 100B still hurts. The user experience isn't exactly seamless, and the person doing it is most likely a hobbyist with opinions about quantization who isn't shy about sharing them.

Mainstream personal AI, the Apple II of this era, the version normal people would actually use, is two or three hardware generations out.

But here's the tell. Every serious vendor has already hedged into local AI. At Computex this month, NVIDIA and Microsoft unveiled RTX Spark, an Arm-and-Blackwell chip with 128GB of unified memory that runs 120-billion-parameter models on a laptop, no datacenter in the loop. Satya Nadella's own pitch for it: "unmetered intelligence to every home and every desk with Windows." We will see what Apple releases tomorrow at WWDC. My bet is local + cloud hybrid AI.


The tools come home

Not every query will come home, but the default layer of intelligence shifts from a rented frontier service to an owned ambient substrate.

It goes back to hybrid, but a smarter hybrid than the 2010 cloud computing model. The cloud keeps the hard and patient work. Local takes everything that's easy enough or urgent enough.

The return does not happen along one axis. First, the stack disaggregates. Businesses stop renting the frontier lab’s crown jewel and begin owning more of the model layer themselves, even if they still serve it from the cloud. In fact, it's already happening. Cursor started as a wrapper over Anthropic's and OpenAI's APIs; now it trains and serves its own coding model. The latest was built by continuing to train on an open base (Kimi K2.5), and it is far cheaper than the API it used to resell.

Then it reaches the individual, personal AI, living on your own devices, arriving the way institutions bought workstations a decade before the PC reached the home. Most people will probably get the bundled model, the one just there in the OS, because nobody is going to manage a GGUF file. Once the model is just there, cheap and assumed, the application layer blooms on top of it, the way the PC software explosion only came after the hardware was everywhere first.

The mainframe always looks permanent. Right up until a couple of hobbyists in a garage made it a footnote. The Altair was 1975, the Apple I a year later, and the same people who called the Altair a toy are calling local AI a toy right now for the same reason, and they'll be wrong the same way.

It will be homebrew, again.