{"id":"49fsxapt35agwyd","title":"How we’ll shape the web of 2026","slug":"how-well-shape-the-web-of-2026","summary":"The web of 2026 should be faster, more personal, and less bloated. The best work will use AI and motion carefully instead of making every page feel like a demo reel.","imageUrl":"https://briancrabtree.me/images/journal-how-well-shape-the-web-of-2026.webp","category":"UX Theory","date":"2025-12-13","featured":false,"likes":44,"author":"Brian Crabtree","content":"<h2>Architecting for Chaos</h2>\n\n<p>The transition we are undertaking is moving us away from comfortable, predictable request-response cycles. The old monolithic patterns, and even the \"simple\" microservices we spent the last ten years arguing about, are too slow and too rigid for this. To build a system that reacts to user intent in under 100 milliseconds, you cannot rely on a chain of synchronous HTTP calls. You need a nervous system.</p>\n\n<p>We are building highly distributed, event-driven architectures where the application state is a moving target, constantly mutated by inference engines. This is not optional. If your application waits for the user to click a button before it starts thinking, you have already lost. The system needs to be executing ahead of the interaction.</p>\n\n<h2>The Distributed Intelligence Layer</h2>\n\n<p>Let’s talk about the so-called AI Copilot. Marketing departments love this term because it sounds helpful and friendly. To an engineer, it is a nightmare of race conditions and consistency issues. This is not a single service you can containerize and forget. It is a diffuse layer of logic that sits between your backend and the user’s glass.</p>\n\n<p>This intelligence layer operates as a continuous feedback loop. It watches, it thinks, and it mutates. To make this work without turning your application into a sluggish mess, you need to be fanatical about four specific architectural pillars.</p>\n\n<p>Everything is an event now. A mouse movement, a scroll depth change, a shift in device orientation, or a sudden drop in network fidelity. In the past, we ignored most of this noise. Now, it is the fuel for the inference engine. You need to capture high-volume, low-latency streams without choking the main thread.</p>\n\n<p>We are using tools like Apache Kafka or AWS Kinesis not because they are trendy, but because they handle backpressure. If you try to build this with a standard REST API, you will DDOS yourself. The critical piece that most junior engineers miss here is schema enforcement. You cannot just throw JSON blobs into a topic and hope for the best. You need strict typing using Avro or Protobuf. If the data quality in your stream degrades, your AI starts hallucinating, and your UI breaks. Garbage in, absolute chaos out.</p>\n\n<p>This is where I see the most bloat. You cannot round-trip to a centralized GPU cluster for every interaction. The latency will kill the perceived intelligence of the application. If the user hesitates over a \"Buy\" button and you want to offer a dynamic discount, you have milliseconds to decide. A round trip to `us-east-1` takes too long.</p>\n\n<p>We are pushing inference models out to the edge. We are using edge functions or, better yet, running them directly in the browser. WebAssembly has finally found its killer use case here. We compile models using ONNX Runtime or TensorFlow.js and execute them on the client device. This keeps the compute cost off our cloud bill and keeps the data local, which is a massive win for privacy. However, this introduces a new constraint: bundle size. You cannot ship a 500MB model to a mobile phone. You need quantized, lightweight models that do one thing well, rather than a massive General Purpose LLM that tries to write poetry when you just need it to sort a list.</p>\n\n<p>Somewhere, a decision has to be made. The orchestration service is the brain that sits on top of these inference results. It takes the prediction from the model, combines it with the user's profile from a low-latency key-value store like DynamoDB, and applies the hard business rules that keep the legal team happy.</p>\n\n<p>This orchestrator constructs what I call the personalization payload. It is a concise instruction set that tells the frontend how to mutate. It might say, \"Swap the hero image for Video B,\" or \"Change the call-to-action text to aggressive mode.\" This service is the single point of failure for the experience. It needs circuit breakers and aggressive bulkheads. If the AI service times out, the orchestrator must immediately fall back to a \"dumb\" default. A static site is better than a broken spinner. We also see a lot of Retrieval Augmented Generation (RAG) here, grounding the AI responses in actual company data so we don't end up promising products that don't exist.</p>\n\n<p>This is where the rubber meets the road, and usually, where the tires blow out. Modern frontend frameworks like React, Vue, or Svelte are powerful, but they struggle with the kind of dynamic, AI-driven mutations we are throwing at them. We are asking components to be \"intelligent,\" to subscribe to streams and rewrite their own DOM structure on the fly.</p>\n\n<p>The biggest technical hurdle here is hydration mismatch. If you are doing Server-Side Rendering (SSR) for performance—and you should be—the server paints one version of the world. Then the client-side JavaScript wakes up, runs an inference model, and decides the world should look different. If you aren't careful, the screen flickers, the layout shifts, and the user feels like they are fighting the interface. This is the Uncanny Valley of web design.</p>\n\n<p>To fight this, we are moving toward Island Architectures and partial hydration. We keep the majority of the page static and lightweight, isolating the \"smart\" components into self-contained sandboxes. We minimize the JavaScript payload at all costs. The goal is to have the initial paint be instant, and the intelligence layer fade in gracefully without jarring the user. We are effectively hiding the complexity of a supercomputer behind a CSS transition.</p>\n\n<h2>The Unforgiving Baseline</h2>\n\n<p>None of this matters if the site is slow. I don't care how smart your AI is; if it takes three seconds to load the Largest Contentful Paint, the user is gone. The challenge for senior engineers today is not just adopting these new technologies but resisting the urge to over-engineer them. It is tempting to add every bell and whistle, every sensor input, and every possible personalization vector.</p>\n\n<p>But the best systems are the ones that know when to shut up. We are building for intelligent adaptability, but we must build on a foundation of raw performance and accessibility. We need to respect the device, respect the battery life, and respect the user's data. This isn't just about writing code anymore; it is about architectural discipline in an era of infinite compute and infinite bloat. Keep it fast, keep it smart, and for the love of code, keep it simple. For a related angle I keep coming back to, see <a href=\"/journal/how-this-site-is-built/\">How This Site Is Built (Reference Stack)</a>.</p>","tags":["React Server Components","Next.js App Router","Kubernetes Operators","GraphQL Subscriptions","WebAssembly Modules"],"views":122}