LLMs and UIs - LLMs will become the UIs
The Demise of the Traditional UI
I'm sick and tired of dealing with these ever-changing UIs and UX that does nothing but keep web devs miserable.
What is the goal of life? To write React web apps and deal with rerender issues? To argue about which JS/TS slop frameworks to use to make the 99th web app for your side project?
If the meaning of life is debugging useEffect dependencies, we've clearly gone wrong somewhere.
The goal of life is to present and absorb information. Every animal has been doing this to survive. Humans have been doing this for millenia. Clay tablets, bamboo sticks, paper, and screens are just mediums to convey information.
What's the best way to achieve this goal? This is the central question that underpins UI ever since we learned how to show graphics and text on a screen. Every medium and technique since then was developed to solve this problem. But we've increasingly strayed further from this as we developed complex and bloated solutions to solve what's essentially a communication problem.
But now, LLMs have entered the picture. LLMs will become the last bridge before Brain-Computer Interfaces (BCIs) become prevalent. LLMs are so far the most effective methods of presenting information to the users. UIs were always just a band-aid solution between wetware and hardware. UI isn't about the visual elements in the machine but rather the information transfer between humans and sometimes machines as middlemen. Every UI design and paradigms we've developed are just our best guesses at translating human intents to presentable information in screens - computer actions. buttons, forms, drop-downs. Most of them are clunky methods of thought and execution with the added burden of making users learn variations of them as UIs constantly changes.
LLMs changes that paradigm entirely by eliminating most of the UI elements and instead focusing on conveying the information to the user in the best way possible using a combination of text, audio, and visuals - all while determining when and how to use each one appropriately.
LLMs are the ultimate dispensers of information - they can guesstimate user intent, format data to be presented, execute function/tool calls, and with the right system prompts, they can be tailored to user needs. And all of this can be presented in a simple chat interface with multi-modality, code execution, and file preview capabilities in some cases. Take Claude's chat interface: it has 4 default styles to choose from which allows the LLMs to adapt their presentation of information to the preferences of the users. Doing this using web frameworks and traditional UIs is an extremely tedious process with a lot of overhead.
Think of how you access information daily - websites, screen readers, mobile apps, and paper. Imagine a data analyst seeking sales insights. Instead of navigating complex menus and interfaces, they simply state: "Show me sales trends for Q4, but only for regions that underperformed." Immediately, the LLM interprets the request, retrieves the data (leveraging SQL), generates a relevant visualization, and provides key insights. No more need for dedicated visualization tools like Tableau or PowerBI – just a direct path from intent to actionable understanding.
Currently, widespread LLM-based UIs are impractical due to infrastructure limitations and the limited reach of small LLMs on devices. However, rapid advancements in chip design and LLM architecture are poised to overcome these constraints, making widespread adoption a near-future prospect.
The future of interfaces isn't about arranging pixels - it's about conversation with information that comes from multitude of sources. Whether with a machine, human, or a machine in the middle of two humans, LLMs will be the solution. Just as humans evolved from using hieroglyphs to natural language, LLMs will enable a return to methods with less friction and overhead - freeing humans from forcing them into think in clicks, swipes, and touches. This is why chatting with LLMs is such a powerful interface, yet we're still so early on them as there's more room for optimizations.
Humans communicate their intents to each other via numerous methods. If they're close to each other, they talk. If they're far away, they still talk, but using their phone. Humans understand concepts via conversations. Two humans conversing is the default mode of communication. These are obvious things.
Yet, this doesn't seem to be so obvious to most UI/UX designers.
They say UI/UX designers spend years perfecting their art. Meanwhile, my mom just had a more meaningful conversation with ChatGPT about her grocery list and shopping items than she's had with any website in the past decade.
'Intelligence-first' paradigm using LLMs is the natural progression of UI design. LLMs are fluent in natural language queries; websites aren't. We've forced people to think in terms of visual elements to communicate their intent instead of just using natural language. This is why Whisper and other Speech-To-Text (STT) systems are such game-changers when it comes to UI/UX.
No one wants to spend 10 minutes learning a website's unfamiliar UI/UX. This is evident by how conversions and click analytics in e-commerce websites show that a website needs to be intuitive for a user to actually purchase something. You don't want Joe Shmuck, aged 55 and ready to spend $100, to spend 1 minute closing ads and pop-ups when he just wants to buy a functional backpack from your e-commerce website. Every visual element is an overhead for the user's cognition and attention span. This is why ChatGPT still rolled out that 1-800-CHATGPT feature - conversations via voice is the default mode for most people. Humans want their intents to be known and LLMs understand that better than clunky UIs using visual elements do.
I think in the future when any UI can be made using AIs, it's going to be just LLMs as the front-end in some chat-like app, then you have Web Components (see: https://developer.mozilla.org/en-US/docs/Web/API/Web_components…) as sort of 'extensions' the same way that browser extensions for browsers exists right now.
One big advantage of Web Components (WCs) is that it's standardized and designed to be modular. You can just write a simple extension in a chat-like interface for anything you want that requires dynamic interaction - like options, forms, etc.. without being burdened by the additional complexities of framework. It's a more intuitive approach as it works like a Lego piece you can just use anywhere that it's supported on.
Web Components offer the following advantages:
- framework-agnostic (no more react vs vue drama)
- self-contained (they handle their own styling and behavior)
- reusable across different LLM interfaces
- standardized (browsers support them natively)
So what do they look like? Or rather, what can they look like?
Look at Claude Chat's interface. You see those code snippets being displayed within the chat itself with proper syntax highlighting? How about that code execution sandbox that lets you play around with HTML/CSS/JS/React right on their interface? That's just a preview of what they can be. Another one are the widgets used by Perplexity to display data - charts for financial data, Election 2024 Map, Info Cards, and Scoreboards for sports matches. Granted, these are all built within React frameworks. But imagine a future where they don't need to be. That's the future we deserve - unburdened by what has been.
Using WCs, when a user uses the chat interface of LLMs or even just the voice mode, the LLM will just need to use tool calling/function calls to select the appropriate WCs for the context whenever needed. Let's go back to our earlier example about Joe Shmuck:
Current Flow:
"Alright Joe, first click this banner. No, not that one - that's an ad. Now scroll past these popups about newsletter subscriptions. See those tiny filter buttons? Yeah, click 'Bags' then 'Backpacks' then 'Price Range' then drag this slider to $100. Oh wait, you accidentally clicked 'Handbags'. Let's start over..."
Now, let's reimagine that with an LLM + WCs (using tool calls).
LLM + Web Components Flow:
Joe: "I need a sturdy backpack around $100"
LLM spawns product-grid component with pre-filtered results
LLM: "Here are some options. This North Face one's popular with hikers"
Joe: "That looks good. Does it fit a laptop?"
LLM swaps to product-detail component showing compartment specs
LLM: "Yes, fits up to 15-inch laptops. Want to see it in different colors?"
Joe: "Nah the black is fine. I'll take it"
LLM seamlessly handles payment through saved preferences.
LLM: "Perfect, you're all set. I'll keep you posted on the important stuff - when it ships, when it's out for delivery, and when it's at your door. No need to check tracking numbers or dig through emails. Anything else you need?"
See the difference? No cognitive overhead. No learning curve. Just pure intent-to-action. It's like having a knowledgeable sales assistant who actually knows what they're doing instead of a maze of buttons, menus, and popups fighting for Joe's attention. This is the difference between 'Intelligence-first' paradigm compared to a 'Framework-first' paradigm.
The AI will handle everything else after the checkout. No more hunting down tracking numbers or playing shipping carrier roulette. Say goodbye to the endless spam of "your order has been received" emails. With this, Joe gets only the important updates, when they're important. The LLM becomes his personal shopping concierge, quietly tackling all the boring parts of online shopping. Joe stays perfectly in the loop, feeling confident and at ease with his purchase.
No need for any goddamn CSS classes stacked upon on top of each other to the point of reaching 25+ classes in a single HTML element (I'm looking at you, Perplexity. You guys are the worst offenders in recent memory). No need for any complex state management. No need for Javascript frameworks war.
And We Will Be Happy.
Notice how in the example that involves LLMs + WCs, the accessibility component of UI and UX has also been solved quietly? Let's consider another example with a focus on accessibility.
An elderly person (perhaps living alone) wanting to buy groceries online currently faces a gauntlet of complex menus and form validations. With an LLM-powered interface, they could simply say, "I need my usual groceries, but add some bananas this time." The LLM would intuitively understand their shopping history, preferences, and handle all the underlying complexity.
This approach is particularly powerful for vision-impaired users because the interaction model remains consistent. They are not relegated to a "special" version of the interface, nor do they have to contend with screen reader inconsistencies. Instead, the LLM adapts its responses and tool usage based on the user's specific needs, ensuring they feel like they're using a full-fledged service, not a compromised alternative. Do you doubt this? Well, in some markets or local communities, there are still people that take orders via a phone call or mail. The difference in the example I gave is that LLMs will handle those kind of situations better than any human can - because LLMs can be consistent. They don't get emotional. They don't get stressed. Unless you want them to.
So if the future of the front-end systems will look like this, what would the future of the back-end look like?
The current state of back-end systems, at least those that are well-designed, is fine for handling LLMs. After all, an LLM can read JSON/XML/HTML responses just fine. For the most part, the back-end developers can enjoy their lives.
Nah, I'm kidding. At least with the latter.
Back-end systems will need to be optimized for real-time responsiveness on a massive scale. Whereas the current state of websites nowadays deal with thousands of concurrent requests for large amount of traffic, expect this to at least have a linear progression in traffic (if we're lucky) with the amount of API calls that will be driven by LLMs using tool calls/function calls and involving multiple retries. Concurrency and parallelism methods will need to be prevalent.
Remember the example earlier regarding Joe's smooth backpack purchase facilitated by an LLM? Let's break down how this actually works, currently. When Joe says, "Show me black backpacks under $100," the LLM doesn't just take that as a vague request. Instead, it:
- Parses the Intent: It translates that natural language request into structured API parameters, essentially understanding what the user really means in the language of a database or service.
- Makes API Calls: It then uses those parameters to make the correct calls to the product APIs, fetching the relevant information.
- Interprets Responses: It understands the JSON or XML responses from those APIs, extracting the data it needs.
- Translates to Conversation: Critically, it doesn’t just present that raw data to the user. It translates it back into a natural, conversational form, like "Here are some black backpacks under $100."
- Handles Errors: It can also handle error states and edge cases smoothly, like if no backpacks match that criteria, responding with something like, "I couldn’t find any exact matches, but here are some similar options."
Right now, most of those are handled by the AI labs themselves that serve LLMs inference via APIs or LLM inference providers themselves. In the future, the current systems will need to be upgraded properly with a focus on observability and Chaos Engineering (think fault tolerance and controlled system-wide slow-downs). Here are some of my guesses on what aspects of back-end systems will need improvements.
Real-Time Responsiveness
Backend systems necessitate immediate responsiveness, as natural conversations are inherently time-sensitive. Consider an e-commerce platform where a Large Language Model (LLM) simultaneously verifies inventory, pricing, and user preferences while maintaining a seamless conversational flow. This requires:
- Streaming-first architectures capable of transmitting data in real-time.
- Event-driven systems that can manage asynchronous updates, such as shipping notifications for user orders.
- Advanced caching layers designed to anticipate subsequent LLM queries.
Tool Orchestration
The backend acts as a critical intermediary, enabling Large Language Models (LLMs) to effectively leverage external services. This is significantly facilitated through Tool Calling capabilities offered by LLM inference providers, where LLMs consult predefined catalogs of available tools. This process entails:
- Dynamic Tool Discovery: LLMs, through mechanisms provided by the inference provider, can automatically identify and access standardized APIs listed within the catalog. This eliminates the need for manual configuration, allowing the LLM to adapt to new functionalities dynamically.
- Function Catalog Lookup: The inference provider furnishes a function registry or catalog, allowing LLMs to understand the purpose and input requirements of each available tool. This empowers the LLM to choose the appropriate tool for a specific task, guided by the catalog's descriptions.
- Provider-Managed Middleware: LLM inference providers typically handle rate limiting, retry logic, and other operational considerations internally, streamlining the interaction between the LLM and the external services. This offloads the complexity of these tasks from the backend application, promoting smoother and more efficient operation.
Enhanced State Management
Conversational state presents unique challenges distinct from traditional session management. Systems must:
- Maintain context across multiple turns of an interaction.
- Process partial updates without disrupting the conversation's flow.
- Manage user preferences and interaction history in a user-friendly and intuitive manner.
Fault Tolerance
LLMs exhibit human-like behavior by retrying and rephrasing queries. Consequently, backend systems provided by LLM inference providers must be robust and resilient:
- Implement intelligent request deduplication.
- Ensure graceful degradation of service during downtime.
- Incorporate context-aware error handling, enabling the LLM to understand and relay error messages to users effectively.
We've spent decades watching front-end devs fight over frameworks, back-end devs over-engineer simple CRUD operations, and UI/UX designers perfect the art of making simple tasks complicated. All while Joe Shmuck just wants to buy a damn backpack without fighting pop-ups and newsletter forms,grandma wants her groceries, and that blind user just wants equal access to the web.
Organizations, whether small or large, spent countless hours masturbating to complexity while users just want shit that works. LLMs + Web Components are the simplest building blocks we need for a good UI and UX. You need a framework? The best framework is a conversation. And the best user experience? It's the one where you don't need a UX designer to explain how it works. No more "innovative" filter systems. No more "intuitive" user journeys. No more accessibility overlays pretending to fix your broken UX. Just pure human intent translated into action.
Funny how we needed artificial intelligence to remind us how natural intelligence actually works.
Intent Is All You Need.