top of page

Building a Simple Streamlit AI Chatbot with OpenRouter Auto-Routing

  • Ken Munson
  • Jun 7
  • 6 min read

Executive Summary

In this mini-project, I built a small Streamlit-based AI assistant (an AI Chat Bot) and integrated it with OpenRouter so the application could dynamically route prompts across many large language models.


The app itself is intentionally simple:

  • A local Streamlit web interface

  • A text input box for prompts

  • An OpenRouter-backed model call

  • A cost/quality slider (to indicate what kind of model OpenRouter should choose)

  • Copy buttons for both the user question and model response

  • Visibility into which model OpenRouter actually selected (printed on screen at the end of each response).



The real value of the project was not the UI. The value was understanding how easy it is to move from a single-model API call to a multi-model routing architecture.

Instead of hardcoding one model provider and one model, the app now uses OpenRouter’s openrouter/auto router and lets OpenRouter decide which model to use based on the prompt and a configurable cost/quality tradeoff (the slider).


Phase 1: Starting with a Direct OpenAI API Call

The first version of the app was very basic.

It used:

  • Python

  • Streamlit

  • The OpenAI Python SDK

  • A local environment variable for the API key

  • A hardcoded model value

The basic flow was:

Streamlit UI → OpenAI SDK → OpenAI API → Model response

The initial application worked, but I quickly ran into a few useful learning moments.

The first was API key management. The app was reading OPENAI_API_KEY from the local environment, but that key did not match the keys visible in the OpenAI API platform projects. This caused quota-related errors even though the account appeared to have credits available.

That debugging step reinforced an important operational point:

The API key determines the project, billing context, and access path used by the application.

Once I created a clean API key in a dedicated project and made the environment variable persistent, the app worked correctly.


I seem to spend half my life dealing with API keys! There is some valuable scar tissue there however.


Phase 2: Verifying the Actual Model Used

The next lesson was model identity.

I initially asked the app:

What model am I talking to?

The model gave inconsistent answers, at one point claiming to be GPT-3.5 and later GPT-4.1.

That was a useful reminder:

A model’s self-description is not a reliable way to identify the model.

The reliable source is the API response metadata.

So I updated the app to display:

response.model

That gave direct visibility into the actual model returned by the API. This became especially important once OpenRouter was added, because OpenRouter may route openrouter/auto to different underlying models depending on the prompt and settings.


Phase 3: Moving from OpenAI Direct to OpenRouter

The OpenRouter integration went surprisingly well.

Instead of using the OpenAI SDK against OpenAI’s default endpoint, the app was updated to use OpenRouter’s OpenAI-compatible endpoint:

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)

The model was changed from a specific OpenAI model to:

MODEL = "openrouter/auto"

That changed the architecture from:

Streamlit UI → OpenAI SDK → OpenAI model

to:

Streamlit UI → OpenAI SDK → OpenRouter endpoint → Auto-selected model

This is the key architectural shift.

The application no longer needs to know in advance whether the best model for a prompt should come from OpenAI, DeepSeek, Anthropic, Google, or another provider. It simply sends the prompt to OpenRouter and receives both the answer and the actual model selected.


Phase 4: Adding a Cost/Quality Tradeoff Slider

This is pretty cool I have to say. After the OpenRouter call was working, I added a Streamlit slider for the OpenRouter auto-router cost/quality tradeoff.

The slider runs from:

0  = favor highest quality (generally more expensive tokens)
10 = favor lowest cost

In the UI, this became a simple control:

OpenRouter cost/quality tradeoff: [0 -------- 10]

This turned the app into a small routing experiment.

With the slider set around 7, OpenRouter selected a stronger model for even simple prompts. With the slider moved to 9, OpenRouter selected a cheaper/faster model for straightforward questions.

That was the first moment the app became more than a chatbot wrapper.

It became a way to observe model routing behavior.


This is my attempt at controlling token cost now that the foundation labs have decided to IPO, which means they need to start being profitable. Of note, it has been my experience sending the same query to different models that the questions I ask, and see other people asking, don't really require the latest and greatest model. Admittedly I don't have a 100 agents refactoring some 200 file repo but that is the beauty of OpenRouter - choose the level of model you need for the task at hand.


Phase 5: Adding Copy Buttons and a Scrollable Response Box

The next usability improvement was adding copy buttons.

I wanted the app to feel a little more like a modern chat interface, so I added:

  • A copy button for the user’s question

  • A copy button for the assistant’s response

  • A scrollable response area for long answers

This required a small custom HTML component inside Streamlit.

A first implementation mostly worked, but long model outputs exposed a display issue: the response appeared to be too short. Copying the response into Notepad revealed that the full answer was actually present. The custom display box was simply clipping the visible content.

The fix was to make the response area scroll internally:

max-height: 520px;
overflow-y: auto;

That kept the page compact while still preserving the full response for reading and copying.

The final UI included:

Prompt input
Ask button
Copyable question box
Copyable assistant response box
Cost/quality slider
Actual selected model

Final Architecture

The final local architecture looks like this:

User
  ↓
Streamlit UI
  ↓
OpenAI Python SDK
  ↓
OpenRouter API endpoint
  ↓
OpenRouter auto-router
  ↓
Selected model provider
  ↓
Model response
  ↓
Streamlit display + copy controls

The application remains intentionally lightweight, but the model path is now flexible.

Instead of writing separate integrations for different providers, the app can use one OpenAI-compatible interface and let OpenRouter handle model selection.


What I Learned

1. API Keys Are Part of the Architecture (and a necessary evil).

This project started with a simple quota error, but the root cause was an environment variable pointing to the wrong key.

That reinforced a practical lesson:

Local environment variables are infrastructure.

If the wrong key is loaded, the application may run under the wrong project, wrong billing context, or wrong provider account.


Side bar: Concentrate on API hygiene. If you can avoid even displaying the key on a terminal, that is best (easy to do - just ask you LLM). Obviously you don't copy it into your source code - you use an env variable for that. And the number of API keys on GitHub is ridiculous so, may sure you key doesn't get copied up to GitHub. Ask you LLM to expand on methods to avoid this!!!


2. Never Ask the Model What Model It Is

The model’s answer to “what model are you?” was unreliable.

The correct approach was to inspect the API response metadata.

For OpenRouter, this became especially useful because the requested model was:

openrouter/auto

but the actual model selected could vary by prompt and routing settings.


3. OpenRouter Makes Model Experimentation Easier

The most interesting part of this project was how little code changed when moving from OpenAI direct to OpenRouter.

The main changes were:

base_url="https://openrouter.ai/api/v1"

and:

MODEL = "openrouter/auto"

That small change opened access to a multi-model routing layer.


4. Cost Control Needs to Be Visible

The cost/quality slider made the routing behavior tangible.

Instead of abstractly saying “use cheaper models when possible,” the app exposed the routing preference directly in the UI.

That made it possible to test the same prompt at different cost/quality settings and observe which model OpenRouter selected.


5. UI Details Matter Even in Small Tools

The copy buttons and scrollable response box were small additions, but they made the app much more usable.

For a local AI workbench, the ability to quickly copy both the prompt and response matters. It supports testing, documentation, reuse, and comparison across models.


Why This Matters

This was not a complex application.

That is partly the point.

In a very small amount of code, the app demonstrates several important AI engineering concepts:

  • API key isolation

  • Environment variable management

  • Provider abstraction

  • OpenAI-compatible SDK reuse

  • Dynamic model routing

  • Cost/quality tradeoff control

  • Response metadata inspection

  • Lightweight local UI development

  • Usability improvements for prompt/response workflows

The app is simple, but the architecture points toward a larger pattern:

Do not bind every AI application directly to one model.
Build a routing layer into the design.

That routing layer may be OpenRouter, a custom router, an internal enterprise gateway, or a policy-controlled model broker. The core idea is the same: separate the application experience from the model selection decision.


Final Reflection

The most valuable part of this mini-project was seeing how quickly a basic Streamlit app could become a model-routing workbench.

The UI stayed simple.

The code stayed readable.

But the application moved from:

Call this one model

to:

Route this prompt to an appropriate model based on cost and quality

That is a meaningful shift.

For small experiments, this makes model comparison easier. For larger systems, the same pattern becomes part of cost governance, reliability engineering, and AI platform design.

This project was a reminder that sometimes the most useful learning labs are not large systems. Sometimes they are small tools that make the invisible architecture visible.


GitHub Link:


Code for this mini-project is available on GitHub:

Comments


bottom of page