A chatbot reads. An assistant acts.

Post #2 in the Lumogis launch series. Post #1 argued that the AI should come to your data, not the other way around. The fair objection is that the model in your browser tab is smarter than anything you can run at home. True, and beside the point. This post is about what "smarter" leaves out.

You have done this. You open a chat product, you have a question about your own life, and the first thing you do is go find the file. You export the PDF, you paste the paragraph, you upload the spreadsheet. The model is brilliant and the model knows nothing, so you spend the first two minutes of every conversation being its research assistant.

That is the tell. A chatbot reads what you hand it. It has no standing relationship with your documents, your past conversations, or the names that recur across both. Each session starts cold, and you warm it up by hand.

An assistant is the other shape. It already has your archive indexed. You ask a question and it goes and finds the answer, opens the file that matches, and replies from what was actually written down. The difference is not the size of the weights. It is where the assistant runs and what it is allowed to touch.

The ceiling is not intelligence

Drop the smartest available model into a sandbox and it is still a stranger to your data. The session is stateless. Your mortgage, your insurance renewal, the email thread from last spring: none of it exists in that window until you carry it in.

Even "upload a file" does not change the relationship, it just relocates it. The bytes go to their storage. They index them on their side. Your corpus becomes a guest in their environment, retrieved by their retriever, reached by their tools. For a throwaway question that is a reasonable bargain. For a living archive, one where documents change, conversations accumulate, and credentials sit next to calendars and notifications, it is the wrong place to put any of it.

The fix is not a better model. It is moving the machinery that reads and acts to the same side of the wall as the data.

Where Lumogis puts the machinery

Lumogis runs as a control plane on your own hardware. Core is a FastAPI orchestrator. Postgres keeps metadata, permissions, and entity records. Qdrant keeps the embeddings for your documents and your past conversations. Ollama does the embedding by default and can run the model locally too if you want nothing leaving the box at all.

You point ingest at folders you already use. Core watches them, pulls out the text, chunks it, embeds it, and writes the vectors locally. The index of your life lives on your disk.

Then the chat loop stops being a recitation from training data. When the model supports tools, Core runs a bounded tool-calling loop: the model asks for a tool, Core runs it against your stores and your filesystem, hands back the result, and the model continues until it can answer. A permission check sits in front of every one of those calls. Read access is the default Ask mode. Anything that writes is Do mode, turned on deliberately and per user. The assistant cannot quietly reach further than you let it.

Watching it work

Here is the whole path, using nothing but the open build.

Drop a PDF into the inbox folder. The watcher notices it, waits for the write to finish, and feeds it to the pipeline. Core extracts the text with pdfminer, falling back to OCR on scanned pages when that is switched on. If the file has not changed since last time, a content hash skips it, so re-ingesting a folder is cheap rather than wasteful. The text gets chunked into roughly 512-token pieces along sentence boundaries with a little overlap, each piece is embedded through Ollama, and the vectors land in Qdrant tagged with your user and the file's path. Names found in the text are pulled out and recorded in Postgres and a separate entities collection. No part of that touched a cloud.

Now ask, in Lumogis Web: when does my home insurance renew?

Before the model even starts, Core can fold in your recent session memory and, if you have auto-RAG on, a few relevant chunks fetched under the same access rules as search. That is a convenience, not a crutch, and the loop runs fine without it.

The model calls search_files. Core embeds your question, searches Qdrant with dense vectors and BM25 fused together, optionally reranks the candidates with a cross-encoder, and returns the strongest chunks with their paths and scores, filtered to what you are allowed to see: your own scope, plus whatever household material you have chosen to share. The model picks the most promising hit and calls read_file on it. Core reads up to a few thousand characters from disk, but only if that path lives under an ingest root you configured. Ask for anything outside those roots and it is refused. If you named the insurer, the model can call query_entity, which looks the name up in Postgres, falls back to a semantic match in Qdrant, and reports everywhere that name has surfaced across your files and conversations.

Search, open, read, answer. The model supplied the reasoning. Your machine supplied the facts.

The same archive, through other agents

The loop above is not the only way in. Core publishes a curated surface over MCP at /mcp on the same port, so any agent that speaks it can reach your memory and search without you pasting context into its window. Point a client like Claude Desktop at it and call context.build, which runs document search and session retrieval locally, merges the results, and returns a context string capped to a budget along with its sources. Or reach for memory.search, entity.lookup, and the rest of the tools Core declares in its manifest at GET /capabilities.

The outside model still never receives your archive. It receives what Core retrieved and what the agent asked for, scoped to your user, and nothing more.

If you want to grow the surface, the unified catalog at GET /api/v1/me/tools reports everything Core can see: the built-in tools, the MCP ones, and any healthy capability services you have registered alongside it. With the catalog on, the chat loop folds those extra tools in for the length of a single request, with bearer trust and a permission check on each call. Capability containers can bring their own write actions. Stock Core ships the read path and the Ask/Do gate rather than a fixed menu of automations, so the household decides what its assistant is permitted to do.

Why none of this travels

Three kinds of locality hold the whole thing together.

Your corpus is local. The chunks and embeddings are computed on your hardware and Qdrant never syncs to a vendor's index. A cloud product would have to make you upload the same material all over again, on its terms, to match what you already have.

Your credentials are local. Connector secrets, model keys, notification endpoints, calendar logins, all sit encrypted in Postgres. MCP tokens are minted and revoked per user. When a tool reaches out, it carries credentials Core resolved on the box, not a share of some pooled SaaS OAuth grant.

Your execution is local. search_files, read_file, and query_entity run inside Core against your own paths and stores. The MCP tools call the very same services. The optional capability services are containers you deploy next to Core, not infrastructure hidden in someone else's account.

You can still send the final answer to a cloud model if you want the sharper weights. That is a clean, visible trade: only the excerpts Core assembled for that one turn cross your network, never the archive behind them. Or you keep the whole completion on the box with a local model. Either way the assistant's reach into your data stays at home.

The comparison that matters

Raw intelligence is the wrong scoreboard. A cloud chatbot with no tools is a brilliant generalist working from memory and whatever you remembered to paste. A modest local model with retrieval and a permissioned tool loop is a specialist wired into your files, your sessions, and your entity index, and it gets more useful every time you add to the pile, because the next question is already cheaper to answer.

That is the wager Lumogis makes. We are opening the AGPL build for people who would rather read the mechanism than the manifesto. Clone it, run Compose, drop a file in the inbox, and watch the orchestrator ingest it, search it, and work your question through the loop. The code paths in this post are the whole product.

Post #1 was your data stays home. Post #2 is your assistant works there too.

Lumogis is AGPL-3.0-only. Documentation: README, capabilities overview, architecture.

A chatbot reads. An assistant acts.

The ceiling is not intelligence

Where Lumogis puts the machinery

Watching it work

The same archive, through other agents

Why none of this travels

The comparison that matters

Comments

More from this blog

What local-first actually costs

The AI comes to your data. Not the other way around.

Command Palette

The ceiling is not intelligence

Where Lumogis puts the machinery

Watching it work

The same archive, through other agents

Why none of this travels

The comparison that matters

Comments

More from this blog