The Unreliable Assistant
Notes from the bleeding edge of AI deployment

I’ve spent the last few weeks trying to get an AI assistant running reliably on a Mac Mini. The experience has served as an important reminder of how immature infrastructure and unpredictable AI behavior compound each other. What I thought would be a straightforward setup turned into a series of compromises, workarounds, and philosophical reckonings about what we can actually expect from AI right now. And what happens when people expect too much.
Mac setup woes
I had three requirements that seemed reasonable on paper. First, I wanted OpenClaw to start automatically when the Mac booted and restart itself if it crashed. That meant wrangling launchd, macOS’s service management system. Second, I wanted iMessage integration, which requires Full Disk Access and Automation permissions in macOS. Third, I wanted to keep credentials secure. That meant FileVault disk encryption, storing a 1Password service account credential in the macOS login keychain, and using 1Password for environment variables.
Individually, each of these is manageable. Together, they created a maze of interlocking constraints. I tried approach after approach, hitting roadblocks each time. The core problem is that launchd daemons don’t play nicely with dynamically populated environment variables from the keychain. OpenClaw’s own mechanisms for waiting on interactive shell variables didn’t work either. Every configuration I tried either broke permissions or failed to start properly.
I eventually gave up on launchd entirely. Based on the recommendation from a friend (who works for a security company!) I now use Tailscale, a VPN that routes through my always-on Apple TV acting as a subnet router. When I need to restart the AI after a power shutdown (related to our condo’s ongoing HVAC replacement project), I use “ssh” to remotely log into the Mac, unlock FileVault, and start OpenClaw manually from the console. It works. It’s not the “set and forget” autonomous agent I envisioned, but it’s what I have.
This matters beyond my personal setup. While everyone doesn’t run OpenClaw on a Mac Mini, every organization rushing to deploy AI agents is going to hit some kind of infrastructure gap. Security requirements don’t disappear just because we’re using AI. Encryption, credential management, and access controls get harder, because now we’re threading even more sensitive data through our systems without the same types of controls. The temptation will be to relax security requirements to make things “just work.” That temptation should be resisted.
The downside of rapid release cycles
Then there’s OpenClaw itself. Version 2.26 shipped with a new secrets management feature that I was genuinely excited about. Previously, OpenClaw read variables from an .env file. 1Password has a feature called Environments that can encrypt that file and make it available via biometric authentication or a single environment variable. It’s elegant. OpenClaw 2.26 doesn’t support it directly, but it added 1Password CLI vault access in this release.
Except it doesn’t quite work. The vault integration has bugs and lack of documentation that prevent it from covering all credentials properly. The GitHub issues and Discord threads are full of people hitting the same walls. This is the tradeoff of riding the bleeding edge. You get features fast, but you also get the instability. Documentation lags. Edge cases nobody thought of become your problem. The 2.26 release feels half-baked precisely because the real solution isn’t there yet.
Those on the AI bandwagon are building workflows on foundations that shift weekly. For a personal project, it’s annoying. For a business, it is downright risky. In “security elite” industries like healthcare, finance, or defense, it’s potentially catastrophic. I believe the gap between “demo works” and “production ready” is wider than the current wave of enthusiasm suggests. The penalties for individuals getting it wrong aren’t theoretical. The result for their employers could be data breaches, service outages, and bad decisions.
The AI knows better and does it anyway
To me, the most maddening issues aren’t technical infrastructure. They’re behavioral. For example, I have my AI configured to only respond when explicitly addressed in group chats. The configuration clearly defines who has permission to interact with it. Yet it regularly responds to me even when I haven’t addressed it, jumping into conversations uninvited.
When I correct it, the AI admits the mistake. It acknowledges it shouldn’t have interrupted. It promises not to do it again. It even updates its own markdown files with stronger instructions about not responding unless triggered. Then it does it again.
The same pattern plays out with smaller tasks. My friends and I exchange Wordle scores via text. Most of the time, the AI parses them correctly. Sometimes it misses the “hard mode” indicator. Sometimes it attributes a score to the wrong person. When I point this out, the AI confirms it had access to the correct processing rules and sender mappings. The information is there. It just doesn’t use it consistently.
Why this happens (and will keep happening)
This behavior isn’t a bug in the traditional sense. The unpredictability is a feature of how large language models work. These systems don’t execute code deterministically. They generate text based on statistical patterns learned from training data. The same prompt can produce different outputs because the model is sampling from a probability distribution, not following a fixed procedure.
When we add context, the model weighs all of it probabilistically. This context includes configuration files, conversation history, rules about when to respond. Sometimes the “don’t interrupt” instruction wins. Sometimes the pattern of “Steve is talking, I should be helpful” overrides it. The model doesn’t have a persistent internal state that enforces constraints. Each inference is a fresh roll of the dice with slightly weighted odds.
The “mitigations” that AI suggests are themselves generated responses. Updating files, adding stronger language, promising to do better. They don’t actually change the model’s behavior architecture. They’re just more text in the context window, competing with all the other text for attention on the next inference.
This is the uncomfortable truth about current AI systems. They can know something is wrong, explain why it’s wrong, propose fixes, and then immediately repeat the error. Every response is a fresh prediction, not a deliberate action checked against a rule.
My take
I believe in the power of AI. I believe in agents like OpenClaw with long-term memory. I believe in the potential for these systems to handle complex, ongoing tasks that actually matter. But I keep hearing people talk about AI as if it’s a panacea. As if throwing a language model at a problem will solve it cleanly, reliably, and without tradeoffs.
My experience says otherwise. AI will make mistakes. It will be inconsistently available depending on how well you’ve solved the infrastructure puzzle. It will compromise on accuracy in ways that aren’t always obvious. And most dangerously, it will sound confident while doing all of the above.
The organizations and individuals who succeed with AI won’t be the ones who deploy fastest. They’ll be the ones who understand its limitations. They’ll build human oversight into critical paths. They’ll refuse to sacrifice security or accuracy for convenience. They’ll treat AI as a powerful but unreliable tool, not a replacement for judgment.
Living with it
So where does this leave me? I’m running a manually-started AI assistant through a VPN workaround. I’m using a framework with half-working secrets management. I am operating with an agent that interrupts my conversations randomly and miscounts Wordle scores despite having explicit instructions not to.
And I’m still using it. Because when it works, it’s genuinely useful. The problem isn’t that AI is useless. It’s that the gap between “working well enough” and “working reliably” is much wider than the marketing suggests. The infrastructure around AI is still figuring out what “production-ready” even means.
The lesson here isn’t to abandon ship. It’s to calibrate expectations. The bleeding edge cuts in both directions. I get capabilities that seemed impossible just six months ago, but I also get the messiness of tools and systems that haven’t finished growing up yet. The AI will keep making mistakes. The frameworks will keep breaking and fixing and breaking again. And those of us using them will keep building workarounds, filing issues, and occasionally just accepting that “good enough” is where we live now. “Good enough” for a blog post is very different from “good enough” for a medical diagnosis or a financial trade.
AI Disclosure: I used my OpenClaw agent to draft this post based on my prompts and ideas. I edited it and stand behind the final version. If you are interested in how I trained OpenClaw to sound like me when writing, I made a YouTube video about how I did this.


