The world is watching as OpenAI prepares to release its "Operator" computer-use agent to the public, and deepseek has just upset the balance with an open-source reasoning model R1, which is as good as or better than o1. Still, o3 is around the corner.

Kurzweil, I never doubted you.
The best part about living in this moment in history is the perceptible rate of change with respect to AI models and their benchmarks. Granted, there's now concern that OAI is cooking the benchmarks, specifically in an upcoming math-focused one from FrontierMath. But for all these pictures, I'm still waiting for the agent.

❌ Source: trust me bro from X
I deserve an agent because...
It turns out that scrolling some websites some of the time is a good way to make yourself aware of things you ought to have already known. That is why I want a dang agent! I know how bad it is out there online, and it's a waste of my gosh darn time. So I had the bright idea of contracting a robot to scroll my timeline for me and record anything that actually, you know, matters. This is how I recently became aware of some new git tricks (apparently I'm a git boomer).
Another thing I spend a lot of time doing is manually checking my stock and option trades on my phone. Now, obviously there are ways you can set up alerts and conditions and generally use a fine-tooth comb to move toward agentic-ish trading. But my trade ideas can be expressed in two sentences, and conditions add another two, so why does it require no fewer than 10 taps to even begin investigating the possibility of executing my idea? Rolling my own trading dashboard soon?
So far, the tech demos have been rather unimpressive, if I'm honest. I read a stat that something like 75% of their clicks don't even land on the target element, but that's old data now, considering how the rate of change is compressing. OAI Operator is launching with the promise of helping you order Instacarts, Ubers, and concert tickets, or you can book a flight... uhh, okay.
Cron job
That's the promise, isn't it? The perfect cron job? One that understands you, what you meant, who you are, and what you actually wanted. Imagine the dependency graph on a single task of even medium complexity. Now it's watching at least three files or video feeds, and all you did was state in two sentences what you want. That's AI, baby.
Near term
I'm beginning to build-out my own agentic systems on a my home lab, and I'm trying to settle on a framework. I've been using Langchain for a while now, and I'm just beginning a kind-of microservice approach to building out my own agentic systems. More details on twitter 247b0t. Soon.