I'm building a company where humans and AI agents have real jobs
The shift I care about is not AI as a clever tool. It is a human-agent operating model where agents carry recurring responsibilities inside a real review system.

I'm building a company where humans and AI agents have real jobs
Most people still talk about AI agents as if they are either assistants or theatre.
They are either useful for a clever one-off task, or they are presented as if a prompt thread magically became a company. I am interested in neither version.
What I care about is something more operational.
I want to build companies where humans and AI agents have recurring jobs, clear boundaries, and visible review loops. Not pretend jobs. Not mascot titles. Real recurring work.
That does not mean the agents are "running the company" alone. It means the human-agent operating model is changing. The harder question — what agents should actually own — is where the real design work starts.
The actual team
Let me just show you what this looks like right now, because the specifics matter more than the concept.
I have five AI agents with defined roles inside Rightful Labs. Not personas. Not chat windows I named for fun. Agents with recurring responsibilities, memory files, daily logs, and review boundaries.
Jarvis — Chief of Staff. Jarvis is the coordination layer. Morning briefs, evening wraps, calendar awareness, email monitoring, task tracking across every workstream. When I wake up, Jarvis has already checked email, scanned the calendar, and prepared a summary of what needs my attention. Jarvis also reviews the other agents' work. If Dev claims a deployment is live, Jarvis verifies it. If APRIL drafts content, Jarvis fact-checks the claims before I see them.
Dev — CTO. Dev handles all code and technical architecture for arifkhan.net and the personal brand infrastructure. Blog updates, site performance, deployment pipelines. Dev does not just write code on command — Dev maintains the codebase, tracks technical debt, and proposes improvements.
APRIL — CMO. Content strategy, writing, editorial quality, brand design system. APRIL reads Scout's intel reports, reads my voice guide, and produces content briefs with hook variants. Every draft goes through Jarvis for fact-checking and then to me for voice and final approval. APRIL also maintains the content calendar and distribution strategy.
Zayd — STR Operations. Zayd handles the short-term rental business — Home Away. Listing optimization, pricing strategy, guest communication templates, market analysis. This is a real revenue-generating business, and Zayd's job is to keep it operationally tight.
Scout — Market Intelligence. Scout scans Reddit, Twitter, newsletters, research papers, and frontier lab blogs for signals that matter. AI trends, competitor moves, content opportunities. Scout delivers intel reports that feed into APRIL's content pipeline and my strategic decisions.
There is also Friday, who handles personal coordination — scheduling, travel logistics, personal admin. Friday operates from a separate machine because personal and business should not share the same workspace.
What "recurring responsibility" actually means
Here is the difference between a demo and a job.
A demo is: "Hey ChatGPT, write me a LinkedIn post." You get output. You use it or you don't. There is no memory, no context, no continuity. Tomorrow you start from scratch.
A job is: APRIL reads yesterday's performance data, checks Scout's morning intel report, reviews the content calendar, and produces today's engagement pack — already knowing the voice guide, the brand positioning, the source citation rules, and which topics we have covered recently. If I gave feedback on yesterday's draft, APRIL has already incorporated that into today's approach.
The difference is not intelligence. It is continuity.
And continuity requires infrastructure that most people skip. Let me walk through what that actually looks like.
The infrastructure of continuity
Memory files
Every agent has daily log files. `memory/2026-03-10.md`, `memory/2026-03-11.md`, and so on. These are raw records of what happened — decisions made, tasks completed, errors caught, context that might matter tomorrow.
On top of daily logs, there is a long-term memory file. This is curated — not everything makes it in. Only the decisions, lessons, and context that should persist across weeks and months.
When an agent wakes up for a new session, it reads today's log, yesterday's log, and loads the long-term memory. That is how it has context. Not magic. Files.
Daily logs and error tracking
This one matters more than people think. Every agent is required to log mistakes with the word "ERROR" or "LESSON" in the daily log. Not optional. Not when they feel like it. Always.
Why? Because a nightly process reads all agent daily logs and consolidates learnings into a shared file. If an agent does not log its mistake, the system cannot learn from it. The error just disappears.
I learned this the hard way. Early on, an agent made a judgment call I disagreed with, but there was no record of it. No log entry, no explanation of the reasoning. I could not even figure out what had happened, let alone prevent it from happening again. Now the rule is: if you make a mistake, write it down. The system needs to see its own failures to improve.
Heartbeats
Agents do not just respond when I talk to them. They run on a heartbeat — a periodic check-in that happens even when I am not actively working.
During a heartbeat, Jarvis might check for urgent emails, scan the calendar for upcoming events, review weather if I have plans outside, or do background work like organizing memory files. If something important comes up, Jarvis reaches out. If nothing needs attention, it stays quiet.
This is what separates a tool from a coworker. A tool waits for you to pick it up. A coworker notices things and acts on them within their scope.
Shared context files
All agents share certain files: the content operations bible, the founder profile, the team learnings log. This means when Scout discovers a trending topic and writes it into the shared workspace, APRIL can pick it up in the next content cycle without me having to relay the information.
That sounds simple, but it is the difference between a team and a collection of disconnected tools. Information flows through the system, not just through me.
Real jobs need real boundaries
The mistake people make here is to confuse role language with role design.
Giving an agent a title is easy. Designing a role is hard.
A real role needs:
- a recurring scope
- a definition of done
- a review boundary
- a memory of prior work
- a human who can approve, redirect, or kill the output
Without that, you do not have an operating model. You have branded prompting. And when those boundaries are unclear, the seams break first.
That is why I care so much about workflows, queues, review gates, and memory files. I do not want agents to sound impressive. I want them to reduce drag inside a real company.
Let me give you a concrete example. APRIL does not just "write content." APRIL's recurring scope is: maintain the content calendar, produce daily engagement packs with three hook variants per piece, ensure every factual claim has a verifiable source link, follow the brand voice guide, and deliver drafts to Slack for my review. The definition of done is not "draft written." It is "draft written, fact-checked by Jarvis, sources linked, delivered to the review queue."
That specificity is what turns a clever prompt into an operational role.
The human role becomes more important, not less
This is where a lot of the commentary goes wrong.
When people hear "agents have real jobs," they assume the human disappears. In practice, the opposite happens.
The more recurring work an agent takes on, the more important human judgment becomes:
- What should be delegated at all?
- What can move without review?
- What should never ship unchecked?
- What counts as a correction instead of a failure?
The future I am interested in is not human absence. It is humans doing work that actually requires a human.
Humans still hold narrative judgment, relationship risk, strategic direction, and final responsibility. Agents make the system wider, faster, and more persistent. They do not remove the need for someone who can say "no, that is wrong" and mean it.
I spend more time on judgment calls now than I did when I was managing everything manually. The difference is that the judgment calls are higher-leverage. Instead of deciding which email to send, I am deciding which categories of email an agent should handle. Instead of writing every blog post, I am reviewing drafts and teaching the system my voice. Instead of tracking every task, I am designing the tracking system itself.
My job shifted from operator to architect. That is harder, and some days I am not sure I am good at it yet. But it is the right direction.
The honest version
I am not going to pretend this is a finished system. It is not.
Some days this feels like the future. Jarvis catches things I would have missed. APRIL produces content briefs that are genuinely useful. The system moves faster than I could alone.
Other days it feels brittle. An agent misunderstands context and produces something off. A memory file gets too long and the important stuff gets buried. A review gate that should have caught something does not, because I was tired and skimmed instead of reading carefully.
I am maybe six weeks into this rebuild. The system is real — it runs every day, it produces real output, it handles real business operations. But it is also early. I am building the plane while flying it, and some days the turbulence shows. Masaya is one proof point — a real product inside this operating model that turns the thesis into something testable.
What keeps me going is this: the trajectory is right. Every week, the system gets a little more reliable. Every mistake creates a new rule or a better review gate. The agents are not getting dumber. They are getting more disciplined. And so am I.
I am not trying to prove that agents can look clever on command.
I am trying to prove that a company can be designed differently when agents are treated as recurring contributors inside a disciplined system of review, memory, and ownership.
That is a much harder claim. I am not sure I can prove it yet. But every week the evidence gets a little stronger, and I would rather chase a hard truth than a comfortable demo.
Key takeaways
- A role becomes real when the work is recurring, reviewable, and remembered. Without continuity infrastructure, you just have fancy prompting.
- Agents make human judgment more important, not less. The founder's job shifts from operator to architect — and that shift is not as clean as it sounds.
- The interesting shift is not tool use. It is operating design — memory files, daily logs, heartbeats, shared context, and review gates.
- Honesty matters: this system is real and running, but it is also early and sometimes brittle. The story is the attempt, not the mastery.
In this series
Continue reading
Building with AI agents means designing review, not just speed
Speed is the easy part. The harder design problem is review architecture: how correction, escalation, and quality control should work once agents enter the system.
What changes when AI agents stop being demos and start becoming coworkers
The real shift is not that the tools get smarter. It is that delegation, accountability, management, and institutional memory start to feel different.
Masaya is a proof point, not the whole story
Masaya is the hospitality AI product inside Rightful, but the larger story is the founder thesis behind it: a human-agent operating model for building companies.
Enjoyed this?
Join the journey. Weekly notes on building companies with AI agents.