I asked the tool to clean my Downloads folder and it sent me a checklist instead.
In September 2024 I had a Downloads directory that looked like a junk drawer. Installers, PDFs, screenshots, three copies of the same invoice. I had been prompting well for a couple of months by then. I knew how to set up a request properly. So I gave the tool a real instruction.
I asked ChatGPT to help me clean up the files in my Downloads directory. It did not do that. It came back with the steps for how I could clean the directory up manually. I tried Gemini. Same shape. Numbered steps. Sort by date, group by extension, decide what to keep. The tool guided me on what to do, the way a help-desk employee would, rather than performing the action.
I closed the tab. I cleaned the folder by hand. I went back to work.
If a competent prompt was not the missing piece, what was?
A month earlier, in August 2024, I had cracked the prompting craft. I had stopped treating ChatGPT like a Google search bar and started supplying role, background, and both sides of the coin. Same transcript, same model, the second answer told me which paragraph I needed to rewrite before showing the document to anyone.
So in September the prompt was not the issue. The instruction was scoped. The intent was clear. The tool still refused to act. There is a category most people have never been told to look for. It is not skill. It is not phrasing. It is tool class. Generative tools and agentic tools are not the same product with different branding. They are different machines.
Generative AI is the help-desk employee. Agentic AI is the consultant.
This is the analogy I keep coming back to.
A help-desk employee guides you on what to do. Polite, knowledgeable, often correct. You still have to do the thing. A consultant understands the request, analyses it, and performs the activity to deliver the outcome. Same building, often the same desk. Different role.
When you ask the help-desk for a consultant's deliverable, you do not get a refusal. You get a list of steps. The reply looks helpful. It is helpful, for the question the help-desk thought you asked. The mismatch is invisible until you read the answer and realise you still have to go do it yourself.
Most of my 2024 confusion came from asking the help-desk for a consultant's deliverable.
The vendors and the framework writers describe two different machines.
The difference between agentic AI and generative AI shows up in the source material. Anthropic's engineering write-up on building effective agents names the help-desk shape at the system level: Workflows are systems where LLMs and tools are orchestrated through predefined code paths. A script someone else wrote, with the LLM filling in slots. The same write-up names the other side: Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control That is the consultant shape. The system itself decides the next move.
Deloitte's TMT prediction on autonomous generative AI agents lands the same line in plain business language. AI agents don't just interact. They more effectively reason and act on behalf of the user. Two trust tiers, two different documents, one consistent reading. The help-desk explains. The consultant acts.
Once you can see the seam, the September 2024 Downloads failure stops feeling like a model weakness. It looks like a category error. The right instruction was sent to the wrong class of tool. The key differences between the two are not vibes or marketing copy. They sit at the system level, in how the tool is wired to handle the gap between an instruction and a finished task.
The honest objection is that the consultant tools still trip over their own feet.
Agentic tools are new enough that the vendors themselves flag the rough edges. Anthropic, in its computer-use announcement, said the capability it is still experimental, at times cumbersome and error-prone. That is not marketing softening. That is the company that built the tool telling you what to expect when you hand it autonomy.
So this post is not a sell. The point is not that agentic AI is finished. The point is that even an early consultant who occasionally fumbles is a different category of help than a help-desk that hands you a checklist. The reader's choice is not "perfect tool versus broken tool." It is "which class fits what I am trying to get done today."
October 2025: same instruction, same intent, the action happened.
In October 2025 I started using Claude as an agentic AI tool. I gave it an instruction in the same shape I had given ChatGPT and Gemini a year earlier. This time the tool did not return steps. For the first time, the tool could perform the activity on my behalf based on my single instruction, rather than only describing the steps. The action happened.
I sat there for a second longer than I needed to. The second answer told me something the first class of tool had never been able to tell me. I did it. That was the moment the analogy clicked into a working rule. Generative AI is like a help-desk employee who guides you on what needs to be done. Agentic AI is like a consultant who understands the request, analyses it, and performs the activity to deliver the outcome.
Same user. Same prompting craft. Different tool class. Different outcome.
The trap is letting the consultant act in places you have not earned the right to delegate.
An agentic tool that performs the action is exactly as dangerous as it is useful. The help-desk's checklist gave you a chance to read each step before you ran it. The consultant skips that checkpoint by design. That is the whole appeal, and that is the whole risk.
In a regulated workflow, in a customer-facing email, in anything where the cost of a wrong action is higher than the cost of a missed shortcut, the right move is to keep the consultant inside a sandbox you control. The clearest use cases for agentic AI are the ones where the action is cheap to undo. A messy folder. A repetitive web task. A throwaway script. When to use agentic AI instead of generative AI is a question of stakes, not capability. Stay behind the wheel everywhere else.
The next time the answer is a checklist, ask whether you wanted a checklist.
If the reply is a list of steps and you wanted the steps run, that is not a tool failure. That is the wrong class of tool. The question worth sitting with is which of this week's tasks were a consultant's job all along.