Evolutionary Prompting: Using modular prompts to improve AI agent performance

There’s a lot to be said about efficiency—whether you’re writing code or crafting prompts for an AI. The basic idea is simple: if you can break something complicated into reusable, modifiable parts, you can build better systems faster. In AI, that means rethinking how we write prompts for LLMs and AI agents.

Modular Prompts: The Object-Oriented Approach to Natural Language

Traditionally, prompts are written as one big block of text. Oftentimes, they read like a non-fiction outline of an essay or a natural language version of pseudocode—integrated, hard to modify, and prone to breaking when a single part is changed. But what if we treated prompts the same way we treat code? Think of it as applying object-oriented programming (OOP) principles to natural language. Instead of one monolithic instruction, we break the prompt into modules, each responsible for a specific function.

For instance, one module might set the tone or context. Another might define the output format. A third could include domain-specific instructions. This modularity makes the whole system more flexible and easier to debug and modify. When you need to change something, you don’t have to rewrite the entire prompt—you can simply update the relevant module. It becomes easier to maintain, similar to how OOP and microservices architecture revolutionized software engineering.

Testing Ground: Building Robust Datasets

Of course, if you’re going to build anything reliable, you need a good set of tests. In our case, that means a robust dataset to evaluate how these modular prompts perform. The idea is to compile a diverse set of tasks that challenge different aspects of the prompts. This dataset becomes our yardstick. How does each module perform? Where is the language redundant or vague? What makes one configuration better than another?

A well-designed dataset can reveal flaws in even the most promising prompts. It forces us to ask: Are we using too many tokens? Is the output accurate? How fast is the model performing for specific tasks? In short, it helps us measure the fitness of our prompts in real-world scenarios.

Evolutionary Algorithms: Nature’s Way to Optimize

During my school years, AI—especially courses on neural networks and evolutionary/genetic algorithms—were my favorite subjects. Being able to create a simple set of rules (or environments) and letting nature take its course to find efficient and novel answers always gave me the illusion of being god-like in this digital world I created.

Imagine applying evolutionary algorithms (EAs) to our modular prompts. In nature, evolution isn’t about perfect design from the start–it’s about continuous improvement through mutation, crossover, and selection. We can mimic that process with our prompts using concepts borrowed from evolutionary algorithms in computer science.

Start with an initial population of prompt configurations—each a mix of modular components. Evaluate them on our dataset. Some will perform better than others, and these fit prompts become candidates for the next generation. We then combine elements from the best-performing prompts, introduce slight mutations (perhaps a tweak suggested by an LLM) within the modules, run crossovers on module sequencing, and iterate.

Over many successive generations, the EA naturally weeds out inefficiencies. Redundant language, overcomplicated structures, and wasted tokens are eliminated. The result is a prompt that’s leaner, faster, and more accurate. It’s evolution in action—an automated, iterative process that refines natural language the way evolution refines organisms.

Evolving for Efficiency and Clarity

The goal of this process is twofold: 1) reduce complexity (and thus token consumption) and 2) maximize output quality—speed and accuracy. By constantly evolving our modular prompts, we ensure that every word counts. There’s no room for tautology or unnecessary verbiage. In the world of AI agents, every token has a cost, and every wasted token is a missed opportunity for performance.

Imagine an AI agent that can tweak its own prompts in real-time. It would assess its output against a robust dataset (especially when that dataset itself is dynamic and expands over time in a production environment), identify where it’s vague or verbose, and then mutate and crossover its prompt to be clearer and more direct. The agent becomes a self-improving system, learning not just from external feedback but from an internal evolutionary process. The outcome is an AI that’s both faster and more accurate—an agent optimized for performance and efficiency.

One potential issue is that evolutionary algorithms can be computationally expensive (hint: tokens), as they can easily run across thousands of generations with large populations, where each member may require multiple iterations for mutation and crossover using LLMs.

The Broader Implications For Prompting

This method of evolving natural language prompts has implications beyond merely optimizing AI agents. It provides a glimpse into the future of human-computer interaction. When we treat language like code, we unlock new avenues for creativity while balancing precision. It challenges the current norm that prompts are static forms of text and data. Instead, they become living documents—continuously refined through a process that mirrors natural evolution.

There is beauty in the simplicity of this idea. By modularizing our language and applying evolutionary algorithms, we create a system that’s inherently adaptable. It’s like having a codebase that writes itself—constantly evolving to meet new challenges and optimize performance. This not only helps AI agents become more accurate and faster but also more resource-efficient.

If prompts themselves become autonomously evolving components, our roles as AI agent builders and operators may shift from heavy prompt engineering to designing evolutionary mechanisms, setting goals, and providing the resources for evolutionary algorithms to run their course.

ABOUT Positive Tenacity

These are my notes from building and investing in startups. I’m John S. Kim, Co-Founder/CEO of Sendbird, and General Partner at Valon Capital/ASQ. Learn more here.