LLM Agent Factory
AI agent generation system using RAG (Retrieval-Augmented Generation).
Description
LLM Agent Factory is an intelligent system that generates structured AI agent descriptions based on user queries. The system uses a RAG approach: finds similar agents in the database and generates a new agent using LLM, adapting it to your query.
Each generated agent contains:
agent_id— unique identifierdisplay_name— human-readable namepersona— agent's character and expertisedescription— what the agent does and how it helps the userrole_id— agent role (researcher, coder, tutor, etc.)domain— subject areatools— list of tools (web_search, code_interpreter, etc.)
Installation
Basic Installation
# Clone repository
git clone https://huggingface.co/frontier-ai/llm-agent-factory
# Install dependencies
pip install -e .
Full Functionality (Recommended)
# With RAG generation support
pip install -e ".[rag]"
Requirements: Python >= 3.12
Configuration
Environment Variables
For LLM API access, create a .env file in the project root (or set environment variables):
# Copy the example file
cp env.example .env
# Edit .env and add your API credentials
LLM_API_KEY=your-api-key-here
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-oss-120b
Note: The .env file is already in .gitignore and will not be committed to the repository.
Alternatively, you can pass API credentials via command-line arguments or in code (see usage examples below).
Quick Start
1. Search Existing Agents
Find a suitable agent in the database (~22,000 agents):
# Interactive search mode
agent-search
# Or single query
agent-search -q "Python programming help" -k 5
2. Generate New Agent by Query
Create a unique agent using RAG:
# Interactive generation mode
agent-generate
# Or immediate generation
agent-generate "I need an agent for code review in Python"
# Generate multiple variants
agent-generate --agents 3 "customer support specialist"
Usage
Agent Search (Retrieval)
The system supports semantic agent search using embedding models and optional reranking.
Interactive Mode Commands
| Command | Description |
|---|---|
/switch <dataset> |
Switch dataset (eng, all) |
/topk <n> |
Change number of results |
/rerank |
Enable/disable reranking |
/stats |
Show statistics |
/help |
Show help |
/quit |
Exit |
Usage Examples
# Search with English dataset (default)
agent-search
# With English dataset (default)
agent-search -d eng
# Use all datasets together
agent-search -d all
# With reranking for better accuracy
agent-search --rerank -q "machine learning expert"
# Choose embedding model
agent-search --model bge-large -q "data analyst"
# Multilingual search
agent-search --model bge-m3 -d all -q "programming"
Agent Generation (RAG)
RAG system combines search for similar agents with LLM generation to create unique agents.
Interactive Mode Commands
| Command | Description |
|---|---|
generate <query> |
Generate agent by query |
search <query> |
Only search for similar agents |
dataset <name> |
Switch dataset (eng, all) |
agents <N> |
Number of agents to generate (1-10) |
examples <N> |
Number of examples for context (1-20) |
format <type> |
Output format (json, pretty) |
stats |
Show configuration |
help |
Help |
quit |
Exit |
Usage Examples
# Interactive mode
agent-generate
# Single generation
agent-generate "I need an agent that helps with code review"
# Generate multiple variants
agent-generate --agents 3 "customer support agent"
# With pretty formatting
agent-generate --format pretty "data analysis helper"
# English dataset (default)
agent-generate --dataset eng "programming assistant"
# More examples for context
agent-generate --examples 10 "medical diagnosis assistant"
# Configure LLM temperature
agent-generate --temperature 0.9 "creative writing assistant"
Programmatic Usage
Quick API (Recommended for Beginners)
from retrieval import quick_search, quick_generate
# Search agents (Retrieval) - simplest way
results = quick_search("Python programming expert")
for result in results:
print(f"{result.agent.display_name}: {result.agent.description}")
# Generate agent (RAG) - simplest way
agents = quick_generate(
"code review assistant for Python",
api_key="your-api-key"
)
print(agents[0]["display_name"])
Advanced Retrieval API
from retrieval import AgentRetriever, RetrievalConfig, DatasetType
# Create configuration
config = RetrievalConfig(
dataset_type=DatasetType.ENG,
embedding_model="BAAI/bge-small-en-v1.5",
top_k=5,
use_reranker=True,
)
# Create retriever
retriever = AgentRetriever(config)
retriever.initialize()
# Search agents
results = retriever.search("I need help with Python programming")
for result in results:
print(f"{result.rank}. {result.agent.display_name}")
print(f" Score: {result.score:.4f}")
print(f" {result.agent.description}\n")
Advanced RAG API
from retrieval import AgentRAG, RAGConfig, LLMConfig, DatasetType
# Configure LLM
llm_config = LLMConfig(
model="gpt-4",
base_url="https://api.openai.com/v1",
api_key="your-api-key",
temperature=0.7,
)
# Create RAG configuration
config = RAGConfig.with_dataset(
dataset_type=DatasetType.ENG,
llm=llm_config,
num_agents_to_return=1,
num_retrieved_for_context=5,
)
# Create RAG system
rag = AgentRAG(config)
rag.initialize()
# Generate agent
agents = rag.generate("I need a code review assistant")
for agent in agents:
print(f"Name: {agent['display_name']}")
print(f"Description: {agent['description']}")
print(f"Persona: {agent['persona']}")
Datasets
The system includes 1 agent dataset:
| Dataset | Language | Agents | Description |
|---|---|---|---|
eng |
English | ~18,000 | Main English dataset (default) |
all |
English | ~18,000 | All datasets together (same as eng) |
Configuration
Embedding Models
| Model | Dimensions | Description |
|---|---|---|
bge-small |
384 | Recommended - balance of speed and quality |
bge-base |
768 | High quality |
bge-large |
1024 | Maximum quality |
bge-m3 |
1024 | Multilingual (100+ languages) |
minilm |
384 | Fast, basic quality |
mpnet |
768 | Medium speed/quality |
Reranker Models
| Model | Description |
|---|---|
bge-reranker-base |
Default, good balance |
bge-reranker-large |
More accurate |
bge-reranker-v2-m3 |
Multilingual |
LLM Configuration
By default, an OpenAI API compatible endpoint is used. You can configure:
--model- model name--url- base API URL--api-key- API key--temperature- generation temperature (0.0-1.0)
Project Structure
LLM-Agent-Factory/
├── agents_database/ # Agent database (JSON files)
│ └── agents_eng.jsonl
├── config/ # Domain, role and tool configurations
│ ├── domain.json # 692 domains
│ ├── role_id.json # 36 agent roles
│ └── tool.json # 10 tools
├── retrieval/ # Main RAG system module
│ ├── __init__.py
│ ├── cli.py # CLI for search
│ ├── rag_cli.py # CLI for generation
│ ├── config.py # Configurations
│ ├── models.py # Pydantic models
│ ├── data_loader.py # Data loading
│ ├── embedder.py # Embeddings
│ ├── retriever.py # Agent search
│ ├── rag.py # RAG generation
│ └── tests/ # Tests
├── pyproject.toml
└── README.md
Agent Roles (36 roles)
general— general assistantresearcher— researcher (with web_search)coder— programmer (with code_interpreter)tutor— teaching assistantadvisor— consultantcritic— critical analysisfact_checker— fact checkingsummarizer— summarizationtranslator— translationplanner— planningcoordinator— coordinationevaluator— evaluation- and others...
Tools (10 types)
web_search— internet searchcode_interpreter— code executionfile_search— document searchvector_search— semantic searchimage_generation— image generationshell— system commandscomputer_use— UI interactionapply_patch— code modificationfunction_calling— external API callsremote_mcp_servers— external tool servers
Statistics
- Domains: 692 (from Wikipedia categories)
- Roles: 36 professional roles
- Agents: ~18,000 unique agents
- Languages: English
FAQ
Q: Which embedding model should I choose?
A: Start with bge-small. For high quality use bge-large. For multilingual use bge-m3.
Q: What is reranking and do I need it?
A: Reranking is two-stage retrieval. The first stage quickly finds candidates, the second accurately ranks them. Improves quality but slows down search.
Q: What's the difference between agent-search and agent-generate?
A: agent-search (Retrieval) searches for existing agents in the database. agent-generate (RAG) creates new unique agents using LLM based on your query and similar agents.
Q: Which dataset should I use?
A: eng for English queries (default), all for all datasets (same as eng).
Q: How to configure my own LLM?
A: Use parameters --model, --url, --api-key in CLI or create LLMConfig in code.
Q: First initialization takes long?
A: Yes, on first run the embedding index is built (~1-2 minutes). It's cached in retrieval/.cache/ and subsequent runs are fast.
Testing
# Run all tests
pytest retrieval/tests/ -v
# Only retrieval tests
pytest retrieval/tests/test_retriever.py -v
# Only RAG tests
pytest retrieval/tests/test_rag.py -v
# With coverage
pytest retrieval/tests/ --cov=retrieval --cov-report=term-missing
Support
If you have questions or issues, create an issue in the repository.
Model tree for frontier-ai/llm-agent-factory
Base model
Qwen/Qwen3-4B-Instruct-2507Evaluation results
- Accuracy on MMLUself-reported82.3%
- Total Tokens on MMLUself-reported0.8M
- End-to-end Latency on MMLUself-reported1.62s
- Accuracy on BIG-benchself-reported85.6%
- Total Tokens on BIG-benchself-reported17.2M
- End-to-end Latency on BIG-benchself-reported2.10s
- Accuracy on BBHself-reported68.5%
- Total Tokens on BBHself-reported1.0M
- End-to-end Latency on BBHself-reported2.10s
- Accuracy on MMLUself-reported82.0%
- Total Tokens on MMLUself-reported1.6M
- End-to-end Latency on MMLUself-reported1.77s
- Accuracy on BIG-benchself-reported85.7%
- Total Tokens on BIG-benchself-reported33.4M
- End-to-end Latency on BIG-benchself-reported2.47s
- Accuracy on BBHself-reported69.3%
- Total Tokens on BBHself-reported2.0M
- End-to-end Latency on BBHself-reported2.59s
- Accuracy on MMLUself-reported82.3%
- Total Tokens on MMLUself-reported2.3M