LLM Agent Factory

AI agent generation system using RAG (Retrieval-Augmented Generation).

Description

LLM Agent Factory is an intelligent system that generates structured AI agent descriptions based on user queries. The system uses a RAG approach: finds similar agents in the database and generates a new agent using LLM, adapting it to your query.

Each generated agent contains:

agent_id — unique identifier
display_name — human-readable name
persona — agent's character and expertise
description — what the agent does and how it helps the user
role_id — agent role (researcher, coder, tutor, etc.)
domain — subject area
tools — list of tools (web_search, code_interpreter, etc.)

Installation

Basic Installation

# Clone repository
git clone https://huggingface.co/frontier-ai/llm-agent-factory

# Install dependencies
pip install -e .

Full Functionality (Recommended)

# With RAG generation support
pip install -e ".[rag]"

Requirements: Python >= 3.12

Configuration

Environment Variables

For LLM API access, create a .env file in the project root (or set environment variables):

# Copy the example file
cp env.example .env

# Edit .env and add your API credentials
LLM_API_KEY=your-api-key-here
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-oss-120b

Note: The .env file is already in .gitignore and will not be committed to the repository.

Alternatively, you can pass API credentials via command-line arguments or in code (see usage examples below).

Quick Start

1. Search Existing Agents

Find a suitable agent in the database (~22,000 agents):

# Interactive search mode
agent-search

# Or single query
agent-search -q "Python programming help" -k 5

2. Generate New Agent by Query

Create a unique agent using RAG:

# Interactive generation mode
agent-generate

# Or immediate generation
agent-generate "I need an agent for code review in Python"

# Generate multiple variants
agent-generate --agents 3 "customer support specialist"

Usage

Agent Search (Retrieval)

The system supports semantic agent search using embedding models and optional reranking.

Interactive Mode Commands

Command	Description
`/switch <dataset>`	Switch dataset (eng, all)
`/topk <n>`	Change number of results
`/rerank`	Enable/disable reranking
`/stats`	Show statistics
`/help`	Show help
`/quit`	Exit

Usage Examples

# Search with English dataset (default)
agent-search

# With English dataset (default)
agent-search -d eng

# Use all datasets together
agent-search -d all

# With reranking for better accuracy
agent-search --rerank -q "machine learning expert"

# Choose embedding model
agent-search --model bge-large -q "data analyst"

# Multilingual search
agent-search --model bge-m3 -d all -q "programming"

Agent Generation (RAG)

RAG system combines search for similar agents with LLM generation to create unique agents.

Interactive Mode Commands

Command	Description
`generate <query>`	Generate agent by query
`search <query>`	Only search for similar agents
`dataset <name>`	Switch dataset (eng, all)
`agents <N>`	Number of agents to generate (1-10)
`examples <N>`	Number of examples for context (1-20)
`format <type>`	Output format (json, pretty)
`stats`	Show configuration
`help`	Help
`quit`	Exit

Usage Examples

# Interactive mode
agent-generate

# Single generation
agent-generate "I need an agent that helps with code review"

# Generate multiple variants
agent-generate --agents 3 "customer support agent"

# With pretty formatting
agent-generate --format pretty "data analysis helper"

# English dataset (default)
agent-generate --dataset eng "programming assistant"

# More examples for context
agent-generate --examples 10 "medical diagnosis assistant"

# Configure LLM temperature
agent-generate --temperature 0.9 "creative writing assistant"

Programmatic Usage

Quick API (Recommended for Beginners)

from retrieval import quick_search, quick_generate

# Search agents (Retrieval) - simplest way
results = quick_search("Python programming expert")

for result in results:
    print(f"{result.agent.display_name}: {result.agent.description}")

# Generate agent (RAG) - simplest way
agents = quick_generate(
    "code review assistant for Python",
    api_key="your-api-key"
)

print(agents[0]["display_name"])

Advanced Retrieval API

from retrieval import AgentRetriever, RetrievalConfig, DatasetType

# Create configuration
config = RetrievalConfig(
    dataset_type=DatasetType.ENG,
    embedding_model="BAAI/bge-small-en-v1.5",
    top_k=5,
    use_reranker=True,
)

# Create retriever
retriever = AgentRetriever(config)
retriever.initialize()

# Search agents
results = retriever.search("I need help with Python programming")

for result in results:
    print(f"{result.rank}. {result.agent.display_name}")
    print(f"   Score: {result.score:.4f}")
    print(f"   {result.agent.description}\n")

Advanced RAG API

from retrieval import AgentRAG, RAGConfig, LLMConfig, DatasetType

# Configure LLM
llm_config = LLMConfig(
    model="gpt-4",
    base_url="https://api.openai.com/v1",
    api_key="your-api-key",
    temperature=0.7,
)

# Create RAG configuration
config = RAGConfig.with_dataset(
    dataset_type=DatasetType.ENG,
    llm=llm_config,
    num_agents_to_return=1,
    num_retrieved_for_context=5,
)

# Create RAG system
rag = AgentRAG(config)
rag.initialize()

# Generate agent
agents = rag.generate("I need a code review assistant")

for agent in agents:
    print(f"Name: {agent['display_name']}")
    print(f"Description: {agent['description']}")
    print(f"Persona: {agent['persona']}")

Datasets

The system includes 1 agent dataset:

Dataset	Language	Agents	Description
`eng`	English	~18,000	Main English dataset (default)
`all`	English	~18,000	All datasets together (same as eng)

Configuration

Embedding Models

Model	Dimensions	Description
`bge-small`	384	Recommended - balance of speed and quality
`bge-base`	768	High quality
`bge-large`	1024	Maximum quality
`bge-m3`	1024	Multilingual (100+ languages)
`minilm`	384	Fast, basic quality
`mpnet`	768	Medium speed/quality

Reranker Models

Model	Description
`bge-reranker-base`	Default, good balance
`bge-reranker-large`	More accurate
`bge-reranker-v2-m3`	Multilingual

LLM Configuration

By default, an OpenAI API compatible endpoint is used. You can configure:

--model - model name
--url - base API URL
--api-key - API key
--temperature - generation temperature (0.0-1.0)

Project Structure

LLM-Agent-Factory/
├── agents_database/         # Agent database (JSON files)
│   └── agents_eng.jsonl
├── config/                  # Domain, role and tool configurations
│   ├── domain.json          # 692 domains
│   ├── role_id.json         # 36 agent roles
│   └── tool.json            # 10 tools
├── retrieval/               # Main RAG system module
│   ├── __init__.py
│   ├── cli.py               # CLI for search
│   ├── rag_cli.py           # CLI for generation
│   ├── config.py            # Configurations
│   ├── models.py            # Pydantic models
│   ├── data_loader.py       # Data loading
│   ├── embedder.py          # Embeddings
│   ├── retriever.py         # Agent search
│   ├── rag.py               # RAG generation
│   └── tests/               # Tests
├── pyproject.toml
└── README.md

Agent Roles (36 roles)

general — general assistant
researcher — researcher (with web_search)
coder — programmer (with code_interpreter)
tutor — teaching assistant
advisor — consultant
critic — critical analysis
fact_checker — fact checking
summarizer — summarization
translator — translation
planner — planning
coordinator — coordination
evaluator — evaluation
and others...

Tools (10 types)

web_search — internet search
code_interpreter — code execution
file_search — document search
vector_search — semantic search
image_generation — image generation
shell — system commands
computer_use — UI interaction
apply_patch — code modification
function_calling — external API calls
remote_mcp_servers — external tool servers

Statistics

Domains: 692 (from Wikipedia categories)
Roles: 36 professional roles
Agents: ~18,000 unique agents
Languages: English

FAQ

Q: Which embedding model should I choose?

A: Start with bge-small. For high quality use bge-large. For multilingual use bge-m3.

Q: What is reranking and do I need it?

A: Reranking is two-stage retrieval. The first stage quickly finds candidates, the second accurately ranks them. Improves quality but slows down search.

Q: What's the difference between agent-search and agent-generate?

A: agent-search (Retrieval) searches for existing agents in the database. agent-generate (RAG) creates new unique agents using LLM based on your query and similar agents.

Q: Which dataset should I use?

A: eng for English queries (default), all for all datasets (same as eng).

Q: How to configure my own LLM?

A: Use parameters --model, --url, --api-key in CLI or create LLMConfig in code.

Q: First initialization takes long?

A: Yes, on first run the embedding index is built (~1-2 minutes). It's cached in retrieval/.cache/ and subsequent runs are fast.

Testing

# Run all tests
pytest retrieval/tests/ -v

# Only retrieval tests
pytest retrieval/tests/test_retriever.py -v

# Only RAG tests
pytest retrieval/tests/test_rag.py -v

# With coverage
pytest retrieval/tests/ --cov=retrieval --cov-report=term-missing

Support

If you have questions or issues, create an issue in the repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for frontier-ai/llm-agent-factory

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1401)

this model

Evaluation results

Accuracy on MMLU
self-reported

82.3%
Total Tokens on MMLU
self-reported

0.8M
End-to-end Latency on MMLU
self-reported

1.62s
Accuracy on BIG-bench
self-reported

85.6%
Total Tokens on BIG-bench
self-reported

17.2M
End-to-end Latency on BIG-bench
self-reported

2.10s
Accuracy on BBH
self-reported

68.5%
Total Tokens on BBH
self-reported

1.0M
End-to-end Latency on BBH
self-reported

2.10s
Accuracy on MMLU
self-reported

82.0%
Total Tokens on MMLU
self-reported

1.6M
End-to-end Latency on MMLU
self-reported

1.77s
Accuracy on BIG-bench
self-reported

85.7%
Total Tokens on BIG-bench
self-reported

33.4M
End-to-end Latency on BIG-bench
self-reported

2.47s
Accuracy on BBH
self-reported

69.3%
Total Tokens on BBH
self-reported

2.0M
End-to-end Latency on BBH
self-reported

2.59s
Accuracy on MMLU
self-reported

82.3%
Total Tokens on MMLU
self-reported

2.3M