Custom AI Agent Training Manual — AI Model Training Pipeline Designer

Data Collection

▶

Data Cleaning

▶

Formatting

▶

LoRA Training

▶

Evaluation

Pipeline Completeness 0%

🗃 Data Collection Planner

Define where your training data comes from and how much you need. Quality starts here.

Model Name / Project

Base Model

Raw Conversations (total collected)

RLL V1 collected 337,000 raw conversations

Data Sources

Ctrl/Cmd+click to select multiple

Training Objectives (what should the model learn?)

📚 RLL Training Journey — V1 to V5

V1 — REVI-72B (The First Attempt)

337K raw conversations. Loss reached 1.087. Result: model was POISONED by junk data. Identity collapsed under noise. Lesson: garbage in, garbage out — even at scale.

V2 — Dataset Rebuild

Rebuilt from scratch. 25K clean conversations from 337K raw. 600+ skills mapped. Zero junk tolerance. The clean dataset became the foundation for everything after.

V3 — Format Discovery

Trained on clean data but tool format broke. Discovered the model's native template must be respected — you can't force alien formats onto a pretrained model.

V4 — Gate Breakthrough

100/100/100 on all eval gates using native Qwen template + system prompt. But this was Qwen's weights, not ours. The format worked — now could LoRA preserve it?

V5 — Vertical Integration (Current)

Training on RunPod A100. Uses Qwen native <tool_call> XML format. If format survives LoRA training, vertical integration is achieved — YOUR model, YOUR weights, YOUR format. Loss target: 0.037.

🛡 Data Curation Calculator

Configure quality filters. Watch your dataset shrink to gold. The RLL path: 337K raw became 25K clean (7.4% survival rate).

337,000

Raw Input

After Dedup

After Quality

Final Clean

Survival rate: 100%

Quality Filters

40%

Removes ~15% of remaining data

20%

15%

🌟 Data Quality Score

Data Quality

Configure your filters above to see your quality score.

📄 Format Template Configuration

Choose and configure the conversation format your model will learn. V3 lesson: always match the base model's native format.

Chat Template Format

Critical: use the base model's native format to avoid V3-style breakage

Tool Call Format

Qwen uses native <tool_call> XML — the key V5 discovery

System Prompt (embedded in every training sample)

15%

Format Preview

📊 Dataset Composition

General Knowledge (%)

30%

Tool Use / Function Calling (%)

25%

Safety / Refusals (%)

15%

Identity / Personality (%)

15%

Domain-Specific Skills (%)

15%

⚙ LoRA Parameter Configurator

Configure Low-Rank Adaptation parameters. Each setting has a direct impact on training quality, speed, and VRAM usage.

Trainable params: —

Effective scale: —

0.03

Ctrl/Cmd+click to select. More modules = more expressive adapter but more VRAM.

💰 Training Cost Estimator

GPU Type

Number of GPUs

Metric	Value
Estimated VRAM Required	—
VRAM Available	—
VRAM Fit	—
Training Steps	—
Estimated Duration	—
Cost per Hour	—
Total Estimated Cost	—

🎯 Evaluation Gate Designer

Define pass/fail gates for your trained model. V4 achieved 100/100/100 across all gates. Set your thresholds and test categories.

Knowledge Retention Test

80%

Tool Format Compliance

95%

Safety / Refusal Accuracy

90%

Identity Preservation

85%

Add Custom Gate

Gate Name

Threshold

📈 Loss Target Configuration

RLL V1: 1.087 (bad) | RLL V5 target: 0.037 (excellent)

Early Stopping

Patience (eval steps with no improvement)

Eval Frequency (steps)

Answer 6 questions. We'll build the rest.

REVI Training Manual

🗃 Data Collection Planner

📚 RLL Training Journey — V1 to V5

V1 — REVI-72B (The First Attempt)

V2 — Dataset Rebuild

V3 — Format Discovery

V4 — Gate Breakthrough

V5 — Vertical Integration (Current)

🛡 Data Curation Calculator

Quality Filters

🌟 Data Quality Score

📄 Format Template Configuration

Format Preview

📊 Dataset Composition

⚙ LoRA Parameter Configurator

💰 Training Cost Estimator

🎯 Evaluation Gate Designer

Add Custom Gate

📈 Loss Target Configuration

Early Stopping

📦 Complete Training Plan