Model Routing System
System Architecture Overview
The Provider Model Routing System is an intelligent, multi-layered routing infrastructure that enables multiple providers to offer the same AI models while automatically selecting the optimal provider based on real-time metrics, user preferences, and system health.
Core Components Structure
┌─────────────────────────────────────────────────────────────────┐
│ USER REQUEST (Model ID) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ RELAY HANDLER │
│ • Request validation │
│ • Channel selection with routing │
│ • Circuit breaker checking │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CHANNEL SERVICE │
│ get_routed_channel() │
│ ├── Try Provider Model Routing (STEP 1) │
│ └── Fallback to Legacy Channel Routing (STEP 2) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ MODEL ROUTING SERVICE │
│ │
│ route_request() │
│ ├─► 1. get_model_providers() [Provider Discovery] │
│ ├─► 2. load_user_preferences() [User Prefs Loading] │
│ ├─► 3. load_routing_config() [Model Config Loading] │
│ ├─► 4. score_providers() [Intelligent Scoring] │
│ └─► 5. select_provider() [Final Selection] │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ROUTING DECISION │
│ • Selected Provider + Channel │
│ • Fallback Providers (ordered) │
│ • Routing Reason & Score │
│ • Strategy Used │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CIRCUIT BREAKER CHECK │
│ should_allow_request() │
│ • Closed → Allow │
│ • Open → Try Fallback │
│ • Half-Open → Limited Allow │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ EXECUTE REQUEST ON SELECTED CHANNEL │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ METRICS RECORDING │
│ record_request() │
│ • Latency tracking │
│ • Success/failure counting │
│ • Token usage │
│ • Quality score calculation │
│ • Circuit breaker state update │
└──────────────────────────────────────────────────────────────────┘
Database Schema Architecture
1. provider_models (Source of Truth)
Provider-submitted model definitions
├── Model Info: model_id, model_name, description
├── Provider: provider_id, provider_name, channel_id
├── Pricing: pricing_prompt, pricing_completion, pricing_image
├── Specs: context_length, modality, supported_parameters
└── Status: status (0=pending, 1=approved, 2=rejected)
2. provider_model_metrics (Real-time Performance)
Live performance tracking per provider-model-channel
├── Cumulative: total_requests, successful_requests, failed_requests
├── Latency: avg/p50/p95/p99/min/max_latency_ms
├── Time Windows: last_hour, last_24h metrics
├── Circuit Breaker: circuit_state, consecutive_failures/successes
├── Quality: quality_score (0.0-1.0)
└── Token Throughput: total tokens, avg_tokens_per_second
3. model_routing_config (Per-Model Configuration)
Admin-configurable routing rules per model
├── Weights: latency_weight, success_rate_weight, price_weight
├── Strategy: default_strategy (performance/cost/balanced/round_robin)
├── Fallback: enable_auto_fallback, max_fallback_attempts
└── Circuit: failure_threshold, recovery_timeout_seconds
4. user_routing_preferences (User Preferences)
Per-user routing customization
├── Strategy: default_strategy
├── Providers: preferred_providers[], blocked_providers[]
├── Limits: max_price_per_million_tokens, min_success_rate, max_latency_ms
└── Requirements: require_streaming, require_function_calling
5. routing_decision_logs (Audit Trail)
Complete history of routing decisions
├── Decision: selected_provider_id, routing_strategy, routing_reason
├── Candidates: candidates_count, candidates_json
├── Fallback: fallback_providers[], is_fallback_request
└── Performance: routing_duration_us
Detailed Process Flow
Phase 1: Request Initiation
User/API Request
│
├─► Model ID: "deepseek-chat"
├─► User ID: 12345
└─► Optional: RoutingPreferences { strategy: "performance" }
Phase 2: Provider Discovery
SELECT provider_id, channel_id, provider_name, pricing, metrics, quality_score
FROM provider_models pm
LEFT JOIN provider_model_metrics pmm ON (...)
LEFT JOIN channels c ON pm.channel_id = c.id
WHERE pm.model_id = 'deepseek-chat' AND pm.status = 1
ORDER BY quality_score DESC
Output: List of ProviderCandidate structs
ProviderCandidate {
provider_id: 5,
channel_id: 23,
provider_name: "Provider A",
price_per_million_prompt: 2.50,
price_per_million_completion: 10.00,
success_rate: 0.98,
avg_latency_ms: 450,
quality_score: 0.92,
circuit_state: Closed,
}
Phase 3: Configuration Loading
Model Config:
RoutingConfig {
canonical_model_id: "deepseek-chat",
latency_weight: 0.3,
success_rate_weight: 0.4,
price_weight: 0.2,
provider_priority_weight: 0.1,
default_strategy: "balanced",
}
User Preferences (merged with request prefs):
RoutingPreferences {
strategy: Performance,
prefer_providers: [5, 8],
avoid_providers: [3],
max_price: Some(15.0),
min_success_rate: Some(0.95),
}
Phase 4: Intelligent Scoring
Strategy: Performance
score = success_rate * 0.4 + latency_score * 0.3 + quality_score * 0.1 + priority_bonus
Strategy: Cost
price_score = 1.0 - (avg_price / 100.0)
score = price_score * 0.6 + success_rate * 0.3 + quality_score * 0.1
Strategy: Balanced
perf_score = performance_score(candidate)
cost_score = cost_score(candidate)
score = perf_score * perf_weight + cost_score * cost_weight
Phase 5: Provider Selection
-
Filter by preferences:
- Remove avoided providers
- Check max_price threshold
- Check min_success_rate
- Check max_latency_ms
-
Boost preferred providers:
- Apply 50% score boost to preferred providers
-
Weighted random selection:
- Sort by score (descending)
- Take top 3 candidates
- Weighted random selection (prevents provider starvation)
-
Prepare fallback chain:
- Remaining candidates become fallback providers (up to 3)
Phase 6: Circuit Breaker Check
┌──────────────────────────────────────────────┐
│ Circuit State Machine │
├──────────────────────────────────────────────┤
│ │
│ CLOSED ──────────────► OPEN │
│ ▲ (5 failures) │ │
│ │ │ │
│ │ (60s timeout) │
│ │ │ │
│ │ ▼ │
│ └───── HALF-OPEN ◄──────── │
│ (3 successes) │
│ │
└──────────────────────────────────────────────┘
States:
• CLOSED: Normal operation (all requests pass)
• OPEN: Block all requests, try fallbacks
• HALF-OPEN: Allow limited test requests
Circuit Breaker Decision:
- If primary provider circuit is OPEN → Try fallback providers
- If all circuits OPEN → Fallback to legacy routing
- If circuit is CLOSED or HALF-OPEN → Proceed
Phase 7: Request Execution
Request sent to selected channel:
Channel {
id: 23,
provider_id: 5,
base_url: "https://api.provider-a.com/v1",
key: "encrypted_key",
status: 1 (active)
}
Phase 8: Metrics Recording
After request completion:
ProviderMetricsService::record_request(
provider_id: 5,
model_id: "deepseek-chat",
channel_id: 23,
latency_ms: 450,
success: true,
prompt_tokens: 1500,
completion_tokens: 300,
)
Metrics Update Process:
- Record in memory buffer (fast)
- Periodic aggregation (every 60 seconds)
- Database update via
update_provider_metrics()SQL function - Quality score recalculation
- Circuit breaker state evaluation
Routing Strategies Explained
1. Performance Strategy
Goal: Maximize speed and reliability
Scoring Formula:
score = success_rate × 0.4 + latency_score × 0.3 + quality_score × 0.1 + priority_bonus
Best for:
- Real-time applications
- Latency-sensitive workloads
- Production critical paths
Example:
Provider A: 98% success, 450ms → Score: 0.89
Provider B: 95% success, 800ms → Score: 0.78
Winner: Provider A
2. Cost Strategy
Goal: Minimize costs
Scoring Formula:
price_score = 1.0 - (avg_price / 100.0)
score = price_score × 0.6 + success_rate × 0.3 + quality_score × 0.1
Best for:
- Batch processing
- Development/testing
- Cost-conscious applications
Example:
Provider A: $5/M → Score: 0.92
Provider B: $12/M → Score: 0.78
Winner: Provider A (cheaper)
3. Balanced Strategy (Default)
Goal: Optimize all factors
Scoring Formula:
Combined = performance_score × perf_weight + cost_score × cost_weight
Best for:
- General purpose applications
- Mixed workloads
- Most production scenarios
4. Round-Robin Strategy
Goal: Equal distribution
Behavior:
- All providers get equal score
- Rotate through providers sequentially
- No performance consideration
Best for:
- Load distribution testing
- Provider evaluation
- Ensuring provider diversity
Key Advantages
1. Intelligent Provider Selection
Real-time metrics-based routing
- Automatically routes to best-performing providers
- Adapts to changing provider performance
- No manual intervention required
Multi-dimensional scoring
- Considers latency, success rate, cost, and quality
- Configurable weights per model
- Strategy-based optimization
2. High Availability & Fault Tolerance
Circuit breaker pattern
Failed Provider → Circuit Opens → Automatic Fallback
↓
Health Recovery → Circuit Half-Opens → Test Requests
↓
Success → Circuit Closes → Full Traffic Restoration
Automatic fallback chains
- Up to 3 fallback providers per request
- Ordered by score
- Seamless failover on provider failure
No single point of failure
- Multiple providers for same model
- Instant failover without retries
- Graceful degradation
3. Cost Optimization
Price-aware routing
- Cost strategy prioritizes cheaper providers
- Price thresholds per user
- Balance cost vs performance
Provider competition
- Multiple providers compete on price
- Market-driven pricing
- Automatic selection of best value
4. Performance Tracking
Comprehensive metrics
Latency: avg, p50, p95, p99, min, max
Success Rate: overall, last_hour, last_24h
Quality Score: calculated from success + latency + experience
Token Throughput: tokens/second tracking
Historical data
- All-time cumulative metrics
- Time-windowed metrics (hourly, daily)
- Trend analysis capability
5. User Empowerment
Customizable preferences
UserPreferences {
strategy: "performance", // Choose optimization goal
prefer_providers: [1, 5], // Favorite providers
avoid_providers: [3], // Blacklist problematic ones
max_price: 15.0, // Budget control
min_success_rate: 0.95, // Quality threshold
max_latency_ms: 5000, // Latency requirement
}
Per-request overrides
- Can override preferences per API call
- Flexible for different use cases
- Maintains user defaults
6. Provider Ecosystem Benefits
Fair provider exposure
- Weighted random selection prevents dominance
- Quality providers get more traffic
- New providers can compete
Transparent performance
- Real metrics visible to admin
- Quality score based on actual performance
- Accountability for providers
7. Operational Excellence
Complete audit trail
routing_decision_logs:
- Every routing decision logged
- Full candidate list with scores
- Debugging and analytics
- 7-day retention (configurable)
Admin control
• Manual circuit breaker control
• Per-model routing configuration
• Provider approval workflow
• Analytics dashboard
8. Scalability
Efficient data structures
- In-memory metrics buffering
- Periodic batch updates to database
- Minimal per-request overhead
Distributed-ready
- Stateless routing decisions
- Database-backed state
- Redis-compatible circuit breakers
9. Developer Experience
Simple API integration
// Automatic routing - just pass model ID
let channel = ChannelService::get_routed_channel(
&pool, "default", "deepseek-chat", user_id, None
).await?;
Simulation endpoint
POST /api/routing/simulate
{
"model_id": "deepseek-chat",
"preferences": { "strategy": "cost" }
}
10. Business Intelligence
Rich analytics
• Provider selection rates
• Strategy distribution
• Model usage patterns
• Cost analysis
• Performance trends
Performance Characteristics
Routing Decision Speed
- Average: < 10ms
- P99: < 50ms
- Includes: DB queries + scoring + selection
Metrics Update
- Memory buffer: ~1μs per record
- DB flush: Every 60s (async, non-blocking)
- Impact on request: Zero (async recording)
Database Queries
- Provider lookup: Single JOIN query with indexes
- Config loading: Cached or single query
- Metrics aggregation: Periodic batch operation
Security & Isolation
Data Isolation
**Provider models completely separate from legacy channels No mixing of provider_models and abilities tables Clear separation of routing logic
Access Control
**Provider can only manage their own models Admin approval required for model visibility User-level routing preferences isolated
API Key Management
**Encrypted channel keys Provider-owned API keys Rotation support via provider_api_keys table
Future Enhancements
Planned Improvements
-
ML-based routing
- Predict provider performance
- Learn from user patterns
- Adaptive weight tuning
-
Geographic routing
- Provider location awareness
- Latency-based geo selection
- Regional failover
-
Advanced analytics
- Provider comparison dashboards
- Cost forecasting
- Performance predictions
-
Enhanced fallback strategies
- Intelligent retry with backoff
- Cross-model fallbacks
- Dynamic strategy switching
Configuration Examples
Example 1: High-Performance Setup
INSERT INTO model_routing_config VALUES (
'deepseek-chat',
0.35, -- latency_weight (high)
0.45, -- success_rate_weight (high)
0.10, -- price_weight (low)
0.10, -- provider_priority_weight
'performance'
);
Example 2: Cost-Optimized Setup
INSERT INTO model_routing_config VALUES (
'deepseek-chat-v3.1',
0.15, -- latency_weight (low)
0.35, -- success_rate_weight (medium)
0.40, -- price_weight (high)
0.10, -- provider_priority_weight
'cost'
);
Example 3: User Cost Control
UserRoutingPreferences {
default_strategy: "cost",
max_price_per_million_tokens: 10.0, // Max $10/M
min_success_rate: 0.90, // Must maintain 90%+
preferred_providers: [1, 5, 8], // Try these first
}