Model Routing System

System Architecture Overview

The Provider Model Routing System is an intelligent, multi-layered routing infrastructure that enables multiple providers to offer the same AI models while automatically selecting the optimal provider based on real-time metrics, user preferences, and system health.

Core Components Structure

┌─────────────────────────────────────────────────────────────────┐
│                    USER REQUEST (Model ID)                      │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              RELAY HANDLER                                      │
│  • Request validation                                           │
│  • Channel selection with routing                               │
│  • Circuit breaker checking                                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│         CHANNEL SERVICE                                         │
│  get_routed_channel()                                           │
│  ├── Try Provider Model Routing (STEP 1)                        │
│  └── Fallback to Legacy Channel Routing (STEP 2)                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      MODEL ROUTING SERVICE                                      │
│                                                                 │
│  route_request()                                                │
│  ├─► 1. get_model_providers()      [Provider Discovery]         │
│  ├─► 2. load_user_preferences()    [User Prefs Loading]         │
│  ├─► 3. load_routing_config()      [Model Config Loading]       │
│  ├─► 4. score_providers()          [Intelligent Scoring]        │
│  └─► 5. select_provider()          [Final Selection]            │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│           ROUTING DECISION                                      │
│  • Selected Provider + Channel                                  │
│  • Fallback Providers (ordered)                                 │
│  • Routing Reason & Score                                       │
│  • Strategy Used                                                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      CIRCUIT BREAKER CHECK                                      │
│  should_allow_request()                                         │
│  • Closed → Allow                                               │
│  • Open → Try Fallback                                          │
│  • Half-Open → Limited Allow                                    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              EXECUTE REQUEST ON SELECTED CHANNEL                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────────┐
│        METRICS RECORDING                                         │
│  record_request()                                                │
│  • Latency tracking                                              │
│  • Success/failure counting                                      │
│  • Token usage                                                   │
│  • Quality score calculation                                     │
│  • Circuit breaker state update                                  │
└──────────────────────────────────────────────────────────────────┘

Database Schema Architecture

1. provider_models (Source of Truth)

Provider-submitted model definitions

├── Model Info: model_id, model_name, description
├── Provider: provider_id, provider_name, channel_id
├── Pricing: pricing_prompt, pricing_completion, pricing_image
├── Specs: context_length, modality, supported_parameters
└── Status: status (0=pending, 1=approved, 2=rejected)

2. provider_model_metrics (Real-time Performance)

Live performance tracking per provider-model-channel

├── Cumulative: total_requests, successful_requests, failed_requests
├── Latency: avg/p50/p95/p99/min/max_latency_ms
├── Time Windows: last_hour, last_24h metrics
├── Circuit Breaker: circuit_state, consecutive_failures/successes
├── Quality: quality_score (0.0-1.0)
└── Token Throughput: total tokens, avg_tokens_per_second

3. model_routing_config (Per-Model Configuration)

Admin-configurable routing rules per model

├── Weights: latency_weight, success_rate_weight, price_weight
├── Strategy: default_strategy (performance/cost/balanced/round_robin)
├── Fallback: enable_auto_fallback, max_fallback_attempts
└── Circuit: failure_threshold, recovery_timeout_seconds

4. user_routing_preferences (User Preferences)

Per-user routing customization

├── Strategy: default_strategy
├── Providers: preferred_providers[], blocked_providers[]
├── Limits: max_price_per_million_tokens, min_success_rate, max_latency_ms
└── Requirements: require_streaming, require_function_calling

5. routing_decision_logs (Audit Trail)

Complete history of routing decisions

├── Decision: selected_provider_id, routing_strategy, routing_reason
├── Candidates: candidates_count, candidates_json
├── Fallback: fallback_providers[], is_fallback_request
└── Performance: routing_duration_us

Detailed Process Flow

Phase 1: Request Initiation

User/API Request
     │
     ├─► Model ID: "deepseek-chat"
     ├─► User ID: 12345
     └─► Optional: RoutingPreferences { strategy: "performance" }

Phase 2: Provider Discovery

SELECT provider_id, channel_id, provider_name, pricing, metrics, quality_score
FROM provider_models pm
LEFT JOIN provider_model_metrics pmm ON (...)
LEFT JOIN channels c ON pm.channel_id = c.id
WHERE pm.model_id = 'deepseek-chat' AND pm.status = 1
ORDER BY quality_score DESC

Output: List of ProviderCandidate structs

ProviderCandidate {
    provider_id: 5,
    channel_id: 23,
    provider_name: "Provider A",
    price_per_million_prompt: 2.50,
    price_per_million_completion: 10.00,
    success_rate: 0.98,
    avg_latency_ms: 450,
    quality_score: 0.92,
    circuit_state: Closed,
}

Phase 3: Configuration Loading

Model Config:

RoutingConfig {
    canonical_model_id: "deepseek-chat",
    latency_weight: 0.3,
    success_rate_weight: 0.4,
    price_weight: 0.2,
    provider_priority_weight: 0.1,
    default_strategy: "balanced",
}

User Preferences (merged with request prefs):

RoutingPreferences {
    strategy: Performance,
    prefer_providers: [5, 8],
    avoid_providers: [3],
    max_price: Some(15.0),
    min_success_rate: Some(0.95),
}

Phase 4: Intelligent Scoring

Strategy: Performance

score = success_rate * 0.4 + latency_score * 0.3 + quality_score * 0.1 + priority_bonus

Strategy: Cost

price_score = 1.0 - (avg_price / 100.0)
score = price_score * 0.6 + success_rate * 0.3 + quality_score * 0.1

Strategy: Balanced

perf_score = performance_score(candidate)
cost_score = cost_score(candidate)
score = perf_score * perf_weight + cost_score * cost_weight

Phase 5: Provider Selection

Filter by preferences:
- Remove avoided providers
- Check max_price threshold
- Check min_success_rate
- Check max_latency_ms
Boost preferred providers:
- Apply 50% score boost to preferred providers
Weighted random selection:
- Sort by score (descending)
- Take top 3 candidates
- Weighted random selection (prevents provider starvation)
Prepare fallback chain:
- Remaining candidates become fallback providers (up to 3)

Phase 6: Circuit Breaker Check

┌──────────────────────────────────────────────┐
│         Circuit State Machine                │
├──────────────────────────────────────────────┤
│                                              │
│  CLOSED ──────────────► OPEN                 │
│    ▲        (5 failures)    │                │
│    │                        │                │
│    │                    (60s timeout)        │
│    │                        │                │
│    │                        ▼                │
│    └───── HALF-OPEN ◄────────                │
│        (3 successes)                         │
│                                              │
└──────────────────────────────────────────────┘

States:
• CLOSED: Normal operation (all requests pass)
• OPEN: Block all requests, try fallbacks
• HALF-OPEN: Allow limited test requests

Circuit Breaker Decision:

If primary provider circuit is OPEN → Try fallback providers
If all circuits OPEN → Fallback to legacy routing
If circuit is CLOSED or HALF-OPEN → Proceed

Phase 7: Request Execution

Request sent to selected channel:

Channel {
    id: 23,
    provider_id: 5,
    base_url: "https://api.provider-a.com/v1",
    key: "encrypted_key",
    status: 1 (active)
}

Phase 8: Metrics Recording

After request completion:

ProviderMetricsService::record_request(
    provider_id: 5,
    model_id: "deepseek-chat",
    channel_id: 23,
    latency_ms: 450,
    success: true,
    prompt_tokens: 1500,
    completion_tokens: 300,
)

Metrics Update Process:

Record in memory buffer (fast)
Periodic aggregation (every 60 seconds)
Database update via update_provider_metrics() SQL function
Quality score recalculation
Circuit breaker state evaluation

Routing Strategies Explained

1. Performance Strategy

Goal: Maximize speed and reliability

Scoring Formula:

score = success_rate × 0.4 + latency_score × 0.3 + quality_score × 0.1 + priority_bonus

Best for:

Real-time applications
Latency-sensitive workloads
Production critical paths

Example:

Provider A: 98% success, 450ms → Score: 0.89
Provider B: 95% success, 800ms → Score: 0.78
Winner: Provider A

2. Cost Strategy

Goal: Minimize costs

Scoring Formula:

price_score = 1.0 - (avg_price / 100.0)
score = price_score × 0.6 + success_rate × 0.3 + quality_score × 0.1

Best for:

Batch processing
Development/testing
Cost-conscious applications

Example:

Provider A: $5/M → Score: 0.92
Provider B: $12/M → Score: 0.78
Winner: Provider A (cheaper)

3. Balanced Strategy (Default)

Goal: Optimize all factors

Scoring Formula:

Combined = performance_score × perf_weight + cost_score × cost_weight

Best for:

General purpose applications
Mixed workloads
Most production scenarios

4. Round-Robin Strategy

Goal: Equal distribution

Behavior:

All providers get equal score
Rotate through providers sequentially
No performance consideration

Best for:

Load distribution testing
Provider evaluation
Ensuring provider diversity

Key Advantages

1. Intelligent Provider Selection

Real-time metrics-based routing

Automatically routes to best-performing providers
Adapts to changing provider performance
No manual intervention required

Multi-dimensional scoring

Considers latency, success rate, cost, and quality
Configurable weights per model
Strategy-based optimization

2. High Availability & Fault Tolerance

Circuit breaker pattern

Failed Provider → Circuit Opens → Automatic Fallback
↓
Health Recovery → Circuit Half-Opens → Test Requests
↓
Success → Circuit Closes → Full Traffic Restoration

Automatic fallback chains

Up to 3 fallback providers per request
Ordered by score
Seamless failover on provider failure

No single point of failure

Multiple providers for same model
Instant failover without retries
Graceful degradation

3. Cost Optimization

Price-aware routing

Cost strategy prioritizes cheaper providers
Price thresholds per user
Balance cost vs performance

Provider competition

Multiple providers compete on price
Market-driven pricing
Automatic selection of best value

4. Performance Tracking

Comprehensive metrics

Latency: avg, p50, p95, p99, min, max
Success Rate: overall, last_hour, last_24h
Quality Score: calculated from success + latency + experience
Token Throughput: tokens/second tracking

Historical data

All-time cumulative metrics
Time-windowed metrics (hourly, daily)
Trend analysis capability

5. User Empowerment

Customizable preferences

UserPreferences {
    strategy: "performance",           // Choose optimization goal
    prefer_providers: [1, 5],          // Favorite providers
    avoid_providers: [3],              // Blacklist problematic ones
    max_price: 15.0,                   // Budget control
    min_success_rate: 0.95,            // Quality threshold
    max_latency_ms: 5000,              // Latency requirement
}

Per-request overrides

Can override preferences per API call
Flexible for different use cases
Maintains user defaults

6. Provider Ecosystem Benefits

Fair provider exposure

Weighted random selection prevents dominance
Quality providers get more traffic
New providers can compete

Transparent performance

Real metrics visible to admin
Quality score based on actual performance
Accountability for providers

7. Operational Excellence

Complete audit trail

routing_decision_logs:
- Every routing decision logged
- Full candidate list with scores
- Debugging and analytics
- 7-day retention (configurable)

Admin control

• Manual circuit breaker control
• Per-model routing configuration
• Provider approval workflow
• Analytics dashboard

8. Scalability

Efficient data structures

In-memory metrics buffering
Periodic batch updates to database
Minimal per-request overhead

Distributed-ready

Stateless routing decisions
Database-backed state
Redis-compatible circuit breakers

9. Developer Experience

Simple API integration

// Automatic routing - just pass model ID
let channel = ChannelService::get_routed_channel(
    &pool, "default", "deepseek-chat", user_id, None
).await?;

Simulation endpoint

POST /api/routing/simulate
{
  "model_id": "deepseek-chat",
  "preferences": { "strategy": "cost" }
}

10. Business Intelligence

Rich analytics

• Provider selection rates
• Strategy distribution
• Model usage patterns
• Cost analysis
• Performance trends

Performance Characteristics

Routing Decision Speed

Average: < 10ms
P99: < 50ms
Includes: DB queries + scoring + selection

Metrics Update

Memory buffer: ~1μs per record
DB flush: Every 60s (async, non-blocking)
Impact on request: Zero (async recording)

Database Queries

Provider lookup: Single JOIN query with indexes
Config loading: Cached or single query
Metrics aggregation: Periodic batch operation

Security & Isolation

Data Isolation

**Provider models completely separate from legacy channels No mixing of provider_models and abilities tables Clear separation of routing logic

Access Control

**Provider can only manage their own models Admin approval required for model visibility User-level routing preferences isolated

API Key Management

**Encrypted channel keys Provider-owned API keys Rotation support via provider_api_keys table

Future Enhancements

Planned Improvements

ML-based routing
- Predict provider performance
- Learn from user patterns
- Adaptive weight tuning
Geographic routing
- Provider location awareness
- Latency-based geo selection
- Regional failover
Advanced analytics
- Provider comparison dashboards
- Cost forecasting
- Performance predictions
Enhanced fallback strategies
- Intelligent retry with backoff
- Cross-model fallbacks
- Dynamic strategy switching

Configuration Examples

Example 1: High-Performance Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat',
    0.35,  -- latency_weight (high)
    0.45,  -- success_rate_weight (high)
    0.10,  -- price_weight (low)
    0.10,  -- provider_priority_weight
    'performance'
);

Example 2: Cost-Optimized Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat-v3.1',
    0.15,  -- latency_weight (low)
    0.35,  -- success_rate_weight (medium)
    0.40,  -- price_weight (high)
    0.10,  -- provider_priority_weight
    'cost'
);

Example 3: User Cost Control

UserRoutingPreferences {
    default_strategy: "cost",
    max_price_per_million_tokens: 10.0,  // Max $10/M
    min_success_rate: 0.90,              // Must maintain 90%+
    preferred_providers: [1, 5, 8],      // Try these first
}

System Architecture Overview​

Core Components Structure​

Database Schema Architecture​

1. provider_models (Source of Truth)​

2. provider_model_metrics (Real-time Performance)​

3. model_routing_config (Per-Model Configuration)​

4. user_routing_preferences (User Preferences)​

5. routing_decision_logs (Audit Trail)​

Detailed Process Flow​

Phase 1: Request Initiation​

Phase 2: Provider Discovery​

Phase 3: Configuration Loading​

Phase 4: Intelligent Scoring​

Strategy: Performance​

Strategy: Cost​

Strategy: Balanced​

Phase 5: Provider Selection​

Phase 6: Circuit Breaker Check​

Phase 7: Request Execution​

Phase 8: Metrics Recording​

Routing Strategies Explained​

1. Performance Strategy​

2. Cost Strategy​

3. Balanced Strategy (Default)​

4. Round-Robin Strategy​

Key Advantages​

1. Intelligent Provider Selection​

2. High Availability & Fault Tolerance​

3. Cost Optimization​

4. Performance Tracking​

5. User Empowerment​

6. Provider Ecosystem Benefits​

7. Operational Excellence​

8. Scalability​

9. Developer Experience​

10. Business Intelligence​

Performance Characteristics​

Routing Decision Speed​

Metrics Update​

Database Queries​

Security & Isolation​

Data Isolation​

Access Control​

API Key Management​

Future Enhancements​

Planned Improvements​

Configuration Examples​

Example 1: High-Performance Setup​

Example 2: Cost-Optimized Setup​

Example 3: User Cost Control​

System Architecture Overview

Core Components Structure

Database Schema Architecture

1. provider_models (Source of Truth)

2. provider_model_metrics (Real-time Performance)

3. model_routing_config (Per-Model Configuration)

4. user_routing_preferences (User Preferences)

5. routing_decision_logs (Audit Trail)

Detailed Process Flow

Phase 1: Request Initiation

Phase 2: Provider Discovery

Phase 3: Configuration Loading

Phase 4: Intelligent Scoring

Strategy: Performance

Strategy: Cost

Strategy: Balanced

Phase 5: Provider Selection

Phase 6: Circuit Breaker Check

Phase 7: Request Execution

Phase 8: Metrics Recording

Routing Strategies Explained

1. Performance Strategy

2. Cost Strategy

3. Balanced Strategy (Default)

4. Round-Robin Strategy

Key Advantages

1. Intelligent Provider Selection

2. High Availability & Fault Tolerance

3. Cost Optimization

4. Performance Tracking

5. User Empowerment

6. Provider Ecosystem Benefits

7. Operational Excellence

8. Scalability

9. Developer Experience

10. Business Intelligence

Performance Characteristics

Routing Decision Speed

Metrics Update

Database Queries

Security & Isolation

Data Isolation

Access Control

API Key Management

Future Enhancements

Planned Improvements

Configuration Examples

Example 1: High-Performance Setup

Example 2: Cost-Optimized Setup

Example 3: User Cost Control