The control plane is responsible for three main areas of functionality:

  1. Reliable job orchestration
  2. LLM Ops
  3. Observability

1. Reliable Job Orchestration

The control plane ensures efficient and dependable execution of jobs within the Inferable system:

  • Job Scheduling: Manages the queue of incoming jobs and allocates resources effectively.
  • Parallelisation: Improves performance by running Jobs in parallel where appropriate.
  • Fault Tolerance: Implements mechanisms to handle worker failures and ensure job completion.
  • State Management: Maintains the state of ongoing jobs and tasks, enabling resume capabilities in case of interruptions.
  • Load balancing: Distributes jobs across available machines to improve utilisation.
  • Job Prioritization: Implements prioritization schemes to handle urgent or high-priority jobs effectively.
  • Retries and Backoff: Manages automatic retries with exponential backoff for failed tasks or jobs.

2. LLM Ops

The control plane handles various aspects of Large Language Model operations:

  • Model Routing: Manages the routing of requests to appropriate model versions.
  • Inference Optimization: Implements techniques like dynamic batching to optimize inference throughput.
  • Resource Management: Allocates and manages resources for LLM inference, implementing intelligent scheduling.
  • Model Monitoring: Tracks model performance metrics in real-time (e.g., latency, accuracy).
  • Knowledge Management: Manages the knowledge base of the LLM, including updates and versioning.

3. Observability

The control plane provides comprehensive observability into the Inferable system:

  • Metrics Collection: Gathers performance metrics from various parts of the system, including job execution times, resource utilization, and model performance.
  • Distributed Tracing: Implements tracing to track requests and jobs as they flow through different components of the system.
  • Alerting: Sets up and manages alerts for various system health and performance indicators.
  • Dashboarding: Provides interfaces for visualizing system performance and job statistics.
  • Performance Analysis: Offers tools for analyzing system performance and identifying bottlenecks.
  • Audit Trails: Maintains detailed audit logs for all significant operations within the system.

By effectively managing these three key areas, Inferable’s control plane ensures robust, efficient, and transparent operations, enabling reliable execution of AI workloads while providing deep insights into system performance and behavior.