NeuroGraph Distributed Inference
Un moteur d'inférence de réseaux neuronaux distribué à faible latence. Optimise l'exécution des graphes sur des clusters hétérogènes via un ordonnanceur personnalisé.
Métriques
Latence :~12ms
Débit :1.2k req/s
Distributed Inference
NeuroGraph splits neural network computational graphs across multiple nodes. By leveraging gRPC and custom memory orchestration, it allows for LLM inference on clusters of consumer-grade hardware.
Smart Scheduling
The engine includes a heat-aware scheduler that monitors GPU temperatures and VRAM usage in real-time, dynamically routing weights to the most efficient available node.
Pile Moteur IA
GogRPCProtobufCUDATensorRTRedisKubernetesPrometheusPyTorch
Node_Main
Orchestrateur Principal
UPTIME: 342:12:08
CLUSTER_HEALTH: 100%
CLUSTER_HEALTH: 100%
WORKER_1
WORKER_2
WORKER_3
WORKER_4