NeuroGraph Distributed Inference

Un moteur d'inférence de réseaux neuronaux distribué à faible latence. Optimise l'exécution des graphes sur des clusters hétérogènes via un ordonnanceur personnalisé.

Métriques

Latence :~12ms
Débit :1.2k req/s

Distributed Inference

NeuroGraph splits neural network computational graphs across multiple nodes. By leveraging gRPC and custom memory orchestration, it allows for LLM inference on clusters of consumer-grade hardware.

Smart Scheduling

The engine includes a heat-aware scheduler that monitors GPU temperatures and VRAM usage in real-time, dynamically routing weights to the most efficient available node.

Pile Moteur IA

GogRPCProtobufCUDATensorRTRedisKubernetesPrometheusPyTorch

Node_Main

Orchestrateur Principal

UPTIME: 342:12:08
CLUSTER_HEALTH: 100%

WORKER_1

WORKER_2

WORKER_3

WORKER_4