Senior GPU Engineer
- $150k-$220k
- ID: 4297
- Posted: 06.11.25
Senior GPU Engineer – Decentralised Compute Infrastructure – Global Remote
Plexus are working with one of the most exciting teams in the decentralised computing space. With a focus on enhanced scalability, efficiency, cost and security.
With a recent raise of 8figures, they are looking to onboard a hands on Senior GPU Engineer into the team.
Responsibilities
- Design and manage multi-tenant GPU clusters using Kubernetes, Slurm, or similar platforms.
- Develop schedulers and resource-sharing tools to optimize GPU utilization and efficiency.
- Build autoscaling systems for training and inference workloads (Ray, Run:AI, Volcano, KubeFlow)
- Implement monitoring and observability for GPUs, network, and job performance (Prometheus, Grafana, OpenTelemetry).
- Profile and optimize compute throughput and cost efficiency (NCCL, CUDA, ROCm, GPUDirect, RDMA, InfiniBand).
- Collaborate on high-throughput I/O and data pipelines (Lustre, Ceph, S3, NVMeoF, Alluxio).
Requirements
- 2+ years managing distributed GPU infrastructure in production.
- Strong experience with Kubernetes or Slurm, and Linux systems.
- Skilled in Python/Go/C++, automation, and infrastructure-as-code (Terraform, Helm).
- Familiar with CUDA/NCCL/ROCm, Ray/Run:AI/Volcano, and high-speed networking (InfiniBand, RoCE).
- Knowledge of AI storage systems (Lustre, Ceph, S3) and performance optimization.
- Excellent collaboration and communication skills across technical teams.
Offer/Benefits
- Up to $220k base
- Fully remote, work-from-anywhere
Sound like you, or someone you know? Please apply
