Core Solution Features
- SLURM-based job orchestration
- On-demand scalable compute nodes
- FSx for Lustre high-performance storage
- Multi-account governance & guardrails
- Real-time monitoring & reporting
The customer’s advanced R&D workloads required massive compute scale and stronger operational controls.
On-prem HPC clusters could not expand fast enough for increasing CFD, aero-acoustic, transient, & PowerFLOW simulation workloads.
High-memory & CPU-heavy engineering jobs suffered delays, impacting product design cycles & extending experiment turnaround times.
Global R&D operations required a geographically distributed architecture with guardrails and governance, which did not exist across their HPC estate.
Monitoring job queues, resource utilization, & cost consumption was manual & reactive, causing inefficiencies & delayed insights.
End-to-end HPC migration and modernization using AWS ParallelCluster and AWS-native security, monitoring, and governance frameworks.
Set up a hardened multi-account architecture using AWS Control Tower, AWS Organizations, and Service Control Policies (SCPs), aligned with AWS Security Reference Architecture.
Migrated CFD DOE Optimization, aero-acoustic CFD, transient CFD, and PowerFLOW workloads to AWS HPC clusters orchestrated via Slurm on AWS ParallelCluster, with on-demand compute nodes and virtual workstations.
Integrated Amazon FSx for Lustre for high-throughput scratch storage with seamless Amazon S3 connectivity for persistent data.
Delivered Amazon CloudWatch dashboards and custom monitoring for CPU, memory, job performance, queue health, and cost visibility—with proactive alerts and automated reporting at user, queue, and cluster levels.
Provided performance benchmarking pre-migration, and ongoing optimization, troubleshooting, patching, and cost governance post-migration.
The HPC migration delivered measurable impact across performance, scalability, and governance.
Run complex CFD and aero-acoustic simulations directly from optimized AWS HPC clusters using SLURM and virtual workstations.
Track job concurrency, queue utilization, CPU/memory performance, and simulation health using dashboards and alerts.
Monitor cluster spend, automate reports, and leverage spot instances and autoscaling to minimize compute costs.
Access automated reports covering cluster-level, queue-level, and user-level consumption patterns.
SLURM Job Scheduler
AWS EC2
Amazon S3
CloudWatch
GuardDuty
Security Hub
Amazon Inspector
AWS Control Tower
FSx Lustre
AWS Parallel Cluster
Dassault Systèmes PowerFLOW
CFD DOE
Learn how Apollo Tyres leveraged Tachyon to scale engineering simulations & drive innovation