Company Description:
About Sutherland:
Artificial Intelligence. Automation. Cloud engineering. Advanced analytics. For business leaders, these are key factors of success. For us, they’re our core expertise.
We work with iconic brands worldwide. We bring them a unique value proposition through market-leading technology and business process excellence.
We’ve created over 200 unique inventions under several patents across AI and other critical technologies. Leveraging our advanced products and platforms, we drive digital transformation, optimize critical business operations, reinvent experiences, and pioneer new solutions, all provided through a seamless “as a service” model.
For each company, we provide new keys for their businesses, the people they work with, and the customers they serve. We tailor proven and rapid formulas, to fit their unique DNA. We bring together human expertise and artificial intelligence to develop digital chemistry. This unlocks new possibilities, transformative outcomes and enduring relationships.
Sutherland
Unlocking digital performance. Delivering measurable results
Job Description:
• Build and maintain platform automation for provisioning, deployment, patching, and remediation tasks.
• Enhance observability frameworks, implementing monitoring, logging, and alerting for Linux workloads.
• Design and implement health checks, SLIs/SLOs for availability and reliability tracking.
• Collaborate with application and DevOps teams to ensure services follow reliability best practices.
• Develop and maintain CI/CD pipelines using Jenkins, GitLab CI, or equivalent tools.
• Implement Infrastructure-as-Code (IaC) solutions using Terraform, Ansible, or CloudFormation.
• Participate in readiness reviews for new service releases.
• Conduct root cause analysis and lead post-incident reviews to drive reliability improvements.
• Partner with Information Security teams to enforce compliance, patching, and hardening standards.
• Optimize system performance and capacity across compute, storage, and container environments.
• Automate recurring operational tasks to enhance efficiency and reduce manual intervention.
Qualifications:
• Bachelor’s degree in Computer Science, Engineering, or related discipline (or equivalent experience).
• 7+ years of hands-on experience in large-scale Linux system engineering, reliability, or operations.
• 3+ years designing, implementing, and maintaining enterprise distributed systems.
• Expertise in Linux distributions and associated system services.
• Strong knowledge of cloud environments and hybrid infrastructure models.
• Proficiency in Bash, Python, and infrastructure automation
• Hands-on experience with CI/CD, configuration management, and version control systems
• Solid understanding of containerization and orchestration.
• Proven troubleshooting and performance tuning skills for distributed and containerized systems.
• Familiarity with observability tools (Prometheus, Grafana, ELK, Datadog).
• Strong grasp of networking, DNS, TLS, and load balancing concepts.
• Excellent communication and collaboration skills with cross-functional teams.
Additional Information:
• Cloud-native Linux deployments on AWS, Azure, or GCP.
• Experience with service mesh (Istio, Linkerd) and API gateways.
• Exposure to automation frameworks for security hardening (OpenSCAP, CIS benchmarks).
• Experience with log analytics, distributed tracing (Jaeger, OpenTelemetry).
• Familiarity with database performance tuning (MySQL, PostgreSQL, or MongoDB).
• Scripting for continuous compliance and infrastructure drift management.
• Experience in managing Linux-based container platforms and Kubernetes clusters at scale.
** The candidate should be willing to work in core US shift schedules. **