Director of SaaS Platform Operations
Are you ready to be part of the future of healthcare? Are you able to think big, be bold and harness the power of digital and AI to take on longstanding life sciences challenges? Then Evinova, a new healthtech business, part of the AstraZeneca Group might be for you! Transform billions of patients’ lives through technology, data and groundbreaking ways of working.
The Platform Operations team plays a critical role in ensuring the seamless delivery of our pharmaceutical clinical SaaS products by maintaining a highly reliable, secure, and scalable platform infrastructure hosted in AWS. This team is responsible for monitoring the health and performance of the platform, managing cloud resources, automating operations through infrastructure-as-code and CI/CD pipelines, and ensuring compliance with industry standards such as Good Clinical Practices (GCP). The team collaborates closely with engineering, product, and security teams to ensure platform capabilities meet business and customer needs while driving operational excellence.
As the Lead, you will oversee the day-to-day operations and long-term strategy of our platform, ensuring 24/7 availability, performance, and cost-efficiency. You will spearhead efforts to build a proactive, automated, and highly resilient operations environment, focusing on incident management, performance optimization, and disaster recovery. You will establish and enforce platform operational standards, drive compliance with pharmaceutical industry regulations, and act as a key partner for cross-functional teams during product launches and updates. By prioritizing mentorship, you will guide and grow a high-performing team, fostering a culture of collaboration, innovation, and accountability. We require a deep understanding of cloud-native technologies, AWS infrastructure, and the unique demands of regulated industries, foundational to the successful delivery of impactful solutions to the pharmaceutical industry.
Accountabilities
Operational Excellence
Monitor platform health using tools for observability, logging, and metrics (e.g., Prometheus, CloudWatch, Datadog).
Respond to and resolve incidents quickly, with a focus on minimizing downtime and ensuring service level agreements (SLAs).
Conduct root cause analyses (RCA) and implement corrective measures for recurring issues.
Continuously analyze and optimize system performance to meet user demand and ensure cost efficiency.
Monitor application workloads and tune the system for scalability and responsiveness.
Implement and maintain backup and disaster recovery solutions.
Design for fault tolerance and redundancy to ensure high availability.
Platform Administration
Manage and optimize the underlying cloud environment (e.g., AWS, Azure, GCP).
Monitor resource utilization and optimize costs while maintaining required performance levels.
Maintain automation scripts and infrastructure-as-code (IaC) tools (e.g., Terraform, Ansible).
Ensure seamless deployment of updates and new features using CI/CD pipelines.
Forecast future resource needs and plan for scaling up or down to meet changing demands.
Security and Compliance
Enforce strict role-based access controls (RBAC) and manage user permissions.
Ensure systems are up-to-date with patches and security updates.
Work with the security team to identify and mitigate vulnerabilities in the infrastructure.
Implement controls and documentation processes to meet regulatory and industry standards (e.g., GCP, HIPAA, GDPR, SOC 2).
Automation and Tooling
Develop scripts and tools to automate manual tasks, such as provisioning, monitoring, and scaling resources.
Maintain and enhance IaC practices to ensure consistency, traceability, and rapid provisioning.
Collaboration and Enablement
Work closely with product, engineering, and security teams to ensure platform requirements are met.
Act as the operational liaison for product launches, updates, and incident resolutions.
Provide operational tools, guidelines, and frameworks for development teams to manage and deploy applications.
Reporting and Documentation
Track and report platform uptime, response times, and other key performance indicators (KPIs).
Maintain up-to-date documentation for processes, configurations, and troubleshooting procedures.
Continuous Improvement
Conduct post-mortem meetings for incidents to identify areas for improvement.
Use feedback from internal and external stakeholders to refine platform capabilities.
Continuously evaluate emerging technologies and practices to improve platform operations.
Vendor & Contract Management
Act as the primary point of contact for the outsourced team, ensuring service level agreements (SLAs) are met and deliverables align with the company’s operational requirements.
Regularly review the performance of the outsourced team, provide constructive feedback, and ensure their work meets the company's standards for quality, compliance, and efficiency.
Facilitate seamless collaboration between the contract team and internal teams, aligning their efforts with the broader platform and product goals.
Develop and maintain documentation and knowledge-sharing practices to ensure continuity of operations and enable a smooth transition to an in-house team in the future.
Ensure the contract team adheres to company policies, industry regulations (e.g., GCP), and security protocols, addressing any gaps proactively.
Essential Skills/Experience
Significant experience in SaaS platform operations or related fields, with substantial experience in a leadership role.
Technical degree or equivilent
Experience running operations for a cloud-native SaaS platform.
Familiarity with operational requirements in regulated industries, particularly the pharmaceutical or healthcare sector is highly preferred.
Demonstrated ability to lead post-incident reviews and implement solutions for system improvements.
Experience managing budgets including cloud cost optimization and vendor contract negotiation.
Deep knowledge of AWS services, including EC2, S3, RDS, Lambda, CloudFormation, and CloudWatch.
Experience with monitoring and optimizing cloud infrastructure for scalability and cost-efficiency.
Proficiency with tools like Terraform, Ansible, or CloudFormation for automating resource provisioning and management.
Hands-on experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline.
Strong scripting skills in Python, Bash, or similar for automation and troubleshooting.
Familiarity with observability tools like Datadog, Prometheus, or ELK stack.
Experience managing outsourced or contract teams, including monitoring performance and ensuring SLA adherence.
Ability to collaborate effectively with engineering, product, and security teams.
Strong written and verbal communication skills for reporting and documentation.
Proven track record of hiring, mentoring, and building high-performing operations teams. Deep knowledge of cloud platforms, specifically AWS, microservices architecture, API design, and database management (SQL/NoSQL).
Strong understanding of security principles, including encryption, identity management, and compliance with regulations like GDPR or HIPAA.
AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorization and employment eligibility verification requirements.