Principal Graph Data Engineer
AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development and commercialization of prescription medicines for some of the world's most serious diseases. But we're more than one of the world's leading pharmaceutical companies. At AstraZeneca, we're proud to have an outstanding workplace culture that encourages innovation and collaboration. Here, employees are empowered to express diverse perspectives and are made to feel valued, energized and rewarded for their ideas and creativity.
About the role
Are you interested in building data products that could benefit millions of patients? We are assembling a new team committed to building and analysing disease knowledge graphs, comprising diverse skillsets: data scientists, bioinformaticians, software engineers and machine learning engineers with an interest in graph and natural language processing. This team will transform our ability to surface key biological insights improving our understanding of oncology, respiratory and cardiovascular disease, targets and drug response..
We are looking for a Backend Engineer with a focus on Knowledge Graphs and NLP to join our Data Science & AI team in Cambridge.
What you’ll do
We are working in collaboration with our scientists to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials.
Our team empowers our scientists from early development to the late stages in drug development, driving innovation and acting as a catalyst for the adoption of the latest advances in Artificial Intelligence and Data Science.
As part of the team, you will contribute to the design and development of data pipelines and machine learning productsat scale to build and interrogate knowledge graphs, using cloud tools and your expertise with spark, docker, python or scala
You will use your expertise in mining massive datasets using map reduce, approximate nearest neighbour, locally sensitive hashing and other similar approaches to analyse and optimize graph structures.
You will help build deep learning algorithms running on distributed GPU clusters to infer new edges, recommend similar nodes and interpret novel biological hypothesis.
You will contribute to the development of APIs and backend services used to exploit the biological insights derived by knowledge graphs.
The algorithms you will help develop will help integrate knowledge from biomedical literature, public data sources, and proprietary clinical and pre-clinical multi-layered data to:
- Discover novel drug targets.
- Identify patterns defining distinct patient groups and biomarkers.
- Understand drug mechanism of action and safety.
- Build recommendation systems supporting decision-making in drug programs.
You will help the team educate the AZ scientific community to adopt a data-first culture, working alongside domain experts to design projects.
You will collaborate with the burgeoning Data Science community across AZ to benchmark and maximise impact, establishing best practices for data engineering, sharing code and peer insight.
You will maintain awareness of state-of-the-art applications of knowledge graphs for drug discovery, and influence strategic decision of the group. You will identify and lead external interactions with opinion leaders in the field, and grow our external reputation by publishing innovative methodologies and scientific discoveries.
What you’ll need
- MS in Computer Science or related quantitative field
- 2+ year backend experience as proven by a product or academic publications
- You have coded and optimized pipeline in spark
- You have working knowledge of packaging, containerization and orchestration tools such as Kubernetes
- You know how to scale server architectures, web services and distributed systems
- Developed software as part of a team and familiar with version control, CI/CD and tooling.
- Strong software development skills, with proficiency in Python and Scala preferred
- Proven experience building large scale data processing pipelines
- Experience with Cloud infrastructure and services
- Creative, collaborative, & product focused
Bonus points if
- Proven experience or demonstrable deep technical skills in one or more of the following areas: machine learning, recommendation systems, pattern recognition, natural language processing or computer vision.
- You have built fast and scalable REST, GraphQL or SPARQL APIs
The role will have no direct line reports, but task management responsibilities within project or services may occur
Department – Data & Analytics, S&EUIT
Science and Enabling Units IT is a global IT capability supporting Drug Research, Drug Development, Product & Portfolio Strategy, Medical Affairs, Finance, HR, Compliance, Legal and Global Business Services. We are organized around 7 key capability areas: Business Partnering, Solution Delivery, Architecture, Application Support, Data & Analytics, Change & Operations, operating out of sites across the US, UK, Sweden, India and Mexico.
Data & Analytics provides analytics and data insight services and solutions critical to the Data & AI/ML emerging strategy and mission of S&EUIT and AZ. D&A is organized into teams specializing in Information Architecture, Data Engineering, Data Visualisation, Knowledge Management, Data Science, Data Analysis and Information Governance.
AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorisation and employment eligibility verification requirements.