Data Engineer, Production Informatics, Oncology
Data Engineer, Production Informatics, Oncology (Boston, MA, USA)
At AstraZeneca, we are united by a common purpose: to push the boundaries of science to deliver life-changing medicines. Our work helps to make hearts healthier, helps people breathe easier, and helps more people survive cancer. Every single day, we make a difference by delivering potentially life-changing medicines to millions of people worldwide. We have exciting opportunities for people passionate about the power of the data engineering, infrastructure and architecture to enable our data scientists.
A significant investment in state-of-the-art data science and AI is at the forefront of our innovation program. This will include assembly of teams to deliver a FAIR science data foundation, AI-augmented drug design, and data science led biological insights. As part of this initiative you will support us in positioning AstraZeneca at the forefront of drug discovery, building systems to enrich and transform our data to discover and develop new medicines in the future.
Apply today to be part of something extraordinary.
Main Duties and Responsibilities
There is an exciting opportunity for a talented and motivated engineer, eager to bring data together in new ways, to join the group as Data Engineer. In this role, you will:
- Help us create AI / ML ready datasets from Petabytes of raw data and metadata
- Automate integration of different data-sources into a coherent flow
- Develop and build systems and architectures for ETLs
- Perform system & data testing
- Build algorithms to support data normalization and result calculation
- Understand and apply FAIR data principles
- Ensure strong adherence to compliance & regulatory environments
- Computer Science, Engineering, or Bioinformatics (Master level) plus 5 years relevant experience
- Good programming skills (experience in Python preferred)
- An ability to interact with various data sources, both structured and unstructured (e.g. HDFS, SQL, noSQL)
- Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems)
- Web based frameworks (e.g. Django)
- Data structure servers, caches, message brokers (e.g. Redis)
- Expertise with biological/health data, especially NGS and other ‘omic technologies
- Experience modelling data and information for graph/network representation
- Experience working with metadata models, controlled vocabularies, and ontologies
- Ability to understand, map, integrate, and document complex data relationship and business rules
- Familiarity with data quality, cleaning, and masking techniques
Next Steps – Apply today!
To be considered for this exciting opportunity, please complete the full application on our website at your earliest convenience – it is the only way that our Recruiter and Hiring Manager can know that you feel well qualified for this opportunity. If you know someone who would be a great fit, please share this posting with them.
AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorisation and employment eligibility verification requirements.