Senior Data Engineer
Indegene
Job Description
We are a technology-led healthcare solutions provider. We are driven by our purpose to enable healthcare organizations to be future-ready. We offer accelerated, global growth opportunities for talent that’s bold, industrious, and nimble. With Indegene, you gain a unique career experience that celebrates entrepreneurship and is guided by passion, innovation, collaboration, and empathy. To explore exciting opportunities at the convergence of healthcare and technology, check out www.careers.indegene.com Looking to jump-start your career? We understand how important the first few years of your career are, which create the foundation of your entire professional journey. At Indegene, we promise you a differentiated career experience. You will not only work at the exciting intersection of healthcare and technology but also will be mentored by some of the most brilliant minds in the industry. We are offering a global fast-track career where you can grow along with Indegene’s high-speed growth. We are purpose-driven. We enable healthcare organizations to be future ready and our customer obsession is our driving force. We ensure that our customers achieve what they truly want. We are bold in our actions, nimble in our decision-making, and industrious in the way we work.
Role: Senior Data Engineer
Description:
You will be responsible for:
• Own the design, development, and optimization of scalable data pipelines using Databricks to enable reliable ingestion and transformation of large-scale datasets
• Collaborate with product owners, analytics teams, and business stakeholders to translate data and analytics requirements into robust Databricks-based solutions
• Design, implement, and maintain efficient data models (Lakehouse patterns) to support evolving analytical and business needs while ensuring performance and data integrity
• Build, enhance, and migrate complex ETL/ELT pipelines leveraging Databricks, Delta Lake, Spark, and cloud object storage
• Continuously improve performance, scalability, reliability, and cost efficiency of data pipelines through Spark optimization, cluster tuning, and best practices
• Extract, transform, and integrate data from heterogeneous data sources, including structured, semi-structured, and unstructured data
• Take ownership across the end-to-end data lifecycle including analysis, design, development, testing, deployment, monitoring, and production support
• Process and analyze large-scale datasets using PySpark and SQL to enable downstream analytics, reporting, and advanced use cases
• Implement and adhere to data governance, data quality, security, and compliance standards, ensuring proper documentation, lineage, and auditability within the Lakehouse
• Act as a technical owner for assigned data domains or pipelines, ensuring timely delivery and adherence to engineering standards
• Apply domain knowledge in pharma commercial analytics, including brand, customer, omnichannel, and content performance data
• Enable the creation of actionable KPIs and analytical datasets for marketing, brand, and digital operations teams
Must Have
Your impact: Candidate should be able to deliver cross functional projects with highest quality, mentor team to create next layer of leadership.
Desired Profile: We are seeking a dynamic and experienced Data Engineer to lead our talented team of data engineers/data analyst. In this role, you will be instrumental in shaping the architecture and infrastructure of our data systems, driving innovation, and ensuring the delivery of high-quality solutions. You will play a critical role in designing and implementing scalable data pipelines, optimizing data workflows, and leveraging advanced analytics techniques to drive business value. Pharma background preferred. Team management preferred.
Requirements:
• 4+ years of overall experience in data engineering with strong hands-on ownership of data pipelines
• Proven experience in ETL/ELT development, data modeling, and modern data architectures
• Strong ability to work with stakeholders and translate business requirements into technical solutions
• Experience working with Life Sciences / Pharma data is a strong advantage
• 4+ years of hands-on experience with Databricks, including notebooks, jobs, workflows, Databricks SQL, and Delta Lake
• Strong expertise in Apache Spark (PySpark) for large-scale data processing
• Hands-on experience with cloud object storage (ADLS / S3 / GCS) integrated with Databricks
• Strong programming experience in Python for data engineering use cases
• Advanced SQL skills, including complex joins, window functions, and performance tuning
• Experience working with relational databases such as MS SQL Server, Oracle, or MySQL
• Understanding of data quality, governance, and security concepts in enterprise data platforms
• Excellent problem-solving, analytical, and communication skills, with the ability to independently deliver complex tasks
Good to have
• Exposure to ML/AI
• Understanding on Gen-AI & Agentic AI