Verified Job On Employer Career Site Job Summary: TowneBank is seeking a forward-thinking Data Engineer to join their banking data team. In this role, you will design and optimize batch ETL workflows on a Databricks Lakehouse platform, ensuring high-quality data pipelines to support critical banking functions. Responsibilities: • Build and Integrate Data Pipelines: Design, integrate, and implement batch ETL processes for data from diverse source systems into our Databricks environment, contributing to the expansion and optimization of our cloud-based data lake (Lakehouse). • Data Quality and Integrity: Ensure pipelines meet high standards of data quality and integrity, implementing rigorous validation, cleansing, and enrichment processes on large volumes of banking data. Maintain historical data for auditability and regulatory compliance (leveraging Delta Lake’s ACID features for versioning). • Performance Optimization: Optimize data processing performance on Databricks (e.g. efficient Spark SQL, partitioning techniques) and manage ETL job scheduling and dependencies to meet business SLAs for data timeliness. • Governance and Compliance: Adhere to enterprise data governance policies and implement security best practices for sensitive financial data. Ensure compliance with banking regulations by enforcing access controls, encryption, and data lineage tracking across pipelines. • Cross-Team Collaboration: Work closely with data architects, analysts, and business stakeholders to gather requirements and translate banking domain needs into scalable data solutions. Collaborate with BI, risk, and data science teams to support analytics and machine learning initiatives with robust data feeds. • Continuous Improvement: Identify and implement improvements (including automating repeatable workflows) to enhance pipeline stability, efficiency, and future scalability. Keep the data platform up-to-date with industry best practices and emerging Databricks features. • Adheres to applicable federal laws, rules, and regulations including those related to Anti-Money Laundering (AML) and the Bank Secrecy Act (BSA). • Other duties as assigned. Qualifications: Required: • Bachelor’s degree in Computer Science or related field (or equivalent practical experience). • 3+ years of experience as a data engineer in complex, large-scale data environments, preferably in the cloud. • Strong hands-on expertise with Databricks and the Apache Spark ecosystem (PySpark, Spark SQL) for building large-scale data pipelines. • Experience working with Delta Lake tables and Lakehouse architectural patterns for data management. • Experience using Delta Live Tables to build automated, declarative ETL pipelines on Databricks. • Proficient in Python (including PySpark) for data processing tasks. • Solid coding skills in SQL for complex querying and transformation of data (Scala or Java experience is a plus). • Experience with at least one major cloud platform (AWS, Azure, or GCP) and its data services (e.g., S3, Azure Data Lake Storage, BigQuery). • Familiarity with cloud-based ETL tools and infrastructure (e.g., Azure Data Factory, AWS Glue) for scalable storage and processing. • Strong understanding of data modeling and data warehousing concepts, including designing relational schemas and dimensional models (OLTP/OLAP, star schemas, etc.) for analytics. • Experience designing end-to-end data pipeline architectures, including orchestration and workflow scheduling. • Familiarity with pipeline orchestration tools (Databricks Jobs, Apache Airflow, or Azure Data Factory) to automate and manage complex workflows. • Hands-on experience implementing data quality checks (unit tests, data validation rules) and monitoring in ETL pipelines to ensure accuracy and consistency of data outputs. • Knowledge of data governance standards and security best practices for managing sensitive data. • Understanding of compliance requirements in banking (e.g., encryption, PII handling, auditing) and ability to enforce data access controls and documentation of data lineage. • Experience using version control (Git) and CI/CD pipelines for code deployment. • Comfortable with DevOps practices to package, test, and deploy data pipeline code in a controlled, repeatable manner. • Strong problem-solving skills with an ability to troubleshoot complex data issues. • Capable of translating business requirements into efficient, reliable ETL solutions, and optimizing workflows for performance and cost-efficiency. Preferred: • Familiarity with the banking sector’s data and processes (e.g. retail banking transactions, investment trading data, fraud detection, risk analytics) is a strong plus. • Exposure to real-time data streaming and event-driven architectures. • Knowledge of Spark Structured Streaming or Kafka for ingesting and processing streaming data alongside batch workflows is a plus. • Experience building data pipelines for regulatory reporting or compliance use-cases in finance. • Familiarity with ensuring consistency, integrity, and timeliness of data in regulatory pipelines (e.g. for CCAR, AML, or Basel reporting) would set a candidate apart. • Understanding of DataOps techniques (automated testing, monitoring, and CI/CD for data pipelines) or MLOps integration to support machine learning data requirements. • Experience with tools and frameworks that improve the automation and reliability of data workflows is a plus. • Relevant industry certifications can be an advantage – for example, Databricks Certified Data Engineer or cloud platform certifications in data engineering. Company: TowneBank is a relationship-based enterprise that provides a full range of banking and financial services. Founded in 1999, the company is headquartered in Portsmouth, Virginia, USA, with a team of 1001-5000 employees. The company is currently Public Company. TowneBank has a track record of offering H1B sponsorships.