Database Engineer (starting March '25)
Description du poste
We are seeking a skilled Data/Database Engineer to join our team and take ownership of managing, optimizing, and monitoring our data infrastructure. The ideal candidate will have experience working with both SQL and NoSQL databases, building and maintaining ETL pipelines, and ensuring seamless data operations. This role requires expertise in cloud platforms (preferably Google Cloud), containerization, and continuous integration/continuous deployment (CI/CD) practices. The engineer will also ensure data security and compliance with regulations such as GDPR/CCPA.
Your responsibilities include:
1/ Data Pipeline Management:
Design, implement, and maintain scalable ETL processes using PySpark and SparkNLP
Manage data pipelines using GCP Workflows for scheduling and orchestrating jobs
Ensure seamless integration and management of data systems to maintain continuous operation.
2/ Database Management:
a) SQL Databases:
Manage and optimize PostgreSQL databases for transactional data and relational database management.
Regularly optimize queries and indexes to ensure high-performance operations.
Implement automated backup and recovery solutions for PostgreSQL to prevent data loss.
b) NoSQL Databases:
Manage and optimize NoSQL datasets using Delta Lake for large-scale data.
Ensure NoSQL infrastructure scalability to handle increasing data volumes.
3/ Infrastructure & Deployment:
Deploy data applications on cloud platforms like Google Cloud.
Utilize Docker for containerized environments and ensure consistency across development, testing, and production environments.
Leverage GCP services for deployment, scaling, and monitoring of data applications.
Set up and manage CI/CD pipelines using GitHub Actions to automate testing, deployment, and version control.
4/ Monitoring & Performance Optimization:
Monitor data processing systems for latency, throughput, and error rates to ensure optimal performance.
Ensure data quality by regularly checking for consistency, completeness, and accuracy across databases and pipelines.
Implement centralized logging using Google Cloud Logging to aggregate logs from multiple sources.
5/ Security & Compliance:
Ensure the encryption of data both at rest and in transit.
Implement role-based access control (RBAC) to secure data and model endpoints.
Maintain compliance with regulations such as GDPR and CCPA, including detailed audit logging for model training and data access.
6/ Documentation & Communication:
Document API endpoints and data pipelines using tools like Swagger for ease of maintenance and onboarding.
Provide data flow diagrams, ETL process documentation, and data schema explanations.
Set up alerts using Google Cloud Monitoring and Slack for real-time issue notifications.
Generate and share performance reports to keep stakeholders informed and facilitate data-driven decision-making.
Required Skills & Qualifications:
Minimum 5 year of experience on both SQL (PostgreSQL) and NoSQL (Delta Lake, Firestore, MongoDB) databases
Experienced in Python and GCP, experienced in AWS is a plus
Proficient in PySpark, SparkNLP, and data pipeline orchestration tools (e.g., GCP Workflows).
Expertise in containerization (Docker) and CI/CD pipelines (GitHub Actions).
Knowledge of performance metrics (latency, throughput, error rates) and data quality checks (consistency, completeness, accuracy).
Understanding of data encryption, access control (RBAC), and compliance with GDPR/CCPA.
API development (REST/GraphQL) and ML pipeline integration.
Strong scripting (Python/Bash) and experience with automation (Terraform, Ansible).
Familiar with monitoring tools (Prometheus, Grafana, ELK stack) and big data frameworks.
Excellent communication skills and the ability to document and report on technical processes.
D&M believes diversity drives innovation and is committed to creating an inclusive environment for all employees. We welcome candidates of all backgrounds, genders, and abilities to apply. Even if you don’t meet every requirement, if you’re excited about the role, we encourage you to go for it—you could be exactly who we need to help us create something amazing together!
A propos de l'entreprise
Descartes & Mauss is the first end-to-end AI-powered SaaS platform automating the decision-making process. Founded in 2021, D&M models the future to build critical paths for companies to find growth and resilience.
By combining advanced data modelling capabilities with a creative methodology rooted in social sciences, D&M de-risks decision-making and gives managers back their capacity to act.
Since May 2024, we have successfully raised €5.5 million in funding from Elaia and Polytechnique Ventures. In the coming months, our goals are to expand our commercial reach and enhance our customer platform 🚀