About the role
What You Do
- Support the development and maintenance of data pipelines, ingestion processes, and data transformations.
- Create and maintain SQL queries, Python scripts, and Spark-based workloads used for data processing and analytics.
- Assist in troubleshooting pipeline failures, data quality issues, and operational incidents.
- Work with senior engineers to implement schema mappings, transformation logic, and data validation rules.
- Ensure datasets meet expected schemas, data contracts, and quality standards.
- Support metadata management, dataset documentation, and lineage activities.
- Assist in maintaining data classification information according to company standards.
- Help automate repetitive operational and data management tasks to improve efficiency and reliability.
- Contribute to monitoring, alerting, and operational support for data pipelines and workflows.
- Participate in testing activities, including unit tests, transformation validation, and data quality checks.
- Follow established engineering standards, coding practices, and team development patterns.
- Learn and apply security, privacy, and compliance requirements when handling sensitive or regulated data.
- Collaborate with Data Governance, Security, and Compliance teams when required.
- Contribute to continuous improvement initiatives focused on data trust, reliability, and operational excellence.
Requirements and Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, Information Systems, Data Science, Software Engineering, or related fields.
- Basic to intermediate English.
- Up to 2 years of experience in Data Engineering, Software Engineering, Data Analytics, or related areas.
- Knowledge of SQL and Python.
- Understanding of ETL/ELT concepts and data transformation processes.
- Familiarity with relational databases and data warehousing concepts.
- Basic knowledge of Spark, Databricks, or distributed data processing frameworks.
- Familiarity with Git and version control workflows.
- Basic understanding of cloud platforms such as AWS, Azure, or Google Cloud.
- Knowledge of automation concepts and scripting for operational efficiency.
- Basic understanding of data quality concepts and validation practices.
- Familiarity with data governance principles, including metadata, ownership, stewardship, and documentation.
- Basic knowledge of data classification concepts (Public, Internal, Confidential, Restricted).
- Understanding of data lineage and traceability concepts.
- Awareness of security best practices, including access management, secrets management, and least-privilege principles.
- Strong analytical, problem-solving, and communication skills.
- Willingness to learn new technologies and collaborate across teams.
Security, Compliance & Governance
- Follow company standards for handling sensitive and regulated data.
- Apply data classification requirements when creating or maintaining datasets and pipelines.
- Use approved authentication, authorization, and secrets management mechanisms.
- Avoid exposing sensitive information through logs, exports, testing data, or documentation.
- Support auditability by maintaining documentation, metadata, and lineage information.
- Escalate security, privacy, or compliance concerns when requirements are unclear.
- Follow established governance processes and contribute to improving data trust across the organization.
How You Work
- Demonstrate curiosity and a continuous learning mindset.
- Write clean, readable, and maintainable code.
- Follow coding standards, testing practices, and development workflows.
- Communicate progress, blockers, and technical questions clearly.
- Participate in code reviews and knowledge-sharing activities.
- Take ownership of assigned tasks while escalating risks or uncertainties appropriately.
- Contribute positively to team collaboration and a culture of continuous improvement.
Nice to Have
- Experience with Databricks, dbt, or similar technologies.
- Familiarity with CI/CD tools such as GitHub Actions, Azure DevOps
- Familiarity with APIs, JSON, event-driven architectures, or messaging systems.
- Exposure to vulnerability scanning, secret scanning, or secure development practices.
- Understanding of privacy regulations such as LGPD, GDPR, or similar frameworks.