Data Engineer (AWS)

Infosys Guadalajara, Mexico

Company

Infosys

Location

Guadalajara, Mexico

Type

Full Time

Job Description

Required Qualifications:

  • 5+ years of experience in data engineering using Python with a focus on AWS S3, EMR, Glue, Step Functions, Apache NiFi and Spark.
  • Proven track record of building scalable data pipelines in cloud environments.
  • Proficiency in flow design, processors, and data provenance in Apache NiFi.
  • Strong expertise in Spark, Hadoop, and distributed computing on AWS EMR.
  • In-depth knowledge of AWS services (S3, Glue, Redshift, RDS, Lambda, Step Functions).
  • Experience with data formats (JSON, CSV, Parquet, Avro) and transformation techniques.
  • Strong problem-solving skills and ability to troubleshoot complex data processing issues.
  • Excellent communication skills with the ability to document and explain technical details clearly.
Preferred Qualifications:

Want more jobs like this?

Get jobs in Guadalajara, Mexico delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • AWS Certified Solutions Architect or Data Analytics Specialty.
  • Experience with data governance frameworks and compliance requirements.
  • Familiarity with CI/CD pipelines and version control (GitLab, Jenkins).
Key Responsibilities:
Design & Develop Data Pipelines:
  • Architect and implement end-to-end data pipelines using AWS S3, EMR, Glue, Step Functions, Apache NiFi, Spark.
  • Manage data ingestion processes from AWS S3, ensuring secure and efficient data transfer.
  • Implement initial data routing, validation, and transformations using Apache NiFi processors and Spark Data Engines
Data Processing & Transformation:
  • Integrate using AWS EMR, Apache NiFi, Spark to perform complex data transformations and analytics.
  • Optimize Spark jobs for processing large-scale datasets with a focus on performance and resource utilization.
  • Handle both historical and incremental data loads, ensuring data consistency and integrity.
Data Storage & Management:
  • Define and implement data storage strategies across S3, RDS, and Redshift, adhering to business requirements.
  • Manage data catalog creation and schema management using AWS Glue.
Automation & Orchestration:
  • Develop and manage workflows using Apache Airflow, AWS Step Functions to automate data processing tasks.
  • Implement monitoring, error handling, and retries within the orchestration framework.
Security & Compliance:
  • Ensure data security with encryption (AES-256, TLS) and IAM role-based access controls.
  • Implement data governance policies using AWS Glue Data Catalog to ensure compliance with regulatory requirements.
Performance Monitoring & Optimization:
  • Utilize AWS CloudWatch to monitor the performance of EMR clusters, NiFi flows and data storage.
  • Continuously optimize Spark job configurations and NiFi data flows for maximum throughput and minimal latency.

Apply Now

Date Posted

12/21/2024

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.95

Similar Jobs

Online Data Analyst: Spanish Language (Remote Position) - TELUS Digital AI Data Solutions

Views in the last 30 days - 0

This freelance opportunity allows you to work as an online data analyst from home enhancing digital maps used by millions worldwide The role involves ...

View Details

EA Specialist II - BigCommerce

Views in the last 30 days - 0

The Administrative Specialist II serves as the primary point of contact for internal constituencies on matters pertaining to the senior executives the...

View Details

Scaled Customer Success Manager - Apollo.io

Views in the last 30 days - 0

The role involves managing a large customer portfolio in AMER and LATAM driving Apollo product adoption and expanding the customer base through tailor...

View Details

Senior Engineer II, Payments - TrueML

Views in the last 30 days - 0

TrueML is a missiondriven financial software company that aims to create better customer experiences for distressed borrowers They use machine learnin...

View Details

Data Scientist II - TrueML

Views in the last 30 days - 0

TrueML is a missiondriven financial software company that uses machine learning to create personalized digital experiences for distressed borrowers Th...

View Details

Director of Data & Site Analytics - Newsela

Views in the last 30 days - 0

The company is seeking a highly experienced Data Analytics Contractor preferably fluent in Spanish or Portuguese to lead product analytics and busines...

View Details