Data Engineer | SQL • Python • Kafka • AWS Data Platforms
I design and operate AWS-native data platforms that support analytics, real-time processing, and machine learning workloads. My work focuses on building reliable, observable, and cost-aware data systems that teams can realistically operate in production environments.
I specialize in building end-to-end data pipelines: from ingestion and event streaming to transformation, storage, and analytical datasets.
My systems are designed using cloud-native architecture, event-driven systems, and distributed data processing principles.
US Citizen — Open to Remote Roles and Relocation to the United States
I transitioned into Data Engineering from an engineering background by building complete production-style systems that include:
- Data ingestion pipelines
- Batch and streaming processing
- Analytical data modeling
- Observability and monitoring
- Infrastructure automation
My focus is on building data systems that are resilient, scalable, and understandable for the teams operating them.
Core principle:
Data platforms must survive failures, scale predictably, and remain operable by real engineering teams.
- SQL for analytics and data transformation
- Python for data pipelines and automation
- Batch and Streaming Pipelines
- Event-Driven Data Architectures
- Data Modeling (Star Schema, Fact & Dimension tables)
- Incremental Data Processing
- ETL / ELT Pipeline Design
- Kafka Event Streaming
- Schema evolution and event versioning
- Reliable message ingestion patterns
- Event-driven decoupled architectures
- AWS Data Architecture
- Data Lakes and Data Warehouses
- Serverless data pipelines
- Data infrastructure automation
- Terraform (Infrastructure as Code)
- CloudFormation
- IAM least-privilege architecture
- Reproducible infrastructure environments
- Monitoring and metrics
- Pipeline reliability strategies
- Retry and failure handling
- Cost-aware architecture decisions
Tools: Python SQL Kafka AWS (S3, Lambda, Redshift, Glue, Athena, DynamoDB) Terraform CloudFormation Grafana CloudWatch Redis
Production-style data warehouse architecture designed to support analytics workloads and evolving datasets.
The system handles late-arriving events, incremental data ingestion, and analytical query optimization using star schema modeling.
- Star schema modeling for analytical performance
- Incremental ingestion pipelines to minimize compute costs
- Strategy for late-arriving transactional data
- Infrastructure defined using Terraform
- AWS Redshift Serverless
- Incremental batch pipelines
- Fact and dimension table modeling
- Reproducible infrastructure with IaC
- AWS Redshift
- S3
- Terraform
- SQL
- Data Modeling
Repository
https://github.com/tuni56/ecommerce-data-warehouse-redshift
Designed a serverless data lake architecture separating raw and curated datasets for scalable analytics workloads.
The system uses columnar storage and automated metadata discovery to enable efficient querying.
- Serverless-first architecture to eliminate idle compute
- Columnar data storage (Parquet)
- Automated schema discovery with Glue Crawlers
- Metadata catalog for discoverability
- Multi-layer S3 data lake
- AWS Glue catalog and ETL
- Athena for serverless querying
- Infrastructure automation
- AWS S3
- AWS Glue
- Athena
- Python
- CloudFormation
Repository
https://github.com/tuni56/serverless-aws-data-lake-with-kiro
Real-time event ingestion and processing pipeline designed to handle high-velocity event streams while maintaining reliability and observability.
The architecture demonstrates event-driven data ingestion patterns used in distributed systems and modern data platforms.
- Event-driven decoupling using Kafka
- Schema evolution using Schema Registry
- Monitoring-first system design
- Consumer reliability strategies
- Streaming ingestion pipeline
- Event routing and processing
- Operational monitoring dashboards
- Resilient message processing
- Kafka
- Python
- Redis
- Grafana
- Terraform
Repository
https://github.com/tuni56/real-time-event-driven-data-pipeline
Architecture designed to ingest and store multi-year IoT datasets while maintaining cost efficiency and long-term queryability.
- Storage lifecycle optimization
- Serverless ingestion architecture
- Queryable historical data storage
- Long-term data retention strategy
- Scalability
- Cost optimization
- Long-term data management
- AWS Serverless
- Kafka
- Data Lake Architecture
Repository
https://github.com/tuni56/iot-data-architecture-aws
Designed an automated cost monitoring data pipeline to ingest AWS Cost & Usage Reports and generate operational insights.
- Automated cost data ingestion
- Event-driven processing architecture
- Operational monitoring mindset
- Near real-time cost visibility
- AWS Lambda
- S3
- CloudWatch
- SNS
- Python
Repository
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-
Before focusing fully on data engineering, I worked with distributed backend systems, including:
- Java
- Spring Boot
- Microservices
- Messaging architectures
This background influences how I design data platforms that behave like production systems rather than isolated pipelines.
I am currently focused on:
- AWS-native Data Platform Architecture
- Event-driven Data Systems
- Infrastructure as Code for Data Platforms
- Observability-driven pipeline design
Actively pursuing Data Engineer / Data Platform Engineer roles in teams building modern data infrastructure.
Argentina (GMT-3)
Open to:
- Remote roles
- Relocation to the United States
US Citizen
LinkedIn
https://www.linkedin.com/in/rociobaigorria/
Email
rociomnbaigorria@gmail.com
Data systems are not just pipelines.
They are living distributed systems that must handle:
- failures
- scale changes
- operational pressure
- human operators
My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

