Rocio.data tuni56

Rocío Baigorria

Data Engineer | SQL • Python • Kafka • AWS Data Platforms

I design and operate AWS-native data platforms that support analytics, real-time processing, and machine learning workloads. My work focuses on building reliable, observable, and cost-aware data systems that teams can realistically operate in production environments.

I specialize in building end-to-end data pipelines: from ingestion and event streaming to transformation, storage, and analytical datasets.

My systems are designed using cloud-native architecture, event-driven systems, and distributed data processing principles.

US Citizen — Open to Remote Roles and Relocation to the United States

About Me

I transitioned into Data Engineering from an engineering background by building complete production-style systems that include:

Data ingestion pipelines
Batch and streaming processing
Analytical data modeling
Observability and monitoring
Infrastructure automation

My focus is on building data systems that are resilient, scalable, and understandable for the teams operating them.

Core principle:

Data platforms must survive failures, scale predictably, and remain operable by real engineering teams.

Core Data Engineering Skills

Data Engineering

SQL for analytics and data transformation
Python for data pipelines and automation
Batch and Streaming Pipelines
Event-Driven Data Architectures
Data Modeling (Star Schema, Fact & Dimension tables)
Incremental Data Processing
ETL / ELT Pipeline Design

Distributed Systems & Streaming

Kafka Event Streaming
Schema evolution and event versioning
Reliable message ingestion patterns
Event-driven decoupled architectures

Cloud Data Platforms

AWS Data Architecture
Data Lakes and Data Warehouses
Serverless data pipelines
Data infrastructure automation

Infrastructure & DevOps

Terraform (Infrastructure as Code)
CloudFormation
IAM least-privilege architecture
Reproducible infrastructure environments

Observability & Reliability

Monitoring and metrics
Pipeline reliability strategies
Retry and failure handling
Cost-aware architecture decisions

Tools: Python SQL Kafka AWS (S3, Lambda, Redshift, Glue, Athena, DynamoDB) Terraform CloudFormation Grafana CloudWatch Redis

Selected Data Engineering Projects

Ecommerce Data Warehouse

AWS Redshift Serverless — Analytical Data Platform

Production-style data warehouse architecture designed to support analytics workloads and evolving datasets.

The system handles late-arriving events, incremental data ingestion, and analytical query optimization using star schema modeling.

Key Architecture Decisions

Star schema modeling for analytical performance
Incremental ingestion pipelines to minimize compute costs
Strategy for late-arriving transactional data
Infrastructure defined using Terraform

Architecture Highlights

AWS Redshift Serverless
Incremental batch pipelines
Fact and dimension table modeling
Reproducible infrastructure with IaC

Tech Stack

AWS Redshift
S3
Terraform
SQL
Data Modeling

Repository
https://github.com/tuni56/ecommerce-data-warehouse-redshift

Serverless Data Lake Platform

AWS-native Analytics Data Lake

Designed a serverless data lake architecture separating raw and curated datasets for scalable analytics workloads.

The system uses columnar storage and automated metadata discovery to enable efficient querying.

Key Architecture Decisions

Serverless-first architecture to eliminate idle compute
Columnar data storage (Parquet)
Automated schema discovery with Glue Crawlers
Metadata catalog for discoverability

Architecture Highlights

Multi-layer S3 data lake
AWS Glue catalog and ETL
Athena for serverless querying
Infrastructure automation

Tech Stack

AWS S3
AWS Glue
Athena
Python
CloudFormation

Repository
https://github.com/tuni56/serverless-aws-data-lake-with-kiro

Real-Time Event-Driven Data Pipeline

Kafka Streaming Architecture

Real-time event ingestion and processing pipeline designed to handle high-velocity event streams while maintaining reliability and observability.

The architecture demonstrates event-driven data ingestion patterns used in distributed systems and modern data platforms.

Key Architecture Decisions

Event-driven decoupling using Kafka
Schema evolution using Schema Registry
Monitoring-first system design
Consumer reliability strategies

Architecture Highlights

Streaming ingestion pipeline
Event routing and processing
Operational monitoring dashboards
Resilient message processing

Tech Stack

Kafka
Python
Redis
Grafana
Terraform

Repository
https://github.com/tuni56/real-time-event-driven-data-pipeline

IoT Data Architecture on AWS

Scalable Sensor Data Ingestion

Architecture designed to ingest and store multi-year IoT datasets while maintaining cost efficiency and long-term queryability.

Key Architecture Decisions

Storage lifecycle optimization
Serverless ingestion architecture
Queryable historical data storage
Long-term data retention strategy

Focus Areas

Scalability
Cost optimization
Long-term data management

Tech Stack

AWS Serverless
Kafka
Data Lake Architecture

Repository
https://github.com/tuni56/iot-data-architecture-aws

AWS Serverless Cost Dashboard

Operational Data Pipeline for Cloud Cost Monitoring

Designed an automated cost monitoring data pipeline to ingest AWS Cost & Usage Reports and generate operational insights.

Highlights

Automated cost data ingestion
Event-driven processing architecture
Operational monitoring mindset
Near real-time cost visibility

Tech Stack

AWS Lambda
S3
CloudWatch
SNS
Python

Repository
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-

Engineering Background

Before focusing fully on data engineering, I worked with distributed backend systems, including:

Java
Spring Boot
Microservices
Messaging architectures

This background influences how I design data platforms that behave like production systems rather than isolated pipelines.

Current Focus

I am currently focused on:

AWS-native Data Platform Architecture
Event-driven Data Systems
Infrastructure as Code for Data Platforms
Observability-driven pipeline design

Actively pursuing Data Engineer / Data Platform Engineer roles in teams building modern data infrastructure.

Location

Argentina (GMT-3)

Open to:

Remote roles
Relocation to the United States

US Citizen

Contact

LinkedIn
https://www.linkedin.com/in/rociobaigorria/

Email
rociomnbaigorria@gmail.com

Engineering Philosophy

Data systems are not just pipelines.

They are living distributed systems that must handle:

failures
scale changes
operational pressure
human operators

My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

Rocio.data tuni56

Achievements

Achievements

Rocío Baigorria

About Me

Core Data Engineering Skills

Data Engineering

Distributed Systems & Streaming

Cloud Data Platforms

Infrastructure & DevOps

Observability & Reliability

Selected Data Engineering Projects

Ecommerce Data Warehouse

AWS Redshift Serverless — Analytical Data Platform

Key Architecture Decisions

Architecture Highlights

Tech Stack

Serverless Data Lake Platform

AWS-native Analytics Data Lake

Key Architecture Decisions

Architecture Highlights

Tech Stack

Real-Time Event-Driven Data Pipeline

Kafka Streaming Architecture

Key Architecture Decisions

Architecture Highlights

Tech Stack

IoT Data Architecture on AWS

Scalable Sensor Data Ingestion

Key Architecture Decisions

Focus Areas

Tech Stack

AWS Serverless Cost Dashboard

Operational Data Pipeline for Cloud Cost Monitoring

Highlights

Tech Stack

Engineering Background

Current Focus

Location

Contact

Engineering Philosophy

Pinned Loading

Uh oh!