Skip to content
View tuni56's full-sized avatar
🤓
Developing new things
🤓
Developing new things

Block or report tuni56

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tuni56/README.md

Rocío Baigorria

Data Engineer | SQL • Python • Kafka • AWS Data Platforms

I design and operate AWS-native data platforms that support analytics, real-time processing, and machine learning workloads. My work focuses on building reliable, observable, and cost-aware data systems that teams can realistically operate in production environments.

I specialize in building end-to-end data pipelines: from ingestion and event streaming to transformation, storage, and analytical datasets.

My systems are designed using cloud-native architecture, event-driven systems, and distributed data processing principles.

US Citizen — Open to Remote Roles and Relocation to the United States


About Me

I transitioned into Data Engineering from an engineering background by building complete production-style systems that include:

  • Data ingestion pipelines
  • Batch and streaming processing
  • Analytical data modeling
  • Observability and monitoring
  • Infrastructure automation

My focus is on building data systems that are resilient, scalable, and understandable for the teams operating them.

Core principle:

Data platforms must survive failures, scale predictably, and remain operable by real engineering teams.


Core Data Engineering Skills

Data Engineering

  • SQL for analytics and data transformation
  • Python for data pipelines and automation
  • Batch and Streaming Pipelines
  • Event-Driven Data Architectures
  • Data Modeling (Star Schema, Fact & Dimension tables)
  • Incremental Data Processing
  • ETL / ELT Pipeline Design

Distributed Systems & Streaming

  • Kafka Event Streaming
  • Schema evolution and event versioning
  • Reliable message ingestion patterns
  • Event-driven decoupled architectures

Cloud Data Platforms

  • AWS Data Architecture
  • Data Lakes and Data Warehouses
  • Serverless data pipelines
  • Data infrastructure automation

Infrastructure & DevOps

  • Terraform (Infrastructure as Code)
  • CloudFormation
  • IAM least-privilege architecture
  • Reproducible infrastructure environments

Observability & Reliability

  • Monitoring and metrics
  • Pipeline reliability strategies
  • Retry and failure handling
  • Cost-aware architecture decisions

Tools: Python SQL Kafka AWS (S3, Lambda, Redshift, Glue, Athena, DynamoDB) Terraform CloudFormation Grafana CloudWatch Redis


Selected Data Engineering Projects

Ecommerce Data Warehouse

AWS Redshift Serverless — Analytical Data Platform

Production-style data warehouse architecture designed to support analytics workloads and evolving datasets.

The system handles late-arriving events, incremental data ingestion, and analytical query optimization using star schema modeling.

Key Architecture Decisions

  • Star schema modeling for analytical performance
  • Incremental ingestion pipelines to minimize compute costs
  • Strategy for late-arriving transactional data
  • Infrastructure defined using Terraform

Architecture Highlights

  • AWS Redshift Serverless
  • Incremental batch pipelines
  • Fact and dimension table modeling
  • Reproducible infrastructure with IaC

Tech Stack

  • AWS Redshift
  • S3
  • Terraform
  • SQL
  • Data Modeling

Repository
https://github.com/tuni56/ecommerce-data-warehouse-redshift


Serverless Data Lake Platform

AWS-native Analytics Data Lake

Designed a serverless data lake architecture separating raw and curated datasets for scalable analytics workloads.

The system uses columnar storage and automated metadata discovery to enable efficient querying.

Key Architecture Decisions

  • Serverless-first architecture to eliminate idle compute
  • Columnar data storage (Parquet)
  • Automated schema discovery with Glue Crawlers
  • Metadata catalog for discoverability

Architecture Highlights

  • Multi-layer S3 data lake
  • AWS Glue catalog and ETL
  • Athena for serverless querying
  • Infrastructure automation

Tech Stack

  • AWS S3
  • AWS Glue
  • Athena
  • Python
  • CloudFormation

Repository
https://github.com/tuni56/serverless-aws-data-lake-with-kiro


Real-Time Event-Driven Data Pipeline

Kafka Streaming Architecture

Real-time event ingestion and processing pipeline designed to handle high-velocity event streams while maintaining reliability and observability.

The architecture demonstrates event-driven data ingestion patterns used in distributed systems and modern data platforms.

Key Architecture Decisions

  • Event-driven decoupling using Kafka
  • Schema evolution using Schema Registry
  • Monitoring-first system design
  • Consumer reliability strategies

Architecture Highlights

  • Streaming ingestion pipeline
  • Event routing and processing
  • Operational monitoring dashboards
  • Resilient message processing

Tech Stack

  • Kafka
  • Python
  • Redis
  • Grafana
  • Terraform

Repository
https://github.com/tuni56/real-time-event-driven-data-pipeline


IoT Data Architecture on AWS

Scalable Sensor Data Ingestion

Architecture designed to ingest and store multi-year IoT datasets while maintaining cost efficiency and long-term queryability.

Key Architecture Decisions

  • Storage lifecycle optimization
  • Serverless ingestion architecture
  • Queryable historical data storage
  • Long-term data retention strategy

Focus Areas

  • Scalability
  • Cost optimization
  • Long-term data management

Tech Stack

  • AWS Serverless
  • Kafka
  • Data Lake Architecture

Repository
https://github.com/tuni56/iot-data-architecture-aws


AWS Serverless Cost Dashboard

Operational Data Pipeline for Cloud Cost Monitoring

Designed an automated cost monitoring data pipeline to ingest AWS Cost & Usage Reports and generate operational insights.

Highlights

  • Automated cost data ingestion
  • Event-driven processing architecture
  • Operational monitoring mindset
  • Near real-time cost visibility

Tech Stack

  • AWS Lambda
  • S3
  • CloudWatch
  • SNS
  • Python

Repository
https://github.com/tuni56/AWS-Cost-Dashboard-Serverless-


Engineering Background

Before focusing fully on data engineering, I worked with distributed backend systems, including:

  • Java
  • Spring Boot
  • Microservices
  • Messaging architectures

This background influences how I design data platforms that behave like production systems rather than isolated pipelines.


Current Focus

I am currently focused on:

  • AWS-native Data Platform Architecture
  • Event-driven Data Systems
  • Infrastructure as Code for Data Platforms
  • Observability-driven pipeline design

Actively pursuing Data Engineer / Data Platform Engineer roles in teams building modern data infrastructure.


Location

Argentina (GMT-3)

Open to:

  • Remote roles
  • Relocation to the United States

US Citizen


Contact

LinkedIn
https://www.linkedin.com/in/rociobaigorria/

Email
rociomnbaigorria@gmail.com


Engineering Philosophy

Data systems are not just pipelines.

They are living distributed systems that must handle:

  • failures
  • scale changes
  • operational pressure
  • human operators

My goal is to build data platforms where information flows reliably and teams can make decisions with confidence.

GitHub Space Invaders


Pinned Loading

  1. ecommerce-streaming-data-platform ecommerce-streaming-data-platform Public

    Real-time ecommerce streaming data platform using Kafka, AWS Route 53 routing, event-driven architecture, and observability with Grafana.

    Python

  2. serverless-aws-data-lake-with-kiro serverless-aws-data-lake-with-kiro Public

    Cost-optimized serverless AWS data lake using S3, Glue, Athena, CloudFormation, and Kiro. Raw/curated architecture, Parquet, automated crawlers, and zero-idle compute.

    Python 2

  3. iot-data-architecture-aws iot-data-architecture-aws Public

    Cost-effective AWS architecture for ingesting, storing, and querying 5 years of IoT sensor data using a serverless data lake approach.

    1 1

  4. AWS-Cost-Dashboard-Serverless- AWS-Cost-Dashboard-Serverless- Public

    AWS Cost Dashboard Serverless

    Python

  5. real-time-event-driven-data-pipeline real-time-event-driven-data-pipeline Public

    Real-time event streaming pipeline with Kafka, Schema Registry, Kafka Streams, and production monitoring. Demonstrates advanced data engineering patterns at scale.

    Java

  6. datalake-analytics-pipeline datalake-analytics-pipeline Public

    Python