Senior Data Engineer | Real-Time & Batch Data Architect
- Passionate about building scalable data pipelines using Spark, Kafka, and modern data platforms.
- Experienced in both cloud-native architectures (AWS, Azure) and on-premise systems.
- Strong believer in the power of clean code, observability, and data quality.
- Design and develop ETL and ELT pipelines (batch + streaming).
- Implement data lakehouse architectures (Bronze / Silver / Gold stages).
- Build real-time processing systems using Spark Structured Streaming & Kafka.
- Automate workflows and scheduling with Airflow.
- Optimize analytics databases (e.g. ClickHouse) for fast query performance.
- Create CI/CD pipelines for data solutions (using GitHub Actions or similar).
| Domain | Technologies |
|---|---|
| Data Processing | PySpark, Spark SQL, Delta Lake |
| Streaming | Apache Kafka, Spark Structured Streaming |
| Workflow Orchestration | Apache Airflow |
| Data Storage | S3 / ADLS, Delta / Parquet |
| Analytics | ClickHouse, PostgreSQL |
| Cloud | AWS, Azure |
| CI / CD | GitHub Actions |
| Languages | Python, SQL |
Here are some of my key repositories (feel free to click and explore):
- [Real-Time Processing Pipeline] β A Kafka β Spark Streaming system with schema validation and data quality checks.
- [Lakehouse Architecture Demo] β Multi-layer (Bronze / Silver / Gold) data lakehouse built with Delta Lake.
- [Airflow Data Workflows] β End-to-end DAGs for ingestion, transformation, and orchestration.
- [Analytics in ClickHouse] β Setup for real-time analytics using ClickHouse materialized views.
- [CI/CD for Data Jobs] β GitHub Actions to test, build, and deploy data workloads.
- Email: omarmohammed271@gmail.com
- I love optimizing pipelines β every millisecond matters.
- Outside work: I enjoy reading about distributed systems and data infrastructure.
- Lifelong learner: currently exploring feature stores and ML data platforms.


