Eng_Omar_Mohamed omarmohammed271

👋 Hi, I'm Omar Mohammed

Senior Data Engineer | Real-Time & Batch Data Architect

Passionate about building scalable data pipelines using Spark, Kafka, and modern data platforms.
Experienced in both cloud-native architectures (AWS, Azure) and on-premise systems.
Strong believer in the power of clean code, observability, and data quality.

Design and develop ETL and ELT pipelines (batch + streaming).
Implement data lakehouse architectures (Bronze / Silver / Gold stages).
Build real-time processing systems using Spark Structured Streaming & Kafka.
Automate workflows and scheduling with Airflow.
Optimize analytics databases (e.g. ClickHouse) for fast query performance.
Create CI/CD pipelines for data solutions (using GitHub Actions or similar).

Domain	Technologies
Data Processing	PySpark, Spark SQL, Delta Lake
Streaming	Apache Kafka, Spark Structured Streaming
Workflow Orchestration	Apache Airflow
Data Storage	S3 / ADLS, Delta / Parquet
Analytics	ClickHouse, PostgreSQL
Cloud	AWS, Azure
CI / CD	GitHub Actions
Languages	Python, SQL

Here are some of my key repositories (feel free to click and explore):

[Real-Time Processing Pipeline] — A Kafka → Spark Streaming system with schema validation and data quality checks.
[Lakehouse Architecture Demo] — Multi-layer (Bronze / Silver / Gold) data lakehouse built with Delta Lake.
[Airflow Data Workflows] — End-to-end DAGs for ingestion, transformation, and orchestration.
[Analytics in ClickHouse] — Setup for real-time analytics using ClickHouse materialized views.
[CI/CD for Data Jobs] — GitHub Actions to test, build, and deploy data workloads.

I love optimizing pipelines — every millisecond matters.
Outside work: I enjoy reading about distributed systems and data infrastructure.
Lifelong learner: currently exploring feature stores and ML data platforms.