Real-time Bitcoin Analytics (LangChain & Neo4j)
Built a real-time graph analytics pipeline modeling 100K+ wallet relationships with NL Cypher generation via LangChain + Mistral; interactive Streamlit dashboard for insights.
View on GitHub →Data Scientist with 4+ years' experience in ML, analytics, and end‑to‑end software. I build AI‑integrated products that ship.
Craft. Iterate. Ship. ✦A quick snapshot of who I am.
I'm currently pursuing a Master's in Data Science at the University of Maryland, College Park, specializing in machine learning and big data systems. With 3+ years of professional experience across ad tech, recommenders, and computer vision, I focus on building scalable, end‑to‑end data products that solve real‑world problems.
Impact snapshots from roles where I shipped, learned, and led.
Shipped PySpark/Airflow ETL over 500+ TB SmartTV data (−30% latency), cut cloud costs by $200K/mo via serverless Spark (Glue) + EKS migration, and boosted ad targeting precision by 20% for 10M+ users with real‑time inference.
Built hybrid recommenders for B2B e‑commerce (+15% engagement) and refactored evaluation to resolve 1000+ issues, cutting test runtime by 40% and enabling weekly CI releases.
Developed ALPR for non‑standard plates; 94% accuracy with MobileNet variant and 4–7% gain under distortion/low‑light. Ran ablations (up to −12% without key modules); authored reproducible docs and presented in 4 seminars.
A few things I've been designing, engineering, and polishing lately.
Built a real-time graph analytics pipeline modeling 100K+ wallet relationships with NL Cypher generation via LangChain + Mistral; interactive Streamlit dashboard for insights.
View on GitHub →Optimized HVAC energy usage with Soft Actor-Critic in a custom simulated environment; deployed policy via FastAPI using 8-hour weather forecasts.
View on GitHub →Classified retinal diseases from fundus images using a VGG16-based CNN with tabular features; validated salience via statistical tests.
View on GitHub →Predicted user purchase likelihood from behavioral features using Gaussian Naive Bayes with pipeline for preprocessing and evaluation.
View on GitHub →Clustered customer personas from marketing data using dimensionality reduction and K-Means to inform targeting and personalization.
View on GitHub →Real-time face detection and recognition using Dlib’s HOG detector and deep metric embeddings; built CLI + camera pipeline.
View on GitHub →Modeled California house prices with feature engineering, cross-validation, and hyperparameter tuning of tree-based regressors.
View on GitHub →A concise stack I love working with.