Hi, I'm Saransh Kumar.

Data Scientist with 4+ years' experience in ML, analytics, and end‑to‑end software. I build AI‑integrated products that ship.

Craft. Iterate. Ship. ✦

About

A quick snapshot of who I am.

I'm currently pursuing a Master's in Data Science at the University of Maryland, College Park, specializing in machine learning and big data systems. With 3+ years of professional experience across ad tech, recommenders, and computer vision, I focus on building scalable, end‑to‑end data products that solve real‑world problems.

Experience

Impact snapshots from roles where I shipped, learned, and led.

Apr 2022 – Aug 2024

Data Scientist • Samsung Research India

Shipped PySpark/Airflow ETL over 500+ TB SmartTV data (−30% latency), cut cloud costs by $200K/mo via serverless Spark (Glue) + EKS migration, and boosted ad targeting precision by 20% for 10M+ users with real‑time inference.

PySpark Airflow AWS Kubernetes
Jul 2021 – Apr 2022

Junior Data Scientist • Wipro

Built hybrid recommenders for B2B e‑commerce (+15% engagement) and refactored evaluation to resolve 1000+ issues, cutting test runtime by 40% and enabling weekly CI releases.

Python ML Recommendation Systems CI/CD
Aug 2020 – May 2021

Research Data Science Intern • NIT Silchar

Developed ALPR for non‑standard plates; 94% accuracy with MobileNet variant and 4–7% gain under distortion/low‑light. Ran ablations (up to −12% without key modules); authored reproducible docs and presented in 4 seminars.

Computer Vision MobileNet Research OpenCV

Projects

A few things I've been designing, engineering, and polishing lately.

LangChainNeo4jStreamlit

Real-time Bitcoin Analytics (LangChain & Neo4j)

Built a real-time graph analytics pipeline modeling 100K+ wallet relationships with NL Cypher generation via LangChain + Mistral; interactive Streamlit dashboard for insights.

View on GitHub →
SAC (RL)FastAPIForecasting

ClimaSense — Smart HVAC Control

Optimized HVAC energy usage with Soft Actor-Critic in a custom simulated environment; deployed policy via FastAPI using 8-hour weather forecasts.

View on GitHub →
CNNVGG16ODIR-5K

Ocular Disease Recognition

Classified retinal diseases from fundus images using a VGG16-based CNN with tabular features; validated salience via statistical tests.

View on GitHub →
GNBscikit-learnPython

Purchase Propensity Model

Predicted user purchase likelihood from behavioral features using Gaussian Naive Bayes with pipeline for preprocessing and evaluation.

View on GitHub →
K-MeansPCAscikit-learn

Customer Segmentation

Clustered customer personas from marketing data using dimensionality reduction and K-Means to inform targeting and personalization.

View on GitHub →
Dlib HOGOpenCVMetric Learning

Face Recognition System

Real-time face detection and recognition using Dlib’s HOG detector and deep metric embeddings; built CLI + camera pipeline.

View on GitHub →
RegressionXGBoostscikit-learn

House Price Prediction

Modeled California house prices with feature engineering, cross-validation, and hyperparameter tuning of tree-based regressors.

View on GitHub →

Skills

A concise stack I love working with.

Data Science & Machine Learning

NumPy Pandas Scikit-learn TensorFlow PyTorch Keras OpenCV XGBoost NLP LLMs Transfer Learning Hyperparameter Tuning Regression Classification Clustering Time Series Computer Vision

Big Data & Cloud

PySpark Kafka Hadoop AWS GCP HDFS Airflow Kubernetes Snowflake ETL

Data Visualization

Matplotlib Seaborn Plotly Streamlit Tableau Power BI Excel

Web & Tools

FastAPI Flask REST APIs GraphQL React Vite Tailwind Docker Git GitHub Actions CI/CD Jupyter Notebook VS Code Linux

Languages

Python SQL Java C++ JavaScript HTML CSS Bash

Databases

MySQL PostgreSQL MongoDB Neo4j DynamoDB

Contact

Open to full‑time roles and collaborations.