Hi, I'm
MS Information Systems @ Santa Clara University · Sunnyvale, CA
7+ years of engineering experience · Dean's List & Merit Scholar
I'm a data-driven technologist with 7+ years of experience in the software industry and a recent Master's graduate in Information Systems from Santa Clara University. I bring a strong technical foundation and a growing focus on Data Analytics and Data Engineering, with an interest in building data-focused solutions and delivering meaningful insights.
I bring hands-on experience with Python, SQL, Spark, Kafka, Snowflake, BigQuery, AWS, GCP, Tableau, and Looker Studio, along with project work in data ETL pipelines, analytics, dashboards, and machine learning. I enjoy transforming raw data into clear, actionable insights that support smarter business decisions.
I'm also excited about the growing impact of Generative AI in data workflows, especially in areas like intelligent automation, NLP, and AI-assisted analytics. I'm motivated by roles where I can combine analytical thinking, engineering skills, and curiosity to create real business value.
Live web apps, data engineering pipelines, analytics dashboards, and machine learning systems
Production-grade Snowflake + Apache Iceberg data pipeline using medallion architecture on Azure — automating daily ingestion of 10K+ logistics records with dbt-powered transformations, Streamlit dashboards, and Cortex AI analytics.
Processed 19M+ Parquet flight records from Amazon S3 using PySpark and SparkSQL. Built a star-schema data warehouse in BigQuery with OLAP queries and automated incremental ETL updates. Interactive Tableau dashboards visualize routes, seasonality, delays, and KPIs for executive, operational, and marketing reporting.
Full-stack web app mapping global historical events to movie recommendations. Uses TF-IDF & CountVectorizer for content-based ML recommendations, Kafka for real-time streaming, Airflow for weekly TMDB data refreshes, MySQL for persistence, and a Node.js/EJS UI. Integrated an Ollama-powered AI chatbot (Mistral LLM) with SSE streaming for natural-language movie search, event timeline exploration, and conversational event creation. Fully containerized with Docker.
End-to-end real-time streaming pipeline using a Producer-Consumer architecture. Fetches live crypto prices from CoinCap API, publishes to Apache Kafka topics, persists to MongoDB, and visualizes trends in a Streamlit dashboard. Fully containerized with Docker Compose for one-command deployment.
Spark-based data pipeline for analyzing Indian Premier League cricket statistics on Databricks. Ingests data from Amazon S3, applies Spark transformation logic, executes SQL-based analytics, and generates visualizations from multi-season IPL data.
Academic research project under Prof. Manoochehr Ghiassi at SCU. Automated EV market news collection using Selenium, engineered a 68-feature NLP dataset, and trained an SVM model achieving ~72% accuracy on sentiment classification to predict EV market trends.
Built and trained a neural network to classify handwritten digits on the MNIST dataset (70,000 samples). Explored network architecture, activation functions, and hyperparameter tuning to maximize classification accuracy.
Interactive Tableau dashboard analyzing customer churn metrics. Identifies high-risk customer segments, key churn drivers, and retention opportunities through visual analytics and data storytelling for business stakeholders.
End-to-end sales analytics dashboard in Tableau. Surfaces revenue trends, top-performing products, regional breakdowns, and period-over-period KPIs for executive and business analyst audiences.
NLP-powered sentiment analysis on employee feedback to measure satisfaction and identify workplace sentiment patterns. Classifies text into positive, negative, and neutral categories to support data-driven HR decisions.
Binary classification model using decision trees to distinguish edible from poisonous mushrooms. Explores feature importance, tree depth tuning, and model evaluation for interpretable, explainable ML.
EDA on global billionaire wealth distribution. Analyzes geographic concentration, industry breakdown, net worth trends, and demographic patterns across the world's wealthiest individuals using Python data visualization.
Coursework: Data Analytics - Python · Object-Oriented Programming · Database Management System · Big Data Modeling & Analytics · Business Intelligence & Data Warehouse · Natural Language Processing · Dashboards with Tableau · Applied Cloud Computing
I'm actively seeking Data Analytics and Data Engineering roles. If you have an opportunity or just want to connect, I'd love to hear from you!