Samiksha Khare

Hi, I'm

Samiksha Khare

|

MS Information Systems @ Santa Clara University  ·  Sunnyvale, CA
7+ years of engineering experience  ·  Dean's List & Merit Scholar

Scroll

About Me

I'm a data-driven technologist with 7+ years of experience in the software industry and a recent Master's graduate in Information Systems from Santa Clara University. I bring a strong technical foundation and a growing focus on Data Analytics and Data Engineering, with an interest in building data-focused solutions and delivering meaningful insights.

I bring hands-on experience with Python, SQL, Spark, Kafka, Snowflake, BigQuery, AWS, GCP, Tableau, and Looker Studio, along with project work in data ETL pipelines, analytics, dashboards, and machine learning. I enjoy transforming raw data into clear, actionable insights that support smarter business decisions.

I'm also excited about the growing impact of Generative AI in data workflows, especially in areas like intelligent automation, NLP, and AI-assisted analytics. I'm motivated by roles where I can combine analytical thinking, engineering skills, and curiosity to create real business value.

Sunnyvale, CA Open to Work Merit Scholar
7+ Years of Engineering Experience
19M+ Flight Records Analyzed
12+ Data & ML Projects
Dean's List Honoree

Experience

Data Engineer Intern

Dassault Systèmes · Santa Clara, CA
June 2025 – Present
  • Built SQL-based data extraction and transformation queries on ERP systems to support manufacturing and supply chain reporting pipelines, reducing report preparation time by 30% and improving data accuracy for stakeholders through Tableau dashboards
  • Validated data pipelines by identifying schema inconsistencies, null values, and duplicate records across distributed data sources, improving overall data accuracy and downstream analytical reliability
SQL ERP Systems Tableau Data Pipelines Agile

Data Engineer Intern

X, The Moonshot Factory (Alphabet Inc.) · Mountain View, CA
Jan – June 2025
  • Built a scalable GCP ETL pipeline using Apache Spark to ingest and process 50M+ multi-source records (APIs, CSV, NetCDF) into BigQuery, reducing manual data preparation by 40% and improving analytics readiness for real-time stakeholder consumption
  • Developed Looker Studio dashboards to surface key ocean data KPIs via SQL OLAP queries, accelerating stakeholder analysis by 30% and facilitating data-driven decisions for sustainability initiatives
GCP Apache Spark BigQuery Python Looker Studio ETL

Research Assistant

Santa Clara University — Prof. Manoochehr Ghiassi
2024
  • Conducted sentiment-driven predictive analytics research on the electric vehicle (EV) market
  • Automated EV market news collection using Selenium; engineered a 68-feature NLP dataset
  • Trained an SVM baseline model achieving ~72% accuracy on sentiment classification
  • Contributed to academic research pipeline spanning data collection, feature engineering, and model evaluation
NLP scikit-learn SVM Selenium Python Feature Engineering

Member of Technical Staff — Data Engineer

TIBCO Software · Pune, India
Sept 2019 – Sept 2022
  • Built high-throughput data pipelines for enterprise ModelOps workflows using medallion architecture on Kafka and JDBC channels, processing millions of records with data quality checks and transformations, reducing manual data preparation by 40%
  • Validated data flow, schema consistency, and integrity across Kafka, REST, JDBC, and file-based ingestion channels, ensuring end-to-end reliability of distributed data pipelines at production scale
  • Conducted installation on AWS Elastic Kubernetes Service to deploy scalable ModelOps microservices
Apache Kafka AWS EKS Python JDBC Medallion Architecture Data Pipelines

Data Analyst

Cognizant Technology Solutions · Pune, India
May 2015 – Sept 2019
  • Designed Power BI dashboards to visualize business KPIs and operational metrics, enabling stakeholders to track trends and make data-driven decisions across enterprise reporting workflows
  • Authored and optimized SQL queries to extract, cleanse, and aggregate large datasets from relational databases, supporting recurring business reporting and ad hoc analysis for cross-functional teams
Power BI SQL Data Analytics Business Reporting Relational Databases

Projects

Live web apps, data engineering pipelines, analytics dashboards, and machine learning systems

Flight Data Analysis & BI

Processed 19M+ Parquet flight records from Amazon S3 using PySpark and SparkSQL. Built a star-schema data warehouse in BigQuery with OLAP queries and automated incremental ETL updates. Interactive Tableau dashboards visualize routes, seasonality, delays, and KPIs for executive, operational, and marketing reporting.

19M+ records from Amazon S3 Star schema DWH · OLAP analysis
Apache Spark PySpark BigQuery Tableau Amazon S3 SQL

Real-Time Movie Recommendation System

Full-stack web app mapping global historical events to movie recommendations. Uses TF-IDF & CountVectorizer for content-based ML recommendations, Kafka for real-time streaming, Airflow for weekly TMDB data refreshes, MySQL for persistence, and a Node.js/EJS UI. Integrated an Ollama-powered AI chatbot (Mistral LLM) with SSE streaming for natural-language movie search, event timeline exploration, and conversational event creation. Fully containerized with Docker.

TF-IDF content-based ML · top-5 recommendations Kafka streaming · Airflow orchestration Ollama AI chatbot · Mistral LLM · SSE streaming
scikit-learn Kafka Airflow MySQL Node.js Docker Ollama / Mistral LLM Redis

Real-Time Crypto Streaming Pipeline

End-to-end real-time streaming pipeline using a Producer-Consumer architecture. Fetches live crypto prices from CoinCap API, publishes to Apache Kafka topics, persists to MongoDB, and visualizes trends in a Streamlit dashboard. Fully containerized with Docker Compose for one-command deployment.

Real-time streaming · CoinCap API Docker Compose one-command setup
Apache Kafka MongoDB Streamlit Docker Python

IPL Data Pipeline (Spark)

Spark-based data pipeline for analyzing Indian Premier League cricket statistics on Databricks. Ingests data from Amazon S3, applies Spark transformation logic, executes SQL-based analytics, and generates visualizations from multi-season IPL data.

Spark ETL on Databricks S3 data lake ingestion
Apache Spark Databricks Amazon S3 SparkSQL Python

EV Market Sentiment Analysis (Research)

Academic research project under Prof. Manoochehr Ghiassi at SCU. Automated EV market news collection using Selenium, engineered a 68-feature NLP dataset, and trained an SVM model achieving ~72% accuracy on sentiment classification to predict EV market trends.

68-feature engineered NLP dataset SVM classifier · ~72% accuracy
NLP SVM scikit-learn Selenium Python Feature Engineering

Neural Network — Digit Classification

Built and trained a neural network to classify handwritten digits on the MNIST dataset (70,000 samples). Explored network architecture, activation functions, and hyperparameter tuning to maximize classification accuracy.

MNIST · 70,000 samples Architecture & hyperparameter tuning
Neural Networks Python scikit-learn NumPy

Customer Churn Analysis

Interactive Tableau dashboard analyzing customer churn metrics. Identifies high-risk customer segments, key churn drivers, and retention opportunities through visual analytics and data storytelling for business stakeholders.

Interactive churn dashboard Segment-level risk analysis
Tableau Data Analytics Business Intelligence

Sales Insights Dashboard

End-to-end sales analytics dashboard in Tableau. Surfaces revenue trends, top-performing products, regional breakdowns, and period-over-period KPIs for executive and business analyst audiences.

Revenue & KPI trends Regional performance breakdown
Tableau SQL Data Visualization

Sentiment Analysis — Employee Satisfaction

NLP-powered sentiment analysis on employee feedback to measure satisfaction and identify workplace sentiment patterns. Classifies text into positive, negative, and neutral categories to support data-driven HR decisions.

NLP text classification HR analytics use case
NLP Python NLTK Pandas

Mushroom Classification — Decision Trees

Binary classification model using decision trees to distinguish edible from poisonous mushrooms. Explores feature importance, tree depth tuning, and model evaluation for interpretable, explainable ML.

Decision tree classification Feature importance analysis
scikit-learn Python Decision Trees Pandas

World Billionaires Statistics

EDA on global billionaire wealth distribution. Analyzes geographic concentration, industry breakdown, net worth trends, and demographic patterns across the world's wealthiest individuals using Python data visualization.

Global wealth & geographic patterns Industry & demographic analysis
Python Pandas Matplotlib Seaborn EDA

Skills

Languages

Python SQL JavaScript TypeScript Java HTML / CSS Node.js

Data Engineering

Apache Spark Apache Kafka Apache Airflow Apache Iceberg dbt ETL Pipelines BigQuery Amazon S3 Databricks Hadoop

Cloud & DevOps

Google Cloud Platform AWS (S3, EKS) Azure Blob Storage Snowflake Snowflake Dynamic Tables Docker Kubernetes Jenkins Git / GitHub

ML / AI

scikit-learn TensorFlow Neural Networks NLP / TF-IDF SVM Prophet Decision Trees NLTK

Generative AI

LLMs Ollama / Mistral LLM Prompt Engineering RAG (Retrieval-Augmented Generation) AI Agents Snowflake Cortex AI LLM Pipelines Agentic Workflows

Methodologies

Agile / Scrum SDLC CI/CD Data Warehousing Medallion Architecture Data Modeling RBAC Microservices Jira Confluence

Analytics & BI

Tableau Power BI Looker Studio Streamlit Matplotlib Seaborn OLAP / Star Schema Pandas NumPy

Databases

MySQL MongoDB PostgreSQL BigQuery Redis

QA & Automation

Selenium Cypress Ranorex TestNG TestRail ReadyAPI HP ALM

Education

M.S. Information Systems

Santa Clara University — Leavey School of Business
Expected Dec 2025 Dean's List 2024 & 2025 Merit Scholarship

Coursework: Data Analytics - Python · Object-Oriented Programming · Database Management System · Big Data Modeling & Analytics · Business Intelligence & Data Warehouse · Natural Language Processing · Dashboards with Tableau · Applied Cloud Computing

Research Assistant — Prof. Ghiassi SCU AI Collaborate Club SCU Women in Tech SCU Spotlight Feature

Bachelor of Engineering, Electronics & Communication

R.G.P.V University, India
July 2010 – June 2014

Dean's List

Spring 2024 & Spring 2025

Merit Scholarship

Santa Clara University

SCU Spotlight

Featured in SCU Leavey School YouTube spotlight ↗

Certifications

Industry Engagement

Databricks AI Summit 2025 — San Francisco
Snowflake Summit 2025 — San Francisco
Snowflake Dev Day 2025 & 2024 — San Francisco
Snowflake Data for Breakfast SF — Cortex AI & AI Agents
SCU AI Collaborate Club — Active Member
SCU Women in Tech — Active Member

Get In Touch

I'm actively seeking Data Analytics and Data Engineering roles. If you have an opportunity or just want to connect, I'd love to hear from you!