Shreyas Sachin Kulkarni

AI Systems Engineer | RAG, Multi-Agent Workflows, Data Platforms

(217) 866-6174 ssk16@illinois.edu linkedin.com/in/shreyaskulkarni29 portfolio-shreyas-kulkarni.netlify.app

Why I Build AI Systems

I build AI systems that have to work outside notebooks: agentic retrieval stacks, evaluators for SWE agents, cloud-native data platforms, and developer tooling with Docker-backed isolation. My strongest work lives where language models, runtime evidence, and production constraints meet.

I use AI-assisted coding tools such as Codex and Claude Code as force multipliers, while staying hands-on in architecture, implementation, debugging, and evaluation.

95%

retrieval accuracy on agentic RAG

50%

MRR lift from chunking and ranking tests

470K+

records processed in PySpark analytics pipelines

25K/min

CDC throughput in containerized banking data pipeline

What I Build

AI Systems

Agentic RAG, LangGraph orchestration, prompt design, vector search, embedding evaluation, and LLM-driven product features that need measurable behavior.

Data Platforms

PySpark pipelines, lakehouse patterns, Databricks, Snowflake, Kafka, Airflow, and cloud-native data services across AWS, Azure, and GCP.

Developer Tooling

Preflight merge validation, Docker sandboxes, eval harnesses, runtime evidence collection, and reproducible testing for AI-assisted engineering workflows.

Selected Stack

Python SQL LangGraph LangChain OpenAI API AWS Bedrock Vertex AI Docker Kubernetes PySpark Databricks Snowflake Kafka Airflow dbt MLflow PowerBI Tableau

Current North Star

Design AI workflows that are trustworthy under budget pressure: systems that can retrieve, reason, validate, and stop themselves when the evidence says a path is weak.

Experience Snapshot

Graduate training at UIUC, production AI engineering in industry, and independent systems work focused on evaluation, orchestration, and reliable execution.

Jul 2025 - Present

AI Engineering Intern | Data Science Research Services, UIUC

Architected agentic RAG with LangGraph decision trees and Qwen3-32B, reaching 95% retrieval accuracy.
Improved retrieval quality by 50% through A/B tests on chunking, hybrid search, and custom scoring.
Built async ETL and ingestion pipelines across Azure, AWS Bedrock, MongoDB, and PostgreSQL vector stores.
Created operational dashboards over 8,700+ events to translate system behavior into decisions.

Apr 2023 - Jun 2024

Associate Data Scientist | Apptware Pvt. Ltd.

Built PySpark and SQL pipelines over 470,000+ records and delivered decision-ready PowerBI dashboards.
Optimized low-latency inference pipelines on AWS Lambda, cutting STT latency by 80% for IVR workloads.
Developed Pix2Pix and YOLO pipelines across healthcare and edge-device use cases.
Deployed always-on data extraction and analytics services on AWS for 20+ enterprise clients.

Sep 2022 - Apr 2023

Data Science Intern | Apptware Pvt. Ltd.

Built GPT-4 plus SQL tool-calling workflows with Chainlit and LangChain.
Fine-tuned Falcon-7B and LLaMA with LoRA and QLoRA for applied NLP systems.
Used OCR, topic models, and few-shot classification to structure unstructured enterprise documents.

Education

M.S. Information Science, UIUC (3.96 GPA)
B.E. Computer Engineering, SPPU (3.86 GPA)

Focus: distributed systems, AI design, and cloud architecture.

Current Focus

Building AI systems that can evaluate themselves under constraints: runtime evidence, early stopping, trace quality, and budget-aware agent control.

Hands-On Engineering

I use Codex and Claude Code to move faster, but I still own the architecture, implementation, evaluation setup, and systems tradeoffs directly.

RAG eval Docker AWS/GCP/Azure Vector DBs PySpark Agentic systems A/B testing Kubernetes

Featured Project Architectures

A selection of systems where I combined AI components, data infrastructure, and explicit control over how information moves through a workflow.

Featured architecture

Multi-Agent Insurance Processing System

Designed a six-agent LangGraph workflow with OpenAI function calling and structured routing for insurance assistance.
Separated supervisor logic, domain tools, RAG access, and observability to keep behavior auditable and reproducible.
Used agent boundaries intentionally so the system could escalate, clarify, and constrain tool usage instead of hallucinating free-form behavior.

High-throughput data platform

Containerized Banking Data Architecture

Built a CDC pipeline from PostgreSQL through Debezium, Kafka, MinIO/S3, Snowflake, dbt, and Airflow.
Designed for continuous throughput of roughly 25,000 transactions and account updates per minute.
Added Docker-based orchestration and GitHub Actions to keep local development and pipeline automation aligned.

Current Systems Work

The projects below reflect the direction I want to keep pushing: reliable AI systems, better evaluation signals, and developer tools that act on evidence instead of hype.

Azure + lakehouse + CV

Automated Insurance Claim Audit Pipeline

Built an end-to-end Azure Databricks pipeline using Bronze -> Silver -> Gold layers under a constrained 4GB cluster budget.
Used PySpark, Hive Metastore, MLflow, and an Xception-based CV model that reached 94% accuracy.
Translated multimodal insurance inputs into business-ready decisions with explicit rule handling and model monitoring hooks.

Solo developer | Docker-backed validation

Preflight

Building a CLI-first merge validation system that provisions isolated sandboxes, deploys candidate code changes, runs targeted checks, collects runtime evidence, and emits structured merge recommendations. The goal is to block unsafe merges with localizable, evidence-backed findings rather than static-only guesswork.

preflight run --sandbox-backend docker
--enable-load-probe
--enable-failure-injection

Research implementation | official harness aware

TracePop / SWE-Agent Evaluation

Working on early termination policies for SWE-agent trajectories under bounded test-time budgets. Exploring cheap read-only prefixes, early-signal tests, LLM-as-judge signals, and non-LLM models using trajectory features such as file exploration breadth, reasoning depth, no-progress patterns, and ideation behavior.

Stage I: prefix culling -> stable pool ->
revision -> trace compression