John He

AI Engineer & Data Scientist

Specializing in LLM Agents, RAG Systems, and Scalable ETL Pipelines.

MS in Machine Learning and Data Science at Northwestern University

John He

Featured Projects

More Projects

Work Experience

Sep 2025 - Dec 2025

Data Science Contractor (Capstone)

Videspan | Evanston, IL

Automating user queries via multi-tool calling and RAG.

  • Deployed an agentic chatbot using a containerized, multi-service architecture (Docker Compose, FastAPI).
  • Engineered the agent's core conversational logic with stateful context management for MCP tool calling and elicitation.
  • Developed the system's multimodal RAG knowledge base with Qwen3 and LangGraph.
Sep 2025 - Nov 2025

Data Science Intern (Biostatistics)

Monopar Therapeutics | Wilmette, IL

Established significant health indicators using Cox PH models and survival analysis.

  • Constructed predictive models from R&D datasets to investigate biomarker relationships.
  • Performed survival analysis on clinical time-to-event data utilizing Generalized Linear Mixed Models and Survival Analyses.
Jun 2025 - Aug 2025

Data Science Intern (LLM)

Alexion, AstraZeneca Rare Disease | Wilmington, DE

Reduced manual review time by 90% (30 min → 3 min) using LangChain & HPC.

  • Architected an LLM pipeline using Langchain to identify relevant research articles and extract 20+ variables.
  • Engineered a scalable, end-to-end data pipeline on a Slurm-managed HPC cluster to ingest 300+ scientific articles.
  • Resolved a failing few-shot classifier by visualizing embeddings with t-SNE, diagnosing labeling drift and guiding a new SOP for 10+ categories.
Sep 2024 - Jun 2025

Data Science Contractor (Industry Practicum)

Azul 3D | Skokie, IL

Cut experimental sample size by 33% via stratified sampling.

  • Led a 3-person team to analyze 50+ material compositions, building a predictive framework to model reaction speeds.
  • Devised a cost-effective experimental design methodology using stratified sampling.
Sep 2021 - Mar 2025

Data Coordinator (Data Engineer)

Brigham and Women's Hospital | Boston, MA

Overhauled research data infrastructure, doubling accessible data volume.

  • Built 10+ automated ETL pipelines, cutting project setup time from months to weeks.
  • Developed a Python web scraper to download and curate a novel dataset of 10,000+ images.
  • Built a Python NLP pipeline to parse 15GB+ of clinical notes.

Skills

Programming Python R SQL
AI / ML & LLMs PyTorch HuggingFace LangChain LangGraph Pinecone MCP LoRA scikit-learn RAG
Cloud & Data Eng AWS Glue Redshift S3 Spark Snowflake
MLOps & Tools Docker FastAPI Streamlit dbt Fivetran ECS Fargate

Education

MS in Machine Learning and Data Science

Northwestern University

BA in Molecular Biology and Biochemistry

Middlebury College