John He

John He

Aspiring Data Scientist

Masters student at Northwestern University studying Machine Learning and Data Science

About Me

  • I am at Alexion, the Rare Disease Unit of AstraZeneca, as a Data Intern.
  • I am a Machine Learning and Data Science graduate student at Northwestern University, expected to graduate in December 2025.
  • I worked for 3 years at Brigham and Women's Hospital (BWH) / Dana Farber Cancer Institute (DFCI) in Radiation Oncology research
  • I have a Bachelor of Arts in Molecular Biology and Biochemistry from Middlebury College.

Education

MS in Machine Learning and Data Science

Northwestern University

GPA: 3.97

Expected: December 2025

BA in Molecular Biology and Biochemistry

Middlebury College

magna cum laude, GPA: 3.64

Projects

Article Querying using RAG-based LLMs

Python

Overview: Used an Agentic AI structure to explore the interaction between RAG and LLMs in the context of Article searches.

Key Results: While this does work, I hope to make queries more accurate in the future.

MLDS Hackathon 2024 - Soccer Analytics

Python

Overview: Participated in the program's annual 48 hour hackathon

Key Results: Worked in a team of 4 to create initial lineup selection (random forest), player substitutions (collaborative filtering), and general recommendations (ChatGpt wrapper)

Video Game Analysis - In Progress

Python

Overview: Looking at trends in Steam's video game data

Key Results: Currently working on this!

Celebrity Image Webscraper

Python

Overview: Extracted Images from Wikimedia Commons

Key Results: Assembled a database of images

Automated Cox Proportional Hazards Analysis

R

Overview: Automated univariate and multivariate analysis Cox PH Analyses across covariates

Key Results: Used significant covariates from univariate analyses for multivariate analyses

R Data Manipulation Tutorial

R

Overview: Provides an overview of R functions for data manipulation

Key Results: Was used to help coworkers who wanted to get better at working with data in their projects/studies

Work Experience

Real World Science Data Intern at Alexion, AstraZeneca

  • Leveraging the power of LLMs to automate some manual, text-heavy processes.

Data Coordinator at BWH/DFCI

  • Developed ETL workflows using R and Python to provide data efficiently to researchers, reducing data retrieval time from weeks to days
  • Conducted survival analyses using Cox Proportional Hazards models to assist physicians in research reports
  • Created a Python web scraper collecting 10,000+ facial images for ML model testing