February 2026 - Present · Portland, Maine
Research Assistant
Northeastern University
Building a vision-language model for automated interpretation of veterinary fine needle aspirate cytology, focused on mast cell tumor detection and grading. End-to-end MedGemma 1.5 4B fine-tuning pipeline with QLoRA on a MedSigLIP encoder, deployed on Databricks with MLflow and Unity Catalog.
- Designed end-to-end VLM fine-tuning pipeline using MedGemma 1.5 4B with QLoRA (4-bit quantization + LoRA adapters) on a MedSigLIP vision encoder for multi-class mast cell tumor classification and cytologic interpretation; built in PyTorch with HuggingFace Transformers, TRL (SFTTrainer), PEFT, and bitsandbytes.
- Engineered multi-channel image preprocessing pipeline merging 4 fluorescence/brightfield channels (bf_green, bf_violet, fl_uv, fl_blue) into pseudo-RGB inputs via per-image P1/P99 normalization.
- Analyzed ~8M single-channel cell images across 19 channels, mapped structured vs unstructured pathology fields, and isolated 2,653 disease-relevant cases spanning 9 grade categories and 66 ground-truth annotated runs.
- Curated a 5-task VQA dataset (structured reporting, pathological process identification, key finding extraction, cell type classification, cytologic interpretation) by converting hierarchical pathologist dropdown annotations across 4 branches and 8 follow-up question types into natural-language Q&A pairs.
- Encoded reasoning from 4 cytologic grading systems (Camus, Paes, Kiupel, Patnaik) as chain-of-thought training signals; built interactive visualizations of the diagnostic decision tree to align the team on annotation schema.
- Deployed Databricks training infrastructure with auto-detection across 4 hardware tiers (T4, A10G, A100, H100), pre-loaded image caching that eliminated S3 I/O during training, custom callbacks tracking token accuracy and validation loss, and MLflow + Unity Catalog for experiment tracking and model registry.
- Architected the Unity Catalog schema (catalogs, schemas, volumes for images, Q&A pairs, model artifacts); evaluated checkpoints with token-level accuracy, ROUGE-L, F1, precision, recall, and perplexity across train/validation/test splits.
MedGemma 1.5 4BMedSigLIPQLoRAPyTorchHuggingFace TransformersTRLPEFTbitsandbytesDatabricksUnity CatalogMLflowSpark