Embodied Minds Lab, Harvard University & Kempner Institute

less than 1 minute read

Published:

Visiting Undergraduate Research Assistant, Supervisor: Prof. Yilun Du & Dr. Ruojin Cai

Sep. 2025 - present

  • Developed an inverse generative modeling framework for scene understanding on CLEVR, training a relation-conditioned composable diffusion model that generates scenes from structured object-attribute and spatial-relation inputs, achieving 82% accuracy on single-relation scene generation.
  • Implemented an inference pipeline that inverts the generative model to predict discrete spatial relations between object pairs from images by scoring candidate relations via diffusion denoising (noise-prediction) energy with Monte Carlo sampling over timesteps and noise.
  • Investigating the inverse generative modeling framework for real-world scenes with DiT and cross-attention mechanisms to better capture complex relations among objects.