Ethan He
Engineer at
xAI
focusing on media generation, world models, and VLMs.
Multi-turn video extension
Reference-to-video
Grok Imagine v1.0
Grok Imagine v0.9
𝕏
GitHub
Scholar
LinkedIn
Grokipedia
firstnamelastname42 at gmail
Open Source Projects
Cosmos
:
state-of-the-art generative world models
NeMo DFM
:
large-scale training and inference framework for diffusion models
Megatron-LM MoE
:
Scaling up mixture of experts
NeMo
:
scalable training framework for LLMs transformers
LongVILA
:
Long-Context VLM for long videos (ICLR'25)
ActGPT
:
browser-use agent
Channel Pruning
:
Accelerating Very Deep Neural Networks (ICCV'17)
Epipolar Transformers
:
Accurate multi-camera pose understanding (CVPR'20)
AMC
:
AutoML for model compression (ECCV'18)
KL Loss
:
Accurate Object Detection (CVPR'19)
FSAF
:
single-shot object detection (CVPR'19)
Invited Talks
NVIDIA Cosmos: World Foundation Model Platform for Physical AI
Upcycling LLMs into MoE
Insights on Sora