Songyou Peng (彭崧猷)

I am a Senior Researcher/Postdoc at ETH Zurich, and also an incoming Research Sicentist at Google Research.

I received my PhD from ETH Zurich and Max Planck Institute for Intelligent Systems under the supervision of Marc Pollefeys and Andreas Geiger.

I was a research intern at Google Research with Tom Funkhouser, Meta Reality Labs Research with Michael Zollhoefer, Technical University of Munich with Daniel Cremers, and INRIA with Peter Sturm. I completed an Erasmus Mundus Masters in Computer Vision and Robotics (VIBOT) with distinction, and a Bachelors in Automation at Xi'an Jiaotong University.

Email  |  CV  |  GitHub  |  Google Scholar  |  LinkedIn  |  Twitter

headshot
News
Research
Renovating Names in Open-Vocabulary Segmentation Benchmarks
Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger
arXiv, 2024
paper | project page

Wanna enhance your segmentation model or benchmark? Renovate names now!

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu*, Zeju Qiu*, Yao Feng**, Yuliang Xiu**, Yuxuan Xue**, Longhui Yu**, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf
(*/** equal contribution)
International Conference on Learning Representations (ICLR), 2024
paper | project page | code

BOFT (Orthogonal Butterfly) is a general finetuning technique that adapts foundation models to different tasks such as Vision, NLP, Math QA, and Controllable Generation.

NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM
Zihan Zhu*, Songyou Peng*, Viktor Larsson, Zhaopeng Cui, Martin R. Oswald, Andreas Geiger, Marc Pollefeys
International Conference on 3D Vision (3DV), 2024 (Oral, Best Paper Honorable Mention)
(* equal contribution)
paper | project page | video | code

RGB-only version of our NICE-SLAM, making it NICER.

FastHuman: Reconstructing High-Quality Clothed Human in Minutes
Lixiang Lin, Songyou Peng, Qijun Gan, Jianke Zhu
International Conference on 3D Vision (3DV), 2024 (Spotlight, top 8.2%)
paper | project page | code

Shape As Points (SAP) for fast human body reconstruction.

Neural Scene Representations for 3D Reconstruction and Scene Understanding
Songyou Peng
PhD Thesis, 2023
thesis | slides
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
Rui Huang, Songyou Peng, Ayça Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann
arXiv, 2023
paper | project page | demo

A self-supervised segmentation approach that outperforms fully-supervised methods.

DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
Shengqu Cai, Eric R. Chan, Songyou Peng, Mohamad Shahbazi, Anton Obukhov, Luc Van Gool, Gordon Wetzstein
International Conference on Computer Vision (ICCV), 2023
paper | project page

A diffusion-model based unsupervised framework capable of synthesizing novel views depicting a long camera trajectory.

OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper | project page | video | code

Zero-shot approach for novel 3D scene understanding tasks with open-vocabulary queries.

: A Unified Framework for Surface Reconstruction
Zehao Yu, Anpei Chen, Bozidar Antic, Songyou Peng, Apratim Bhattacharyya, Michael Niemeyer, Siyu Tang, Torsten Sattler, Andreas Geiger
Open Source Project, 2023
project page | code

We provide a unified framework and benchmark for neural implicit surface reconstruction.

MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, Andreas Geiger
Advances in Neural Information Processing Systems (NeurIPS), 2022
paper | project page

Monocular depth and normal cues significantly boost the performance of neural implicit surface reconstruction methods.

NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
Zihan Zhu*, Songyou Peng*, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
(* equal contribution)
paper | project page | video | code

A neural implicit-based RGB-D SLAM that can be applied to large-scale scenes.

Shape As Points: A Differentiable Poisson Solver
Songyou Peng, Chiyu "Max" Jiang, Yiyi Liao, Michael Niemeyer, Marc Pollefeys, Andreas Geiger
Advances in Neural Information Processing Systems (NeurIPS), 2021 (Oral, top 0.6%)
paper | project page | video (6 min) | video (12 min) | podcast | code

An interpretable hybird shape representation that yields HQ watertight meshes at low inference times.

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
Michael Oechsle, Songyou Peng, Andreas Geiger
International Conference on Computer Vision (ICCV), 2021 (Oral, top 3%)
paper | project page | video | teaser video | code

Our method enables to reconstruct accurate surfaces without input masks.

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
Christian Reiser, Songyou Peng, Yiyi Liao, Andreas Geiger
International Conference on Computer Vision (ICCV), 2021
paper | project page | blog | video | teaser video | code

Over 2000x speed-ups for NeRF are possible by utilizing thousands of tiny MLPs.

Dynamic Plane Convolutional Occupancy Networks
Stefan Lionar*, Daniil Emtsev*, Dusan Svilarkovic*, Songyou Peng
Winter Conference on Applications of Computer Vision (WACV), 2021
(* equal contribution)
paper | video | code

A student project of 3D Vision course at ETH Zurich where I served as the advisor.

Convolutional Occupancy Networks
Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, Andreas Geiger
European Conference on Computer Vision (ECCV), 2020 (Spotlight, top 5%)
paper | project page | blog | video | teaser video | code
Most influential ECCV'20 papers #13

A flexible implicit representation for accurate large-scale 3D reconstruction.

DIST: Rendering Deep Implicit Signed Distance Function with Differentiable Sphere Tracing
Shaohui Liu, Yinda Zhang, Songyou Peng, Boxin Shi, Marc Pollefeys, Zhaopeng Cui
Conference on Computer Vision and Pattern Recognition (CVPR), 2020
paper | project page | teaser video | poster | code

A differentiable renderer for deep implicit signed distance functions.

Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty
Songyou Peng and Peter Sturm
International Conference on Computer Vision (ICCV), 2019 (Oral, top 4.6%)
paper | video | poster | code

A novel system that interactively guides a user to take optimal calibration images.

Photometric Depth Super-Resolution
Bjoern Haefner*, Songyou Peng*, Alok Verma*, Yvain Queau, Daniel Cremers
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
(* equal contribution)
paper | project page

Recover high-resolution depth maps with fine geometric details using photometric techniques.

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship
Le Zhang, Songyou Peng, Stefan Winkler
IEEE Transactions on Affective Computing (TAFFC), 2019. In press.
paper | code

A journal extension of our ACM MM 2018 paper.

Give Me One Portrait Image, I Will Tell You Your Emotion and Personality
Songyou Peng, Le Zhang, Stefan Winkler, Marianne Winslett
ACM International Conference on Multimedia (ACM MM), 2018
paper | slides | code

Technical Demo. A deep Siamese-like network is introduced to predict one's Big-Five personality and arousal-valence emotion from one portrait photo.

Depth Super-Resolution Meets Uncalibrated Photometric Stereo
Songyou Peng, Bjoern Haefner, Yvain Queau, Daniel Cremers
International Conference on Computer Vision (ICCV) Workshops, 2017
paper | slides | code & data

A novel depth super-resolution approach for RGB-D sensors is presented.

This paper a part of my master thesis, and subsumed by our TPAMI paper.

High Quality Shape from a RGB-D Camera using Photometric Stereo
Songyou Peng
M.Sc. Thesis, Techinical University of Munich
Supervisor: Yvain Queau and Daniel Cremers
thesis | bibtex | poster

Mentored Students
I am fortunate to (co-)mentor some talented and highly motivated students. I have learnt from and gotten inspired by them:
  • Jan Ackermann (Ongoing): MSc student at ETH Zurich
    • Semester thesis: Continual Learning of 3D Gaussian Splatting

  • Gonca Yilmaz (Ongoing): MSc student at University of Zurich
    • Semester thesis: Open Vocabulary Segmentation from Multi-Modal Inputs (ICCVW'23)
    • Master thesis (Ongoing): OpenDAS: Open-Vocabulary Domain Adaption for Segmentation (ECCV'24 submission)

  • Weining Ren (2023): MSc student at ETH Zurich
    • Master thesis: NeRF On-the-go (CVPR'24)
    • → PhD student at The University of Hong Kong (HKU), advised by Kai Han

  • Lei Li (2023): MSc student at ETH Zurich

  • Mirlan Karimov (2023): MSc student at ETH Zurich
    • Master thesis: Interactive Preprocessing via Multi-Modal Prompting for NeRFs
    • → PhD student at Mercedes-Benz AG

  • Shengqu Cai (2022): MSc student at ETH Zurich

  • Zihan Zhu (2021): BSc student at Zhejiang University
    • Bachelor internship project: NICE-SLAM (CVPR'22)
    • Semester project: NICER-SLAM (3DV'24, Best Paper Honorable Mention)
    • → Direct doctorate student at ETH Zurich, advised by Marc Pollefeys

  • Pfister Severin (2021): MSc student at ETH Zurich
    • Master thesis: Online Implicit Reconstruction
    • → Consultant at McKinsey & Company

  • Weirong Chen (2020): MSc student at ETH Zurich
    • Semester thesis: Real-time 3D Reconstruction through Neural Implicit Representation
    • → PhD student at TU Munich, advised by Daniel Cremers and Andrea Vedaldi
Invited Talks


2D Magic in a 3D World
Imperial College London, hosted by Andrew Davison, 2024
Czech Technical University (CTU), hosted by Torsten Sattler, 2024
The University of Hong Kong (HKU), hosted by Kai Han, 2024
slides
Dive into Neural Explicit-Implicit 3D Representations and Their Applications
Symposium of Geometry Processing (SGP) Graduate School, 2023 (Invited Lecture)
slides
Learning to Reconstruct and Understand the 3D World
Microsoft Mixed Reality & AI Labs - Zurich, 2023
slides
Learning Neural Scene Representations for 3D Reconstruction and Understanding
Shanghai AI Lab, 2023
slides
OpenScene: 3D Scene Understanding with Open Vocabularies
Peking University, hosted by Baoquan Chen, 2023
Apple, 2023
Stability.ai, 2023
slides
How do NeRF and CLIP advance 3D Scene Reconstruction and Understanding
Chinese University of Hong Kong (CUHK) Shenzhen, 2023
Bosch Center for Artificial Intelligence (BCAI), 2023
slides
Large-Scale 3D Scene Reconstruction with NeRF
Stanford University, hosted by Gordon Wetzstein, 2022
slides
Towards Practical Applications of NeRF
Adobe Research, hosted by Zexiang Xu, 2022
slides
Neural Scene Representations for 3D Reconstruction
University of Basel, 2022
slides
Shape As Points: A Differentiable Poisson Solver
Talking Papers Podcast, 2022
video | podcast
Shape As Points: A Differentiable Poisson Solver
Graphics And Mixed Environment Seminar (GAMES), 2021
slides | talk (in Chinese)
Towards Practical Applications of NeRF
Graphics And Mixed Environment Seminar (GAMES), 2021
slides | talk (in Chinese)
Selected Projects

3D Textured Shape Recovery with Learned Geometric Priors
Lei Li, Zhizheng Liu, Weining Ren, Liudi Yang, F. Wang, Marc Pollefeys, Songyou Peng
Shape Recovery from Partial Textured 3D Scans (SHARP), 2022
leaderboard | arxiv | code

1st place in reconstructing partial textured objects and 2nd overall.

A Deep Network for Arousal-Valence Emotion Prediction with Acoustic-Visual Cues
Songyou Peng, Le Zhang, Yutong Ban, Meng Fang, Stefan Winkler
IJCNN One-Minute Gradual (OMG) Emotion Behavior Challenge, 2018
leaderboard | arxiv | code

1st for vision-only arousal/valence prediction and 2nd for overall valence prediction.

A Hybrid SLAM and Object Recognition System for Pepper Robot
Songyou Peng*, Kaisar Kushibar*, Paola Ardon*
VIBOT Robotics Project, 2016
arxiv | video | code

Apply visual SLAM on the Pepper robot along with object recognition.

Teaching
Teaching Assistant (Lead), 3D Vision, Spring 2023
Teaching Assistant, Computer Vision, Fall 2022
Teaching Assistant (Lead), 3D Vision, Spring 2022
Teaching Assistant, Deep Learning for Computer Vision: Seminal Work, Spring 2022
Teaching Assistant, 3D Vision, Spring 2020
Teaching Assistant, Deep Learning for Computer Vision: Seminal Work, Spring 2020

Teaching Assistant, Deep Learning, Winter 2020/2021



Academic Services
  • Publicity Chair: 3DV'25
  • Area Chair: 3DV'24
  • Workshop Organizer: OpenSUN3D at ICCV'23
  • Conference Reviewer: CVPR, ICCV, ECCV, ICLR, NeurIPS, SIGGRAPH, SIGGRAPH Asia
  • Journal Reviewer: TPAMI, CVIU

template adapted from this awesome website