Research Demo

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Sangyun Chung, Se Yeon Kim, Youngchae Chee, and Yong Man Ro

CVPR 2026 / Code

Recursive Think-Answer Process for LLMs and VLMs

Byung-Kwan Lee*, Youngchae Chee*, Yong Man Ro (*equal contribution)

CVPR 2026 Findings / Project Page

Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier

Hyeongseop Rha, Jeong Hun Yeo, Yeonju Kim, Yong Man Ro

AAAI 2026

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

Sungjune Park*, Hyunjun Kim*, Yong Man Ro (* equal contributor)

IEEE Transactions on Circuits and Systems for Video Technology

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim*, Sebin Shin*, Youngjoon Yu, Hak Gu Kim, and Yong Man Ro (* equal contributor)

CVPR 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro (* equally contributed)

CVPR 2024

Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation

Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro

ICASSP 2024

Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge

Seongyeop Kim, Hyung-Il Kim, and Yong Man Ro

AAAI 2024

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Jeongsoo Choi*, Joanna Hong*, and Yong Man Ro (* equally contributed)

ICCV 2023

Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

YeonJu Kim, Junho Kim, Byung-Kwan Lee, Sebin Shin, and Yong Man Ro

ICIP 2023

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro (* equally contributed)

CVPR 2023

Lip-to-speech Synthesis in the Wild with Multi-task Learning

Minsu Kim*, Joanna Hong*, and Yong Man Ro (* equally contributed)

ICASSP 2023

VisageSynTalk: Unseen Speaker Video to Speech Synthesis via Speech Visage Feature Selection

Joanna Hong, Minsu Kim, Yong Man Ro

ECCV 2022

Weakly Paired Associative Learning for Sound-Image Representation

Sangmin Lee, Hyung-Il Kim, Yong Man Ro

CVPR 2022

Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network

Byung-Kwan Lee, Junho Kim, Yong Man Ro

CVPR 2022

Towards Versatile Pedestrian Detector with Multisensory Matching/ Multispectral Recalling

Jung Uk Kim, Sungjune Park, Yong Man Ro

AAAI 2022

Distinguishing Homophenes using Multi-head Visual-Audio Memory for Lip Reading

Minsu Kim, Jeong Hun Yeo, Yong Man Ro

AAAI 2022

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

Sejin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

AAAI 2022

Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Yong Man Ro

NeurIPS 2021

Distilling Robust and Non-Robust Features in Adversarial Examples

Junho Kim, Byung-Kwan Lee, Yong Man Ro

NeurIPS 2021

Multi Modality Associative Bridging Through Memory Speech Sound Recollected From Face Video

Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

ICCV 2021

Robust Small scale Pedestrian Detection with Cued Recall via Memory Learning

Jung Uk Kim, Sungjune Park, Yong Man Ro

ICCV 2021

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, Yong Man Ro

CVPR 2021

Video Based Facial Expression Recognition with appearance suppressed dynamic features for on-the-fly prediction

Wissam J. Baddar, Sangmin Lee, Yong Man Ro

IEEE Transactions on Affective Computing 2019

ICADX: Interpretable Computer Aided Diagnosis of Breast Masses

Seong Tae Kim, Hakmin Lee, Hak Gu Kim, Yong Man Ro

Medical Imaging 2018

Facial Expression Based Face Identification

Seong Tae Kim, Yong Man Ro

ICIP 2018

Ultra Fast CGH Calculation using Sparse FFT

Hak Gu Kim, Yong Man Ro

Optics Express 2017

Deep Learning based Recognition: DeepSensus, deep facial expression recognition

Wissam J. Baddar, Daehoe Kim, Yong Man Ro

MMM 2017

Free-view Generation for 3D Displays

Hak Gu Kim, Yong Man Ro

IEEE TCSVT 2016

Automatically masking face for privacy protection first and Recognizing enrolled face later

2018.08.08

Measure of Visual Discomfort While Watching 3D TV

2018.08.08

Emotion TV: Emotion Measure While Watching TV Contents

2018.08.08

S3D quality analyzer

2018.08.08

Automatic Privacy Protection in Surveillance (Face Masking) and Real application to ATM Surveillance

2018.07.18

Facial Expression Recognition in Real-world Situation

2018.07.18

Digital Breast Tomosynthesis (DBT) Computer-Aided Detection (CAD)

2018.07.18

Page updated

Report abuse