Notice

[#219]   2020-12-03  [AAAI 2021]   Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images (by Hak Gu Kim) is accepted in AAAI 2021

Title: Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

Authors: Hak Gu Kim, Minho Park, Sangmin Lee, Seongyeop Kim, Yong Man Ro

 

Depth adjustment aims to enhance the visual experience of stereoscopic 3D (S3D) images, which accompanied with improving visual comfort and depth perception. For a human expert, the depth adjustment procedure is a sequence of iterative decision making. The human expert iteratively adjusted the depth until he is satisfied with the both levels of visual comfort and the perceived depth. In this work, we present a novel deep reinforcement learning (DRL)-based approach for depth adjustment named VCA-RL (Visual Comfort Aware Reinforcement Learning) to explicitly model human sequential decision making in depth editing operations. We formulate the depth adjustment process as a Markov decision process where actions are defined as camera movement operations to control the distance between the left and right cameras. Our agent is trained based on the guidance of an objective visual comfort assessment metric to learn the optimal sequence of camera movement actions in terms of perceptual aspects in stereoscopic viewing. With extensive experiments and user studies, we show the effectiveness of our VCA-RL model on three different S3D databases.

[#218]   2020-12-03  [AAAI 2021]   Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents (by Hak Gu Kim) is accepted in AAAI 2021

Title: Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents

Authors: Hak Gu Kim, Sangmin Lee, Seongyeop Kim, Heoun-taek Lim, and Yong Man Ro

 

We address the black-box issue of VR sickness assessment (VRSA) by evaluating the level of physical symptoms of VR sickness. For the VR contents inducing the similar VR sickness level, the physical symptoms can vary depending on the characteristics of the contents. Most of existing VRSA methods focused on assessing the overall VR sickness score. To make better understanding of VR sickness, it is required to predict and provide the level of major symptoms of VR sickness rather than overall degree of VR sickness. In this paper, we predict the degrees of main physical symptoms affecting the overall degree of VR sickness, which are disorientation, nausea, and oculomotor. In addition, we introduce a new large-scale dataset for VRSA including 360 videos with various frame rates, physiological signals, and subjective scores. On VRSA benchmark and our newly collected dataset, our approach shows a potential to not only achieve the highest correlation with subjective scores, but also to better understand which symptoms are the main causes of VR sickness.

[#217]   2020-11-23  [IEEE CSVT]   CUA Loss: Class Uncertainty-Aware Gradient Modulation for Robust Object Detection (by Jung Uk Kim) is accepted in IEEE Trans. on Circuits and Systems for Video Technology

CUA Loss: Class Uncertainty-Aware Gradient Modulation for Robust Object Detection

Authors: Jung Uk Kim, Seong Tae Kim, Hong Joo Lee, Sangmin Lee, and Yong Man Ro


Recently, a wide range of research on object detectionhas shown breakthrough performance. However, in a

challenging environment, such as occlusion and small object cases, object detectors still produce inaccurate or erroneous predictions. To effectively cope with such conditions, most of the existing methods have suggested loss functions to guide the object detectors by modulating the magnitude of their loss. However, when modulating the loss function, they are highly dependent on the classification score of the object detector. It is a known fact that deep neural networks tend to be overconfident in their predictions. In this paper, to alleviate the problem of the object detectors which heavily rely on the prediction in the training phase, we devise a novel loss function called class uncertainty-aware (CUA) loss. CUA loss considers the predictive ambiguity as well as the predictions on classification score when modulating loss function. In addition to the classification score, CUA loss further modulates the loss gradient in an increasing way when the object detectors output an uncertain prediction. Therefore, object detectors with CUA loss effectively cope with challenging environments where prediction result is uncertain. With comprehensive experiments on three public datasets (i.e. PASCAL VOC, MS COCO, and Berkeley DeepDrive), we verified that our CUA loss enhanced the accuracy of the object detectors and outperformed previous state-of-the-art loss functions.

[#216]   2020-10-16  [MMM 2021]   Robust Multispectral Pedestrian Detection via Uncertain-Aware Cross-Modal Learning (by Sungjune Park) is accepted in MMM 2021

Title: Robust Multispectral Pedestrian Detection via Uncertain-Aware Cross-Modal Learning

Authors: Sungjune Park, Jung Uk Kim, Yeon Gyun Kim, Sang-Keun Moon and Yong Man Ro

 

With the development of deep neural networks, multispectral pedestrian detection has been received a great attention by exploiting complementary properties of multiple modalities (e.g., color-visible and thermal modalities). Previous works usually rely on network prediction scores in combining complementary modal information. However, it is widely known that deep neural networks often show overconfident problem which results in limited performance. In this paper, we propose a novel uncertainty-aware cross-modal learning to alleviate the aforementioned problem in multispectral pedestrian detection. First, we extract

object region uncertainty which represents the reliability of object region features in multiple modalities. Then, we combine each modal object region feature considering object region uncertainty. Second, we guide the classifier of detection framework with soft target labels to be aware of the level of object region uncertainty in multiple modalities. To verify the effectiveness of the proposed methods, we conduct extensive experiments with various detection frameworks on two public datasets (i.e., KAIST Multispectral Pedestrian Dataset and CVC-14).

[#215]   2020-10-16  [ICPR 2021]   Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition (by Minsu Kim) is accepted in ICPR 2021

Title: Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition

Authors: Minsu Kim, Joanna Hong, Junho Kim, Hong Joo Lee, Yong Man Ro

 

It is well-known that identity-unrelated variations (e.g., viewpoint or illumination) degrade the performances of face recognition methods. In order to handle this challenge, a robust method for disentangling the identity and view representations has drawn an attention in the machine learning area. However, existing methods learn discriminative features which require a manual supervision of such factors of variations. In this paper, we propose a novel disentangling framework through modeling three representations of identity, viewpoint, and residues (i.e., identity and pose unrelated) which do not require supervision of the variations. By jointly modeling the three representations, we enhance the disentanglement of each representation and achieve robust face recognition performance. Further, the learned viewpoint representation can be utilized for pose estimation or editing of a posed facial image. Extensive quantitative and qualitative evaluations verify the effectiveness of our proposed method which disentangles identity, viewpoint, and residues of facial images. 

[#214]   2020-09-15  2021 전기 학생모집

2021년도 전기 박사과정, 석사과정, 산학장학생 (KEPSI, EPSS, LGenius) 등을 모집합니다.

(http://admission.kaist.ac.kr/graduate/)

 

모집 연구분야:

 - Deep learning (XAI, adversarial attack/defense, multimodal)

 - Machine learning with visual data

 - Computer vision (object segmentation/detection/classification)

 - multimodal (Vision-Language) Deep learning

 - Medical imaging/ Defense security

 

현재 진행중인 연구과제:

 - Explainable (Interpretable) Deep learning

 - Adversarial attack/defense in Deep learning

 - Deep learning algorithms (detection/classification/segmentation) in computer vision

 - Multimodal deep learning

 

최근 연구실 연구결과 - 링크 (LINK)

최근 연구실 석박사과정 딥러닝 관련 해외 학회 발표실적 - 링크 (LINK)

최근 연구실 석박사과정 해외 저널 실적 - 링크 (LINK)

을 참고하세요.

 

연구실 입학 문의는 노용만 교수님(ymro@kaist.ac.kr)께 이메일/사전미팅 하기 바랍니다.


[#213]   2020-08-24  [IEEE Access]   Dual-Branch Structured De-Striping Convolution Network Using Parametric Noise Model (by Jongho Lee) is accepted in IEEE Access

Title: Dual-Branch Structured De-Striping Convolution Network Using Parametric Noise Model

Authors: Jongho Lee and Yong Man Ro

 

Abstract: The stripe fixed pattern noise (FPN) of the infrared image significantly corrupts the image quality so that the infrared imaging system suffers from the degradation of observability and detectability during operation. Therefore, the FPN de-striping method, which eliminates stripe patterns without substantial loss of image information, remains a core technology in the field of infrared image processing. In this paper, we propose the dual-branch structured based FPN de-striping deep convolutional neural network to effectively extract the structural features of the FPN and preserve the image details in the single infrared image. In addition, we have established the parametric FPN model through an infrared image diagnosis experiment based on the physical principle of the infrared detector signal response. We have optimized each parameter of the FPN model using measured data, which acquired on a wide range of detector temperatures. Further, we generate the training data using our FPN model to ensure stable learning performance against various stripe patterns. We performed comparative experiments with state-of-the-art methods using artificially corrupted infrared images and real corrupted infrared images, and our proposed method achieved outstanding de-striping results in both qualitative and quantitative evaluation compared with existing methods.

[#212]   2020-07-31  [BMVC 2020]   Robust Ensemble Model Training via Random Layer Sampling Against Adversarial Attack (by Hakmin Lee and Hong Joo Lee) is accepted in BMVC 2020

Title: Robust Ensemble Model Training via Random Layer Sampling Against Adversarial Attack

Authors: Hakmin Lee*, Hong Joo Lee*, Seong Tae Kim, and Yong Man Ro

* Both authors contributed equally to this work.


Deep neural networks have achieved substantial achievements in several computer vision areas, but have vulnerabilities that are often fooled by adversarial examples that are not recognized by humans. This is an important issue for security or medical applications. In this paper, we propose an ensemble model training framework with random layer sampling to improve the robustness of deep neural networks. In the proposed training framework, we generate various sampled model through the random layer sampling and update the weight of the sampled model. After the ensemble models are trained, it can hide the gradient efficiently and avoid the gradient-based attack by the random layer sampling method. To evaluate our proposed method, comprehensive and comparative experiment have been conducted on three datasets. Experimental results show that the proposed method improves the adversarial robustness.

[#211]   2020-07-03   [ECCV 2020]   SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding (by Sangmin Lee) is accepted in ECCV 2020

Title: SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding

Authors: Sangmin Lee, Jung Uk Kim, Hak Gu Kim, Seongyeop Kim, and Yong Man Ro


Recently, cybersickness assessment for VR content is in demand to deal with viewing safety issues. Assessing physical symptoms of individual viewers is challenging but important to provide detailed and personalized guides for viewing safety. In this paper, we propose a novel symptom-aware cybersickness assessment network (SACA Net) that quantifies physical symptom levels for assessing cybersickness of individual viewers. SACA Net is designed to utilize the relational characteristics of symptoms for complementary effects among relevant symptoms. The proposed network consists of three main parts: a stimulus symptom context guider, a physiological symptom guider, and a symptom relation embedder. The stimulus symptom context guider and the physiological symptom guider extract symptom features from VR content and human physiology, respectively. The symptom relation embedder refines the stimulus-response symptom features to effectively predict cybersickness by embedding relational characteristics with graph formulation. For validation, we utilize two public 360-degree video datasets that contain cybersickness scores and physiological signals. Experimental results show that the proposed method is effective in predicting human cybersickness with physical symptoms. Further, latent relations among symptoms are interpretable by analyzing relational weights in the proposed network.

[#210]   2020-05-18   [IEEE ICIP]   8 papers have been accepted (Jung uk, Seongyeop, Minsu, Joanna, Junho, Dae hwi, Byeong cheon) in IEEE ICIP 2020

1. Authors: Jung Uk Kim*, Sungjune Park*, Yong Man Ro *equally contributed first author

Title: 'Towards Human-Like Interpretable Object Detection Via Spatial Relation Encoding'


2. Authors: Eun Sung Kim*, Jung Uk Kim*, Sangmin Lee, Sang-Keun Moon, Yong Man Ro *equally contributed first author

Title: 'Class Incremental Learning With Task-Selection'


3. Authors: Seongyeop Kim, Sangmin Lee, Yong Man Ro

Title: 'Estimating Vr Sickness Caused By Camera Shake In Vr Videography'


4. Authors: Minsu Kim, Hong Joo Lee, Sangmin Lee, Yong Man Ro

Title: 'Robust Video Facial Authentication With Unsupervised Mode Disentanglement'


5. Authors: Joanna Hong, Jung Uk Kim, Sangmin Lee, Yong Man Ro

Title: 'Comprehensive Facial ____Expression____ Synthesis Using Human-Interpretable Language'


6. Authors: Junho Kim, Minsu Kim, Jung Uk Kim, Hong Joo Lee, Sangmin Lee, Joanna Hong, Yong Man Ro

Title: 'Learning Style Correlation For Elaborate Few-Shot Classification'


7. Authors: Dae Hwi Choi, Hong Joo Lee, Sangmin Lee, Jung Uk Kim, Yong Man Ro

Title: 'Fake Video Detection With Certainty-Based Attention Network'


8. Authors: Byeong Cheon Kim, Jung Uk Kim, Hakmin Lee, Yong Man Ro

Title: 'Revisiting Role Of Autoencoders In Adversarial Settings'

[#209]   2020-04-01  2020년도 후기 학생모집 (국비, KAIST, 산학)

2020년도 후기 박사과정(KAIST장학), 석사과정(국비 및 KAIST장학), 산학장학생 (KEPSI, EPSS, LGenius) 등을 모집합니다.

(http://admission.kaist.ac.kr/graduate/)


모집 연구분야:

 - Deep learning (XAI, adversarial defense, multimodal)

 - Machine learning with visual data

 - Computer vision (object segmentation/detection/classification)

 - multimodal (Vision-Language) Deep learning

 - Medical imaging/ Defense security


현재 진행중인 연구과제:

 - Explainable (Interpretable) Deep learning

 - Adversarial defense in Deep learning

 - Deep learning algorithms (detection/classification/segmentation) in computer vision

 - Multimodal deep learning


최근 연구실 연구결과 - 링크 (LINK)

최근 연구실 석박사과정 딥러닝 관련 해외 학회 발표실적 - 링크 (LINK)

최근 연구실 석박사과정 해외 저널 실적 - 링크 (LINK)


을 참고하세요.


연구실 입학 학생 (국비, KAIST장학생, 지도교수 사전선택 KAIST장학생)은 노용만 교수님(ymro@kaist.ac.kr)께 이메일/사전미팅 하기 바랍니다.

[#208]   2020-03-05  [IEEE CSVT]   Robust Video Frame Interpolation (by Minho, Hak Gu, and Sangmin) is accepted in IEEE CSVT

Title: Robust Video Frame Interpolation with Exceptional Motion Map

Authors: Minho Park, Hak Gu Kim, Sangmin Lee, and Yong Man Ro,


Video frame interpolation has increasingly attracted attention in computer vision and video processing fields. When motion patterns in a video are complex, large and non-linear (exceptional motion), the generated intermediate frame is blurred and likely to have large artifacts. In this paper, we propose a novel video frame interpolation considering the exceptional motion patterns. The proposed video frame interpolation takes into account an exceptional motion map that contains the location and intensity of the exceptional motion. The proposed method consists of three parts, which are optical flow based frame interpolation, exceptional motion detection, and frame refinement. The optical flow based frame interpolation predicts an optical flow which is used to synthesize the pre-generated intermediate frame. The exceptional motion detection detects the position and intensity of complex and large motion with the current frame and the previous frame sequence. The frame refinement focuses on the exceptional motion region of the pre-generated intermediate frame by using the exceptional motion map. The proposed video frame interpolation can be robust against the exceptional motion including complex and large motion. Experimental results showed that the proposed video frame interpolation achieved high performance on various public video datasets and especially on videos with exceptional motion patterns.

[#207]   2020-02-27  [CVPR 2020]   Structure Boundary Preserving Segmentation (by Hong Joo Lee) is accepted in CVPR 2020

Title: Structure Boundary Preserving Segmentation for Medical Image with Ambiguous Boundary

Authors: Hong Joo Lee, Jung Uk Kim, Sangmin Lee, Hak Gu Kim and Yong Man Ro


In this paper, we propose a novel image segmentation method to tackle two critical problems of medical image, which are (i) ambiguity of structure boundary in the medical image domain and (ii) uncertainty of the segmented region without specialized domain knowledge. To solve those two problems in automatic medical segmentation, we propose a novel structure boundary preserving segmentation framework. To this end, the boundary key point selection algorithm is proposed. In the proposed algorithm, the key points on the structural boundary of the target object are estimated. Then, a boundary preserving block (BPB) with the boundary key point map is applied for predicting the structure boundary of the target object. Further, for embedding experts’ knowledge in the fully automatic segmentation, we propose a novel shape boundary-aware evaluator (SBE)with the ground-truth structure information indicated by experts. The proposed SBE could give feedback to the segmentation network based on the structure boundary key point. The proposed method is general and flexible enough to be built on top of any deep learning-based segmentation network. We demonstrate that the proposed method could surpass the state-of-the-art segmentation network and improve the accuracy of three different segmentation network models on different types of medical image datasets.

[#206]   2020-01-28  [ICASSP 2020] Classification and localization separation considered Object detection (by Jung Uk Kim) is accepted in ICASSP 2020

Title: TOWARDS HIGH-PERFORMANCE OBJECT DETECTION: TASK-SPECIFIC DESIGN CONSIDERING CLASSIFICATION AND LOCALIZATION SEPARATION

Authors: Jung Uk Kim, Seong Tae Kim, Eun Sung Kim, Sang-Keun Moon, Yong Man Ro


Object detection performs two tasks (classification and localization) simultaneously. Two tasks share a similarity: they need robust features that effectively represents the visual appearance of the objects. However, two tasks also have different properties. First, classification mainly requires features from discriminative parts of an object to determine the object category, whereas localization mainly requires features from the entire object regions for localizing by drawing a bounding box. Second, classification has a translation invariant property, whereas localization has a translation variant property. In order to increase the efficiency of object detection, it is necessary to design a network in consideration of the commonalities and differences of two tasks. In this work, we simply modified layers of the existing object detection networks into three parts by considering such characteristics: lower-layer feature sharing part, layer separation part, and feature fusion part. As a result, the performance of the proposed method was noticeably improved by properly sharing, separating, and fusing layers of the existing object detection networks

[#205]   2020-01-28   [ICASSP 2020]   Exceptional motion aware video frame interpolation (by Minho Park) is accepted in ICASSP 2020

Title: VIDEO FRAME INTERPOLATION VIA EXCEPTIONAL MOTION-AWARE SYNTHESIS

Authors: Minho Park, Sangmin Lee, Yong Man Ro


In this paper, we propose a novel video frame interpolation method via exceptional motion-aware synthesis, in which accurate optical flow could be estimated even with exceptional motion patterns. Specifically, we devise two deep learning modules: exceptional motion detection and frame interpolation with refined flow. The motion detection module detects the position and intensity of exceptional motion patterns in current frame given the past frame sequence. The flow refinement module refines the pre-estimated optical flow for synthesizing the intermediate frame using the information of exceptional motion. The proposed modules improve the quality of the synthesized intermediate frame by making the optical flow robust against exceptional case of motion. Experimental results showed that the proposed method outperforms the state-of-the-art methods qualitatively and quantitatively.

[#204]   2020-01-16   [MMM 2020]   Interactive VIdeo Search Tool (by Sung June) is published in VBS of MMM 2020

IVIST: Interactive VIdeo Search Tool in VBS 2020                           

Authors: Sungjune Park, Jaeyub Song, Minho Park, Yong Man Ro

 

This paper presents a new video retrieval tool, Interactive VIdeo Search Tool (IVIST), which participates in the 2020 Video Browser Showdown (VBS). As a video retrieval tool, IVIST is equipped with proper and high performing functionalities such as object detection, dominant-color finding, scene-text recognition and text-image retrieval. These functionalities are constructed with various deep neural networks. By adopting these functionalities, IVIST performs well in searching users’ desirable videos. Furthermore, due to user-friendly user interface, IVIST is easy to use even for novice users. Although IVIST is developed to participate in VBS, we hope that it will be applied as a practical video retrieval tool in the future, dealing with actual video data on the Internet.

[#203]   2020-01-16   [MMM 2020]]   Facial Εxpression Sentence Generation (by Joanna) is published in MMM 2020

Face Tells Detailed Εxpression: Generating Comprehensive Facial Εxpression Sentence through Facial Action Units

Authors: Joanna Hong, Hong Joo Lee, Yelin Kim, Yong Man Ro

 

Human facial expression plays the key role in the understanding of the social behavior. Many deep learning approaches present facial emotion recognition and automatic image captioning considering human sentiments. However, most current deep learning models for facial expression analysis do not contain comprehensive, detailed information of a single face. In this paper, we newly introduce a text-based facial expression description using several essential components describing comprehensive facial expression: gender, facial action units, and corresponding intensities. Then, we propose comprehensive facial expression sentence generating model along with facial expression recognition model for a single facial image to verify the effectiveness of our text-based dataset. Experimental results show that the proposed two models are supporting each other improving their performances: the text-based facial expression description provides comprehensive semantic information to the facial emotion recognition model. Also, the visual information from the emotion recognition model guides the facial expression sentence generation to produce a proper sentence describing comprehensive description.