Kenneth Wong, namely Wenbin Wang (王文彬), is currently a final-year Ph.D. student in the Key Lab. of Intelligent Information Processing (IIP) at Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) , advised by Prof. Xilin Chen and Prof. Ruiping Wang . His research interests include but not limited to 2D/3D scene understanding, object detection, scene graph generation, and image captioning. Before this, he received his B.Eng. degree in Computer Science and Technology as an undergraduate student of Nankai University (NKU, 2013 - 2017).
“Those times when you get up early and you work hard; those times when you stay up late and you work hard; those times when don’t feel like working — you’re too tired, you don’t want to push yourself — but you do it anyway. That is actually the dream. That’s the dream.”
Ph.D. in Computer Vision / Artificial Intelligence, 2017 -
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
B.Eng. in Computer Science and Technology, 2013 - 2017
Nankai University, Tianjin, China
[Jul. 23, 2021]: One paper was accepted to ICCV 2021!
[Aug. 10, 2020]: The code of our HetH (ECCV 2020) was released.
[Jul. 3, 2020]: One paper was accepted to ECCV 2020!
[Jun. 15 ~ 19, 2019]: I attended the CVPR 2019 held in Long Beach, CA, U.S.
[Mar. 1, 2019]: One paper was accepted to CVPR 2019!
[Sept. 1, 2017]: I joined the Key Lab. of IIP, ICT, CAS as a Ph.D. student. Go Bears!
If an image tells a story, the scene graph and image caption are the most popular narrators. Generally, a scene graph prefers to be an omniscient generalist, while the image caption is more willing to be a specialist, which outlines the gist. Lots of previous studies have found that as a generalist, a scene graph is not enough to serve for downstream advanced intelligent tasks unless it can reduce the trivial contents and noises. In this respect, the image caption is a good teacher. To this end, we let the scene graph borrow the ability from the image caption.
Scene graph aims to faithfully reveal humans’ perception of image content. When humans analyze a scene, they usually prefer to describe image gist ﬁrst, namely major objects and key relations in a scene graph, which contains essential image content. This humans’ inherent perceptive habit implies that there exists a hierarchical structure about humans’ preference during the scene parsing procedure. Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph.
Relationship is the core of scene graph, but its prediction is far from satisfying because of its complex visual diversity. To alleviate this problem, we treat relationship as an abstract object, exploring not only significative visual pattern but contextual information for it, which are two key aspects when considering object recognition. Our observation on current datasets reveals that there exists intimate association among relationships. Therefore, inspired by the successful application of context to object-oriented tasks, we especially construct context for relationships where all of them are gathered so that the recognition could benefit from their association.
Posters & Publications
Scikit-Learn, Pandas, NumPy, Matplotlib
Common Machine Learning Models
Pytorch, TensorFlow, Git, VS Code, Jupyter