PointHop: An Explainable Machine Learning Method for Point Cloud Classification
Zhang, Min, You, Haoxuan, Kadam, Pranav, Liu, Shan, C -C Jay Kuo
Published in arXiv.org (16.12.2019)
Published in arXiv.org (16.12.2019)
Get full text
Paper
Journal Article
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Wang, Zhecan, You, Haoxuan, He, Yicheng, Li, Wenhao, Kai-Wei, Chang, Shih-Fu, Chang
Published in arXiv.org (23.10.2023)
Get full text
Published in arXiv.org (23.10.2023)
Paper
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
You, Haoxuan, Guo, Mandy, Wang, Zhecan, Kai-Wei, Chang, Baldridge, Jason, Yu, Jiahui
Published in arXiv.org (23.03.2023)
Get full text
Published in arXiv.org (23.03.2023)
Paper
MM-Ego: Towards Building Egocentric Multimodal LLMs
Ye, Hanrong, Zhang, Haotian, Daxberger, Erik, Chen, Lin, Lin, Zongyu, Li, Yanghao, Bowen, Zhang, You, Haoxuan, Xu, Dan, Gan, Zhe, Lu, Jiasen, Yang, Yinfei
Published in arXiv.org (09.10.2024)
Get full text
Published in arXiv.org (09.10.2024)
Paper
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
You, Haoxuan, Sun, Rui, Wang, Zhecan, Kai-Wei, Chang, Shih-Fu, Chang
Published in arXiv.org (14.12.2022)
Get full text
Published in arXiv.org (14.12.2022)
Paper
Ferret: Refer and Ground Anything Anywhere at Any Granularity
You, Haoxuan, Zhang, Haotian, Gan, Zhe, Du, Xianzhi, Bowen, Zhang, Wang, Zirui, Cao, Liangliang, Shih-Fu, Chang, Yang, Yinfei
Published in arXiv.org (11.10.2023)
Get full text
Published in arXiv.org (11.10.2023)
Paper
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Zhang, Haotian, You, Haoxuan, Dufter, Philipp, Bowen, Zhang, Chen, Chen, Hong-You, Chen, Fu, Tsu-Jui, William Yang Wang, Shih-Fu, Chang, Gan, Zhe, Yang, Yinfei
Published in arXiv.org (11.04.2024)
Get full text
Published in arXiv.org (11.04.2024)
Paper
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Wang, Zhecan, Liu, Junzhang, Chia-Wei, Tang, Alomari, Hani, Sivakumar, Anushka, Sun, Rui, Li, Wenhao, Atabuzzaman, Md, Hammad Ayyubi, You, Haoxuan, Alvi Ishmam, Kai-Wei, Chang, Shih-Fu, Chang, Thomas, Chris
Published in arXiv.org (25.09.2024)
Get full text
Published in arXiv.org (25.09.2024)
Paper
Graph-MLP: Node Classification without Message Passing in Graph
Hu, Yang, You, Haoxuan, Wang, Zhecan, Wang, Zhicheng, Zhou, Erjin, Gao, Yue
Published in arXiv.org (08.06.2021)
Get full text
Published in arXiv.org (08.06.2021)
Paper
Multi-modality Latent Interaction Network for Visual Question Answering
Gao, Peng, You, Haoxuan, Zhang, Zhanpeng, Wang, Xiaogang, Li, Hongsheng
Published in arXiv.org (10.08.2019)
Get full text
Published in arXiv.org (10.08.2019)
Paper
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Wang, Zhecan, Codella, Noel, Yen-Chun, Chen, Zhou, Luowei, Yang, Jianwei, Dai, Xiyang, Xiao, Bin, You, Haoxuan, Shih-Fu, Chang, Lu, Yuan
Published in arXiv.org (28.12.2022)
Get full text
Published in arXiv.org (28.12.2022)
Paper