Zhao, X., Xu, M., Silamu, W., & Li, Y. (2024). CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model. Sensors (Basel, Switzerland), 24(22), 7371. https://doi.org/10.3390/s24227371
Chicago Style (17th ed.) CitationZhao, Xiaoqing, Miaomiao Xu, Wushour Silamu, and Yanbing Li. "CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model." Sensors (Basel, Switzerland) 24, no. 22 (2024): 7371. https://doi.org/10.3390/s24227371.
MLA (9th ed.) CitationZhao, Xiaoqing, et al. "CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model." Sensors (Basel, Switzerland), vol. 24, no. 22, 2024, p. 7371, https://doi.org/10.3390/s24227371.