X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks

Article Properties
Journal Categories
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Technology
Electrical engineering
Electronics
Nuclear engineering
Electric apparatus and materials
Electric circuits
Electric networks
Technology
Electrical engineering
Electronics
Nuclear engineering
Electronics
Technology
Engineering (General)
Civil engineering (General)
Technology
Mechanical engineering and machinery
Refrences
Title Journal Journal Categories Citations Publication Date
ViLT: Vision-and-language transformer without convolution or region supervision 2021
ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks 2019
Multi-grained vision language pre-training: Aligning texts with visual concepts
OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework
Faster R- CNN: Towards real-time object detection with region proposal networks