When: June 20, 2023 Where: Vancouver Canada
Foundation model has attracted great interest from both the academia and the industry. By its early definition, the foundation model is a large artificial intelligence model trained on a vast quantity of unlabeled data at scale and can be adapted to a wide range of downstream tasks. Recent realistic applications further encourage using both the labeled and unlabeled data, therefore generalizing the concept of foundation model. This evolution is natural because besides the unlabeled data, many labeled datasets (from public or private resources) are large-scale and can bring substantial benefit to downstream tasks as well. In this workshop, we advocate the generalized foundation model with two considerations: 1) due to the combination of labeled and unlabeled data, it enlarges the potential benefit of large-scale pretraining, and 2) it is more flexible and efficient for downstream task adaptation. For example, a recent foundation model UFO trained with labeled datasets can be trimmed into a specific model for the already-seen sub-task without any adaptation cost.
Thus, in this workshop, we are interested in training the foundation model on both labeled + unlabeled data and evaluating its capacity for both seen + unseen downstream tasks. To this end, we will set up a two-phase competition consisting of seen and unseen down-stream tasks. The first phase evaluates the deep representation ability of foundation model on various seen tasks e.g., face recognition, person / vehicle re-identification and fine-grained image retrieval. It demands involving multiple supervised deep representation learning tasks (besides the popular unsupervised pretraining tasks) for training the foundation model. Evaluating the already-learned foundation model on these employed deep representation tasks reflects the multi-task learning effect, which is critical for large- scale and multi-source learning. The second phase evaluates the foundation model on various unseen downstream tasks including detection (MS COCO), segmentation (ADE20K) and Visual Task Adaptation Benchmark (VTAB). We note that each competitor is requested to provide a single foundation model for the two phases. Therefore, the foundation model should be competitive for both 1) the direct deployment on the already-seen task and 2) the adaptation to novel downstream tasks. These two capacities are of great value in realistic applications and are essential for the generalized foundation model.
This workshop particularly encourages exploring and investigating the mechanism of multi-task learning. Multi-task learning effect is critical for the generalized foundation model, because it is trained with large-scale data from multiple sources/tasks. We consider a paper with insight on training strategy as being valuable, even if its result in the competition is low. Kendall et al. Some approach proposes a principled approach to tune the weights of multiple loss functions by considering the homoscedastic uncertainty of each task. Dynamic Task Prioritization automatically prioritizes more difficult tasks by adaptively adjusting the mixing weight of each task’s loss objective. Other works adopt gradient-based methods to combat the challenge. GradNorm automatically balances the training of different task losses in deep multi-task models by dynamically tuning their gradient magnitudes. Sener et al. Some approach explicitly cast multi-task learning as gradient-based multi-objective optimization, with the overall objective of finding a Pareto optimal solution to minimizing all task losses. Based on the observation that models with lower variance in the angles between task gradients perform better, Suteu et al. Some approach propose a novel gradient regularization of enforcing nearly orthogonal gradients. To avoid the interference of gradients from different losses, PCGrad projects a task’s gradient onto the normal plane of the gradient of other tasks that have a conflicting gradient. All these literature give inspiration to the development of the foundation model.
To sum up, this workshop advocates the development of foundation model based on multi-source (e.g., labeled + unlabeled, different tasks) data. We will organize a competition to evaluate the foundation model on both seen and unseen downstream tasks and under the conditions of limited and unlimited model size. We also encourage exploration and discussions on the multi-task learning mechanism.