Program_foundation model

Second Workshop on Foundation Models

When: June 17, 2024 Where: Seattle USA

Program: To be updated

When: June 17, 2024 Where: Seattle USA Summit 434

Program

Keynote Speakers

Title: Should we be searching over Generative Neural Architectures

Abstract:There has been impressive recent progress in unsupervised learning of Deep Nets and
Neural Architectures. In particular, we now have the ability to learn feature vector representations
which can exploit the enormous amount of unannotated data. But is this enough to overcome the
limitations of Deep Nets compared to the human visual system? Namely their task specific and
domain specific nature and their difficulty at extending to out-of-distribution data and to tasks
they have not been trained to perform? We argue that generative models suggest a promising
mathematical strategy for overcoming these limitations which differs from discriminative methods
like Deep Nets and give examples of good performance on out-of-distribution data and tasks that the
algorithms were not trained on.

Speak Profile:
Alan Yullie is a Bloomberg Distinguished Professor of Cognitive Science and Computer Science at Johns Hopkins University.
He directs the research group on Compositional Cognition, Vision, and Learning. He is affiliated with the Center for Brains,
Minds and Machines, and the NSF Expedition in Computing, Visual Cortex On Silicon. Alan Yuille received the BA degree in
mathematics from the University of Cambridge in 1976. His PhD on theoretical physics, supervised by Prof. S.W. Hawking,
was approved in 1981. He was a research scientist in the Artificial Intelligence Laboratory at MIT and the Division of Applied
Sciences at Harvard University from 1982 to 1988. He served as an assistant and associate professor at Harvard until 1996.
He was a senior research scientist at the Smith-Kettlewell Eye Research Institute from 1996 to 2002. He was a full professor
of Statistics at the University of California, Los Angeles, as a full professor with joint appointments in computer science,
psychiatry, and psychology. He moved to Johns Hopkins University in January 2016. His research interests include computational
models of vision, mathematical models of cognition, medical image analysis, and artificial intelligence and neural networks.

Title: Neural Architecture Search

Abstract: Deep neural networks have achieved extraordinary success in recent years. However, finding
appropriate network architectures still involves extensive human efforts and experience. As an alternative,
NAS was recently proposed to automatically discover suitable networks by searching over a vast architecture
space. It has rapidly become a research hotspot and achieved cutting-edge performance in various computer
vision tasks, ranging from image classification, segmentation to detection. Aiming at the effectiveness and
efficiency of neural architecture search, this talk briefly introduces the existing NAS methods and covers some
of the recent work and achievements of Professor Rongrong Ji’s research group.

Speak Profile:
Rongrong Ji is a distinguished professor at Xiamen University, a recipient of the National Natural Science Fund for
Distinguished Young Scholars. His research falls in the field of computer vision, multimedia analysis, and machine learning.
He has published 100+ papers in ACM/IEEE Transactions, including TPAMI and IJCV, as well as top-tier international conferences,
such as CVPR and NeurIPS. His publications have got over 10K citations in Google Scholar. He was the recipient of the first prize of
technology invention of the ministry of education in 2016, the first prize of the Fujian provincial science and technology award in
2018, science and technology award for youth of Fujian province in 2019. He has served as the area chair of top-tier international
conferences such as IEEE CVPR and ACM Multimedia. He is also the Vice Director of Academic Working Committee of Chinese
Society of Image and Graphics, and a member of the Artificial Intelligence Professional Construction Advisory Committee of the
Electronic Information Education Commission of the Ministry of Education.

Title: Dynamic Neural Networks

Abstract: In recent years, network architecture innovations are pushing forward the application of deep
learning in various areas. This talk will introduce the paradigm that improves the inference efficiency of deep
networks with dynamic architectures. Compared to the mainstream CNN backbones with static components,
dynamic models can change its depth/width/parameters at the inference stage, conditioned on each input
sample, thus leading to substantially improved computational efficiency. The advantages of dynamic models
and possible future directions will be discussed.

Speak Profile:
Gao Huang is an Assistant Professor in the Department of Automation at Tsinghua University. Previously, he was apostdoc
researcher in the Department of Computer Science at Cornell University from 2015 to 2018. His researchinterests lie in
machine learning and computer vision, especially deep learning. He has authored about 50 papers ontop-tier journals and
conferences (PAMI, CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, etc.), which collect more than 16,000 citations. He is a recipient
of the CVPR Best Paper Award, for the invention of DenseNet.

Title: Deep (Convolution) Networks from First Principles

Abstract: In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the
perspective of data compression (and group invariance). In particular, we show how modern deep layered
architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived
from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters
of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All
components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical
interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental t
radeoff between invariance and sparsity for class separability; it reveals a fundamental connection between
deep networks and Fourier transform for group invariance – the computational advantage in the spectral
domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation
(optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to
fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective.
This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google
Research, and Professor John Wright of Columbia University.

Speak Profile:
Yi Ma received the first prize of Excellent Student Scholarship from Tsinghua University in 1994 and the Regents Fellowship from
U.C. Berkeley from 1995 to 1996. His PhD research won the David Marr Best Paper Award with S. Soatto, J. Kosecka, and S. Sastry,
at the International Conference on Computer Vision (ICCV) in 1999. He also received honorable mention for the Longuet-Higgins
Best Paper Award with R. Vidal at the European Conference on Computer Vision (ECCV) in 2004, the Sang Uk Lee Best Student
Paper Award with his students Shankar Rao, Hossein Mobahi, and Allen Yang at the Asian Conference on Computer Vision (ACCV)
in 2009, and the second prize of the Best Paper Award of the IMA Journal on Information and Inference in 2015. Yi Ma was the
recipient of the Faculty Early Career Development (CAREER) Award from the National Science Foundation(NSF) in 2003. He was
also the recipient of the Young Investigator Program (YIP) Award from the Office of Naval Research (ONR) in 2005. He received
the Gold Star Award from Microsoft Corporate in 2009 and the Best Research Team of the Year Award from Microsoft Research
Asia in 2012. He has given over two dozens of Plenary Talks at international conferences and workshops. He was on the Incomplete
List of Teachers Ranked as Excellent of the University of Illinois for Spring'01, Fall'02, and Spring'06. Yi Ma is an IEEE Fellow since
2013, an ACM Fellow since 2017, and a SIAM Fellow since 2020. He is ranked the World's Highly Cited Researchers since 2016 by
Clarivate Analytics of Thomson Reuters and ranked Top 50 of the World's Most Influential Authors in Computer Science by
Semantic Scholar, reported by Science Magazine in April 2016.

Title: Learning 3D environment representations through intelligent anticipation

Abstract: Embodied agents operating in unfamiliar indoor environments must explore efficiently and build
useful representations of the environment. We introduce the idea of predicting unobserved content in 3D
spaces to (1) learn agents that build maps efficiently and (2) learn transferrable representations that benefit
several downstream navigational tasks. First, we propose the idea of occupancy anticipation, where we infer
spatial occupancy for unobserved regions to map unfamiliar environments rapidly. For example, we can predict
whether there is a corridor outside the room and whether there is floor space behind the bed. Embodied
agents equipped with the ability to anticipate occupancy build 30% more accurate maps when compared to
prior work, and navigate more efficiently in the challenging Gibson and Matterport3D datasets. Our approach
won the 2020 Habitat PointNav Challenge. Next, we propose the self-supervised approach of environment
predictive coding to learn effective representations of observation sequences gathered by an embodied
agent. We learn these representations on video walkthroughs generated by other agents, and transfer the
representations to various geometric and semantic navigation tasks. Our approach improves the learning
efficiency of embodied agents by a 2-4x compared to methods that only learn image-level representations,
and leads to better navigation performance.

Speak Profile:
Kristen Grauman is a Full Professor in the Department of Computer Science at the University of Texas at Austin where she
leads the UT Computer Vision Group. Her research is in computer vision and machine learning. She is a Fellow of AAAI, an
Alfred P. Sloan Research Fellow, and a recipient of the Presidential Early Career Award for Scientists and Engineers, the 2013
Computers and Thought Award, and several best paper awards. Prof. Grauman serves as Associate Editor-in-Chief for the
IEEE Transactions on Pattern Analysis and Machine Intelligence. She was elected to the Academy of Distinguished Teachers
in 2017, and received her B.A. from Boston College and her Ph.D. from MIT. Within computer vision and machine learning,
Prof. Grauman's primary interests are visual recognition, image and video search, video analysis, first-person vision, embodied
and multi-modal perception, and interactive machine learning.

Title: Automated Architecture and Training Recipie Search of Computationally Efficient Deep Neural Networks

Speak Profile:
Peter Vajda is Research Scientist Manager at Facebook. He is leading the effort for Mobile Vision team on Efficient Deep
learning for Computer Vision. From 2012-2014, he was a Visiting Assistant Professor in Professor Bernd Girod's group in
Stanford University, Stanford, USA. From September 2007 to January 2012, he was research assistant andPh.D. student in
Professor Ebrahimi's group in Ecole Polytechnique Fédéral de Lausanne (EPFL), Lausanne,Switzerland.

Title: Examining Deep Neural Architectures in Practice

Abstract: The past decade witnesses the significant progress in deep learning. Probably as the most tricky
hyper parameter in deep learning, the neural architecture becomes the key to accelerate deep learning
computation. In this talk, I will introduce our recent works on neural architecture search. Beyond accuracy,
adversarial robustness is another important factor to be investigated. Further, instead of the toy setting for an
efficient neural architecture, we need to keep the target of the searched neural architecture in mind. An
extension of graph neural architecture search is also included to broaden the boundary of the study on neural
architecture search.

Speak Profile:
Chang Xu is Senior Lecturer and ARC DECRA Fellow at the School of Computer Science, University of Sydney. He received the
Ph.D. degree from Peking University, China. His research interests lie in machine learning algorithms and related applications in
computer vision. He has published over 100 papers in prestigious journals and top tier conferences. He has received several paper
awards, including Distinguished Paper Award in IJCAI 2018. He regularly severed as the PC member or senior PC member for many
conferences, e.g. NeurIPS, ICML, ICLR, CVPR, ICCV, IJCAI and AAAI. He has been recognized as Top Ten Distinguished Senior PC
Member in IJCAI 2017.