I am currently offering 1~2 positions for research projects regarding multi-modal learning toward visual/multimedia understanding. In particular, we will play with the latest large language model and rethink the vision-based tasks including image understanding and video understanding. A 3-point research course will be credited for each semester and each position. The projects will start in the semester of Summer 2023. One Senior Researcher will participate, and we will give suggestions jointly.
Preferred background/experience: PyTorch, Detectron2, experience with LLM, Fundamental understanding of {Optimization, Transformer & CNN, Word Embedding}