A paper by MCDS grad Siddha Ganju was recently accepted for publication by the Conference on Computer Vision and Pattern Recognition (CVPR 2017), one of the most prestigious annual conferences in the field of computer vision.
The paper, titled “What’s in a Question: Using Visual Questions as a Form of Supervision,” falls in the category of Visual Question Answering (VQA), a subdomain of Natural Language Processing and Computer Vision that involves developing specialized Question Answering systems to extract information from images based on questions presented in natural human language. Due to the need to use both NLP and CV techniques, VQA is considered to be a “Hard Task” in the field of Artificial Intelligence.
Ganju’s paper specifically is concerned with extracting useful information from the question itself, as well as fine-tuning vision models. Through these techniques, Ganju and her team were able to achieve a 7.1% improvement over the standard VQA benchmark.
Ganju credited MCDS Director and Professor Eric Nyberg’s Intelligent Information Systems Course with initially drawing her attention to the dearth of research that’s been done on the intersection of language and images. She also emphasized the support of her faculty co-authors, Professor Olga Russakovsky and Professor Abhinav Gupta, both of whom are on the faculty of the Robotics Institute. “I’d like to sincerely thank Olga and Abhinav for nurturing me to achieve something that seemed impossible at first,” Ganju said.
Though she acknowledged that the work was challenging (“I greatly underestimated the amount of work that goes into research,” she said), Ganju also encouraged future students not to be discouraged.
“Now that my publication is part of this bunch, I can empathize with a neophyte who is now in my shoes,” she said.
Billed as “the premier annual computer vision event,” CVPR has been held annually since 1985. This year’s conference will be held July 21-26 in Honolulu, HI.
Ganju’s paper can be viewed here.