Title : Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA

Type of Material: Thesis
Title: Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
Researcher: Manmadhan, Sruthy
Guide: Kovoor, Binsu C
Department: Department of Information Technology
Publisher: Cochin University of Science & Technology, Cochin
Place: Cochin
Year: 2022
Language: English
Subject: Computer Vision
Engineering and Technology
Information Technology
Natural Language Processing (NLP)
Visual Question Answering
Computer Science and Information Technology
Engineering and Technology
Dissertation/Thesis Note: PhD; Department of Information Technology, Cochin University of Science & Technology, Cochin, Cochin; 2022
Fulltext: Shodhganga

00000000ntm a2200000ua 4500
001456404
003IN-AhILN
0052024-10-09 16:39:39
008__241009t2022||||ii#||||g|m||||||||||eng||
035__|a(IN-AhILN)th_456404
040__|aCUST_682022|dIN-AhILN
041__|aeng
100__|aManmadhan, Sruthy|eResearcher
110__|aDepartment of Information Technology|bCochin University of Science & Technology, Cochin|dCochin|ein|0U-0253
245__|aDesign and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
260__|aCochin|bCochin University of Science & Technology, Cochin|c2022
300__|axvi,240|dDVD
502__|cDepartment of Information Technology, Cochin University of Science & Technology, Cochin, Cochin|d2022|bPhD
518__|d2023|oDate of Award
518__|oDate of Registration|d2017
520__|aThis thesis studies a multi-modal AI task called Visual Question Answering (VQA). It covers two different areas of computer science research; Computer Vision (CV) and Natural Language Processing (NLP). Due to its expansive set of applications including assistance to visually impaired people, surveillance data analysis etc., many researchers attracted to this AI-complete task for the last few years. Most of the existing works have given attention to the multi-modal feature fusion phase of VQA ignoring the effect of individual input features. Thus, despite rapid improvements in VQA algorithm efficiency, there is still a substantial gap between the best methods and humans. The proposed research aims to design and develop deep learning models for the AI-complete task of Visual Question Answering with enhanced multi-modal representations and thereby reducing the gap between human and machine intelligence. The proposed research focus on each task in the established three phase pipeline of VQA; image and question
650__|aComputer Science and Information Technology|2UGC
650__|aEngineering and Technology|2AIU
653__|aComputer Vision
653__|aEngineering and Technology
653__|aInformation Technology
653__|aNatural Language Processing (NLP)
653__|aVisual Question Answering
700__|eGuide|aKovoor, Binsu C
856__|uhttp://shodhganga.inflibnet.ac.in/handle/10603/510334|yShodhganga
905__|afromsg

User Feedback Comes Under This section.