Type of Material: | Thesis |
Title: | Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA |
Researcher: | Manmadhan, Sruthy |
Guide: | Kovoor, Binsu C |
Department: | Department of Information Technology |
Publisher: | Cochin University of Science & Technology, Cochin |
Place: | Cochin |
Year: | 2022 |
Language: | English |
Subject: | Computer Vision | Engineering and Technology | Information Technology | Natural Language Processing (NLP) | Visual Question Answering | Computer Science and Information Technology | Engineering and Technology |
Dissertation/Thesis Note: | PhD; Department of Information Technology, Cochin University of Science & Technology, Cochin, Cochin; 2022 |
Fulltext: | Shodhganga |
000 | 00000ntm a2200000ua 4500 | |
001 | 456404 | |
003 | IN-AhILN | |
005 | 2024-10-09 16:39:39 | |
008 | __ | 241009t2022||||ii#||||g|m||||||||||eng|| |
035 | __ | |a(IN-AhILN)th_456404 |
040 | __ | |aCUST_682022|dIN-AhILN |
041 | __ | |aeng |
100 | __ | |aManmadhan, Sruthy|eResearcher |
110 | __ | |aDepartment of Information Technology|bCochin University of Science & Technology, Cochin|dCochin|ein|0U-0253 |
245 | __ | |aDesign and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA |
260 | __ | |aCochin|bCochin University of Science & Technology, Cochin|c2022 |
300 | __ | |axvi,240|dDVD |
502 | __ | |cDepartment of Information Technology, Cochin University of Science & Technology, Cochin, Cochin|d2022|bPhD |
518 | __ | |d2023|oDate of Award |
518 | __ | |oDate of Registration|d2017 |
520 | __ | |aThis thesis studies a multi-modal AI task called Visual Question Answering (VQA). It covers two different areas of computer science research; Computer Vision (CV) and Natural Language Processing (NLP). Due to its expansive set of applications including assistance to visually impaired people, surveillance data analysis etc., many researchers attracted to this AI-complete task for the last few years. Most of the existing works have given attention to the multi-modal feature fusion phase of VQA ignoring the effect of individual input features. Thus, despite rapid improvements in VQA algorithm efficiency, there is still a substantial gap between the best methods and humans. The proposed research aims to design and develop deep learning models for the AI-complete task of Visual Question Answering with enhanced multi-modal representations and thereby reducing the gap between human and machine intelligence. The proposed research focus on each task in the established three phase pipeline of VQA; image and question |
650 | __ | |aComputer Science and Information Technology|2UGC |
650 | __ | |aEngineering and Technology|2AIU |
653 | __ | |aComputer Vision |
653 | __ | |aEngineering and Technology |
653 | __ | |aInformation Technology |
653 | __ | |aNatural Language Processing (NLP) |
653 | __ | |aVisual Question Answering |
700 | __ | |eGuide|aKovoor, Binsu C |
856 | __ | |uhttp://shodhganga.inflibnet.ac.in/handle/10603/510334|yShodhganga |
905 | __ | |afromsg |
User Feedback Comes Under This section.