Title : Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA

Bibliographic Details
Marc view

Type of Material:	Thesis
Title:	Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
Researcher:	Manmadhan, Sruthy
Guide:	Kovoor, Binsu C
Department:	Department of Information Technology
Publisher:	Cochin University of Science & Technology, Cochin
Place:	Cochin
Year:	2022
Language:	English
Subject:	Computer Vision
	Engineering and Technology
	Information Technology
	Natural Language Processing (NLP)
	Visual Question Answering
	Computer Science and Information Technology
	Engineering and Technology
Dissertation/Thesis Note:	PhD; Department of Information Technology, Cochin University of Science & Technology, Cochin, Cochin; 2022
Fulltext:	Shodhganga

000 00000ntm a2200000ua 4500
001 456404
003 IN-AhILN
005 2024-10-09 16:39:39
008 __ 241009t2022||||ii#||||g|m||||||||||eng||
035 __ |a(IN-AhILN)th_456404
040 __ |aCUST_682022|dIN-AhILN
041 __ |aeng
100 __ |aManmadhan, Sruthy|eResearcher
110 __ |aDepartment of Information Technology|bCochin University of Science & Technology, Cochin|dCochin|ein|0U-0253
245 __ |aDesign and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
260 __ |aCochin|bCochin University of Science & Technology, Cochin|c2022
300 __ |axvi,240|dDVD
502 __ |cDepartment of Information Technology, Cochin University of Science & Technology, Cochin, Cochin|d2022|bPhD
518 __ |d2023|oDate of Award
518 __ |oDate of Registration|d2017
520 __ |aThis thesis studies a multi-modal AI task called Visual Question Answering (VQA). It covers two different areas of computer science research; Computer Vision (CV) and Natural Language Processing (NLP). Due to its expansive set of applications including assistance to visually impaired people, surveillance data analysis etc., many researchers attracted to this AI-complete task for the last few years. Most of the existing works have given attention to the multi-modal feature fusion phase of VQA ignoring the effect of individual input features. Thus, despite rapid improvements in VQA algorithm efficiency, there is still a substantial gap between the best methods and humans. The proposed research aims to design and develop deep learning models for the AI-complete task of Visual Question Answering with enhanced multi-modal representations and thereby reducing the gap between human and machine intelligence. The proposed research focus on each task in the established three phase pipeline of VQA; image and question
650 __ |aComputer Science and Information Technology|2UGC
650 __ |aEngineering and Technology|2AIU
653 __ |aComputer Vision
653 __ |aEngineering and Technology
653 __ |aInformation Technology
653 __ |aNatural Language Processing (NLP)
653 __ |aVisual Question Answering
700 __ |eGuide|aKovoor, Binsu C
856 __ |uhttp://shodhganga.inflibnet.ac.in/handle/10603/510334|yShodhganga
905 __ |afromsg

User Feedback Comes Under This section.

000	00000ntm a2200000ua 4500
001	456404
003	IN-AhILN
005	2024-10-09 16:39:39
008	__	241009t2022\|\|\|\|ii#\|\|\|\|g\|m\|\|\|\|\|\|\|\|\|\|eng\|\|
035	__	\|a(IN-AhILN)th_456404
040	__	\|aCUST_682022\|dIN-AhILN
041	__	\|aeng
100	__	\|aManmadhan, Sruthy\|eResearcher
110	__	\|aDepartment of Information Technology\|bCochin University of Science & Technology, Cochin\|dCochin\|ein\|0U-0253
245	__	\|aDesign and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
260	__	\|aCochin\|bCochin University of Science & Technology, Cochin\|c2022
300	__	\|axvi,240\|dDVD
502	__	\|cDepartment of Information Technology, Cochin University of Science & Technology, Cochin, Cochin\|d2022\|bPhD
518	__	\|d2023\|oDate of Award
518	__	\|oDate of Registration\|d2017
520	__	\|aThis thesis studies a multi-modal AI task called Visual Question Answering (VQA). It covers two different areas of computer science research; Computer Vision (CV) and Natural Language Processing (NLP). Due to its expansive set of applications including assistance to visually impaired people, surveillance data analysis etc., many researchers attracted to this AI-complete task for the last few years. Most of the existing works have given attention to the multi-modal feature fusion phase of VQA ignoring the effect of individual input features. Thus, despite rapid improvements in VQA algorithm efficiency, there is still a substantial gap between the best methods and humans. The proposed research aims to design and develop deep learning models for the AI-complete task of Visual Question Answering with enhanced multi-modal representations and thereby reducing the gap between human and machine intelligence. The proposed research focus on each task in the established three phase pipeline of VQA; image and question
650	__	\|aComputer Science and Information Technology\|2UGC
650	__	\|aEngineering and Technology\|2AIU
653	__	\|aComputer Vision
653	__	\|aEngineering and Technology
653	__	\|aInformation Technology
653	__	\|aNatural Language Processing (NLP)
653	__	\|aVisual Question Answering
700	__	\|eGuide\|aKovoor, Binsu C
856	__	\|uhttp://shodhganga.inflibnet.ac.in/handle/10603/510334\|yShodhganga
905	__	\|afromsg