Applications of deep learning in large-scale object detection and semantic segmentation

Xiang, Wei

dc.contributor.advisor	Athitsos, Vassilis
dc.creator	Xiang, Wei
dc.date.accessioned	2019-02-27T19:04:57Z
dc.date.available	2019-02-27T19:04:57Z
dc.date.created	2018-05
dc.date.issued	2018-08-13
dc.date.submitted	May 2018
dc.identifier.uri	http://hdl.handle.net/10106/27836
dc.description.abstract	With the massive storage of multimedia data and increasing computational power of mobile devices, developing scalable computer vision applications has become the primary motivation for both research and industrial community. Among these applications, object detection and semantic segmentation are two of the most popular topics which, in addition, serve as the fundamental features for many computer vision systems under platforms like mobile, healthcare, autonomous driving, etc. Inspired by the current and foreseeable trend, this thesis focuses on developing both effective and efficient object detection and semantic segmentation models, with the large-scale, publicly available data sets sourced for various applications. In the last several years, object detection and semantic segmentation have received large attention in the literature, and have been significantly advanced with the emergence of deep learning methods. Particularly, by applying Convolutional Neural Networks (CNNs), researchers have leveraged unsupervised features in modeling which greatly simplified the tasks of classification and regression, compared to using merely hand-crafted features in those traditional approaches. In object detection, however, there still exist many open research problems like integrating contextual information to the existing models, the missing relationship between proposal scales and receptive field sizes for different CNNs, etc. In this thesis, we study extensively such relationship, and further demonstrate that our statistical results can be used as a guideline to design both heuristically and efficiently new detection models, with an improvement of detection accuracy particularly for small objects. In semantic segmentation, we investigate many of the state-of-the-art methods and figure out that current research have largely focused on using complicated backbones together with some popular meta-architectures and designs which, in turn, leads to the problem of overfitting and incapability for real-time tasks. To overcome this issue, we propose Turbo Unified Network (ThunderNet), which builds on a minimum backbone followed by a pyramid pooling module and a customized, two-level lightweight decoder. Our experimental results show that ThunderNet remains one of the fastest models that are currently available, while achieving comparable accuracy to a majority of methods in the literature. We also test ThunderNet with a GPU-powered embedded platform--NVIDIA Jetson TX2, whose results indicate that ThunderNet performs sufficiently fast and accurate, thus meeting the demands for embedded system. Finally, this thesis also surveys on the joint calibration methods for RGB-D sensor. We summarize the related work and present our quantitative evaluation results thereafter.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Computer vision
dc.subject	Object detection
dc.subject	Semantic segmentation
dc.subject	RGB-D calibration
dc.title	Applications of deep learning in large-scale object detection and semantic segmentation
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2019-02-27T19:04:57Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: XIANG-DISSERTATION-2018.pdf
Size:: 14.91Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record