Effective Sequence Models and Graph Neural Networks for Molecular Data Analysis

Yan, Chaochao

dc.contributor.advisor	Huang, Junzhou
dc.creator	Yan, Chaochao
dc.date.accessioned	2022-09-15T14:10:23Z
dc.date.available	2022-09-15T14:10:23Z
dc.date.created	2022-08
dc.date.issued	2022-08-16
dc.date.submitted	August 2022
dc.identifier.uri	http://hdl.handle.net/10106/30989
dc.description.abstract	Drug discovery is the process of discovering new candidate medications. New drugs are continually developed by pharmaceutical industries to address increasing medical needs. Drug discovery involves a series of processes including target identification and validation, hit identification, lead generation and optimization, and finally the identification of a candidate for further development. The development further includes optimization of chemical synthesis and its formulation, toxicological studies in animals, clinical trials, and eventually regulatory approval. Both of these processes are time-consuming and cost-expensive. Computer-aided drug discovery mainly relies on modern computers to model drug molecules, which can speed up the process of drug discovery and reduce costs. In this dissertation, we will investigate two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Since molecules can be represented as either sequences or graphs, therefore different machine learning models (sequence models and graph neural networks) can be adapted for molecular modelling. As the rapid development of machine learning, there are abundant research works try to apply machine learning models on drug discovery. However, these methods are not efficient and effective enough for real-world applications. We propose to improve the efficiency of modern machine learning models for the drug discovery applications. We will explore two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Particularly, we propose new techniques to improve the current sequence models for the molecule generation and graph models for the retrosynthesis prediction, respectively. Extensive experiments prove the efficiency and effectiveness of our methods. We will first investigate variational autoencoder models for molecule sequence generation. We propose a simple and effective solution to the posterior collapse problem of variational autoencoder models. Then we will study retrosynthesis prediction, and we propose both template-free and template-based methods to overcome the disadvantages of existing methods.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Graph neural networks
dc.subject	Sequence models
dc.subject	Molecule generation
dc.subject	Retrosynthesis prediction
dc.title	Effective Sequence Models and Graph Neural Networks for Molecular Data Analysis
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2022-09-15T14:10:23Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: YAN-DISSERTATION-2022.pdf
Size:: 3.171Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record