Show simple item record

dc.contributor.advisorHuang, Junzhou
dc.creatorYan, Chaochao
dc.date.accessioned2022-09-15T14:10:23Z
dc.date.available2022-09-15T14:10:23Z
dc.date.created2022-08
dc.date.issued2022-08-16
dc.date.submittedAugust 2022
dc.identifier.urihttp://hdl.handle.net/10106/30989
dc.description.abstractDrug discovery is the process of discovering new candidate medications. New drugs are continually developed by pharmaceutical industries to address increasing medical needs. Drug discovery involves a series of processes including target identification and validation, hit identification, lead generation and optimization, and finally the identification of a candidate for further development. The development further includes optimization of chemical synthesis and its formulation, toxicological studies in animals, clinical trials, and eventually regulatory approval. Both of these processes are time-consuming and cost-expensive. Computer-aided drug discovery mainly relies on modern computers to model drug molecules, which can speed up the process of drug discovery and reduce costs. In this dissertation, we will investigate two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Since molecules can be represented as either sequences or graphs, therefore different machine learning models (sequence models and graph neural networks) can be adapted for molecular modelling. As the rapid development of machine learning, there are abundant research works try to apply machine learning models on drug discovery. However, these methods are not efficient and effective enough for real-world applications. We propose to improve the efficiency of modern machine learning models for the drug discovery applications. We will explore two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Particularly, we propose new techniques to improve the current sequence models for the molecule generation and graph models for the retrosynthesis prediction, respectively. Extensive experiments prove the efficiency and effectiveness of our methods. We will first investigate variational autoencoder models for molecule sequence generation. We propose a simple and effective solution to the posterior collapse problem of variational autoencoder models. Then we will study retrosynthesis prediction, and we propose both template-free and template-based methods to overcome the disadvantages of existing methods.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectGraph neural networks
dc.subjectSequence models
dc.subjectMolecule generation
dc.subjectRetrosynthesis prediction
dc.titleEffective Sequence Models and Graph Neural Networks for Molecular Data Analysis
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2022-09-15T14:10:23Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record