EVALUATION OF A FACTUAL CLAIM CLASSIFIER WITH AND WITHOUT USING ENTITIES AS FEATURES

Syed, Abu Ayub Ansari

dc.contributor.advisor	Li, Chengkai
dc.creator	Syed, Abu Ayub Ansari
dc.date.accessioned	2017-10-02T14:47:32Z
dc.date.available	2017-10-02T14:47:32Z
dc.date.created	2017-08
dc.date.issued	2017-08-14
dc.date.submitted	August 2017
dc.identifier.uri	http://hdl.handle.net/10106/26988
dc.description.abstract	Fact-checking in real-time for events such as presidential debates is a challenging task. These fact-checking processes have a difficult and rigorous task in having the best accuracy in classifying facts, finding topics, etc. The first and foremost task in fact-checking is to find out whether a sentence is factually check-worthy. The UTA IDIR Lab has deployed an automated fact-checking system named ClaimBuster. ClaimBuster has a core functionality of identifying check-worthy factual sentences. Named entities are essentially an important component of any textual data. To use these named entities, it is required to link them to labels such as a person, location, and organization. If we want the automated systems to read and understand the natural language like we do, the system must recognize the named entities that are mentioned in the text. The ClaimBuster Project, in classifying the sentences of the presidential debates has categorized the sentences into three types, namely check-worthy factual sentences (CFS), non-factual sentences (NFS) and unimportant factual sentences (UFS). This categorization helps us in making the supervised classification problem as a three-class problem (or a two-class problem, by merging NFS and UFS). ClaimBuster, in the process of identifying check-worthy factual claims, has employed named entities as a feature along with sentiment, length, words (W) and part-of-speech(POS) tags in the classification models. In this work, I have evaluated the classification algorithms such as Naïve Bayes Classifier (NBC), Support Vector Machine (SVM) and Random Forrest Classifier (RFC). The evaluation mainly constitutes the comparison of the performance of these classifiers with and without using named entities as a feature. We have also analyzed the mistakes that the classifiers have made by comparing two sets of features at a time. Therefore, the analysis consists of 18 experiments constituting three classifiers, two classification types and three sets of feature comparison. We see that the presence of named entities contributes very little to the classifier, but also that their presence is subdued by the presence of better performing features such as the part-of-speech (POS) tags.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Named entity
dc.subject	Features
dc.subject	Classifiers
dc.title	EVALUATION OF A FACTUAL CLAIM CLASSIFIER WITH AND WITHOUT USING ENTITIES AS FEATURES
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Master of Science in Computer Science
dc.date.updated	2017-10-02T14:48:36Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Masters
thesis.degree.name	Master of Science in Computer Science
dc.type.material	text

Files in this item

Name:: SYED-THESIS-2017.pdf
Size:: 1.709Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record