Show simple item record

dc.contributor.advisorDas, Gautam
dc.creatorRahman, Md Farhadur
dc.date.accessioned2019-02-06T21:26:03Z
dc.date.available2019-02-06T21:26:03Z
dc.date.created2018-08
dc.date.issued2019-02-06
dc.date.submittedAugust 2018
dc.identifier.urihttp://hdl.handle.net/10106/27663
dc.description.abstractIn recent years we have seen an increase in the popularity of many web applications. The functionality of these applications range from allowing users to interact using online social network, to assist users in their everyday activity such as selecting a hotel in an area, locating a nearby restaurant etc. Google Maps, WeChat, FourSquare, AirBnB, TripAdvisor, and Hotels.com are a few such examples. The backed database of these applications can be a rich source of information for the corresponding application domain. For example, using Google Maps a user can find the ratings, reviews, and price of a restaurant, using Zillow users compare the price distributions of houses in different areas of a city. The public query interfaces of these applications may be abstractly modeled as a kNN interface over the backend database that returns k results that are nearest to the user query, where k is typically a small number such as 10 or 50. Moreover, Access to the underlying database is further restricted by enforcing query rate limitation, i.e., they limit the number of requests from an IP address or API account for a certain time period. Because of these restrictions, it is quite impossible for a third-party to crawl the data from the backend database. In this dissertation, we present efficient techniques for exploratory analysis over web database. Specifically, we propose several algorithms that enable third-party applications to perform the analytics and mining of the backend database by using nothing but the restrictive, public-access, interface of the application. In addition, we also studied the problem of answering the exploratory queries with full access to the underlying dataset. First, we consider a special category of web application know as Location based services (LBS). In general, an LBS provides a public search interface over its backend database, taking as input a 2D query point and returning k tuples in the database that are closest to the query point. In this work, we propose several algorithms that enable approximate estimate of SUM and COUNT aggregates by using the public search interface provided by the LBS. Second, we enable density-based clustering over the backend database of an LBS using nothing but limited access to the kNN interface provided by the LBS. Our goal here is to mine from the LBS a cluster assignment function f(), such that for any tuple t in the database (which may or may not have been accessed), f() can produce the cluster assignment of t with high accuracy. Third, we investigate a novel problem on the privacy implications of database ranking, which has not been studied before. We show how privacy leakage (through the top-k interface) can be caused by a seemingly innocent design of the ranking function in ranked retrieval models. We introduce a comprehensive taxonomy of the problem space. Then, for each subspace of the problem, we develop a novel technique which either guarantees the successful inference of private attributes, or accomplishes the attack for a significant portion of real-world tuples. Finally, we study the problem of subspace skyline discovery over web database where the attributes are mostly categorical. Skyline queries are an effective tool in assisting users in data exploration. In accordance with common practice in traditional database query processing, we design solutions for two important practical instances of this problem, namely: (i) assuming that no indices exist on the underlying dataset, and (ii) assuming that indices exist on each individual attribute of the dataset.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectAlgorithms
dc.subjectExploratory query
dc.subjectLocation based services
dc.subjectLBS
dc.subjectkNN query
dc.subjectTop-k query
dc.subjectHidden database
dc.subjectAggregate estimation
dc.subjectSampling
dc.subjectClustering
dc.subjectGoogle Maps API
dc.subjectDensity based clustering
dc.subjectRanked based inference
dc.subjectPrivacy
dc.subjectPrivate attribute
dc.subjectSubspace skyline
dc.subjectSorted list
dc.subjectCategorical domain
dc.subjectTA algorithm
dc.subjectThreshold algorithm
dc.subjectSorting based algorithm
dc.subjectPartition based algorithm
dc.titleAlgorithms for Exploratory Queries over Web Database
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2019-02-06T21:26:04Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record