Algorithms for Exploratory Queries over Web Database

Rahman, Md Farhadur

dc.contributor.advisor	Das, Gautam
dc.creator	Rahman, Md Farhadur
dc.date.accessioned	2019-02-06T21:26:03Z
dc.date.available	2019-02-06T21:26:03Z
dc.date.created	2018-08
dc.date.issued	2019-02-06
dc.date.submitted	August 2018
dc.identifier.uri	http://hdl.handle.net/10106/27663
dc.description.abstract	In recent years we have seen an increase in the popularity of many web applications. The functionality of these applications range from allowing users to interact using online social network, to assist users in their everyday activity such as selecting a hotel in an area, locating a nearby restaurant etc. Google Maps, WeChat, FourSquare, AirBnB, TripAdvisor, and Hotels.com are a few such examples. The backed database of these applications can be a rich source of information for the corresponding application domain. For example, using Google Maps a user can find the ratings, reviews, and price of a restaurant, using Zillow users compare the price distributions of houses in different areas of a city. The public query interfaces of these applications may be abstractly modeled as a kNN interface over the backend database that returns k results that are nearest to the user query, where k is typically a small number such as 10 or 50. Moreover, Access to the underlying database is further restricted by enforcing query rate limitation, i.e., they limit the number of requests from an IP address or API account for a certain time period. Because of these restrictions, it is quite impossible for a third-party to crawl the data from the backend database. In this dissertation, we present efficient techniques for exploratory analysis over web database. Specifically, we propose several algorithms that enable third-party applications to perform the analytics and mining of the backend database by using nothing but the restrictive, public-access, interface of the application. In addition, we also studied the problem of answering the exploratory queries with full access to the underlying dataset. First, we consider a special category of web application know as Location based services (LBS). In general, an LBS provides a public search interface over its backend database, taking as input a 2D query point and returning k tuples in the database that are closest to the query point. In this work, we propose several algorithms that enable approximate estimate of SUM and COUNT aggregates by using the public search interface provided by the LBS. Second, we enable density-based clustering over the backend database of an LBS using nothing but limited access to the kNN interface provided by the LBS. Our goal here is to mine from the LBS a cluster assignment function f(), such that for any tuple t in the database (which may or may not have been accessed), f() can produce the cluster assignment of t with high accuracy. Third, we investigate a novel problem on the privacy implications of database ranking, which has not been studied before. We show how privacy leakage (through the top-k interface) can be caused by a seemingly innocent design of the ranking function in ranked retrieval models. We introduce a comprehensive taxonomy of the problem space. Then, for each subspace of the problem, we develop a novel technique which either guarantees the successful inference of private attributes, or accomplishes the attack for a significant portion of real-world tuples. Finally, we study the problem of subspace skyline discovery over web database where the attributes are mostly categorical. Skyline queries are an effective tool in assisting users in data exploration. In accordance with common practice in traditional database query processing, we design solutions for two important practical instances of this problem, namely: (i) assuming that no indices exist on the underlying dataset, and (ii) assuming that indices exist on each individual attribute of the dataset.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Algorithms
dc.subject	Exploratory query
dc.subject	Location based services
dc.subject	LBS
dc.subject	kNN query
dc.subject	Top-k query
dc.subject	Hidden database
dc.subject	Aggregate estimation
dc.subject	Sampling
dc.subject	Clustering
dc.subject	Google Maps API
dc.subject	Density based clustering
dc.subject	Ranked based inference
dc.subject	Privacy
dc.subject	Private attribute
dc.subject	Subspace skyline
dc.subject	Sorted list
dc.subject	Categorical domain
dc.subject	TA algorithm
dc.subject	Threshold algorithm
dc.subject	Sorting based algorithm
dc.subject	Partition based algorithm
dc.title	Algorithms for Exploratory Queries over Web Database
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2019-02-06T21:26:04Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: RAHMAN-DISSERTATION-2018.pdf
Size:: 9.316Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record

Algorithms for Exploratory Queries over Web Database

Files in this item

This item appears in the following Collection(s)

Related items

Models And Algorithms To Determine Cerebral Activation Using Near Infrared Spectroscopy ﻿

Building Energy-efficient Broadcast Trees Using A Genetic Algorithm In Wireless Sensor Networks ﻿

Integration Of Data Mining Algorithms And Control Charts For Multivariate And Autocorrelated Processes ﻿

Models And Algorithms To Determine Cerebral Activation Using Near Infrared Spectroscopy

Building Energy-efficient Broadcast Trees Using A Genetic Algorithm In Wireless Sensor Networks

Integration Of Data Mining Algorithms And Control Charts For Multivariate And Autocorrelated Processes