Algorithms for Exploratory Queries over Web Database
Rahman, Md farhadur
MetadataShow full item record
In recent years we have seen an increase in the popularity of many web applications. The functionality of these applications range from allowing users to interact using online social network, to assist users in their everyday activity such as selecting a hotel in an area, locating a nearby restaurant etc. Google Maps, WeChat, FourSquare, AirBnB, TripAdvisor, and Hotels.com are a few such examples. The backed database of these applications can be a rich source of information for the corresponding application domain. For example, using Google Maps a user can find the ratings, reviews, and price of a restaurant, using Zillow users compare the price distributions of houses in different areas of a city. The public query interfaces of these applications may be abstractly modeled as a kNN interface over the backend database that returns k results that are nearest to the user query, where k is typically a small number such as 10 or 50. Moreover, Access to the underlying database is further restricted by enforcing query rate limitation, i.e., they limit the number of requests from an IP address or API account for a certain time period. Because of these restrictions, it is quite impossible for a third-party to crawl the data from the backend database. In this dissertation, we present efficient techniques for exploratory analysis over web database. Specifically, we propose several algorithms that enable third-party applications to perform the analytics and mining of the backend database by using nothing but the restrictive, public-access, interface of the application. In addition, we also studied the problem of answering the exploratory queries with full access to the underlying dataset. First, we consider a special category of web application know as Location based services (LBS). In general, an LBS provides a public search interface over its backend database, taking as input a 2D query point and returning k tuples in the database that are closest to the query point. In this work, we propose several algorithms that enable approximate estimate of SUM and COUNT aggregates by using the public search interface provided by the LBS. Second, we enable density-based clustering over the backend database of an LBS using nothing but limited access to the kNN interface provided by the LBS. Our goal here is to mine from the LBS a cluster assignment function f(), such that for any tuple t in the database (which may or may not have been accessed), f() can produce the cluster assignment of t with high accuracy. Third, we investigate a novel problem on the privacy implications of database ranking, which has not been studied before. We show how privacy leakage (through the top-k interface) can be caused by a seemingly innocent design of the ranking function in ranked retrieval models. We introduce a comprehensive taxonomy of the problem space. Then, for each subspace of the problem, we develop a novel technique which either guarantees the successful inference of private attributes, or accomplishes the attack for a significant portion of real-world tuples. Finally, we study the problem of subspace skyline discovery over web database where the attributes are mostly categorical. Skyline queries are an effective tool in assisting users in data exploration. In accordance with common practice in traditional database query processing, we design solutions for two important practical instances of this problem, namely: (i) assuming that no indices exist on the underlying dataset, and (ii) assuming that indices exist on each individual attribute of the dataset.
Showing items related by title, author, creator and subject.
Allen, Monica Suresh (Electrical Engineering, 2007-08-23)The specific aims of this dissertation were to develop: (1) Correlations between experimental protocols and oxygenated, deoxygenated, and total hemoglobin concentrations in the brain (2) Mathematical models to associate ...
An, Min Kyung (Computer Science & Engineering, 2007-10-08)In wireless broadcast environment, data can be transmitted to several nodes by a single transmission. The nodes in that environment have limited energy resources; therefore we need to reduce the energy consumption when ...
Malalur, Sanjeev Sreenivasa Rao (Electrical Engineering, 2011-10-11)Starting with the concept of equivalent networks, a framework for analyzing the effect of linear dependence on training of a multi-layer perceptron is established. Detailed mathematical analyses are carried out to show ...