From news.cs.ucla.edu!not-for-mail Wed Aug 16 10:05:44 2000
Path: news.cs.ucla.edu!not-for-mail
From: susanna@kiwi.cs.ucla.edu (Susanna Reyn)
Newsgroups: grads
Subject: David Taylor's Final Public Defense
Date: 24 May 2000 09:09:26 -0700
Organization: University of California, Los Angeles
Lines: 38
Message-ID: <8ggurm$1l5$1@kiwi.cs.ucla.edu>
NNTP-Posting-Host: kiwi.cs.ucla.edu
X-Trace: news.cs.ucla.edu 959184569 2413 131.179.128.19 (24 May 2000 16:09:29 GMT)
X-Complaints-To: usenet@cs.ucla.edu
NNTP-Posting-Date: 24 May 2000 16:09:29 GMT
Keywords: FINAL PUBLIC DEFENSE
X-Newsreader: trn 4.0-test58 (13 May 97)
Xref: news.cs.ucla.edu grads:11761


Name:    David S. Taylor 
Date:    May 31, 2000
Time:    10:00 a.m. -12:00 noon
Place:   Boelter Hall 4760
Advisor: Prof. E. Koutsoupias

Title:   Indexing Schemes for Range Query Workloads

Abstract:

We investigate a fundamental problem in data storage systems, the
efficient retrieval of $d$-dimensional range queries from external
memory.  For multidimensional data this problem is inherently hard
because of the nature of secondary memory devices; data must be
grouped together for fast access.  This introduces locality of
reference issues absent from the internal memory problem. 

We consider the tradeoff between storage redundancy and access
overhead, using the indexing scheme measure of Hellerstein,
Koutsoupias, and Papadimitriou.  These parameters measure the size of
a database, and its speed in answering queries.

We give upper and lower bounds on the tradeoff required for arbitrary
2-dimensional workloads (point sets), and for random $d$-dimensional
workloads.  These bounds are tight with respect to the total size of
the workload, and exhibit a threshold behavior:  redundancy values
less than the threshold imply near worst case data access patterns,
while slightly higher values permit near optimal access performance.

We also study generalizations of indexing schemes for multiple disks,
fault tolerant replication, and alternate complexity measures.  In
each case, our results are promising, especially for multiple disks:
we show the surprising result that for some workloads, $D$ disks
increase access speeds substantially more than a factor of $D$.

----- End of forwarded message from David S. Taylor -----