From news.cs.ucla.edu!not-for-mail Wed Aug 16 10:05:44 2000 Path: news.cs.ucla.edu!not-for-mail From: susanna@kiwi.cs.ucla.edu (Susanna Reyn) Newsgroups: grads Subject: David Taylor's Final Public Defense Date: 24 May 2000 09:09:26 -0700 Organization: University of California, Los Angeles Lines: 38 Message-ID: <8ggurm$1l5$1@kiwi.cs.ucla.edu> NNTP-Posting-Host: kiwi.cs.ucla.edu X-Trace: news.cs.ucla.edu 959184569 2413 131.179.128.19 (24 May 2000 16:09:29 GMT) X-Complaints-To: usenet@cs.ucla.edu NNTP-Posting-Date: 24 May 2000 16:09:29 GMT Keywords: FINAL PUBLIC DEFENSE X-Newsreader: trn 4.0-test58 (13 May 97) Xref: news.cs.ucla.edu grads:11761 Name: David S. TaylorDate: May 31, 2000 Time: 10:00 a.m. -12:00 noon Place: Boelter Hall 4760 Advisor: Prof. E. Koutsoupias Title: Indexing Schemes for Range Query Workloads Abstract: We investigate a fundamental problem in data storage systems, the efficient retrieval of $d$-dimensional range queries from external memory. For multidimensional data this problem is inherently hard because of the nature of secondary memory devices; data must be grouped together for fast access. This introduces locality of reference issues absent from the internal memory problem. We consider the tradeoff between storage redundancy and access overhead, using the indexing scheme measure of Hellerstein, Koutsoupias, and Papadimitriou. These parameters measure the size of a database, and its speed in answering queries. We give upper and lower bounds on the tradeoff required for arbitrary 2-dimensional workloads (point sets), and for random $d$-dimensional workloads. These bounds are tight with respect to the total size of the workload, and exhibit a threshold behavior: redundancy values less than the threshold imply near worst case data access patterns, while slightly higher values permit near optimal access performance. We also study generalizations of indexing schemes for multiple disks, fault tolerant replication, and alternate complexity measures. In each case, our results are promising, especially for multiple disks: we show the surprising result that for some workloads, $D$ disks increase access speeds substantially more than a factor of $D$. ----- End of forwarded message from David S. Taylor -----