To subscribe to the db-ucla mailing list for seminar announcement, please visit this page
DB-UCLA Seminar
Time: 12:00n-1:00pm Fridays; Room: 4549 Boelter Hall
*To invite a guest speaker or to schedule a talk, contact Young Cha (cookie58z at gmail dot com)
Date | Speaker | Title |
---|---|---|
04/01 | ||
04/08 | Dr. Carlo Zaniolo | [dbucla] SMM: a Data Stream Management System for Knowledge Discovery |
04/15 | Shanu Sushmita | [snorg] Aggregated Search: Result Presentation and User Behavior. |
04/22 | ||
04/29 | ||
05/06 | Kai Zeng | [dbucla] Uncertainty Propagation in Complex Query Networks on Data Streams: A New Paradigm for Load Shedding |
05/13 | Jer-Ming Chen | [dbucla] Burrows-Wheeler Transform Database |
05/20 | Ryan R. Rosario | [snorg] Towards a Crawling Schedule for Web Forum Content |
05/27 | Michael Shindler | [dbucla] Clustering Large Datasets |
05/27 | Shi Gao | [dbucla] Trajectory Pattern Mining |
06/03 | Nikhilesh Tadiparthi & Priya Meenakshi | [dbucla] Processing of streaming XML Documents |
Date | Speaker | Title |
---|---|---|
01/07 | ||
01/14 | Dr. Junghoo Cho | [snorg] How to be a good student |
01/21 | ||
01/28 | ||
02/04 | Admission meeting | |
02/11 | Dr. Carlo Zaniolo | [dbucla] Bring Order to Data Models and Query Languages for Data Streams |
02/18 | Vladimir Braverman | [dbucla] Space-efficient algorithms for data streams |
02/25 | Barzan Mozafari | [dbucla] High-performance Pattern Detection and Discovery for Databases and Data Streams |
03/04 | Prospective student visit day | |
03/11 |
Date | Speaker | Title |
---|---|---|
10/01 | Barzan Mozafari | [dbucla] From Regular Expressions to Nested Words: Unifying Languages and Query Execution for Relational and XML Sequences |
10/08 | Mirjana Mazuran | [dbucla] Mining and querying Tree-based Association Rules from XML documents |
10/15 | Amruta Joshi | [snorg] Leveraging Keyword Context to Improve Retrieval Relevance |
10/22 | Kelvin T. Leung | [dbucla] IRMA: An Image Registration Meta-Algorithm --- Evaluating Alternative Algorithms with Multiple Metrics |
10/29 | Hamid Mousavi | [dbucla] Fast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams |
11/05 | Nikolay Laptev | [dbucla] Extending Teradata UDFs to support SQL and Map Reduce programs |
11/12 | Veteran's day week | |
11/19 | Daniel Fabbri | [dbucla] PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting |
11/26 | Happy Thanksgiving! | |
12/03 | Finals week |
On scaling Analytics Realtime under Exterme Load Conditions: My Internship at Google
Speaker:
Nikolay Laptev
Abstract:
The need to support realtime data analytics is important to make timely decision, however due to the extreme system load requirement this is often
difficult and in this talk I will describe my internship experince at Google Realtime Analytics team where we tried to address this problem. This talk will discuss the load
generation tool developed, the realtime monitoring tool, the dynamic RPC adjustment and finally anomaly detection. The load generation framework was used
to purposefully overload the system and find its bottleknwcks. The realtime monitoring tool was developed to see where the most packets are being dropped and helped in further
analysis of the weak points of the system. The dynamic RPC adjustment (patent pending) was developed to address our scalability problem and finally the anomaly detection
implementation was supplied in order to alert both us or the user of any unexpected behavior in the website traffic (due to system overload or other anomalous activity). With the
help of the above, Google Analytics Realtime was launched on 09-28-2011. This talk will also briefly discuss the future plans for our lab and Stream Mill Miner system that we are developing.
Recent Advances in Logic-Based Languages
Speaker:
Dr. Carlo Zaniolo
Abstract:
Logic-based query languages represented a cornerstone of the relational
database model introduced by E.F. Codd, and were then significantly extended
with the development of Datalog and other deductive database lan-
guages that support rules and recursive queries. This line of research produced
elegant semantics and implementation techniques, and delivered the
enabling technology for the effcient implementation of recursive queries with
stratified negation supported by commercial DBMS and spefied by SQL
standards. After a lull of several years, the spread of web-age information
systems and applications has generated a vigorous resurgence of interest leading
to a "Springtime for Datalog" according to Joseph M. Hellerstein [3]. It
was indeed in his lab at UCB that the idea of logic-based declarative specification
and design of Internet protocols and services frst emerged [4]; more recent work
at UCB seeks to extend this approach and develop Datalog-based foundations for
parallel and distributed programming languages [3]. On the Semantic Web
front, a novelty of great interest is represented by the introduction of Linear
Datalog± for expressing and supporting fficiently subsets of description
logic and reasoning for ontological queries [2]. Two other simple extensions
of Datalog of significant practical interest have been proposed very recently:
one is intended for complex graph queries, including page-rank queries [5], while
the other supports more powerful continuous queries on data streams [6].
Finally, novel challenges and opportunities are emerging at the implementation
level, making it possible to support recursive queries in a Map-Reduce emvironment [1].
This DBUCLA presentation will first summarize the tutorial I recently
presented at the WAIM 2011/MSRA Summer School, and then expand on [5].
My slides from the Summer School can be downloaded from
http://www.cs.ucla.edu/~zaniolo/talks11/.
References
[1] F. N. Afrati, V. R. Borkar, M. J. Carey, N. Polyzotis, and J. D. Ullman. Map-
Reduce Extensions and Recursive Queries. In EDBT 2011, Invited Paper,
pages 1–8, 2011.
[2] G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and opti-
mization. In ICDE 2011: Keynote Paper, pages 2–13, 2011.
[3] J. M. Hellerstein. Datalog redux: experience and conjecture. In PODS 210:
Keynote Paper, pages 1–2, 2010.
[4] B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica.
Implementing Declarative Overlays. In SOSP, pages 75–90, 2005.
[5] M.Mazuran, E. Serra, and C. Zaniolo: WEB Queries in Datalog with Frequency Support.
Submitted for Publication.
[6] C. Zaniolo. The Logic of Query Languages for Data Streams. In Logic in
DataBases 2011. EDBT Workshop, 2011.
Social Network Analysis Using Topic Model
Speaker:
Young Cha
Title: Social Network Analysis Using Topic Model
Abstract:
In this paper, we discuss how we can extend probabilistic topic models to analyze the relationship graph of popular social-network data,
so that we can label or group the edges and the nodes in the graph based on their topic similarity. In particular, we first apply the well-known Latent Dirichlet Allocation (LDA)
model and its existing variants to the graph-labeling task and argue that the existing models do not handle popular nodes (the nodes with many incoming edges) in the graph very well.
We then propose possible extensions to this model to deal with popular nodes.
Our experiments show that the proposed extensions are very effective in labeling popular nodes, showing significant improvements over the existing methods.
Our proposed methods can be used for providing, for instance, more relevant recommendations for people to follow within a social network.
Computation of Biased Histograms over Fast Data Streams
Speaker:
Hamid Mousavi
Title: Computation of Biased Histograms over Fast Data Streams
Abstract:
In this talk, I will present our current work on designing biased histograms over fast data streams with sliding windows. In biased histograms, we need to have the size of bars/buckets exponentially decreasing (or increasing) toward some points in the distribution of data. The biased points are usually one or both ends of the data distribution.
Our algorithm, which uses a new sampling technique, is based on our previous work on designing equi-depth histograms over fast data streams. Both approaches take advantage of "Exponential Histograms" and try to split and merge histogram's bars in order to keep the histograms up-to-date.
Toward More Powerful Query Languages & Systems for Continuous Analytics and Data Mining.
Speaker:
Dr. Carlo Zaniolo
Title: Toward More Powerful Query Languages & Systems for
Continuous Analytics and Data Mining.
Abstract:
Database oriented query languages for continuous queries offer many
benefits, including scalability and parallelizability. However, they are
impaired by severe limitations in their ability to express complex
applications, such as data stream mining. The UCLA Stream Mill system
addresses and overcomes these limitations by:
1) Extensibility cosntructs that achieve Turing completeness for SQL-based
languages. Friendly environments for advanced applications,
such as data stream mining, are thus achieved via these constructs and a
multi-layer architecture.
2) A powerful query language for searching complex patterns recognizable by
nested-word automata.
Current research focuses on reclaiming blocking constructs,
such as negation. Toward this goal, we use Datalog as a powerful
modelling tool for the analysis of continuous queries on data streams.
Scal(a)ing up Machine Learning and Graph-based Analytics
Speaker:
Tyson Condie
Title: Scal(a)ing up Machine Learning and Graph-based Analytics
Abstract:
Machine learning practitioners are increasingly interested in applying their algorithms to Big Data. Unfortunately, current high-level languages for data analytics (e.g., Hive, Pig, Sawzall, Scope) do not fully cover this domain. One key missing ingredient is the means to efficiently support iteration over the data. Zaharia et al., were the first to answer this call from a systems perspective with Spark. Spark adds the notion of a working set to data-parallel workflows and has published speed-ups of 30x over Hadoop MapReduce for many machine learning and graph algorithms.
Unfortunately, Spark does cover the whole pipeline of Big Data analytics; at Yahoo!, it is common to compose Pig, MPI and direct MapReduce program modules into workflows. This fractioning of individual processing steps can be a major pain e.g., for optimization, debugging, and code readability. Our prescription to this dilemma is a new DSL for data analytics called ScalOps. Like Pig, ScalOps combines the declarative style of SQL and the low-level procedural style of MapReduce. Like Spark, ScalOps can optimize its runtime—the Hyracks parallel-database engine—for repeated access to data collections. ScalOps is part of a broader research agenda to explore new abstractions for machine learning and graph-based analytics. In this talk, I will present example workflows from the machine learning domain expressed in ScalOps and their translation to Hyracks recursive query plans.
Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search AND Querying Uncertain Data with Aggregate Constraints
Speaker:
Mohan Yang
Title: Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search
Abstract: An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. Our applications, on the other hand, are interested in specific content from specific data sources. Knowing the content or the semantics of the data enables us to differentiate data items based on their importance and volatility, which are key factors that impact the design of the data synchronization strategy. In this work, we formulate the concept of content freshness, and present a novel approach that maintains content freshness with least amount of web communication. Specifically, we assume data is accessible through a general keyword search interface, and we form keyword queries based on their selectivity, as well their contribution to content freshness of the local copy. Experiments show the effectiveness of our approach compared with several naive methods for keeping data fresh.
Title: Querying Uncertain Data with Aggregate Constraints
Abstract: Data uncertainty arises in many situations. A common approach to query processing uncertain data is to sample many ``possible worlds'' from the uncertain data and to run queries against the possible worlds. However, sampling is not a trivial task, as a randomly sampled possible world may not satisfy known constraints imposed on the data. In this paper, we focus on an important category of constraints, the aggregate constraints. An aggregate constraint is placed on a set of records instead of on a single record, and a real-life system usually has a large number of aggregate constraints. It is a challenging task to find qualified possible worlds in this scenario, since tuple by tuple sampling is extremely inefficient because it rarely leads to a qualified possible world. In this paper, we introduce two approaches for querying uncertain data with aggregate constraints: constraint aware sampling and MCMC sampling. Our experiments show that the new approaches lead to high quality query results with reasonable cost.
SMM: a Data Stream Management System for Knowledge Discovery
Speaker:
Professor Zaniolo
Abstract:
The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The major technical advances that achieved separately on DSMSs and on data stream mining algorithms have failed to converge and produce powerful data stream mining systems. Such systems, however, are essential since the traditional pull-based approach of cache mining is no longer applicable, and the push-based computing mode of data streams and their bursty traffic complicate application development. For instance, to write mining applications with quality of service (QoS) levels approaching those of DSMSs, a mining analyst would have to contend with many arduous tasks, such as support for data buffering, complex storage and retrieval methods, scheduling, fault-tolerance, synopsis-management, load shedding, and query optimization.
Our Stream Mill Miner (SMM) system solves these problems by providing a data stream mining workbench that combines the ease of specifying high-level mining tasks, as in Weka, with the performance and QoS guarantees of a DSMS. This is accomplished in three main steps. The first is an open and extensible DSMS architecture where KDD queries can be easily expressed as user-defined aggregates (UDAs)—our system combines that with the efficiency of synoptic data structures and mining-aware load shedding and optimizations. The second key component of SMM is its integrated library of fast mining algorithms that are light enough to be effective on data streams. The third advanced feature of SMM is a Mining Model Definition Language (MMDL) that allows users to define the flow of mining tasks, integrated with a simple box&arrow GUI, to shield the mining analyst from the complexities of lower-level queries. SMM is the first DSMS capable of online mining and this paper describes its architecture, design, and performance on mining queries.
Aggregated Search: Result Presentation and User Behavior.
Speaker:
Shanu Sushmita
Affiliation:
University of Glasgow
Abstract:
In order to facilitate information access, search engines are now providing access to diverse data in a unified manner, called aggregated search. An aggregated search interface is designed to aggregate retrieval results from different information sources (image, video, maps, blogs, etc) into a single result page. In other words, aggregated search provides a richer media experience to users, and by using only a small number of keywords or phrase. It does so by without mentioning the type of media, the results provided aims to cover everything relevant. This work aims to explore the aggregated search from the users' perspective, where the focus is on understanding and describing the phenomena related to the users' search process in the context of the aggregated search. In building this understanding, the work presented here focuses on the click-behavior, information need, source relevance, dynamics of search intents. The understanding comes partly from conducting users studies and, from analyzing search engine log data.
Uncertainty Propagation in Complex Query Networks on Data Streams: A New Paradigm for Load Shedding
Speaker:
Kai Zeng
Advisor:
Professor Zaniolo
Abstract:
Data Stream Management Systems (DSMS) are subject to bursty data arrivals.
When overloaded, a DSMS needs to shed some of the load while minimizing the accuracy loss.
Previous work on load shedding, when focusing on optimizing the query accuracy, has great limitation in the generality and applicability to handle complex queries.
We propose and study a new load shedding paradigm that is based on modeling the distribution and propagation effect of the uncertainty.
This work solves the problem of optimal load shedding for the most general case of query networks and operators.
Our analytical conclusions are validated through extensive experiments.
Burrows-Wheeler Transform Database
Speaker:
Jer-Ming Chen
Advisor:
Professor Zaniolo
Abstract:
Imagine you have potentially 1 million documents to index. Each documents are roughly 3000 characters long. You have roughly 3 billion characters to index. Documents come in different time, just like your email. How would you index them with high performance and updateable?
This is a SQL database implementation of BWT using SQLite on C# .Net platform. SQLite can store hundreds of TB, thus, storage size is not limited.
Towards a Crawling Schedule for Web Forum Content
Speaker:
Ryan R. Rosario
Abstract:
There is much literature about crawling web pages to minimize staleness and maximize recency.
Web Forum threads are similar to web pages, but pose more challenges because the content is feverishly createed by an unpredictable set of users.
Forum threads and posts are created at whim by users, and die off as the community loses interests in its topic,
as moderators close/lock the thread, and as past threads are flushed from databases.
The life cycle of a thread is different from the life cycle of a web page.
In this talk, I will discuss a recent crawling effort on offtopic.com, and discuss
my ideas on scheduling crawls for forums like offtopic.com to maximize recency and minimize staleness.
Clustering Large Datasets
Speaker:
Michael Shindler
Abstract:
Clustering is a common problem in many fields, and the k-means objective is a very popular formulation of it. We consider the case when the data to be clustered (via k-means) is too large to fit into main memory, and thus must be read and processed sequentially, using as little memory as possible. We show that, if we can read the data twice, we can achieve the same approximation bound that can be guaranteed for the normal (main memory) cases previously studied. If we are constrained by one read, we can achieve almost this -- only an additive epsilon factor worse. We demonstrate a considerable speed-up to the algorithm. Finally, we show that in addition to the theoretical guarantees, this achieves a great improvement in practice as well.
Space-efficient algorithms for data streams
Speaker:
Vladimir Braverman
Advisor:
Professor Rafail Ostrovsky
Abstract:
Streaming algorithms is an important area of theoretical computer science with many practical applications. We will define the basic model of data streams and will explain some fundamental algorithms and methods that have shaped the area of data streams. Also, we will survey some of our recent results and discuss current challenges and open problems. In particular, we will present the paper “Zero-One Frequency Laws” (STOC 2010) where we investigate frequency-based functions and answer the main open question of Alon, Matias and Szegedi (STOC 1996).
High-performance Pattern Detection and Discovery for Databases and Data Streams
Speaker:
Barzan Mozafari
Advisor:
Professor Zaniolo
Abstract:
Many important applications such as decision support, fraud detection, stock market analysis, weather forecast and online recommendation systems involve the detection and discovery of patterns, including spatio-temporal patterns, sequential patterns, frequent itemset patterns and trajectory patterns. The efficient search for such patterns and the discovery of characteristic patterns of interest pose difficult research challenges for both stored data and data streams. My dissertation has focused on these challenges seeking to achieve the following objectives: (1) Minimal extensions to current query languages for patterns, (2) Optimization techniques for pattern query languages and diverse computing environments data base, data streams, single and parallel processing, (3) Efficient algorithms for discovering interesting patterns in data streams, and (4) An integrated system that enables expressing and definition, mining, evaluation and deployment of different type of patterns from data streams.
Toward these objectives, I have sought integrated solutions of the research problems at three different levels: (i) algorithmic level, (ii) language level and (iii) system level. This talk overviews the progress that I have made on these problems, and closely related ones such as privacy-preserving data mining. Then I will describe a promising approach based on Kleene-closure expressions, that we are currently pursuing for language extension, query optimization, and support for advance spatio-temporal applications such as trajectory analysis.
From Regular Expressions to Nested Words: Unifying Languages and Query Execution for Relational and XML Sequences
Speaker:
Barzan Mozafari
Advisor:
Professor Zaniolo
Abstract:
There is growing interest in query language extensions for pattern
matching over event streams and stored database sequences, due
to the many important applications that such extensions make possible.
The push for such extensions has led DBMS vendors and
DSMS venture companies to propose Kleene-closure extensions of
SQL standards, building on seminal research that demonstrated the
effectiveness and amenability to efficient implementation of such
constructs. These extensions, however powerful, suffer from limitations
that severely impair their effectiveness in many real-world
applications. To overcome these problems, we have designed the
K*SQL language and system, based on our investigation of the
nested words, which are recent models that generalize both words
and trees.
K*SQL extends the existing relational sequence languages, and
also enables applications from other domains such as genomics,
software analysis, and XML processing. At the same time, K*SQL
remains extremely efficient, using our powerful optimizations for
pattern search over nested words. Furthermore, we show that other
sequence languages and XPath can be automatically translated into
K*SQL, allowing for K*SQL to be also used as a high-performance
query execution back-end for those languages. Therefore, K*SQL
is a unifying SQL-based engine for sequence and XML queries,
which provides novel optimization techniques for both.
Mining and querying Tree-based Association Rules from XML documents
Speaker:
Mirjana Mazuran
Affiliation:
Politecnico di Milano
Abstract:
Extracting information from semistructured documents is becoming more and more critical as the amount of digital information available on the internet grows. Documents are often so large that also the dataset returned as answer to a query may be too big to convey interpretable knowledge. Tree-based Association Rules (TARs) are a new way to face the challenges of such scenario. They are structure-preserving associatin rules mined from XML documents and provide approximate, intensional information on both their structure and contents. This mined knowledge is the gist of the original document and provides support for (i) query formulation on datasets whose structure is unknown and (ii) intensional query answering, that is, allowing to query TARs rather then the original document in order to provide quick, approximate answers.
IRMA: An Image Registration Meta-Algorithm ---
Evaluating Alternative Algorithms with Multiple Metrics
Speaker:
Kelvin T. Leung
Advisor:
Professor Parker
Abstract:
A common problem in scientific computing is the need to choose among a
number of different metrics or scoring criteria. To minimize error,
for example, one might choose among metrics like L1 error, L2 error,
etc. These choices can be difficult to make, because each criterion is
an objective that can be useful in its own right. Because they can
differ significantly from one another, ad hoc choices can have
significant consequences.
This problem is important in scientific imaging, as many different
measures of distance, similarity, and quality are useful for
images. In the image registration problem, one is given a pair of
images -- an input image and a reference image -- and a set of
registration algorithms for aligning them. The result of registration
is a transformation of the input image that resembles the reference
image, and the quality of this result is often the value of a metric
measuring the distance or similarity between these
images. Registration is a central problem in image analysis, and new
solutions provide new ways of retrieving images from databases.
In this talk, we describe a new meta-algorithm called Image
Registration Meta-Algorithm (IRMA) that evaluates results of multiple
metrics. Selection is done by a decision engine that automatically
stores metric computation results in a database, then mines these
results to evaluate which registration algorithms gives results of
highest quality. To help facilitate our goal, we have build a
scientific model-base management system using Postgres on the LONI/CCB
grid computing facility and LONI pipeline workflow infrastructure,
which allows scientists to catalog the results such as provenance
information about the parameter settings being used in the workflow
and allows scientists to perform ”shot-gun” approach experiments
to explore and further understand the space of alternatives in a more
organized fashion.
* This work supported by NIH grant 1U54RR021813 and the Center for
Computational Biology, an NIH National Center of Biomedical
Computing.
Fast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams
Speaker:
Hamid Mousavi
Advisor:
Professor Zaniolo
Abstract:
Equi-depth histograms represent a fundamental synopsis
widely used in both database and data stream applications, as they
provide the cornerstone of many techniques such as query optimization,
approximate query answering, distribution fitting, and parallel database
partitioning. Equi-depth histograms try to partition a sequence of data
in a way that every part has the same number of data items.
In this paper, we present a new algorithm to estimate equi-depth
histograms for high speed data streams over sliding windows. While
many previous methods were based on quantile computations, we propose
a new method called BAr Splitting Histogram (BASH) that provides an
expected ϵ-approximate solution to compute the equi-depth histogram.
Extensive experiments show that BASH is at least four times faster than
one of the best existing approaches, while achieving the same accuracy
and using less memory. The experimental results also indicate that BASH
is more stable on data affected by frequent concept shifts.
Extending Teradata UDFs to support SQL and Map Reduce programs
Speaker:
Nikolay Laptev
Advisor:
Professor Zaniolo
Abstract:
Teradata is a massively parallel processing system running a shared nothing architecture.
The Teradata DBMS is linearly and predictably scalable in all dimensions of a database
system workload (data volume, breadth, number of users, complexity of queries). One of
the many features of the Teradata Database includes the support for User Defined Functions (UDFs).
A Teradata Database UDF is a program that operates on data stored in relational tables.
UDFs allow users to add their own extensions to the Teradata SQL language.
Teradata UDFs must be written in Java or C. In this talk a high level overview is given
about the extension of Teradata UDFs to support SQL to maximize programmer productivity and to decrease debugging
time. Furthermore the conversion of Hadoop MapReduce programs and of Java UDFs into native
Teradata UDFs is also discussed. The work presented was part of a summer project at Teradata.
PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting
Speaker:
Daniel Fabbri
Affiliation:
U of Michigan
Abstract:
Recent legislation has increased the requirements of organizations to report data breaches, or unauthorized access to data. While access control policies are used to restrict access to a database, these policies are complex and difficult to configure. As a result, misconfigurations sometimes allow users access to unauthorized data.
In this paper, we consider the problem of reporting data breaches after such a misconfiguration is detected.
To locate past SQL queries that may have revealed unauthorized information, we introduce the novel idea of a misconfiguration response (MR) query. The MR-query cleanly addresses the challenges of information propagation within the database by replaying the log of operations and returning all logged queries for which the result has changed due to the misconfiguration. A strawman implementation of the MR-query would go back in time and replay all the operations that occurred in the interim, with the correct policy. However, re-executing all operations is inefficient. Instead, we develop techniques to improve reporting efficiency by reducing the number of operations that must be re-executed and reducing the cost of replaying the operations. An extensive evaluation shows that our method can reduce the total runtime by up to an order of magnitude.