Speaker: Xin Zhou Title: Unifying the Processing of XML Streams and Relational Data Streams Time: 12:30-2:00 Room: BH 4549 Abstract Relational data streams and XML streams have previously provided two separate research foci, but their unified support by a single Data Stream Management System (DSMS) is very desirable from an application viewpoint. In this paper, we propose a simple approach to extend relational DSMSs to support both kinds of streams efficiently. In our Stream Mill system, XML streams expressed as SAX events, can be easily transformed into relational streams, and vice versa. This enables a close cooperation of their query languages, resulting in great power and flexibility. For instance, XQuery can call functions defined in our SQLbased Expressive Stream Language (ESL) using the logical/ physical windows that have proved so useful on relational data streams. Many benefits are also gained at the system level, since relational DSMS techniques for load shedding, memory management, query scheduling, approximate query answering, and synopsis maintenance can now be applied to XML streams. Moreover, the many FSA-based optimization techniques developed for XPath and XQuery can be easily and efficiently incorporated in our system. Indeed, we show that YFilter, which is capable of efficiently processing multiple complex XML queries, can be easily integrated in Stream Mill via ESL user-defined and systemdefined aggregates. This approach produces a powerful and flexible system where relational and XML streams are unified and processed efficiently.
Speaker: Feng Qiu Title: Automatic Identification of User Interest For Personalized Search Time: 12:30-2:00 Room: BH 4549 Abstract One hundred users, one hundred needs. As more and more topics are being discussed on the web and our vocabulary remains relatively stable, it is increasingly difficult to let the search engine know what we want. Coping with ambiguous queries has long been an important part of the research on Information Retrieval, but still remains a challenging task. Personalized search has recently got significant attention in addressing this challenge in the web search community, based on the premise that a userÕs general preference may help the search engine disambiguate the true intention of a query. However, studies have shown that users are reluctant to provide any explicit input on their personal preference. In this paper, we study how a search engine can learn a userÕs preference automatically based on her past click his- tory and how it can use the user preference to personalize search results. Our experiments show that usersÕ preferences can be learned accurately even from little click-history data and personalized search based on user preference yields sig- nificant improvements over the best existing ranking mech- anism in the literature.
Speaker: Hyun Jin Moon Title: Support for Historical Queries and Schema Evolution in XML and Relational DBMS Time: Friday(Mar., 3) 12:30-2:00pm Room: BH4549 Abstract Schema is the interface between the database and the applications: the database is organized under a schema, and application queries are written against the schema. For this reason, it is desired that the schema would remain unchanged. However, in real world scenarios, schemas do change many times during lifetime, posing a host of challenging schema evolution problems in information system research. In this paper, we first consider the problem in archival information systems, i.e. systems that preserve the history of the database content and support temporal queries on such history. We discuss exiting approaches to archival databases and temporal queries in the situation where the schema has remained unchanged, and only the database has evolved over time. Then, we concentrate on the more difficult problem of supporting temporal queries when the schema has also evolved over time, resulting in multiple versions of schema, and multiple versions of the database under each schema version. To address this challenging problem, we propose an XML-based approach to represent the combined history of database schema and content, and mapping techniques to translate queries between different versions of schemas. Then, we turn to the problem of schema evolution in current databases, and explore the use of similar techniques to support a more gradual transition of the database and applications from the old schema to the current one. Our objective is to address these two independent, but closely-related problems within a unified framework.
Speaker: Yan-Nei Law Title: Models and Operators for Continuous Queries on Data Streams Time: Friday(Feb., 24) 12:30-2:00pm Room: BH4549 Abstract A new generation of data-intensive applications is emerging for managing and querying information that, rather than residing in databases, flows continuously through the network in the form of massive data streams. Hence, there is much research work on designing Data Stream Management Systems, and the approach favored by many research projects consists in extending database languages and technology for data streams. However, the new computational environment brings significant research challenges in areas such as query languages, query processing, and advanced applications. In this talk, we first focus on the limitations of relational languages in expressing continuous stream queries. A main limitation follows from the fact that only nonblocking operators can be used in continuous queries, which makes relational languages incomplete on these queries. To address this problem, we investigate user-defined aggregates natively defined in SQL itself, and prove that these make SQL (i) Turing-complete on stored data, and (ii) complete on data streams. Furthermore, we illustrate the effectiveness of the proposed extensions on complex applications involving time-series queries, and mining queries. For advanced applications, we focus on data-stream mining algorithms, which must now be redesigned to make lighter demands on resources and display greater adaptability than those on stored data. In this talk, we introduce ANNCAD, which uses multi-resolution data representation to classify new test points using the nearest-neighbors principle. The incremental property and very fast update speed make ANNCAD very suitable for mining data streams. Our experiments show that ANNCAD is adaptive and works well in many applications, including image recognition and censor surveying. We then study the problem of stream query processing. We propose a load-shedding technique for multi-join, called Msketch, which makes decisions based on the productivity of tuples, rather than only the content of the joined pair stream. A thorough study shows that Msketch outperforms other existing algorithms. Finally, we propose general techniques for optimizing the accuracy of window aggregates, statistical aggregates and mining queries in the presence of sampling. Our method incorporates prior knowledge into an error model that is used to reduce the uncertainty introduced by sampling. We also extend the method to adjust to concept shifts.
Speaker: Professor Arne and Ingeborg Solvberg Title: Concepual Modelling for the Semantic Web Time: Friday(Feb., 10) 12:00-2:00pm Room: BH4549 Abstract Over the last 10 years the web has emerged as a primary vehicle for disseminating information among people and among businesses. Along with this comes an increased need for interoperability, e.g., in order to support value chain organised business. In technical terms this translates to an increased need for enterprise modelling and for information content modelling. For both of these purposes ontologies are of central importance. The talk will present a view of what constitutes the central issues. Ongoing research at NTNU in Trondheim will be presented, in model management, semantic annotation of process models, ontology alignments. The talk will be rounded off by a short overview of a new initiative of establishing a field test laboratory for developing and testing out mobile information services. The laboratory comprises a wireless broadband service covering substantial part of downtown Trondheim. About authors Professors Arne and Ingeborg Solvberg of NTNU (Norwegian Technical-Natural science University) at Trondheim, Norway http://www.ntnu.no/indexe.php stay with UCLAÕs CS department during the spring of 2006. They are hosted by Wes Chu. ArneÕs research area is Information Systems http://www.idi.ntnu.no/grupper/is/. IngeborgÕs area is Digital Libraries.http://www.idi.ntnu.no/grupper/if/