Detailed News for 2002


5/17: Dongwon Lee's Ph.D. Defense.

Name:    Dongwon Lee
Date:    Friday, May 17, 2002
Time:    2:00 - 4:00 pm
Place:   4760 Boelter Hall
Advisor: Prof. W. Chu

Title: Query Relaxation for XML Model

Abstract

XML (eXtensible Markup Language) is the new universal format for
structured documents and data on the World Wide Web.  As the Web
becomes a major means of disseminating and sharing information and as
the amount of XML data increases substantially, there are increased
needs to manage and query such XML data in a novel yet efficient way.
In this talk, I will particularly focus on one query processing
technique called "Query Relaxation" in the context of the XML model.
Unlike relational databases where the schema is relatively small and
fixed, the XML model allows varied/missing structures and values,
which make it difficult for users to ask questions precisely and
completely. To address such problems, query relaxation technique
enables systems to automatically weaken, when not satisfactory, the
given user query to a less restricted form to permit "approximate"
answers as well.

To support query relaxation for XML, I first present a formal
framework where users can express the precise semantics and behaviors
of query relaxation. This framework can also be used as the basis for
designing and implementing the eventual relaxation-enabled query
language. Secondly, I describe an array of issues that are related to
support query relaxation using native XML engines.  Especially, I
describe the notion of similarity between XML data trees using tree
edit distance and the issue of selectivity estimation of a set of
relaxed XML queries.  Lastly, I present issues involved in converting
data between XML and relational models. This is a necessary step to
support query relaxation for XML model by way of using the mature
relational database systems.

New Course Announcement

Date: 22 Feb 2002 12:57:55 -0800
From: "Junghoo (John) Cho" 
Newsgroups: -m, -h, -n, grads
Subject: [Course announcement] CS249 Advanced Topics in Information Systems

Hello everyone,

The following course will be offered in Spring 2002.

Course No: 249

Title: Advanced Topics in Information Management Systems

Catalog Description:

Study of underlying theories and technologies of information
management systems. Review of current literature in the area of
Web-related information systems.  Student presentation and classroom
discussion of selected papers.  Pursuing independent group project
proposed by students.

Objective:

Present a broad survey of database and information management systems
Provide the opportunity for students to develop the background
necessary to carry out research in this field

The course will cover various materials drawn from recent research
papers.  Tentative topics include

1. Web characterization
2. Page ranking
3. Combination of Information Retrieval and Database technology
4. Data extractiuon from the Web
5. Cache maintenance
6. XML databases

2/21: Giovanni Giuffrida's Ph.D. Defense.

Name:    Giovanni Giuffrida
Date:    Thursday, February 21, 2002
Time:    4:00 - 6:00 pm
Place:   4760 Boelter Hall
Advisor: Prof. W. Chu

Title: Data Mining of Large Relational Databases

                        Abstract

Knowledge Discovery from Databases and Data Mining (KDD/DM) is a young
multidisciplinary area that combines experiences from, besides others,
statistics, machine learning (ML), databases and, data visualization.
KDD/DM grew at breakneck pace in recent years driven by the needs of
an industry which, over the past decades, accumulated tremendous
amounts of data and, now, lacks the capability of effectively (and
efficiently) gathering relevant information from such a vast amount of
data. The relational model has largely shown its strength in
structuring and retrieving data when the type of information we are
looking for is well known. So, while a question like "How much did my
customers spend on product X in region Y?" is a straightforward task
for a relational database system, the same does not hold true for a
question like: "What are the reasons for the strong sales of product
X in region Y?" The industry has largely recognized the value of a
system able to "answer" the second type of question; the new wave of
KDD/DM applications addresses this issue.

KDD/DM is mostly rooted in the machine learning discipline and,
consequently, inherited many legacies that not necessarily fit in the
domain of large databases. This is mostly due to the memory-bound
nature of machine learning algorithms that was the de-facto choice
given the reduced size of the used databases.  Also, KDD/DM grew in a
sort of uncoordinated way fueled by fast growing commercial interests
and good successes in the research community. Even though nowadays
many tools and algorithms can be found, in both commercial and
research environments, no real standards have been yet proposed. For
instance, there is no standard way of structuring the database or,
similarly, there is no standard language for data mining. We believe
that the integration of data mining and databases has a lot to offer
in tackling some of these issues.

We tried to address these issues by setting the following three
objectives for this dissertation:

- Prove that efficient and effective data mining can be achieved
  on top of standard DBMS. We do so by presenting an algorithm
  tightly integrated with a commercial DBMS. We apply this algorithm
  to a commercial dataset and compare it with other algorithms.

- Introduce a couple of general heuristics that help to reduce the
  search space when mining large datasets. We do this by presenting an
  algorithm to discover classification rules that implements such
  heuristics and comparing it with other mining tools over a large
  commercial dataset.

- Promote integration of data mining and statistics. We do this by
  presenting two applications on real marketing problems on real data.
  In one case we prove that by combining data mining and statistics we
  achieve a result that is superior to the ones achieved when data
  mining and statistics are applied in isolation.  In the other case we
  prove that, other than groundwork costs, data mining models and
  statistical models are comparable in terms of predictive
  performances for the application at hand.

Last modified: Tue May 7 09:50:25 PDT 2002