Combining
Strategies for Extracting Relations from Text Collections
Eugene Agichtein
Eleazar Eskin
Luis Gravano
Proceedings of the 2000
ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge
Discovery (DMKD 2000).
To appear.
Abstract
Text documents often contain valuable structured data that
is hidden in regular English sentences.
This data is best exploited if available as a relational table
that we could use for answering precise queries or for running
data mining tasks.
Our Snowball system extracts these relations from document
collections starting with only a handful of user-provided example
tuples. Based on these tuples, Snowball generates patterns
that are used, in turn, to find more tuples.
In this paper we introduce a new pattern and tuple generation
scheme for Snowball, with different strengths and weaknesses than
those of our original system. We also show preliminary results on how we
can combine the two versions of Snowball to extract tuples more
accurately.