| Speaker: | Rosie Jones |
| Language Technologies Institute, Carnegie Mellon University | |
| Date: | Wednesday 5th January 2000 |
| Time: | 14:00--15:30 |
| Place: | E6A 357, Macquarie University |
Abstract:
We are developing a system that can be trained to extract symbolic knowledge from hypertext, using a variety of machine learning methods. By embedding bag-of-words text classifiers, first-order classifiers which use hyper-link information, and sentence level information extractors in a web crawler, we are able to continuously and automatically augment a growing knowledge base from the contents of the world-wide web. This knowledge base can then be used as the basis for answering questions about information which is contained in distributed web sources.
In this talk I will give an overview of the project and how the components fit in, talk about one of our research paradigms which includes augmenting labeled training data with unlabeled data, and show some results for individual classifiers and extractors, as well as question answering capabilities of the resulting knowledge base.
More information on the Web->KB project can be found at http://www.cs.cmu.edu/~webkb/.
Bio:
Rosie Jones got her Bachelor of Science in Computer Science from Sydney University in 1994. She was a summer intern and then research programmer at MRI, and briefly enrolled in a Masters of Linguistics at Macquarie University, in early 1995. She has been a PhD student at Carnegie Mellon University since 1995, where she is in the Language Technologies Institute. Her current research focusses on reduction of training data requirements for easily reconfigurable information extraction systems for naive users.
Enquiries: sals@mri.mq.edu.au
| Last modified: 19th December 1999 |