creative_2004_13_079_083_001

Towards Clustering from Corpora by Using Pattern Oriented Approaches


Dana Avram Lupșa


Abstract

creative_2004_13_079_083_abstract

Full PDF

creative_2004_13_079_083

Clustering from untagged corpora is very important especially for languages that have no such hierarchies as WordNet. The idea is to combine automatic noun clustering from unannotated corpora with some supervised learning methods. This paper presents a study on automatic noun clustering from texts selected from corpora by using a pattern-oriented filter. The patterns used are oriented to Romanian language but can be extended also to other languages. A comparison between results obtained by not applying any pattern and by applying different patterns (as filters) is also presented.

Additional Information

Author(s)

Lupsa, Dana