Difference between revisions of "Tagger"

From Glottopedia
Jump to navigation Jump to search
Line 7: Line 7:
 
Conceptually, tagging can be considered as a three step process: (i). identification of the relevant units (ii). assigning all possible labels to the units (e.g. by lexical look-up, applying heuristics, etc.) (iii). disambiguation.
 
Conceptually, tagging can be considered as a three step process: (i). identification of the relevant units (ii). assigning all possible labels to the units (e.g. by lexical look-up, applying heuristics, etc.) (iii). disambiguation.
  
It is common practice to distinguish between rule-based and stochastic tagger, though in some cases it is not easy to decide
+
It is common practice to distinguish between rule-based and stochastic tagger, though some tagger combine rules and stochastic information.
 
   
 
   
Tagger erreichen je nach Textsorte eine Korrektheit von 90-97%.
+
State-of-the-art tagger achieve a precision of at least 95% for morpho-syntactic tagging.
  
 
==Subtypes==
 
==Subtypes==

Revision as of 17:54, 6 July 2007

Definition

A tagger is a device which assigns symbolic labels (tags) to linguistics units. The labels are taken from a predefined set of symbols (tag-set).

Comments

In most cases, a tagger assigns tags representing morpho-syntactic information to single word-forms or token. But there are tagger which have been designed to identify semantic role of noun phrases or prepositional phrases (sense tagging) and sometimes identiying the discourse structure of a text is considered as a king of tagging.

Conceptually, tagging can be considered as a three step process: (i). identification of the relevant units (ii). assigning all possible labels to the units (e.g. by lexical look-up, applying heuristics, etc.) (iii). disambiguation.

It is common practice to distinguish between rule-based and stochastic tagger, though some tagger combine rules and stochastic information.

State-of-the-art tagger achieve a precision of at least 95% for morpho-syntactic tagging.

Subtypes

Other Languages