keyboard_arrow_up
POS-Augmented Statistical Parsing Framework using Tree-Adjoining Grammar for Natural Language Processing

Authors

Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar and Lenali Singh, Centre for Development of Advanced Computing, India

Abstract

This paper presents a novel probabilistic parsing framework for Tree-Adjoining Grammar (TAG) that integrates part-of-speech (POS) information to enhance syntactic disambiguation and improve parsing accuracy. While TAG remains a linguistically expressive formalism for modelling complex syntactic phenomena such as long-distance dependencies and recursive structures, conventional statistical TAG parser predominantly relies on lexical information, limiting their ability to resolve structural ambiguities inherent in Natural Languages (NL). To address this limitation, we extend the probabilistic TAG formalism by conditioning derivation decisions jointly on lexical anchors and their associated POS tags. Our model supports both generative and discriminative formulations, incorporating POS-based feature representations into the derivation scoring mechanism. The training process is adapted to align POS-tagged lexical items with elementary tree structures, allowing the parser to learn syntactic patterns with greater accuracy and robustness. Empirical evaluations across multiple languages demonstrate that POS-augmented approach yields significant gains in parsing accuracy, particularly in the presence of syntactic ambiguity. The POS-Augmented Statistical Parser was evaluated on a dataset of 12,000 sentences, resulting a 30% reduction in parsing time compared to the conventional TAG Parser. The integration of POS not only enhances parsing speed but also provides structural advantages. In contrast, Tree Adjoining Grammar (TAG) often struggles to fully capture the complexity of linguistic phenomena, especially in cross-linguistic transfer between English and Indian languages. The proposed framework offers a scalable and linguistically informed enhancement to TAG-based systems, bridging symbolic grammatical representations with data-driven statistical learning. This POS-augmented approach offers a lightweight yet effective extension to existing TAG-based systems, enhancing their linguistic expressiveness and robustness for Natural Language Processing (NLP) applications.

Keywords

Natural Language Processing (NLP), Natural Languages (NL), Tree Adjoining Grammar (TAG), part-of-speech (POS)

Full Text  Volume 15, Number 15