Download PDFOpen PDF in browser

Contextual Predictability of Texts for Texts Processing and Understanding

EasyChair Preprint 2398

16 pagesDate: January 17, 2020

Abstract

This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, fea-tures, metrics and sets of features. The aim of this paper is to identify the depend-ence of the implementation of contextual predictability procedures on the genre characteristics of the text (or text collection): scientific vs. fictional. We construct a model predicting text elements and designate its features for texts of different genres and domains. We analyze various methods for studying contextual pre-dictability, carry out a computational experiment against scientific and fictional texts, and verify its results by the experiment with informants (cloze-tests) and word embeddings (word2vec CBOW model). As a result, text processing model is built. It is evaluated based on the selected contextual predictability features and experiments with informants.

Keyphrases: Contextual Predictability, Dice, Fiction texts, Informational Entropy, cloze test, computational linguistic, conditional probability, language model, scientific corpora, surprisal

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:2398,
  author    = {Olga Krutchenko and Ekaterina Pronoza and Elena Yagunova and Viktor Timokhov and Alexander Ivanets},
  title     = {Contextual Predictability of Texts for Texts Processing and Understanding},
  howpublished = {EasyChair Preprint 2398},
  year      = {EasyChair, 2020}}
Download PDFOpen PDF in browser