Download PDFOpen PDF in browser

Modeling Non-Compositional Expressions using a Search Engine

EasyChair Preprint 418

6 pagesDate: August 9, 2018

Abstract

Non-compositional multi-word expressions present great challenges to natural language processing applications. In this paper, we present a method for modeling non-compositional expressions based on the assumption that the meaning of expressions depends on context. Therefore, context words can be used to select documents and separate documents where the expression has different meanings. Deviation from a baseline is measured using serendipity (i.e. the pointwise effect size). We used this statistical measure to mark which patterns are over- and under-represented and to take a decision if the pattern under scrutiny belongs to the meaning selected by the context words or not. We used the Google search engine to find document frequency estimates. When used with Google document frequency estimates, the serendipity measure closely mirrors some human intuitions on the preferred alternative.

Keyphrases: Context Word, Frequency Machine, Natural Language Processing, Non-compositional, Serendipity, compositional meaning, compositional multi word expression, computational linguistic, conjunction fallacy, effect size, expected frequency, memory-based learning, multiword expressions, non compositional expression, non-compositional meaning, search engine, statistics

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:418,
  author    = {Cheikh Bamba Dione and Christer Johansson},
  title     = {Modeling Non-Compositional Expressions using a Search Engine},
  doi       = {10.29007/4jl9},
  howpublished = {EasyChair Preprint 418},
  year      = {EasyChair, 2018}}
Download PDFOpen PDF in browser