iKernels

Machine Learning and NLP group at Trento.

WebCrow

WebCrow is one of the most advanced crossword puzzle solvers whose major assumption is to attack crosswords making use of the Web as its primary source of knowledge. It has been developed by Artificial Intelligence Group at the University of Siena (Italy) lead by Prof. Marco Gori.

In collaboration with the University of Siena and Arabic Language Technologies group at Qatar Research Computing Institute, we developed some new deep natural language models for the amazing system of WebCrow. On this page, you can find our publications and the datasets we used for them.

About the CWDB Dataset

We compiled a crossword corpus combining (i) CP downloaded from the Web and (ii) the clue database provided by Otsys.

We removed duplicates, fill-in-the-blank clues and clues representing anagrams or linguistic games. We collected over 6.3 millions of clues, published by many different American editors. Although this is a very rich database, it contains many duplicates and non-standard clues, which introduce significant noise in the dataset. For this reason we created a compressed dataset of 2 millions unique and standard clues, with associated answers. It excludes the fill-in-the-blank clues mentioned above.

Sources: www.crosswordgiant.com, www.otsys.com/clue

Contact person: Gianni Barlacchi - name dot surname at unitn dot it

CWDB v1.1 - June 2015 (ACL paper)

CWDB v1.0 - October 2014 (ECIR Paper)

CWDB v0.1 - Jan 2014 (Conll paper) [Deprecated]

CWDB v0.1 it - December 2014 (CLiC-it paper)

References

Severyn, A., Nicosia, M., Barlacchi, G., and Moschitti, A. (2015) Distributional Neural Networks for Automatic Resolution of Crossword Puzzles. In ACL 2015.

Nicosia, M., Barlacchi, G., and Moschitti, A. (2015) Learning to Rank Aggregated Answers for Crossword Puzzles. In ECIR 2015

Barlacchi, G., Nicosia, M. and Moschitti, A. (2014) Learning to Rank Answer Candidates for Automatic Resolution of Crossword Puzzles. In CoNLL 2014.

Barlacchi, G., Nicosia, M. and Moschitti, A. (2014) A Retrieval Model for Automatic Resolution of Crossword Puzzles in Italian Language. In CLiC-it, 2014.

Citing the Dataset

The CWDB dataset is available for research purposes. If you use this dataset in a scientific publication, we would appreciate citations to the following paper:

Barlacchi, G., Nicosia, M. and Moschitti, A. (2014) Learning to Rank Answer Candidates for Automatic Resolution of Crossword Puzzles. In CoNLL 2014.

@inproceedings{DBLP:conf/conll/BarlacchiNM14,
     author = {Gianni Barlacchi and
               Massimo Nicosia and
               Alessandro Moschitti},
      title = {Learning to Rank Answer Candidates for Automatic Resolution of Crossword Puzzles},
  booktitle = {Proceedings of the Eighteenth Conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26-27, 2014},
  pages     = {39--48},
  year      = {2014},
  crossref  = {DBLP:conf/conll/2014}, 
  url       = {http://aclweb.org/anthology/W/W14/W14-1605.pdf},
  timestamp = {Wed, 21 Jan 2015 17:09:29 +0100},
  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/conll/BarlacchiNM14},
  bibsource = {dblp computer science bibliography, http://dblp.org}
  }