Blog

Evaluation Quirks, Pitfalls and Some Recommendations: A little Survey

A collection of funny evaluation quirks and some general guidance for classification evaluation.

An ACL best-paper seems to have “disproven” Chomsky’s claim that LLMs can model all languages with “equal facility”. I argue that the story is more nuanced.

What’s in a %&!$# vector? Explaining semantic similarity

We check out two interesting methods for interpretability in semantic search.

How to hack an AMR Parsing evaluation – and what to do about it

We score 100 points on a popular NLP parsing benchmark with a simple hack! We also see how we can evaluate such parser more properly and safely.