Researcher, Ph.D.
Evaluation Quirks, Pitfalls and Some Recommendations: A little Survey
A collection of funny evaluation quirks and some general guidance for classification evaluation.
An ACL best-paper seems to have “disproven” Chomsky’s claim that LLMs can model all languages with “equal facility”. I argue that the story is more nuanced.
What’s in a %&!$# vector? Explaining semantic similarity
We check out two interesting methods for interpretability in semantic search.
How to hack an AMR Parsing evaluation – and what to do about it
We score 100 points on a popular NLP parsing benchmark with a simple hack! We also see how we can evaluate such parser more properly and safely.