Tianxiao

Tianxiao

MongoDB

Software engineer at MongoDB. I work mostly on vector search using Lucene

Sessions

When BM25 Scores Disagree: A Corpus-Independent Alternative

Talk
7. May 2026, 16:00 - 16:45
Main Stage
In distributed search, BM25 returns different results across nodes because IDF and average document length vary with each node's corpus state. StableTfl replaces these with a term-length rarity heuristic, eliminating all corpus dependency. On 22 BEIR datasets, it retains ~90% of BM25's NDCG@10 while guaranteeing identical rankings across nodes.