LAVA: Data Valuation without Pre-Specified Learning Algorithms

We introduce LAVA, a framework for valuing data independent of any pre-specified learning algorithm. We derive a proxy for validation performance via a class-wise Wasserstein distance between training and validation sets and show it upper bounds validation performance under Lipschitz conditions. Sensitivity analysis of this distance yields pointwise values that can be obtained directly from solver outputs, avoiding repeated training runs. We evaluate LAVA across diverse settings, enabling algorithm-agnostic data valuation for acquisition and pricing.

Authors

Hoang. Just

Feiyang Kang

Tianhao Wang

Yi Zeng

Myeongseob Ko

Ming Jin

Ruoxi Jia

Published

January 1, 2023