Value Smoothing via Latent Embedding Similarity
Published:
Created an experiment to modify the reward structure of reinforcement learning algorithms to enhance learning capabilities in environments with sparse rewards.
This used similarity in embedding space to teach a model how to understand when an output receives a negative reward but is “almost correct” — smoothing the value landscape around near-correct states.
Experimentally showed that the algorithm performs better than standard approaches in environments with sparse rewards.
