Value smoothing using similarity-based on latent embeddings
Published:
Created an experiment to modify reward structure of reinforcement learning algorithms to enhance the learning capabilities in environments with sparse rewards. And experimentally showed that the algorithm performs better that standard in environments with sparce rewards.
This used similarity in embedding space to teach a model how to understand when the output is a negative reward, but “almost correct”.
