Selective Preference Optimization via Token-Level Reward Function Estimation
| Publicatietype: | In proceedings |
| Citatie: | yang:2025 |
| Publication status: | Accepted |
| Boektitel: | Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP) |
| Jaar: | In Press |
| URL: | https://arxiv.org/abs/2408.135... |
| Trefwoorden: | |
| Auteurs | |
| Toegevoegd door: | [PRT] |
| Totaalscore: | 0 |
|
Bestanden
|
|
|
Aantekeningen
|
|
|
|
|
|
Onderwerpen
|
|
|
|
|
