Aldeman: New York City families have a wide variety of district and charter schools where literacy rates are soaring despite ...
This is the repo for the Layer_Gradient project, in which we try to understand the layer-wise gradient behaviors when LLMs are finetuned on Fast vs. Slow Thinking. What makes a difference in the ...
In this paper, we propose a new technique of prompt optimization with smaller language models, called GReaTer that leverages gradient information over task-specific reasoning. Compared to text-based ...