This model has been fine-tuned using GRPO (Generative Reward-Penalized Optimization) and compressed using qLoRA.
Try it out with different prompts and generation parameters!
32256
0.11
0.11
14
12
Enable/disable sampling for deterministic output
Toggle to show responses from both base and fine-tuned models
Your generated responses will appear here...
Example Prompts
Try these example prompts to test the model:
Technical Questions:
"What is machine learning?"
"What is deep learning?"
"What is the difference between supervised and unsupervised learning?"
Creative Writing:
"Write a short story about a robot learning to paint."
"Write a story about a time-traveling smartphone."
"Write a fairy tale about a computer learning to dream."