Phi-2 Fine-tuned with GRPO and qLoRA

This model has been fine-tuned using GRPO (Generative Reward-Penalized Optimization) and compressed using qLoRA. Try it out with different prompts and generation parameters!

32 256
0.1 1
0.1 1
1 4
1 2

Enable/disable sampling for deterministic output

Toggle to show responses from both base and fine-tuned models

Your generated responses will appear here...

Example Prompts

Try these example prompts to test the model:

  1. Technical Questions:

    • "What is machine learning?"
    • "What is deep learning?"
    • "What is the difference between supervised and unsupervised learning?"
  2. Creative Writing:

    • "Write a short story about a robot learning to paint."
    • "Write a story about a time-traveling smartphone."
    • "Write a fairy tale about a computer learning to dream."
    • "Create a story about an AI becoming an artist."
  3. Technical Explanations:

    • "How does neural network training work?"
    • "Explain quantum computing in simple terms."
    • "What is transfer learning?"
  4. Creative Tasks:

    • "Write a poem about artificial intelligence."
    • "Write a poem about the future of technology."
    • "Create a story about a robot learning to dream."