zabirauf || Zohaib
Topic

LLM-Training

posts
Feb 8, 2025

DeepSeek-R1: A Peek Under the Hood

DeepSeek-R1 uses cost-effective Reinforcement Learning to unlock emergent reasoning. By rewarding correct, verifiable steps, it refines logic and answers—showcasing how systematic feedback can reduce data needs and boost performance. Here I discuss my understanding from research paper.

6 min read min read
Feb 15, 2024

A beginners guide to fine tuning LLM using LoRA

Discover how to create a synthetic dataset, select the right metrics for evaluation, and fine-tune your model using LoRA for a narrow scenario. Plus, learn how to serve your model efficiently using LLaMa.cpp on Mac/Linux.

8 min read min read