Lesson 3.3: Self-Refinement & Iteration

Duration: 60 minutes

Introduction

First drafts are rarely perfect. The key to high-quality outputs is iteration—generating, critiquing, and refining. In this lesson, you’ll learn how to make LLMs critique and improve their own outputs, creating a systematic refinement process that dramatically improves quality.

Multiple Perspectives

Generation and critique use different cognitive modes

Error Detection

Critique mode catches mistakes generation mode missed

Incremental Improvement

Each iteration builds on previous improvements

Quality Assurance

Built-in verification before final output

Self-Consistency Methods

Generate multiple reasoning paths and use majority voting.

The Self-Consistency Pattern

Generate Multiple Solutions

Create 3-5 independent solutions to the same problem

Compare Answers

Check if solutions agree or differ

Use Majority Vote

Select the answer that appears most frequently

Verify Consensus

If no consensus, investigate discrepancies

Example: Math Problem

Problem: A store offers 20% off, then an additional 10% off the discounted price. 
What's the total discount on a $100 item?

Generate 3 independent solutions:

Solution 1:
- First discount: $100 × 0.20 = $20
- Price after first discount: $100 - $20 = $80
- Second discount: $80 × 0.10 = $8
- Final price: $80 - $8 = $72
- Total discount: $100 - $72 = $28 (28%)

Solution 2:
- Combined discount: 1 - (0.80 × 0.90) = 1 - 0.72 = 0.28
- Total discount: 28%
- Final price: $100 × 0.72 = $72

Solution 3:
- First discount: 20% off = $80
- Second discount: 10% off $80 = $8 off
- Total saved: $20 + $8 = $28
- Total discount: 28%

Consensus: All three solutions agree on 28% total discount and $72 final price.
Answer: 28% total discount, final price $72

Research Finding: Self-consistency can improve accuracy by 20-30% on reasoning tasks compared to single-path solutions.

When Solutions Disagree

Problem: If 5 machines make 5 widgets in 5 minutes, how long does it take 100 machines 
to make 100 widgets?

Solution 1:
- 5 machines make 5 widgets in 5 minutes
- So 100 machines make 100 widgets in 5 minutes
Answer: 5 minutes

Solution 2:
- 5 machines make 5 widgets in 5 minutes
- Rate: 1 widget per machine per 5 minutes
- 100 machines × 1 widget each = 100 widgets in 5 minutes
Answer: 5 minutes

Solution 3:
- 5 machines make 5 widgets in 5 minutes
- 100 machines make 20× more = 100 widgets
- Time needed: 5 minutes × (100/5) = 100 minutes
Answer: 100 minutes ❌

Analysis: Solutions 1 and 2 agree (5 minutes). Solution 3 made an error in reasoning.
The correct answer is 5 minutes (majority vote).

Verification: Each machine makes 1 widget in 5 minutes, so 100 machines make 100 widgets 
in 5 minutes simultaneously.

Iterative Improvement

The Generate → Critique → Refine cycle.

Step 1: Generate Initial Response
[Create first draft]

Step 2: Critique the Response
Identify weaknesses:
- What's missing?
- What could be clearer?
- What's incorrect?
- What could be improved?

Step 3: Generate Improved Version
[Create refined version addressing critiques]

Example: Email Writing

Task: Write a professional email declining a job offer.

STEP 1: Initial Draft

Subject: Job Offer

Hi,

Thanks for the offer but I've decided to go with another company. 

Best,
John

STEP 2: Critique

Issues identified:
- Subject line too vague
- Lacks gratitude and professionalism
- Too brief and abrupt
- Doesn't maintain relationship
- No specific reason given
- Missing proper closing

STEP 3: Refined Version

Subject: Re: Software Engineer Position - Decision

Dear [Hiring Manager Name],

Thank you so much for offering me the Software Engineer position at [Company Name]. 
I truly appreciate the time you and your team invested in the interview process and 
the opportunity to learn more about your innovative work.

After careful consideration, I have decided to accept a position with another organization 
that aligns more closely with my current career goals. This was a difficult decision, as 
I was impressed by [Company Name]'s mission and the talented team I met.

I hope we can stay in touch, and I wish you and the team continued success.

Warm regards,
John Smith

Task: Explain quantum computing to a 10-year-old.

ITERATION 1:
Quantum computers use quantum mechanics to process information using qubits instead of bits, 
leveraging superposition and entanglement for parallel computation.

Critique 1:
- Too technical (quantum mechanics, qubits, superposition, entanglement)
- No relatable examples
- Assumes too much prior knowledge

ITERATION 2:
Quantum computers are super powerful computers that work differently from regular computers. 
They can solve really hard problems much faster.

Critique 2:
- Better but still vague
- Doesn't explain HOW they're different
- Needs concrete example
- Could use analogy

ITERATION 3:
Imagine you're in a maze trying to find the exit. A regular computer would try one path at 
a time until it finds the way out. A quantum computer is like having a magical ability to 
try ALL paths at the same time! That's why it can solve really tricky puzzles much faster 
than regular computers. Scientists use them for things like creating new medicines and 
understanding space.

Critique 3:
- Clear analogy (maze)
- Age-appropriate language
- Concrete examples (medicines, space)
- Explains the key difference (parallel vs sequential)
- Good! Minor improvement: could add one more relatable example

FINAL VERSION:
Imagine you're in a maze trying to find the exit. A regular computer would try one path at 
a time until it finds the way out. A quantum computer is like having a magical ability to 
try ALL paths at the same time! That's why it can solve really tricky puzzles much faster.

Scientists use quantum computers for amazing things like:
- Creating new medicines to help sick people
- Understanding how stars and planets work
- Making super-secure passwords that no one can crack

It's like having a super-brain that can think about many things at once!

Self-Critique Prompting

Make the model evaluate its own output.

The Self-Critique Pattern

Task: [TASK DESCRIPTION]

Step 1: Generate initial response
[RESPONSE]

Step 2: Critique your response
Evaluate based on:
- Accuracy: Is the information correct?
- Completeness: Is anything missing?
- Clarity: Is it easy to understand?
- Relevance: Does it address the task?

Identified issues:
[LIST ISSUES]

Step 3: Generate improved response
[IMPROVED RESPONSE]

Example: Code Review

Task: Write a Python function to check if a string is a valid email address.

STEP 1: Initial Code

def is_valid_email(email):
    return '@' in email and '.' in email

STEP 2: Self-Critique

Accuracy Issues:
- Too simplistic - "[email protected]" would pass but isn't valid
- Doesn't check for proper format
- Doesn't validate domain
- Allows multiple @ symbols

Completeness Issues:
- No input validation (None, empty string)
- No docstring
- No error handling
- Missing edge cases

Clarity Issues:
- Function name is good
- But logic is too simple to be clear about what it actually validates

STEP 3: Improved Code

import re

def is_valid_email(email):
    """
    Validate email address format.
    
    Args:
        email (str): Email address to validate
        
    Returns:
        bool: True if valid email format, False otherwise
        
    Examples:
        >>> is_valid_email("[email protected]")
        True
        >>> is_valid_email("invalid.email")
        False
    """
    if not email or not isinstance(email, str):
        return False
    
    # RFC 5322 simplified pattern
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    
    return bool(re.match(pattern, email))

Verification Strategies

Fact-Checking Pattern

Statement: [CLAIM TO VERIFY]

Step 1: Identify verifiable facts
[LIST FACTS]

Step 2: Check each fact
Fact 1: [FACT]
Verification: [TRUE/FALSE/UNCERTAIN]
Source/Reasoning: [EXPLANATION]

Step 3: Overall assessment
[VERIFIED/PARTIALLY VERIFIED/UNVERIFIED]

Example: Fact Verification

Statement: "The Great Wall of China is visible from space and is over 13,000 miles long."

Step 1: Identify verifiable facts
Fact A: The Great Wall is visible from space
Fact B: The Great Wall is over 13,000 miles long

Step 2: Check each fact

Fact A: Visible from space
Verification: FALSE
Reasoning: This is a common myth. The Great Wall is not visible from space with the naked eye. 
NASA astronauts have confirmed this. While it's visible from low Earth orbit with aid, it's 
not visible from the Moon or deep space.

Fact B: Over 13,000 miles long
Verification: TRUE
Reasoning: According to archaeological surveys, the Great Wall (including all branches) 
measures approximately 13,171 miles (21,196 km). This was confirmed by Chinese surveys 
in 2012.

Step 3: Overall assessment
PARTIALLY VERIFIED: The length claim is accurate, but the visibility from space claim is false.

Corrected statement: "The Great Wall of China is over 13,000 miles long, making it one of 
the longest structures ever built, though contrary to popular belief, it is not visible 
from space with the naked eye."

Logic Verification

Argument: [LOGICAL ARGUMENT]

Step 1: Identify premises and conclusion
Premises:
1. [PREMISE 1]
2. [PREMISE 2]
Conclusion: [CONCLUSION]

Step 2: Check logical validity
Does the conclusion follow from the premises?
[ANALYSIS]

Step 3: Check soundness
Are the premises true?
[VERIFICATION]

Step 4: Final assessment
[VALID/INVALID, SOUND/UNSOUND]

Quality-Critical Outputs

Use when: The output will be published, presented, or used for important decisionsExample: Business proposals, research papers, legal documents

Complex Creative Tasks

Use when: The task requires creativity and multiple iterations improve qualityExample: Marketing copy, story writing, design briefs

Technical Accuracy Required

Use when: Errors could have serious consequencesExample: Code, medical information, financial advice

Ambiguous Requirements

Use when: The initial requirements weren’t perfectly clearExample: First attempt reveals misunderstandings that need correction

Focus refinement on specific aspects.

Initial Output: [CONTENT]

Refinement Focus: [SPECIFIC ASPECT]

Critique focused on [ASPECT]:
[TARGETED CRITIQUE]

Refined version (improving [ASPECT]):
[IMPROVED CONTENT]

Example:

Initial: "Our product is good and customers like it."

Refinement Focus: Specificity and evidence

Critique: Too vague. "Good" is subjective. "Customers like it" needs evidence.

Refined: "Our product has a 4.8/5 star rating from over 10,000 customers, with 94% 
reporting they would recommend it to others."

Generate multiple versions and select the best.

Task: [TASK]

Version A: [APPROACH 1]
Version B: [APPROACH 2]
Version C: [APPROACH 3]

Comparison:
- Version A: [STRENGTHS/WEAKNESSES]
- Version B: [STRENGTHS/WEAKNESSES]
- Version C: [STRENGTHS/WEAKNESSES]

Best version: [SELECTION]
Reasoning: [WHY]

Final output (combining best elements):
[OPTIMIZED VERSION]

Refine based on different stakeholder perspectives.

Initial Output: [CONTENT]

Perspective 1: [STAKEHOLDER TYPE]
Concerns: [ISSUES FROM THIS PERSPECTIVE]
Refinement: [ADJUSTMENTS]

Perspective 2: [STAKEHOLDER TYPE]
Concerns: [ISSUES FROM THIS PERSPECTIVE]
Refinement: [ADJUSTMENTS]

Balanced Final Version:
[CONTENT ADDRESSING ALL PERSPECTIVES]

Best Practices

Specific Criteria

Define clear criteria for what makes a good output

Multiple Iterations

Don’t stop at one refinement—iterate 2-3 times for critical content

Objective Critique

Focus on concrete issues, not vague “could be better”

Track Changes

Document what changed and why in each iteration

Practice Exercises

Refine this customer service email through 2 iterations. Initial: “We got your complaint. The problem is being looked at. We’ll let you know.”

Sample Solution

ITERATION 1 - Critique:
- Too informal and brief
- Lacks empathy
- No timeline
- Passive voice
- No specific action mentioned

ITERATION 1 - Refined:
"Thank you for contacting us about your issue. We're currently investigating the problem 
and will update you soon."

ITERATION 2 - Critique:
- Better but still vague ("soon")
- Could be more empathetic
- Should acknowledge specific issue
- Could offer interim solution

ITERATION 2 - Final:
"Thank you for bringing this to our attention. I sincerely apologize for the inconvenience 
you've experienced with [specific issue]. Our technical team is actively investigating and 
we'll have an update for you within 24 hours. In the meantime, [interim solution if applicable]. 
Please don't hesitate to reach out if you have any questions."

Refine this function through self-critique.

def calculate(a, b, op):
    if op == '+':
        return a + b
    elif op == '-':
        return a - b
    elif op == '*':
        return a * b
    elif op == '/':
        return a / b

Sample Solution

# CRITIQUE:
# - No input validation
# - Division by zero not handled
# - No docstring
# - Limited operators
# - No type hints
# - No error handling for invalid operators

# REFINED VERSION:
def calculate(a: float, b: float, operation: str) -> float:
    """
    Perform basic arithmetic operations.
    
    Args:
        a: First number
        b: Second number
        operation: Operation to perform ('+', '-', '*', '/')
        
    Returns:
        Result of the operation
        
    Raises:
        ValueError: If operation is invalid or division by zero
        TypeError: If inputs are not numbers
    """
    # Input validation
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError("Both operands must be numbers")
    
    # Perform operation
    operations = {
        '+': lambda x, y: x + y,
        '-': lambda x, y: x - y,
        '*': lambda x, y: x * y,
        '/': lambda x, y: x / y if y != 0 else (_ for _ in ()).throw(
            ValueError("Division by zero")
        )
    }
    
    if operation not in operations:
        raise ValueError(f"Invalid operation: {operation}")
    
    return operations[operation](a, b)

Refine this technical explanation for a non-technical audience. Initial: “The API uses RESTful architecture with JSON payloads over HTTPS, implementing OAuth 2.0 for authentication.”

Sample Solution

ITERATION 1 - Critique:
- Too many technical terms
- No context for why this matters
- Assumes knowledge of REST, JSON, OAuth
- Doesn't explain benefits

ITERATION 1 - Refined:
"The API is a way for different software programs to communicate securely. It uses 
industry-standard methods to keep your data safe."

ITERATION 2 - Critique:
- Better but too vague
- Could use analogy
- Should mention specific benefits
- "Industry-standard" is still jargon

ITERATION 2 - Final:
"Think of our API as a secure messenger between different apps. When one app needs 
information from another, the API delivers it safely—like a courier with a locked briefcase. 
This means your data stays private, and different tools you use can work together seamlessly."

Real-World Application: Content Quality System

You are a content quality assistant. Help refine content through systematic critique.

Content: [INPUT CONTENT]
Purpose: [INTENDED USE]
Audience: [TARGET AUDIENCE]

STEP 1: Initial Assessment
Rate the content (1-10) on:
- Clarity: [SCORE] - [REASONING]
- Accuracy: [SCORE] - [REASONING]
- Completeness: [SCORE] - [REASONING]
- Engagement: [SCORE] - [REASONING]
- Appropriateness: [SCORE] - [REASONING]

STEP 2: Detailed Critique
Identify specific issues:
- Clarity issues: [LIST]
- Factual concerns: [LIST]
- Missing elements: [LIST]
- Tone/style issues: [LIST]
- Structural problems: [LIST]

STEP 3: Prioritized Improvements
High priority (must fix):
1. [ISSUE] - [WHY CRITICAL]

Medium priority (should fix):
1. [ISSUE] - [WHY IMPORTANT]

Low priority (nice to have):
1. [ISSUE] - [WHY BENEFICIAL]

STEP 4: Refined Version
[IMPROVED CONTENT]

STEP 5: Verification
Confirm improvements address all high-priority issues:
✓ [ISSUE] - [HOW ADDRESSED]

Key Takeaways

Use self-consistency (multiple paths + majority vote) for critical reasoning

Apply Generate → Critique → Refine cycle for quality improvement

Make critiques specific and actionable, not vague

Iterate 2-3 times for important outputs

Verify facts and logic before finalizing

Focus refinement on specific aspects when needed

Next Steps

You’ve mastered self-refinement. Now learn to combine multiple reasoning paths through ensembling for even more robust solutions.

Next: Lesson 3.4 - Ensembling & Multi-Path Reasoning

Combine multiple approaches for robust solutions

Module 1

Module 2

Module 3

​Introduction

​Why Self-Refinement Works

Multiple Perspectives

Error Detection

Incremental Improvement

Quality Assurance

​Self-Consistency Methods

​The Self-Consistency Pattern

​Example: Math Problem

​When Solutions Disagree

​Iterative Improvement

​Three-Step Refinement Pattern

​Example: Email Writing

​Multi-Iteration Refinement

​Self-Critique Prompting

​The Self-Critique Pattern

​Example: Code Review

​Verification Strategies

​Fact-Checking Pattern

​Example: Fact Verification

​Logic Verification

​When Refinement Helps Most

​Advanced Refinement Techniques

​Targeted Refinement

​Comparative Refinement

​Stakeholder-Focused Refinement

​Best Practices

Specific Criteria

Multiple Iterations

Objective Critique

Track Changes

​Practice Exercises

​Exercise 1: Email Refinement

​Exercise 2: Code Refinement

​Exercise 3: Explanation Refinement

​Real-World Application: Content Quality System

​Key Takeaways

​Next Steps

Next: Lesson 3.4 - Ensembling & Multi-Path Reasoning

Introduction

Why Self-Refinement Works

Self-Consistency Methods

The Self-Consistency Pattern

Example: Math Problem

When Solutions Disagree

Iterative Improvement

Three-Step Refinement Pattern

Example: Email Writing

Multi-Iteration Refinement

Self-Critique Prompting

The Self-Critique Pattern

Example: Code Review

Verification Strategies

Fact-Checking Pattern

Example: Fact Verification

Logic Verification

When Refinement Helps Most

Advanced Refinement Techniques

Targeted Refinement

Comparative Refinement

Stakeholder-Focused Refinement

Best Practices

Practice Exercises

Exercise 1: Email Refinement

Exercise 2: Code Refinement

Exercise 3: Explanation Refinement

Real-World Application: Content Quality System

Key Takeaways

Next Steps