Skip to main content
Duration: 60 minutes

Introduction

Information extraction transforms unstructured text into structured data. Whether you’re parsing resumes, analyzing contracts, or extracting entities from news articles, the right prompting patterns make all the difference. In this lesson, you’ll master techniques that progress from simple to complex extraction tasks.

Why Information Extraction Matters

Business Value

Automate data entry, document processing, and content analysis at scale

Accuracy Critical

Missing or incorrect extractions can have serious downstream consequences

Named Entity Recognition (NER)

Simple Entity Extraction

Start with the basics—extracting specific entity types.
Extract all person names from the following text.

Text: "Sarah Chen met with Dr. James Wilson and Maria Rodriguez to discuss the project."

Person names:
Output:
- Sarah Chen
- James Wilson
- Maria Rodriguez

Multi-Category NER

Extract multiple entity types simultaneously.
Extract entities from the text and categorize them.

Text: "For Tom Jenkins, CEO of the European Tourism Organisation, 2024 represents a shift in travel patterns across Europe."

Format your response as:
- Person: [names]
- Organization: [organizations]
- Date: [dates]
- Location: [locations]
Output:
- Person: Tom Jenkins
- Organization: European Tourism Organisation
- Date: 2024
- Location: Europe
Pro Tip: Explicitly specify the output format to ensure consistency across extractions.

NER with Context

Sometimes you need more than just the entity—you need its role or relationship.
Extract people mentioned in the text along with their roles.

Text: "The research was led by Dr. Emily Watson, with contributions from graduate student Michael Park and advisor Professor Linda Chen."

Format: Name | Role
Output:
Dr. Emily Watson | Research Lead
Michael Park | Graduate Student
Professor Linda Chen | Advisor

Relation Extraction

Extract not just entities, but the relationships between them.
Identify relationships between entities in the following text.

Text: "Apple acquired the startup founded by Jane Smith for $500 million. Smith will join Apple as VP of Innovation."

Extract:
1. Acquisition: [Acquirer] acquired [Target] for [Amount]
2. Employment: [Person] joined [Company] as [Role]
Output:
1. Acquisition: Apple acquired startup (founded by Jane Smith) for $500 million
2. Employment: Jane Smith joined Apple as VP of Innovation

Template-Based Extraction

Structured Field Extraction

Extract specific fields from semi-structured text.
Extract the following information from the job posting:

Job Posting: "Senior Software Engineer needed at TechCorp. Requirements: 5+ years Python experience, BS in Computer Science. Salary: $120k-$150k. Location: Remote (US only). Apply by March 15, 2024."

Extract:
- Job Title:
- Company:
- Experience Required:
- Education Required:
- Salary Range:
- Location:
- Application Deadline:
Output:
- Job Title: Senior Software Engineer
- Company: TechCorp
- Experience Required: 5+ years Python experience
- Education Required: BS in Computer Science
- Salary Range: $120k-$150k
- Location: Remote (US only)
- Application Deadline: March 15, 2024

Progressive Extraction (Simple → Complex)

Build complexity gradually for better accuracy.
1

Step 1: Extract Basic Info

Start with obvious, easy-to-identify information
2

Step 2: Extract Relationships

Identify connections between extracted entities
3

Step 3: Infer Implicit Info

Derive information that’s implied but not stated
Example:
Analyze this customer review in three steps:

Review: "I ordered the blue XL shirt on Monday. It arrived Thursday but was the wrong size. Customer service sent a replacement the same day, which arrived Saturday. Much better experience the second time!"

Step 1 - Extract explicit facts:
- Product:
- Order date:
- Delivery dates:
- Issue:
- Resolution:

Step 2 - Identify timeline:
- Total resolution time:
- Response speed:

Step 3 - Infer sentiment:
- Initial experience:
- Resolution experience:
- Overall satisfaction:
Output:
Step 1 - Extract explicit facts:
- Product: Blue XL shirt
- Order date: Monday
- Delivery dates: Thursday (first), Saturday (second)
- Issue: Wrong size
- Resolution: Replacement sent same day

Step 2 - Identify timeline:
- Total resolution time: 5 days (Monday to Saturday)
- Response speed: Same-day replacement shipment

Step 3 - Infer sentiment:
- Initial experience: Negative (wrong item)
- Resolution experience: Positive (fast response)
- Overall satisfaction: Positive (problem resolved quickly)

Structured Output Extraction

JSON Format Extraction

Request data in JSON format for easy integration.
Extract information from the following text and format as JSON.

Text: "Meeting scheduled for January 15, 2024 at 2:00 PM in Conference Room B. Attendees: Alice Johnson (host), Bob Smith, Carol White. Agenda: Q1 budget review."

JSON format:
{
  "date": "",
  "time": "",
  "location": "",
  "host": "",
  "attendees": [],
  "agenda": ""
}
Output:
{
  "date": "2024-01-15",
  "time": "14:00",
  "location": "Conference Room B",
  "host": "Alice Johnson",
  "attendees": ["Alice Johnson", "Bob Smith", "Carol White"],
  "agenda": "Q1 budget review"
}
Best Practice: Provide the JSON schema in your prompt to ensure consistent structure.

Table Extraction

Convert unstructured text into tabular format.
Extract product information and format as a table.

Text: "We offer three plans: Basic at $10/month with 10GB storage, Pro at $25/month with 100GB storage and priority support, and Enterprise at $50/month with unlimited storage, priority support, and dedicated account manager."

Create a table with columns: Plan | Price | Storage | Support | Account Manager
Output:
| Plan       | Price/Month | Storage    | Support          | Account Manager |
|------------|-------------|------------|------------------|-----------------|
| Basic      | $10         | 10GB       | Standard         | No              |
| Pro        | $25         | 100GB      | Priority         | No              |
| Enterprise | $50         | Unlimited  | Priority         | Yes             |

Advanced Extraction Techniques

Conditional Extraction

Extract different information based on document type.
Identify the document type, then extract relevant information.

Document: "INVOICE #12345. Date: Jan 10, 2024. Bill To: Acme Corp. Items: Widget A ($100), Widget B ($150). Total: $250."

Step 1 - Document Type:
Step 2 - Extract based on type:
(If invoice: invoice number, date, customer, items, total)
(If receipt: transaction ID, date, merchant, items, total)
(If quote: quote number, date, customer, items, validity period)
Output:
Step 1 - Document Type: Invoice

Step 2 - Extract based on type:
- Invoice Number: 12345
- Date: January 10, 2024
- Customer: Acme Corp
- Items: Widget A ($100), Widget B ($150)
- Total: $250

Multi-Document Extraction

Extract and aggregate information from multiple sources.
Extract key information from these three customer feedback entries and summarize common themes.

Feedback 1: "Love the new dashboard! Much easier to navigate."
Feedback 2: "Dashboard is great, but mobile app needs work."
Feedback 3: "New dashboard is intuitive. Please add dark mode."

Extract:
1. Mentioned features:
2. Positive feedback:
3. Improvement requests:
4. Common themes:
Output:
1. Mentioned features: Dashboard, Mobile app, Dark mode
2. Positive feedback: Dashboard is easier to navigate, intuitive, great
3. Improvement requests: Mobile app improvements, Dark mode addition
4. Common themes: Dashboard redesign is well-received; users want mobile and dark mode enhancements

Handling Extraction Challenges

Dealing with Missing Information

Extract the following fields. If information is not present, indicate "Not specified".

Text: "John Doe applied for the position. He has 5 years of experience."

Extract:
- Name:
- Position:
- Experience:
- Education:
- Salary Expectation:
Output:
- Name: John Doe
- Position: Not specified
- Experience: 5 years
- Education: Not specified
- Salary Expectation: Not specified

Handling Ambiguous Information

Extract information and note any ambiguities.

Text: "The meeting is next Tuesday at 3."

Extract:
- Date:
- Time:
- Ambiguities:
Output:
- Date: Next Tuesday (specific date depends on current date)
- Time: 3:00 (AM/PM not specified, likely PM based on context)
- Ambiguities: Exact date unclear without current date; AM/PM not specified for time

Validation and Confidence

Extract information and provide confidence levels (High/Medium/Low).

Text: "Dr. Smith mentioned the patient might need surgery, possibly next month."

Extract:
- Doctor: [name] (Confidence: )
- Recommendation: [action] (Confidence: )
- Timeline: [when] (Confidence: )
Output:
- Doctor: Dr. Smith (Confidence: High - explicitly stated)
- Recommendation: Surgery (Confidence: Medium - "might need" indicates uncertainty)
- Timeline: Next month (Confidence: Low - "possibly" indicates speculation)

Best Practices

Start Simple

Begin with basic extraction, then add complexity

Specify Format

Clearly define the output structure you need

Handle Missing Data

Explicitly instruct how to handle absent information

Validate Extractions

Request confidence levels for critical extractions

Practice Exercises

Exercise 1: Resume Parsing

Extract structured information from a resume.
Resume: "Jane Smith, jane@email.com, (555) 123-4567. Education: BS Computer Science, MIT, 2018. Experience: Software Engineer at Google (2018-2021), Senior Engineer at Meta (2021-present). Skills: Python, Java, React, AWS."

Extract and format as JSON with fields: name, contact, education, experience, skills.
Extract the following information from the resume and format as JSON:

{
  "name": "",
  "contact": {
    "email": "",
    "phone": ""
  },
  "education": [
    {
      "degree": "",
      "institution": "",
      "year": ""
    }
  ],
  "experience": [
    {
      "title": "",
      "company": "",
      "period": ""
    }
  ],
  "skills": []
}

Resume: [INSERT RESUME TEXT]

Exercise 2: Contract Analysis

Extract key terms from a service agreement.
Extract the following key terms from the service agreement:

Agreement: [INSERT TEXT]

Extract:
- Parties: [Party A] and [Party B]
- Service Description:
- Contract Duration:
- Payment Terms:
- Termination Conditions:
- Liability Limitations:

For each field, quote the relevant text and provide your interpretation.

Exercise 3: News Article Extraction

Extract structured data from a news article.
Extract information from the news article:

Article: [INSERT ARTICLE]

Extract:
- Main Event:
- Key People: (Name | Role)
- Organizations Mentioned:
- Locations:
- Dates/Timeline:
- Quotes: (Speaker | Quote)
- Impact/Significance:

Real-World Application: Customer Support Ticket Parser

Build a system to extract structured data from support tickets:
Parse the following customer support ticket and extract structured information.

Ticket: "Subject: Login Issues. From: john@company.com. Priority: High. I can't log into my account since yesterday. I've tried resetting my password twice but the reset emails aren't arriving. My account email is john@company.com and my user ID is JD12345. This is blocking my work. Please help ASAP!"

Extract:
{
  "ticket_info": {
    "subject": "",
    "from": "",
    "priority": "",
    "user_id": ""
  },
  "issue": {
    "category": "",
    "description": "",
    "started": "",
    "attempts_to_resolve": []
  },
  "urgency_indicators": [],
  "required_action": ""
}
Output:
{
  "ticket_info": {
    "subject": "Login Issues",
    "from": "john@company.com",
    "priority": "High",
    "user_id": "JD12345"
  },
  "issue": {
    "category": "Authentication/Login",
    "description": "Unable to log in; password reset emails not received",
    "started": "Yesterday",
    "attempts_to_resolve": [
      "Tried resetting password twice",
      "Reset emails not arriving"
    ]
  },
  "urgency_indicators": [
    "High priority",
    "Blocking work",
    "ASAP request"
  ],
  "required_action": "Investigate email delivery and account access issues for user JD12345"
}

Key Takeaways

Use progressive extraction: simple → complex
Specify exact output format (JSON, table, list)
Handle missing information explicitly
Extract relationships, not just entities
Validate critical extractions with confidence levels

Next Steps

You’ve mastered extracting information from text. Now learn to generate new content with specific constraints and attributes.

Next: Lesson 2.3 - Text Generation Prompts

Create content with constraints and attributes
I