LOOMAL
How to

How to parse an email reply
for an LLM to read.

Raw email bodies are a mess of quoted history, signatures, and disclaimers. Here's how to hand the model only the content it needs.

An incoming email reply is usually 90% noise. Quoted history from every prior message, a five-line signature, a corporate 'this email and any attachments are confidential' disclaimer, an unsubscribe footer. Handing all of that to an LLM is expensive and produces worse output because the model gets confused about which part to respond to.

The job is to extract just the new content. There are three common approaches, listed from hackiest to cleanest: regex heuristics, the talon library, and a service that does it for you.

1. Regex heuristics

The fastest fix is to split on common quote markers. Gmail-style replies begin with 'On Mon, ...' or 'wrote:'. Outlook uses 'From: ... Sent: ... To:'. Cut everything below the first marker and you're mostly done.

This works 70% of the time and fails silently the other 30%. The failure cases are ugly: top-posted HTML replies with no marker, bottom-posted messages with interleaved new content, replies in different languages. Fine for prototyping, insufficient for production.

heuristic.py
import re

QUOTE_MARKERS = [
    r"\nOn .* wrote:",
    r"\nFrom:.*\nSent:",
    r"\n-+ ?Original Message ?-+",
    r"\n_{5,}",  # separator lines
]

def extract_reply(body: str) -> str:
    for marker in QUOTE_MARKERS:
        match = re.search(marker, body, flags=re.DOTALL)
        if match:
            body = body[:match.start()]
    # Strip trailing signature
    body = re.sub(r"\n-- ?\n.*$", "", body, flags=re.DOTALL)
    return body.strip()

2. Use talon (Mailgun's library)

talon is an open-source library that handles far more edge cases — multilingual markers, HTML replies, signature detection trained on a real dataset. It's Python-only and unmaintained, but still the best free option for serious parsing.

Install with pip install talon. Call extract_from for the body; it returns just the new content.

with_talon.py
from talon import quotations, signature

def clean(body: str, from_addr: str) -> str:
    reply = quotations.extract_from(body)
    text, _sig = signature.bruteforce.extract(reply, sender=from_addr)
    return text

3. Let Loomal do it

If you're using Loomal as the agent's mailbox, every message comes with an extractedText field. Quoted history stripped, signatures removed, normalized for LLM consumption. No library, no heuristic — just use it.

This is the cleanest option because Loomal maintains the extraction pipeline centrally; when a new email client adds a weird marker, the fix happens once and every agent benefits.

loomal.py
import os, requests

res = requests.get(
    "https://api.loomal.ai/v0/messages",
    headers={"Authorization": f"Bearer {os.environ['LOOMAL_API_KEY']}"},
    params={"labels": "unread", "limit": 10},
    timeout=5,
)
for msg in res.json()["messages"]:
    # msg["extractedText"] is clean, LLM-ready
    answer = llm.generate(msg["extractedText"])

Pick by scale

For a weekend project, regex is fine. For anything meaningful, talon is worth the install. For production agents that need the extraction to just work and not be a source of mystery bugs, use a provider that handles it — Loomal is one option, but the main point is that extraction is not where you should be spending engineering cycles.

FAQ

How does Loomal handle HTML emails?

extractedText is derived from the plain-text alternative when present; when only HTML is sent, Loomal converts to text with a Markdown-like flattening. Tables and lists come through reasonably; inline images don't.

Can I get the raw body if I need it?

Yes — the message object includes both text and html fields in addition to extractedText. Use the raw fields when you need exact fidelity (e.g., for forwarding).

What about agent-to-agent messages with structured JSON payloads?

For JSON payloads embedded in the body, skip extractedText and parse the raw text field directly. Consider a structured header like X-Agent-Data to signal to the receiver that the body is machine-readable.

Give your agent its own identity.

Free tier, 30-second setup.

Last updated: 2026-04-15