Same model. Same prompt.
Different tradecraft.
The model isn't the problem — the method is. Below are three real investigation prompts, each run two ways: once by a vanilla LLM working from instinct, and once by the same class of model with a OSINT Tradecraft skill loaded. The left column is what gets a case thrown out. The right column is what holds up.
Person-of-interest workup from an email
A search summary dressed as an investigation
Runs a couple of generic searches, lists two LinkedIn profiles that might be him, and pads the rest with a 'public records report' it never actually saw. No order of operations, no way to tell what's confirmed from what's assumed.
A pivot chain with provenance on every line
Applies seed-discovery-from-email: confirms the address is live without tipping the target, pivots to username candidates, enumerates platforms, and grades each lead. Stops cold at anything requiring a permissible purpose it doesn't have.
Domain attribution for a scam site
A WHOIS dump and a shrug
Pulls a WHOIS record hidden behind a privacy proxy, notes the registrar, and suggests you 'email the abuse contact.' Never pivots through the infrastructure that actually ties the operation together.
An infrastructure graph and a named hypothesis
Chains domain-and-whois-research → ssl-certificate-pivoting → dns-history-and-passive-dns → ip-and-asn-attribution. Surfaces a reused TLS cert and a sibling domain on the same origin, then maps it to a known refund-phishing kit.
Preserving web evidence for court
A screenshot and crossed fingers
Tells you to take a screenshot and maybe 'save the URL.' Produces an artifact a defense attorney can challenge in thirty seconds — no hash, no capture metadata, nothing tying the image to the live page at a moment in time.
A defensible, authenticated capture
Applies the web-evidence-preservation methodology: full-page and DOM capture, SHA-256 hashing, recorded capture time and method, and a logged chain of custody — plus an independent archive so the record exists in two places.
The same three failures, every time.
Look across the left columns and the vanilla failures rhyme: it invents sources, it dead-ends where a professional would pivot, and it ignores the rules that make work admissible. A skill doesn't make the model smarter — it makes it disciplined.
Every finding gets a source, a timestamp, and a confidence grade — or it doesn't get reported.
The skill knows the next move: cert transparency, passive DNS, handle enumeration. It doesn't stop at the first wall.
Jurisdictional limits and stop-points are baked in, so the work is defensible before anyone challenges it.
Get the right column on your agent.
Start with four skills free. Run them against a real case, then upgrade to a bundle when your agent earns it.
