"The money's gone, the chain is anonymous, it can't be traced" — a common misconception. Blockchains are precisely public and traceable. This piece first builds a minimal viable on-chain tracer in Python: pull an address's USDT transfers, BFS along the flow, build the fund graph, and identify potential landing points; then upgrades from this toy script toward taint analysis, address clustering, and cross-chain handling.
The walkthrough uses Ethereum USDT (ERC20); TRC20 notes are at the end.
1. Goal and approach
Given a starting address (a theft address or suspicious recipient), we want to:
- pull its USDT outbound records;
- trace hop by hop along "outbound → next address" (breadth-first search);
- label addresses (exchange / contract / ordinary);
- build a directed graph and visualize the flow.
This pull → trace → label → visualize loop is a simplified version of the four-step method Delta & Capital uses in real forensics — same principles, differing only in data scale and model precision.
2. Prerequisites
pip install requests networkx matplotlib
- An Etherscan API key (free).
- Ethereum USDT contract address:
0xdAC17F958D2ee523a2206206994597C13D831ec7.
3. Pulling an address's USDT transfers
Etherscan's tokentx endpoint returns an address's ERC20 transfer details:
import requests
ETHERSCAN_API = "YOUR_API_KEY" # 替换为你的 Key
USDT = "0xdAC17F958D2ee523a2206206994597C13D831ec7"
USDT_DECIMALS = 6
def get_usdt_transfers(address, page=1, offset=100):
"""拉取某地址的 USDT 转账记录(按时间升序)"""
url = "https://api.etherscan.io/api"
params = {
"module": "account",
"action": "tokentx",
"contractaddress": USDT,
"address": address,
"page": page,
"offset": offset,
"sort": "asc",
"apikey": ETHERSCAN_API,
}
resp = requests.get(url, params=params, timeout=20).json()
if resp.get("status") != "1":
return []
return resp["result"]
def get_outgoing(address):
"""只取该地址转出的记录(from == address)"""
txs = get_usdt_transfers(address)
out = []
for t in txs:
if t["from"].lower() == address.lower():
out.append({
"hash": t["hash"],
"to": t["to"].lower(),
"value": int(t["value"]) / 10 ** USDT_DECIMALS,
"timeStamp": int(t["timeStamp"]),
})
return out
4. BFS hop-by-hop tracing and the fund graph
import networkx as nx
from collections import deque
def trace_funds(start_address, max_depth=2, min_value=100):
"""
从起始地址出发,BFS 追踪资金流向。
max_depth: 追踪深度(跳数)
min_value: 忽略小于该金额(USDT)的转账,减少噪音
"""
g = nx.DiGraph()
visited = set()
queue = deque([(start_address.lower(), 0)])
while queue:
addr, depth = queue.popleft()
if addr in visited or depth >= max_depth:
continue
visited.add(addr)
for tx in get_outgoing(addr):
if tx["value"] < min_value:
continue
# 累加同一对地址之间的转账金额
if g.has_edge(addr, tx["to"]):
g[addr][tx["to"]]["value"] += tx["value"]
else:
g.add_edge(addr, tx["to"], value=round(tx["value"], 2))
queue.append((tx["to"], depth + 1))
return g
graph = trace_funds("0xYOUR_START_ADDRESS", max_depth=2, min_value=500)
print(f"共发现 {graph.number_of_nodes()} 个地址,{graph.number_of_edges()} 条资金流")
5. Labeling addresses: finding the landing points
Funds usually end up at exchanges or contracts. Use a known hot-wallet table plus contract detection to label:
# 示例:已知交易所热钱包(实际应维护一份更完整的标签库)
KNOWN_LABELS = {
"0x28c6c06298d514db089934071355e5743bf21d60": "Binance 热钱包",
"0x21a31ee1afc51d94c2efccaa2092ad1028285549": "Binance 热钱包",
# ... 可从开源标签库(如 etherscan 标签、社区数据集)补充
}
def is_contract(address):
"""通过 eth_getCode 判断是否合约地址"""
url = "https://api.etherscan.io/api"
params = {
"module": "proxy", "action": "eth_getCode",
"address": address, "tag": "latest",
"apikey": ETHERSCAN_API,
}
code = requests.get(url, params=params, timeout=20).json().get("result", "0x")
return code not in ("0x", "", None)
def label_address(address):
if address in KNOWN_LABELS:
return KNOWN_LABELS[address] # 交易所 → 可触达的落地点
if is_contract(address):
return "合约地址"
return "普通地址"
Once funds enter a KYC exchange address, you have a reachable landing point — usually the key to judicial coordination and freezing. KNOWN_LABELS here is a toy; in production Delta & Capital maintains a continuously updated label library with tens of millions of entities, each with source and confidence — that is what turns "an unknown address" into "a given exchange / a given scam ring" quickly.
6. Visualizing the flow
import matplotlib.pyplot as plt
def draw_graph(g):
pos = nx.spring_layout(g, k=0.6, seed=42)
labels = {n: f"{n[:6]}...{n[-4:]} {label_address(n)}" for n in g.nodes()}
edge_labels = {(u, v): f"{d['value']:,.0f}" for u, v, d in g.edges(data=True)}
plt.figure(figsize=(14, 10))
nx.draw(g, pos, labels=labels, node_color="#cfe8ff",
node_size=2200, font_size=8, arrows=True, arrowsize=18)
nx.draw_networkx_edge_labels(g, pos, edge_labels=edge_labels, font_size=7)
plt.title("USDT 资金流向追踪图")
plt.axis("off")
plt.tight_layout()
plt.savefig("fund_flow.png", dpi=150)
print("已导出 fund_flow.png")
draw_graph(graph)
For large graphs, export gexf and use Gephi for professional visualization:
nx.write_gexf(graph, "fund_flow.gexf")
7. TRC20 (TRON) notes
For TRON USDT, switch the data source to Tronscan / TronGrid:
def get_trc20_transfers(address, limit=50):
url = "https://apilist.tronscanapi.com/api/token_trc20/transfers"
params = {
"relatedAddress": address,
"limit": limit,
"start": 0,
"contract_address": "TR7NHqjeKQxGTCi8q8ZY4pL8otSzgjLj6t", # TRC20 USDT
}
return requests.get(url, params=params, timeout=20).json().get("token_transfers", [])
Field names differ from Ethereum (note from_address / to_address / quant); the BFS, labeling, and visualization logic all carry over.
8. Gotchas
- Rate limits: free APIs allow ~5 req/s — add time.sleep in loops or exponential backoff.
- Pagination: a single address can have thousands of transfers — page until exhausted.
- Dedup & cycle guards: keep a visited set in BFS to avoid looping between addresses forever.
- Noise filtering: filter dust with min_value; mixers and bridges create path break-points — the real-world difficulty of tracing, requiring cross-chain capability to reconnect.
- Compliance boundary: analysis supplies leads; actual freezing and clawback require lawyers and judicial authorities.
9. Advanced I: taint analysis
BFS only answers "where the money went". Real forensics must answer a harder question: how much of a downstream address's balance is actually the dirty money? That is taint analysis. Three common models:
- Poison: any touch of dirty funds taints the whole address. High false positives, but nothing missed.
- Haircut (pro-rata dilution): taint dilutes by inbound share — the most common, most balanced.
- FIFO / LIFO: match inbound and outbound in order, transaction by transaction — precise but complex.
A minimal haircut implementation (built on the earlier graph):
import networkx as nx
def haircut_taint(g, source):
"""
haircut 污点分析:脏钱按转出比例向下游稀释。
返回 {地址: 脏钱占比}。注意:图中有环时需先做环处理,或按区块时间排序。
"""
taint = {n: 0.0 for n in g.nodes()}
taint[source] = 1.0
for n in nx.topological_sort(g): # 有环会抛异常,真实场景按交易时间序处理
out_total = sum(d["value"] for _, _, d in g.out_edges(n, data=True))
if out_total == 0:
continue
for _, v, d in g.out_edges(n, data=True):
taint[v] += taint[n] * (d["value"] / out_total)
return taint
taint = haircut_taint(graph, "0xYOUR_START_ADDRESS".lower())
dirty = sorted(taint.items(), key=lambda x: x[1], reverse=True)[:10]
for addr, ratio in dirty:
print(f"{addr} 脏钱占比 {ratio:.2%} 标签 {label_address(addr)}")
With taint ratios you can distinguish "clean funds passing through" from "addresses genuinely carrying dirty money" — a key step in evidence reports. Delta & Capital's engine switches between Haircut / FIFO / Poison per case and attaches a verifiable computation trail to its conclusions.
10. Advanced II: address clustering and entity recognition
A single address means little; the value is merging hundreds of addresses into one entity (an exchange, a scam ring, an OTC desk). Common heuristics:
- Common-input-ownership: on UTXO chains (BTC), multiple inputs of one transaction usually belong to one owner.
- Deposit-address clustering: exchanges assign each user a deposit address that periodically sweeps to the same hot wallet — reverse it to identify the exchange.
- Gas funding relations: on Ethereum, a new address's first ETH (gas) often comes from the same funder — a same-owner lead.
- Behavioral fingerprints: timing distributions, amount habits, contract-interaction patterns.
A same-owner lead example based on gas funding:
def first_funder(address):
"""找出给某地址转入首笔 ETH 的‘供血’地址,常用于同主体聚类线索"""
url = "https://api.etherscan.io/api"
params = {
"module": "account", "action": "txlist",
"address": address, "startblock": 0, "endblock": 99999999,
"page": 1, "offset": 10, "sort": "asc",
"apikey": ETHERSCAN_API,
}
txs = requests.get(url, params=params, timeout=20).json().get("result", [])
for t in txs:
if t["to"].lower() == address.lower() and int(t["value"]) > 0:
return t["from"].lower() # 首笔入金来源
return None
Stack these leads and scattered addresses collapse into entities, upgrading the fund graph from address-level to entity-level.
11. Advanced III: handling bridges and mixers
Ordinary scripts go dark at bridges and mixers — exactly the dividing line of professional capability:
- Bridges: funds lock on chain A and mint on chain B. Tracing must collide A-side deposit events with B-side withdrawals by amount + time window + recipient to reconnect the break.
- Mixers (e.g. fixed-denomination): same denominations, heavy in/out — transfers alone cannot be paired. Combine timing analysis, gas habits, related addresses, and denomination combinations for probabilistic inference — likelihood, not certainty.
- OTC / underground banking off-ramps: the chain ends there; what follows depends on off-chain intelligence and judicial cooperation.
Professional teams don't quit at break-points — they annotate each path segment with confidence and cross-validate with multi-source data. Bridge and mixer handling is one of Delta & Capital's core capabilities: no premature verdicts, but confidence-weighted multi-path inference that reconnects as much of the trail as possible.
12. From toy script to production tracing: Delta & Capital's engineering practice
This article's script explains the principles, but real cases (thousands of addresses, multiple chains, nested mixing) demand engineering upgrades. Delta & Capital's forensic stack typically includes:
- Multi-source data foundation: self-hosted archive nodes + The Graph subgraphs + multi-chain explorer APIs, with millisecond historical-state queries and no single rate-limited dependency.
- Label library at tens-of-millions scale: continuously maintained entity labels for exchanges, mixers, scams, sanctions, OTC — each with source and confidence.
- Multi-model taint engine: switchable Haircut / FIFO / Poison, with cross-chain continuation and bridge collision.
- Entity graph & visualization: clustering addresses into entities and producing standardized evidence-chain reports usable for police reports, inquiries, and litigation.
- Compliance & judicial coordination: analysis only supplies leads — Delta & Capital's licensed legal experts also interface with exchange compliance and assist with police reports and judicial freezes, turning technical conclusions into an executable recovery path.
In one line: a script shows you where the money went; turning "seeing" into "recovering" takes data, models, label libraries, and judicial coordination combined.
13. Summary
In under a hundred lines of Python you can build a minimal on-chain fund tracer. It is no match for professional platforms, but it teaches what tracing really is: a public ledger + graph traversal + address labels. The hard parts were never "can it be examined" — they are cross-chain break-points, mixers, and whether a cooperative landing platform can be reached. That is why real recovery is rarely a one-person, one-script job.
Risk & compliance notice: the code here is for learning, research, and lawful self-protection/forensic scenarios only — never for violating others' privacy or any illegal purpose. If assets are stolen, report to police immediately and pursue rights lawfully.