Building a Gang Network Intelligence Graph with Open Source Tools
One of the intelligence feeds we maintain at FLLC tracks gang network associations — not for targeting, but for situational awareness, threat landscape mapping, and supporting lawful investigations. This post describes the methodology using entirely open-source tools and publicly available information.
Important: Everything described here uses publicly available data. No private data is accessed. This is lawful OSINT.
Why Graph Analysis for Gang Networks
Gang networks are relationship graphs. The interesting intelligence isn't in individual records — it's in the connections between them. Standard database queries don't reveal clusters, bridges, and hierarchical structures. Graph analysis does.
A person who appears in three separate court cases as a co-defendant, in a social media post tagged at a known location, and in a property record for an address linked to another case — those connections only become visible when you model the data as a graph.
Data Sources We Use
Court Records
- PACER (federal cases) — public docket access
- State court portals — most states publish case lookup
- Booking records where publicly available
Public Social Media
- Archived posts from public accounts
- Geo-tagged content aggregated from public APIs
- Platform pages for entities with public presence
Property and Business Records
- Secretary of state business filings (most are public)
- Property tax records (county assessor portals)
- UCC filings
News and Journalism
- Court reporting from local papers
- Law enforcement press releases
- Academic crime research datasets
The Toolchain
# Core libraries
import networkx as nx # Graph construction and analysis
import pandas as pd # Data wrangling
from pyvis.network import Network # Interactive visualization
import spacy # Named entity extraction
Step 1: Entity Extraction
Raw text from court documents gets processed through spaCy's NER pipeline to extract:
- Person names (PERSON)
- Organizations (ORG)
- Locations (GPE, LOC)
- Dates (DATE)
Each extracted entity becomes a potential node in the graph.
Step 2: Relationship Construction
Edges are built from co-occurrence evidence:
- Co-defendants in the same case → strong edge
- Mentioned in the same document → weak edge
- Shared address or phone → medium edge
- Tagged together in social content → medium edge
G = nx.Graph()
# Add nodes with attributes
G.add_node(entity_id, name=name, type=entity_type, sources=source_list)
# Add edges with evidence weight
G.add_edge(entity_a, entity_b, weight=confidence, source=evidence_ref)
Step 3: Cluster Analysis
# Detect communities using Louvain algorithm
from networkx.algorithms import community
communities = community.louvain_communities(G, seed=42)
# Find bridge nodes (high betweenness centrality)
betweenness = nx.betweenness_centrality(G)
bridges = sorted(betweenness.items(), key=lambda x: x[1], reverse=True)[:10]
High-betweenness nodes are analytically significant — they're the connectors between clusters, often the most intelligence-valuable individuals in the network.
What the Current Dataset Shows
Our live feed tracks 349 network associations across publicly documented sources. Key patterns we track:
- Network fragmentation after key arrests
- Recruitment patterns visible in social media engagement changes
- Geographic mobility shown by booking records in multiple jurisdictions
- Business front associations through corporate filing analysis
Limitations and Ethics
This methodology has hard limits:
- Public data only — no hacking, no social engineering, no scraping that violates ToS
- Verification required — name matching is imperfect; all high-confidence assertions require multiple independent sources
- No individual targeting — this work informs threat landscape understanding, not individual surveillance
- Legal review — any intelligence supporting actual law enforcement work goes through legal counsel