licence-normaliser
Comprehensive license normalsation with a three-level hierarchy.
licence-normaliser is a comprehensive license normalisation library that
maps any license representation (SPDX tokens, URLs, prose descriptions) to a
canonical three-level hierarchy.
Features
Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.
Wide format support - SPDX tokens, URLs, prose descriptions.
Creative Commons support - Full CC family with versions and IGO variants.
Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.
File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.
Pluggable parsers - Drop in a new parser class to ingest any external license registry. Parsers implement plugin interfaces (
RegistryPlugin,URLPlugin, etc.).Strict mode - Raise
LicenseNotFoundErrorinstead of silently returning"unknown".Caching - LRU caching for performance.
CLI - Command-line interface with
--strictand--explainsupport.
Hierarchy
The library uses a three-level hierarchy:
LicenseFamily - broad bucket:
"cc","osi","copyleft","publisher-tdm", …LicenseName - version-free:
"cc-by","cc-by-nc-nd","mit","wiley-tdm"LicenseVersion - fully resolved:
"cc-by-3.0","cc-by-nc-nd-4.0"
Installation
With uv:
uv pip install licence-normaliser
Or with pip:
pip install licence-normaliser
Quick start
from licence_normaliser import normalise_license
v = normalise_license("CC BY-NC-ND 4.0")
str(v) # "cc-by-nc-nd-4.0" ← LicenseVersion
str(v.license) # "cc-by-nc-nd" ← LicenseName
str(v.license.family) # "cc" ← LicenseFamily
Strict mode
By default, unresolvable inputs return an "unknown" result. Pass
strict=True to raise LicenseNotFoundError instead:
from licence_normaliser import normalise_license
from licence_normaliser.exceptions import LicenseNotFoundError
# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key # "unknown"
# Strict: raises on unresolvable input
try:
v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
print(exc.raw) # original input
print(exc.cleaned) # cleaned form that failed lookup
Trace / Explain
Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get
resolution traces showing how the license was matched:
from licence_normaliser import normalise_license
# Via function
v = normalise_license("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())
# Via class
from licence_normaliser import LicenseNormaliser
ln = LicenseNormaliser(trace=True)
v = ln.normalise_license("MIT")
print(v.explain())
Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:
Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
[✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)
Result:
version_key: 'cc-by-nc-nd-3.0-igo'
name_key: 'cc-by-nc-nd'
family_key: 'cc'
The trace can also be accessed via v._trace for programmatic use.
Batch normalisation
from licence_normaliser import normalise_licenses
results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
print(r.key)
# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)
Custom plugins
The LicenseNormaliser class lets you inject custom plugin classes for
specialised use cases:
from licence_normaliser import LicenseNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser
# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenseNormaliser(
registry=[SPDXParser],
alias=[AliasParser],
family=[AliasParser],
name=[AliasParser],
cache=True,
cache_maxsize=8192,
)
# MIT resolves via SPDX parser
assert str(ln.normalise_license("MIT")) == "mit"
# CC BY resolves via Alias
assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"
Note
Explicit plugin passing is optional — LicenseNormaliser()
automatically loads defaults. Use the pattern above only if you need
custom plugins or reduce number of plugins loaded.
For caching, LicenseNormaliser wraps the resolution method
with lru_cache.
Disable it by passing cache=False for debugging:
from licence_normaliser import LicenseNormaliser
ln = LicenseNormaliser(cache=False)
result = ln.normalise_license("MIT")
Update data (CLI)
licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs
Integration tests (public API only)
All integration tests live in
src/licence_normaliser/tests/test_integration.py
and only import the public API.
CLI usage
Normalise a single license:
licence-normaliser normalise "MIT"
# Output: mit
licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc
licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error
Batch normalise:
licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"
Exceptions
from licence_normaliser.exceptions import (
LicenseNormaliserError, # base class
LicenseNotFoundError, # raised by strict mode
)
Testing
All tests run inside Docker:
make test
To test a specific Python version:
make test-env ENV=py312
License
MIT
Project documentation
Contents:
- Contributor guidelines
- Security Policy
- Release history and notes
- Package
- Indices and tables
- Project source-tree
- README.rst
- CONTRIBUTING.rst
- AGENTS.md
- conftest.py
- docker-compose.yml
- pyproject.toml
- scripts/README.rst
- scripts/__init__.py
- scripts/check_missing_aliases.py
- scripts/compare_datasets.py
- scripts/test_name_inference.py
- src/licence_normaliser/__init__.py
- src/licence_normaliser/_cache.py
- src/licence_normaliser/_core.py
- src/licence_normaliser/_models.py
- src/licence_normaliser/_normaliser.py
- src/licence_normaliser/_trace.py
- src/licence_normaliser/cli/__init__.py
- src/licence_normaliser/cli/_main.py
- src/licence_normaliser/data/README.rst
- src/licence_normaliser/data/aliases/aliases.json
- src/licence_normaliser/data/prose/prose_patterns.json
- src/licence_normaliser/data/publishers/publishers.json
- src/licence_normaliser/data/urls/url_map.json
- src/licence_normaliser/defaults.py
- src/licence_normaliser/exceptions.py
- src/licence_normaliser/parsers/__init__.py
- src/licence_normaliser/parsers/alias.py
- src/licence_normaliser/parsers/creativecommons.py
- src/licence_normaliser/parsers/opendefinition.py
- src/licence_normaliser/parsers/osi.py
- src/licence_normaliser/parsers/prose.py
- src/licence_normaliser/parsers/publisher.py
- src/licence_normaliser/parsers/scancode_licensedb.py
- src/licence_normaliser/parsers/spdx.py
- src/licence_normaliser/plugins.py
- src/licence_normaliser/tests/__init__.py
- src/licence_normaliser/tests/conftest.py
- src/licence_normaliser/tests/test_aliases.py
- src/licence_normaliser/tests/test_cache.py
- src/licence_normaliser/tests/test_cli.py
- src/licence_normaliser/tests/test_core.py
- src/licence_normaliser/tests/test_exceptions.py
- src/licence_normaliser/tests/test_integration.py
- src/licence_normaliser/tests/test_models.py
- src/licence_normaliser/tests/test_prose.py
- src/licence_normaliser/tests/test_publisher.py