licence-normaliser ****************** [image: licence-normaliser logo][image] Comprehensive license normalsation with a three-level hierarchy. [image: PyPI Version][image][image: Supported Python versions][image][image: Build Status][image][image: Documentation Status][image][image: llms.txt - documentation for LLMs][image][image: MIT][image][image: Coverage][image] "licence-normaliser" is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy. Features ======== * **Three-level hierarchy** - LicenseFamily → LicenseName → LicenseVersion. * **Wide format support** - SPDX tokens, URLs, prose descriptions. * **Creative Commons support** - Full CC family with versions and IGO variants. * **Publisher-specific licenses** - Springer, Nature, Elsevier, Wiley, ACS, and more. * **File-driven data** - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms. * **Pluggable parsers** - Drop in a new parser class to ingest any external license registry. Parsers implement plugin interfaces ("RegistryPlugin", "URLPlugin", etc.). * **Strict mode** - Raise "LicenseNotFoundError" instead of silently returning ""unknown"". * **Caching** - LRU caching for performance. * **CLI** - Command-line interface with "--strict" and "--explain" support. Hierarchy ========= The library uses a three-level hierarchy: 1. **LicenseFamily** - broad bucket: ""cc"", ""osi"", ""copyleft"", ""publisher-tdm"", ... 2. **LicenseName** - version-free: ""cc-by"", ""cc-by-nc-nd"", ""mit"", ""wiley-tdm"" 3. **LicenseVersion** - fully resolved: ""cc-by-3.0"", ""cc-by-nc- nd-4.0"" Installation ============ With "uv": uv pip install licence-normaliser Or with "pip": pip install licence-normaliser Quick start =========== from licence_normaliser import normalise_license v = normalise_license("CC BY-NC-ND 4.0") str(v) # "cc-by-nc-nd-4.0" ← LicenseVersion str(v.license) # "cc-by-nc-nd" ← LicenseName str(v.license.family) # "cc" ← LicenseFamily Strict mode =========== By default, unresolvable inputs return an ""unknown"" result. Pass "strict=True" to raise "LicenseNotFoundError" instead: from licence_normaliser import normalise_license from licence_normaliser.exceptions import LicenseNotFoundError # Silent fallback (default) v = normalise_license("some-unknown-string") v.family.key # "unknown" # Strict: raises on unresolvable input try: v = normalise_license("some-unknown-string", strict=True) except LicenseNotFoundError as exc: print(exc.raw) # original input print(exc.cleaned) # cleaned form that failed lookup Trace / Explain =============== Set "ENABLE_LICENCE_NORMALISER_TRACE=1" or pass "trace=True" to get resolution traces showing how the license was matched: from licence_normaliser import normalise_license # Via function v = normalise_license("cc by-nc-nd 3.0 igo", trace=True) print(v.explain()) # Via class from licence_normaliser import LicenseNormaliser ln = LicenseNormaliser(trace=True) v = ln.normalise_license("MIT") print(v.explain()) Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched: Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo' [✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json) Result: version_key: 'cc-by-nc-nd-3.0-igo' name_key: 'cc-by-nc-nd' family_key: 'cc' The trace can also be accessed via "v._trace" for programmatic use. Batch normalisation =================== from licence_normaliser import normalise_licenses results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"]) for r in results: print(r.key) # Strict batch - raises on first unresolvable results = normalise_licenses(["MIT", "Apache-2.0"], strict=True) Custom plugins ============== The "LicenseNormaliser" class lets you inject custom plugin classes for specialised use cases: from licence_normaliser import LicenseNormaliser from licence_normaliser.parsers.alias import AliasParser from licence_normaliser.parsers.spdx import SPDXParser # Use only SPDX + Alias plugins (no CC, no publisher URLs) ln = LicenseNormaliser( registry=[SPDXParser], alias=[AliasParser], family=[AliasParser], name=[AliasParser], cache=True, cache_maxsize=8192, ) # MIT resolves via SPDX parser assert str(ln.normalise_license("MIT")) == "mit" # CC BY resolves via Alias assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0" Note: Explicit plugin passing is optional — "LicenseNormaliser()" automatically loads defaults. Use the pattern above only if you need custom plugins or reduce number of plugins loaded. For caching, "LicenseNormaliser" wraps the resolution method with "lru_cache". Disable it by passing "cache=False" for debugging: from licence_normaliser import LicenseNormaliser ln = LicenseNormaliser(cache=False) result = ln.normalise_license("MIT") Update data (CLI) ================= licence-normaliser update-data --force # Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs Integration tests (public API only) =================================== All integration tests live in "src/licence_normaliser/tests/test_integration.py" and only import the public API. CLI usage ========= Normalise a single license: licence-normaliser normalise "MIT" # Output: mit licence-normaliser normalise --full "CC BY 4.0" # Output: # Key: cc-by-4.0 # URL: https://creativecommons.org/licenses/by/4.0/ # License: cc-by # Family: cc licence-normaliser normalise --strict "totally-unknown" # Exits with code 1 and prints an error Batch normalise: licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0" licence-normaliser batch --strict MIT "Apache-2.0" Exceptions ========== from licence_normaliser.exceptions import ( LicenseNormaliserError, # base class LicenseNotFoundError, # raised by strict mode ) Testing ======= All tests run inside Docker: make test To test a specific Python version: make test-env ENV=py312 License ======= MIT Author ====== Artur Barseghyan Project documentation ===================== Contents: Table of Contents ^^^^^^^^^^^^^^^^^ * licence-normaliser * Features * Hierarchy * Installation * Quick start * Strict mode * Trace / Explain * Batch normalisation * Custom plugins * Update data (CLI) * Integration tests (public API only) * CLI usage * Exceptions * Testing * License * Author * Project documentation * Contributor guidelines * Developer prerequisites * Code standards * Virtual environment * Installation * Testing * Adding new normalisation rules * Releases * Adding tests * Pull requests * Questions * Issues * Security Policy * Reporting a Vulnerability * Supported Versions * Release history and notes * 0.3.2 * 0.3.1 * 0.3 * 0.2 * 0.1.1 * 0.1 * Package * Indices and tables * Project source-tree * README.rst * CONTRIBUTING.rst * AGENTS.md * conftest.py * docker-compose.yml * pyproject.toml * scripts/README.rst * scripts/__init__.py * scripts/check_missing_aliases.py * scripts/compare_datasets.py * scripts/test_name_inference.py * src/licence_normaliser/__init__.py * src/licence_normaliser/_cache.py * src/licence_normaliser/_core.py * src/licence_normaliser/_models.py * src/licence_normaliser/_normaliser.py * src/licence_normaliser/_trace.py * src/licence_normaliser/cli/__init__.py * src/licence_normaliser/cli/_main.py * src/licence_normaliser/data/README.rst * src/licence_normaliser/data/aliases/aliases.json * src/licence_normaliser/data/prose/prose_patterns.json * src/licence_normaliser/data/publishers/publishers.json * src/licence_normaliser/data/urls/url_map.json * src/licence_normaliser/defaults.py * src/licence_normaliser/exceptions.py * src/licence_normaliser/parsers/__init__.py * src/licence_normaliser/parsers/alias.py * src/licence_normaliser/parsers/creativecommons.py * src/licence_normaliser/parsers/opendefinition.py * src/licence_normaliser/parsers/osi.py * src/licence_normaliser/parsers/prose.py * src/licence_normaliser/parsers/publisher.py * src/licence_normaliser/parsers/scancode_licensedb.py * src/licence_normaliser/parsers/spdx.py * src/licence_normaliser/plugins.py * src/licence_normaliser/tests/__init__.py * src/licence_normaliser/tests/conftest.py * src/licence_normaliser/tests/test_aliases.py * src/licence_normaliser/tests/test_cache.py * src/licence_normaliser/tests/test_cli.py * src/licence_normaliser/tests/test_core.py * src/licence_normaliser/tests/test_exceptions.py * src/licence_normaliser/tests/test_integration.py * src/licence_normaliser/tests/test_models.py * src/licence_normaliser/tests/test_prose.py * src/licence_normaliser/tests/test_publisher.py