mtg_python_deckbuilder/RELEASE_NOTES_TEMPLATE.md
mwisnowski 03e2846882
Some checks are pending
CI / build (push) Waiting to run
feat: implement theme stripping system with THEME_MIN_CARDS config (#55)
* feat: implement theme stripping system with THEME_MIN_CARDS config

* fix: call build_catalog directly to avoid argparse conflicts in CI
2026-03-19 15:27:17 -07:00

3 KiB

MTG Python Deckbuilder

[Unreleased]

Added

  • Theme Editorial Quality & Standards: Complete editorial system for theme catalog curation
    • Editorial Metadata Fields: description_source (tracks provenance: official/inferred/custom) and popularity_pinned (manual tier override)
    • Heuristics Externalization: Theme classification rules moved to config/themes/editorial_heuristics.yml for maintainability
    • Enhanced Quality Scoring: Four-tier system (Excellent/Good/Fair/Poor) with 0.0-1.0 numerical scores based on uniqueness, duplication, description quality, and metadata completeness
    • CLI Linter: validate_theme_catalog.py --lint flag with configurable thresholds for duplication and quality warnings, provides actionable improvement suggestions
    • Editorial Documentation: Comprehensive guide at docs/theme_editorial_guide.md covering quality scoring, best practices, linter usage, and workflow examples
  • Theme Stripping Configuration: Configurable minimum card threshold for theme retention
    • THEME_MIN_CARDS Setting: Environment variable (default: 5) to strip themes with too few cards from catalogs and card metadata
    • Analysis Tooling: analyze_theme_distribution.py script to visualize theme distribution and identify stripping candidates
    • Core Threshold Logic: theme_stripper.py module with functions to identify and filter low-card-count themes
    • Catalog Stripping: Automated removal of low-card themes from YAML catalog with backup/logging via strip_catalog_themes.py script

Changed

  • Build Process Modernization: Theme catalog generation now reads from parquet files instead of obsolete CSV format
    • Updated build_theme_catalog.py and extract_themes.py to use parquet data (matches rest of codebase)
    • Removed silent CSV exception handling (build now fails loudly if parquet read fails)
    • Added THEME_MIN_CARDS filtering directly in build pipeline (themes below threshold excluded during generation)
    • theme_list.json now auto-generated from stripped parquet data after theme stripping
    • Eliminated manual JSON stripping step (JSON is derived artifact, not source of truth)
  • Parquet Theme Stripping: Strip low-card themes directly from card data files
    • Added strip_parquet_themes.py script with dry-run, verbose, and backup modes
    • Added parquet manipulation functions to theme_stripper.py: backup, filter, update, and strip operations
    • Handles multiple themeTags formats: numpy arrays, lists, and comma/pipe-separated strings
    • Stripped 97 theme tag occurrences from 30,674 cards in all_cards.parquet
    • Updated stripped_themes.yml log with 520 themes stripped from parquet source
    • Automatic integration: Theme stripping now runs automatically in run_tagging() after tagging completes (when THEME_MIN_CARDS > 1, default: 5)
    • Integrated into web UI setup, CLI tagging, and CI/CD workflows (build-similarity-cache)

Fixed

No unreleased changes yet

Removed

No unreleased changes yet