mirror of
https://github.com/mwisnowski/mtg_python_deckbuilder.git
synced 2026-03-24 14:06:31 +01:00
37 lines
3 KiB
Markdown
37 lines
3 KiB
Markdown
# MTG Python Deckbuilder
|
|
|
|
## [Unreleased]
|
|
### Added
|
|
- **Theme Editorial Quality & Standards**: Complete editorial system for theme catalog curation
|
|
- **Editorial Metadata Fields**: `description_source` (tracks provenance: official/inferred/custom) and `popularity_pinned` (manual tier override)
|
|
- **Heuristics Externalization**: Theme classification rules moved to `config/themes/editorial_heuristics.yml` for maintainability
|
|
- **Enhanced Quality Scoring**: Four-tier system (Excellent/Good/Fair/Poor) with 0.0-1.0 numerical scores based on uniqueness, duplication, description quality, and metadata completeness
|
|
- **CLI Linter**: `validate_theme_catalog.py --lint` flag with configurable thresholds for duplication and quality warnings, provides actionable improvement suggestions
|
|
- **Editorial Documentation**: Comprehensive guide at `docs/theme_editorial_guide.md` covering quality scoring, best practices, linter usage, and workflow examples
|
|
- **Theme Stripping Configuration**: Configurable minimum card threshold for theme retention
|
|
- **THEME_MIN_CARDS Setting**: Environment variable (default: 5) to strip themes with too few cards from catalogs and card metadata
|
|
- **Analysis Tooling**: `analyze_theme_distribution.py` script to visualize theme distribution and identify stripping candidates
|
|
- **Core Threshold Logic**: `theme_stripper.py` module with functions to identify and filter low-card-count themes
|
|
- **Catalog Stripping**: Automated removal of low-card themes from YAML catalog with backup/logging via `strip_catalog_themes.py` script
|
|
|
|
### Changed
|
|
- **Build Process Modernization**: Theme catalog generation now reads from parquet files instead of obsolete CSV format
|
|
- Updated `build_theme_catalog.py` and `extract_themes.py` to use parquet data (matches rest of codebase)
|
|
- Removed silent CSV exception handling (build now fails loudly if parquet read fails)
|
|
- Added THEME_MIN_CARDS filtering directly in build pipeline (themes below threshold excluded during generation)
|
|
- `theme_list.json` now auto-generated from stripped parquet data after theme stripping
|
|
- Eliminated manual JSON stripping step (JSON is derived artifact, not source of truth)
|
|
- **Parquet Theme Stripping**: Strip low-card themes directly from card data files
|
|
- Added `strip_parquet_themes.py` script with dry-run, verbose, and backup modes
|
|
- Added parquet manipulation functions to `theme_stripper.py`: backup, filter, update, and strip operations
|
|
- Handles multiple themeTags formats: numpy arrays, lists, and comma/pipe-separated strings
|
|
- Stripped 97 theme tag occurrences from 30,674 cards in `all_cards.parquet`
|
|
- Updated `stripped_themes.yml` log with 520 themes stripped from parquet source
|
|
- **Automatic integration**: Theme stripping now runs automatically in `run_tagging()` after tagging completes (when `THEME_MIN_CARDS` > 1, default: 5)
|
|
- Integrated into web UI setup, CLI tagging, and CI/CD workflows (build-similarity-cache)
|
|
|
|
### Fixed
|
|
_No unreleased changes yet_
|
|
|
|
### Removed
|
|
_No unreleased changes yet_
|