Merge pull request #32 from mwisnowski/feature/tagging-refinement

Feature/tagging refinement
This commit is contained in:
mwisnowski 2025-10-13 10:13:29 -07:00 committed by GitHub
commit 4c79a7b45b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
40 changed files with 5632 additions and 2789 deletions

View file

@ -92,6 +92,12 @@ WEB_AUTO_REFRESH_DAYS=7 # dockerhub: WEB_AUTO_REFRESH_DAYS="7"
WEB_TAG_PARALLEL=1 # dockerhub: WEB_TAG_PARALLEL="1"
WEB_TAG_WORKERS=2 # dockerhub: WEB_TAG_WORKERS="4"
WEB_AUTO_ENFORCE=0 # dockerhub: WEB_AUTO_ENFORCE="0"
# Tagging Refinement Feature Flags
TAG_NORMALIZE_KEYWORDS=1 # dockerhub: TAG_NORMALIZE_KEYWORDS="1" # Normalize keywords & filter specialty mechanics
TAG_PROTECTION_GRANTS=1 # dockerhub: TAG_PROTECTION_GRANTS="1" # Protection tag only for cards granting shields
TAG_METADATA_SPLIT=1 # dockerhub: TAG_METADATA_SPLIT="1" # Separate metadata tags from themes in CSVs
# DFC_COMPAT_SNAPSHOT=0 # 1=write legacy unmerged MDFC snapshots alongside merged catalogs (deprecated compatibility workflow)
# WEB_CUSTOM_EXPORT_BASE= # Custom basename for exports (optional).
# THEME_CATALOG_YAML_SCAN_INTERVAL_SEC=2.0 # Poll for YAML changes (dev)

View file

@ -9,16 +9,69 @@ This format follows Keep a Changelog principles and aims for Semantic Versioning
## [Unreleased]
### Summary
- _No unreleased changes yet_
- Card tagging system improvements split metadata from gameplay themes for cleaner deck building experience
- Keyword normalization reduces specialty keyword noise by 96% while maintaining theme catalog quality
- Protection tag now focuses on cards that grant shields to others, not just those with inherent protection
- Web UI improvements: faster polling, fixed progress display, and theme refresh stability
- **Protection System Overhaul**: Comprehensive enhancement to protection card detection, classification, and deck building
- Fine-grained scope metadata distinguishes self-protection from board-wide effects ("Your Permanents: Hexproof" vs "Self: Hexproof")
- Enhanced grant detection with Equipment/Aura patterns, phasing support, and complex trigger handling
- Intelligent deck builder filtering includes board-relevant protection while excluding self-only and type-specific cards
- Tiered pool limiting focuses on high-quality staples while maintaining variety across builds
- Improved scope tagging for cards with keyword-only protection effects (no grant text, just inherent keywords)
- **Tagging Module Refactoring**: Large-scale refactor to improve code quality and maintainability
- Centralized regex patterns, extracted reusable utilities, decomposed complex functions
- Improved code organization and readability while maintaining 100% tagging accuracy
### Added
- _None_
- Metadata partition system separates diagnostic tags from gameplay themes in card data
- Keyword normalization system with smart filtering of one-off specialty mechanics
- Allowlist preserves important keywords like Flying, Myriad, and Transform
- Protection grant detection identifies cards that give Hexproof, Ward, or Indestructible to other permanents
- Automatic tagging for creature-type-specific protection (e.g., "Knights Gain Protection")
- New `metadataTags` column in card data for bracket annotations and internal diagnostics
- Static phasing keyword detection from keywords field (catches creatures like Breezekeeper)
- "Other X you control have Y" protection pattern for static ability grants
- "Enchanted creature has phasing" pattern detection
- Chosen type blanket phasing patterns
- Complex trigger phasing patterns (reactive, consequent, end-of-turn)
- Protection scope filtering in deck builder (feature flag: `TAG_PROTECTION_SCOPE`) intelligently selects board-relevant protection
- Phasing cards with "Your Permanents:" or "Targeted:" metadata now tagged as Protection and included in protection pool
- Metadata tags temporarily visible in card hover previews for debugging (shows scope like "Your Permanents: Hexproof")
- Web-slinging tagger function to identify cards with web-slinging mechanics
### Changed
- _None_
- Card tags now split between themes (for deck building) and metadata (for diagnostics)
- Keywords now consolidate variants (e.g., "Commander ninjutsu" becomes "Ninjutsu")
- Setup progress polling reduced from 3s to 5-10s intervals for better performance
- Theme catalog streamlined from 753 to 736 themes (-2.3%) with improved quality
- Protection tag refined to focus on 329 cards that grant shields (down from 1,166 with inherent effects)
- Protection tag renamed to "Protective Effects" throughout web interface to avoid confusion with the Magic keyword "protection"
- Theme catalog automatically excludes metadata tags from theme suggestions
- Grant detection now strips reminder text before pattern matching to avoid false positives
- Deck builder protection phase now filters by scope metadata: includes "Your Permanents:", excludes "Self:" protection
- Protection card selection now randomized per build for variety (using seeded RNG when deterministic mode enabled)
- Protection pool now limited to ~40-50 high-quality cards (tiered selection: top 3x target + random 10-20 extras)
- Tagging module imports standardized with consistent organization and centralized constants
### Fixed
- _None_
- Setup progress now shows 100% completion instead of getting stuck at 99%
- Theme catalog no longer continuously regenerates after setup completes
- Health indicator polling optimized to reduce server load
- Protection detection now correctly excludes creatures with only inherent keywords
- Dive Down, Glint no longer falsely identified as granting to opponents (reminder text fix)
- Drogskol Captain, Haytham Kenway now correctly get "Your Permanents" scope tags
- 7 cards with static Phasing keyword now properly detected (Breezekeeper, Teferi's Drake, etc.)
- Type-specific protection grants (e.g., "Knights Gain Indestructible") now correctly excluded from general protection pool
- Protection scope filter now properly prioritizes exclusions over inclusions (fixes Knight Exemplar in non-Knight decks)
- Inherent protection cards (Aysen Highway, Phantom Colossus, etc.) now correctly get "Self: Protection" metadata tags
- Scope tagging now applies to ALL cards with protection effects, not just grant cards
- Cloak of Invisibility, Teferi's Curse now get "Your Permanents: Phasing" tags
- Shimmer now gets "Blanket: Phasing" tag for chosen type effect
- King of the Oathbreakers now gets "Self: Phasing" tag for reactive trigger
- Cards with static keywords (Protection, Hexproof, Ward, Indestructible) in their keywords field now get proper scope metadata tags
- Cards with X in their mana cost now properly identified and tagged with "X Spells" theme for better deck building accuracy
- Card tagging system enhanced with smarter pattern detection and more consistent categorization
## [2.5.2] - 2025-10-08
### Summary

View file

@ -99,15 +99,51 @@ Execute saved configs without manual input.
### Initial Setup
Refresh data and caches when formats shift.
- Runs card downloads, CSV regeneration, tagging, and commander catalog rebuilds.
- Runs card downloads, CSV regeneration, smart tagging (keywords + protection grants), and commander catalog rebuilds.
- Controlled by `SHOW_SETUP=1` (on by default in compose).
- Force a rebuild manually:
- **Force a full rebuild (setup + tagging)**:
```powershell
docker compose run --rm --entrypoint bash web -lc "python -m code.file_setup.setup"
# Docker:
docker compose run --rm web python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging()"
# Local (with venv activated):
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging()"
# With parallel processing (faster):
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging(parallel=True)"
# With parallel processing and custom worker count:
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging(parallel=True, max_workers=4)"
```
- Rebuild only the commander catalog:
- **Rebuild only CSVs without tagging**:
```powershell
docker compose run --rm --entrypoint bash web -lc "python -m code.scripts.refresh_commander_catalog"
# Docker:
docker compose run --rm web python -c "from code.file_setup.setup import initial_setup; initial_setup()"
# Local:
python -c "from code.file_setup.setup import initial_setup; initial_setup()"
```
- **Run only tagging (CSVs must exist)**:
```powershell
# Docker:
docker compose run --rm web python -c "from code.tagging.tagger import run_tagging; run_tagging()"
# Local:
python -c "from code.tagging.tagger import run_tagging; run_tagging()"
# With parallel processing (faster):
python -c "from code.tagging.tagger import run_tagging; run_tagging(parallel=True)"
# With parallel processing and custom worker count:
python -c "from code.tagging.tagger import run_tagging; run_tagging(parallel=True, max_workers=4)"
```
- **Rebuild only the commander catalog**:
```powershell
# Docker:
docker compose run --rm web python -m code.scripts.refresh_commander_catalog
# Local:
python -m code.scripts.refresh_commander_catalog
```
### Owned Library

View file

@ -1,26 +1,67 @@
# MTG Python Deckbuilder ${VERSION}
## [Unreleased]
### Summary
- Builder responsiveness upgrades: smarter HTMX caching, shared debounce helpers, and virtualization hints keep long card lists responsive.
- Commander catalog now ships skeleton placeholders, lazy commander art loading, and cached default results for faster repeat visits.
- Deck summary streams via an HTMX fragment while virtualization powers summary lists without loading every row up front.
- Mana analytics load on demand with collapsible sections and interactive chart tooltips that support click-to-pin comparisons.
- Card tagging system improvements split metadata from gameplay themes for cleaner deck building experience
- Keyword normalization reduces specialty keyword noise by 96% while maintaining theme catalog quality
- Protection tag now focuses on cards that grant shields to others, not just those with inherent protection
- Web UI improvements: faster polling, fixed progress display, and theme refresh stability
- Comprehensive enhancement to protection card detection, classification, and deck building
- Fine-grained scope metadata distinguishes self-protection from board-wide effects ("Your Permanents: Hexproof" vs "Self: Hexproof")
- Enhanced grant detection with Equipment/Aura patterns, phasing support, and complex trigger handling
- Intelligent deck builder filtering includes board-relevant protection while excluding self-only and type-specific cards
- Tiered pool limiting focuses on high-quality staples while maintaining variety across builds
- Improved scope tagging for cards with keyword-only protection effects (no grant text, just inherent keywords)
- Large-scale refactor to improve code quality and maintainability
- Centralized regex patterns, extracted reusable utilities, decomposed complex functions
- Improved code organization and readability while maintaining 100% tagging accuracy
### Added
- Skeleton placeholders accept `data-skeleton-label` microcopy and only surface after ~400ms across the build wizard, stage navigator, and alternatives panel.
- Must-have toggle API (`/build/must-haves/toggle`), telemetry ingestion route (`/telemetry/events`), and structured logging helpers capture include/exclude beacons.
- Commander catalog results wrap in a deferred skeleton list while commander art lazy-loads via a new `IntersectionObserver` helper in `code/web/static/app.js`.
- Collapsible accordions for Mana Overview and Test Hand sections defer heavy analytics until they are expanded.
- Click-to-pin chart tooltips keep comparisons anchored and add copy-friendly working buttons.
- Virtualized card lists automatically render only visible items once 12+ cards are present.
- Metadata partition system separates diagnostic tags from gameplay themes in card data
- Keyword normalization system with smart filtering of one-off specialty mechanics
- Allowlist preserves important keywords like Flying, Myriad, and Transform
- Protection grant detection identifies cards that give Hexproof, Ward, or Indestructible to other permanents
- Automatic tagging for creature-type-specific protection (e.g., "Knights Gain Protection")
- New `metadataTags` column in card data for bracket annotations and internal diagnostics
- Static phasing keyword detection from keywords field (catches creatures like Breezekeeper)
- "Other X you control have Y" protection pattern for static ability grants
- "Enchanted creature has phasing" pattern detection
- Chosen type blanket phasing patterns
- Complex trigger phasing patterns (reactive, consequent, end-of-turn)
- Protection scope filtering in deck builder (feature flag: `TAG_PROTECTION_SCOPE`) intelligently selects board-relevant protection
- Phasing cards with "Your Permanents:" or "Targeted:" metadata now tagged as Protection and included in protection pool
- Metadata tags temporarily visible in card hover previews for debugging (shows scope like "Your Permanents: Hexproof")
- Web-slinging tagger function to identify cards with web-slinging mechanics
### Changed
- Commander search and theme picker now share an intelligent debounce to prevent redundant requests while typing.
- Card grids adopt modern containment rules to minimize layout recalculations on large decks.
- Include/exclude buttons respond immediately with optimistic updates, reconciling gracefully if the server disagrees.
- Frequently accessed views, like the commander catalog default, now pull from an in-memory cache for sub-200ms reloads.
- Deck review loads in focused chunks, keeping the initial page lean while analytics stream progressively.
- Chart hover zones expand to full column width for easier interaction.
- Card tags now split between themes (for deck building) and metadata (for diagnostics)
- Keywords now consolidate variants (e.g., "Commander ninjutsu" becomes "Ninjutsu")
- Setup progress polling reduced from 3s to 5-10s intervals for better performance
- Theme catalog streamlined from 753 to 736 themes (-2.3%) with improved quality
- Protection tag refined to focus on 329 cards that grant shields (down from 1,166 with inherent effects)
- Protection tag renamed to "Protective Effects" throughout web interface to avoid confusion with the Magic keyword "protection"
- Theme catalog automatically excludes metadata tags from theme suggestions
- Grant detection now strips reminder text before pattern matching to avoid false positives
- Deck builder protection phase now filters by scope metadata: includes "Your Permanents:", excludes "Self:" protection
- Protection card selection now randomized per build for variety (using seeded RNG when deterministic mode enabled)
- Protection pool now limited to ~40-50 high-quality cards (tiered selection: top 3x target + random 10-20 extras)
- Tagging module imports standardized with consistent organization and centralized constants
### Fixed
- _None_
- Setup progress now shows 100% completion instead of getting stuck at 99%
- Theme catalog no longer continuously regenerates after setup completes
- Health indicator polling optimized to reduce server load
- Protection detection now correctly excludes creatures with only inherent keywords
- Dive Down, Glint no longer falsely identified as granting to opponents (reminder text fix)
- Drogskol Captain, Haytham Kenway now correctly get "Your Permanents" scope tags
- 7 cards with static Phasing keyword now properly detected (Breezekeeper, Teferi's Drake, etc.)
- Type-specific protection grants (e.g., "Knights Gain Indestructible") now correctly excluded from general protection pool
- Protection scope filter now properly prioritizes exclusions over inclusions (fixes Knight Exemplar in non-Knight decks)
- Inherent protection cards (Aysen Highway, Phantom Colossus, etc.) now correctly get "Self: Protection" metadata tags
- Scope tagging now applies to ALL cards with protection effects, not just grant cards
- Cloak of Invisibility, Teferi's Curse now get "Your Permanents: Phasing" tags
- Shimmer now gets "Blanket: Phasing" tag for chosen type effect
- King of the Oathbreakers now gets "Self: Phasing" tag for reactive trigger
- Cards with static keywords (Protection, Hexproof, Ward, Indestructible) in their keywords field now get proper scope metadata tags
- Cards with X in their mana cost now properly identified and tagged with "X Spells" theme for better deck building accuracy
- Card tagging system enhanced with smarter pattern detection and more consistent categorization

View file

@ -1,5 +0,0 @@
import urllib.request, json
raw = urllib.request.urlopen("http://localhost:8000/themes/metrics").read().decode()
js=json.loads(raw)
print('example_enforcement_active=', js.get('preview',{}).get('example_enforcement_active'))
print('example_enforce_threshold_pct=', js.get('preview',{}).get('example_enforce_threshold_pct'))

View file

@ -1 +0,0 @@
=\ 1\; & \c:/Users/Matt/mtg_python/mtg_python_deckbuilder/.venv/Scripts/python.exe\ code/scripts/build_theme_catalog.py --output config/themes/theme_list_tmp.json

View file

@ -1,3 +0,0 @@
from code.web.services import orchestrator
orchestrator._ensure_setup_ready(print, force=False)
print('DONE')

View file

@ -1759,6 +1759,7 @@ class DeckBuilder(
entry['Synergy'] = synergy
else:
# If no tags passed attempt enrichment from filtered pool first, then full snapshot
metadata_tags: list[str] = []
if not tags:
# Use filtered pool (_combined_cards_df) instead of unfiltered (_full_cards_df)
# This ensures exclude filtering is respected during card enrichment
@ -1774,6 +1775,13 @@ class DeckBuilder(
# tolerate comma separated
parts = [p.strip().strip("'\"") for p in raw_tags.split(',')]
tags = [p for p in parts if p]
# M5: Extract metadata tags for web UI display
raw_meta = row_match.iloc[0].get('metadataTags', [])
if isinstance(raw_meta, list):
metadata_tags = [str(t).strip() for t in raw_meta if str(t).strip()]
elif isinstance(raw_meta, str) and raw_meta.strip():
parts = [p.strip().strip("'\"") for p in raw_meta.split(',')]
metadata_tags = [p for p in parts if p]
except Exception:
pass
# Enrich missing type and mana_cost for accurate categorization
@ -1811,6 +1819,7 @@ class DeckBuilder(
'Mana Value': mana_value,
'Creature Types': creature_types,
'Tags': tags,
'MetadataTags': metadata_tags, # M5: Store metadata tags for web UI
'Commander': is_commander,
'Count': 1,
'Role': (role or ('commander' if is_commander else None)),

View file

@ -438,7 +438,7 @@ DEFAULT_REMOVAL_COUNT: Final[int] = 10 # Default number of spot removal spells
DEFAULT_WIPES_COUNT: Final[int] = 2 # Default number of board wipes
DEFAULT_CARD_ADVANTAGE_COUNT: Final[int] = 10 # Default number of card advantage pieces
DEFAULT_PROTECTION_COUNT: Final[int] = 8 # Default number of protection spells
DEFAULT_PROTECTION_COUNT: Final[int] = 8 # Default number of protective effects (hexproof, indestructible, protection, ward, etc.)
# Deck composition prompts
DECK_COMPOSITION_PROMPTS: Final[Dict[str, str]] = {
@ -450,7 +450,7 @@ DECK_COMPOSITION_PROMPTS: Final[Dict[str, str]] = {
'removal': 'Enter desired number of spot removal spells (default: 10):',
'wipes': 'Enter desired number of board wipes (default: 2):',
'card_advantage': 'Enter desired number of card advantage pieces (default: 10):',
'protection': 'Enter desired number of protection spells (default: 8):',
'protection': 'Enter desired number of protective effects (default: 8):',
'max_deck_price': 'Enter maximum total deck price in dollars (default: 400.0):',
'max_card_price': 'Enter maximum price per card in dollars (default: 20.0):'
}
@ -511,7 +511,7 @@ DEFAULT_THEME_TAGS = [
'Combat Matters', 'Control', 'Counters Matter', 'Energy',
'Enter the Battlefield', 'Equipment', 'Exile Matters', 'Infect',
'Interaction', 'Lands Matter', 'Leave the Battlefield', 'Legends Matter',
'Life Matters', 'Mill', 'Monarch', 'Protection', 'Ramp', 'Reanimate',
'Life Matters', 'Mill', 'Monarch', 'Protective Effects', 'Ramp', 'Reanimate',
'Removal', 'Sacrifice Matters', 'Spellslinger', 'Stax', 'Superfriends',
'Theft', 'Token Creation', 'Tokens Matter', 'Voltron', 'X Spells'
]

View file

@ -539,6 +539,10 @@ class SpellAdditionMixin:
"""Add protection spells to the deck.
Selects cards tagged as 'protection', prioritizing by EDHREC rank and mana value.
Avoids duplicates and commander card.
M5: When TAG_PROTECTION_SCOPE is enabled, filters to include only cards that
protect your board (Your Permanents:, {Type} Gain) and excludes self-only or
opponent protection cards.
"""
target = self.ideal_counts.get('protection', 0)
if target <= 0 or self._combined_cards_df is None:
@ -546,14 +550,88 @@ class SpellAdditionMixin:
already = {n.lower() for n in self.card_library.keys()}
df = self._combined_cards_df.copy()
df['_ltags'] = df.get('themeTags', []).apply(bu.normalize_tag_cell)
pool = df[df['_ltags'].apply(lambda tags: any('protection' in t for t in tags))]
# M5: Apply scope-based filtering if enabled
import settings as s
if getattr(s, 'TAG_PROTECTION_SCOPE', True):
# Check metadata tags for scope information
df['_meta_tags'] = df.get('metadataTags', []).apply(bu.normalize_tag_cell)
def is_board_relevant_protection(row):
"""Check if protection card helps protect your board.
Includes:
- Cards with "Your Permanents:" metadata (board-wide protection)
- Cards with "Blanket:" metadata (affects all permanents)
- Cards with "Targeted:" metadata (can target your stuff)
- Legacy cards without metadata tags
Excludes:
- "Self:" protection (only protects itself)
- "Opponent Permanents:" protection (helps opponents)
- Type-specific grants like "Knights Gain" (too narrow, handled by kindred synergies)
"""
theme_tags = row.get('_ltags', [])
meta_tags = row.get('_meta_tags', [])
# First check if it has general protection tag
has_protection = any('protection' in t for t in theme_tags)
if not has_protection:
return False
# INCLUDE: Board-relevant scopes
# "Your Permanents:", "Blanket:", "Targeted:"
has_board_scope = any(
'your permanents:' in t or 'blanket:' in t or 'targeted:' in t
for t in meta_tags
)
# EXCLUDE: Self-only, opponent protection, or type-specific grants
# Check for type-specific grants FIRST (highest priority exclusion)
has_type_specific = any(
' gain ' in t.lower() # "Knights Gain", "Treefolk Gain", etc.
for t in meta_tags
)
has_excluded_scope = any(
'self:' in t or
'opponent permanents:' in t
for t in meta_tags
)
# Include if board-relevant, or if no scope tags (legacy cards)
# ALWAYS exclude type-specific grants (too narrow for general protection)
if meta_tags:
# Has metadata - use it for filtering
# Exclude if type-specific OR self/opponent
if has_type_specific or has_excluded_scope:
return False
# Otherwise include if board-relevant
return has_board_scope
else:
# No metadata - legacy card, include by default
return True
pool = df[df.apply(is_board_relevant_protection, axis=1)]
# Log scope filtering stats
original_count = len(df[df['_ltags'].apply(lambda tags: any('protection' in t for t in tags))])
filtered_count = len(pool)
if original_count > filtered_count:
self.output_func(f"Protection scope filter: {filtered_count}/{original_count} cards (excluded {original_count - filtered_count} self-only/opponent cards)")
else:
# Legacy behavior: include all cards with 'protection' tag
pool = df[df['_ltags'].apply(lambda tags: any('protection' in t for t in tags))]
pool = pool[~pool['type'].fillna('').str.contains('Land', case=False, na=False)]
commander_name = getattr(self, 'commander', None)
if commander_name:
pool = pool[pool['name'] != commander_name]
pool = self._apply_bracket_pre_filters(pool)
pool = bu.sort_by_priority(pool, ['edhrecRank','manaValue'])
self._debug_dump_pool(pool, 'protection')
try:
if str(os.getenv('DEBUG_SPELL_POOLS', '')).strip().lower() in {"1","true","yes","on"}:
names = pool['name'].astype(str).head(30).tolist()
@ -580,6 +658,48 @@ class SpellAdditionMixin:
if existing >= target and to_add == 0:
return
target = to_add if existing < target else to_add
# M5: Limit pool size to manageable tier-based selection
# Strategy: Top tier (3x target) + random deeper selection
# This keeps the pool focused on high-quality options (~50-70 cards typical)
original_pool_size = len(pool)
if len(pool) > 0 and target > 0:
try:
# Tier 1: Top quality cards (3x target count)
tier1_size = min(3 * target, len(pool))
tier1 = pool.head(tier1_size).copy()
# Tier 2: Random additional cards from remaining pool (10-20 cards)
if len(pool) > tier1_size:
remaining_pool = pool.iloc[tier1_size:].copy()
tier2_size = min(
self.rng.randint(10, 20) if hasattr(self, 'rng') and self.rng else 15,
len(remaining_pool)
)
if hasattr(self, 'rng') and self.rng and len(remaining_pool) > tier2_size:
# Use random.sample() to select random indices from the remaining pool
tier2_indices = self.rng.sample(range(len(remaining_pool)), tier2_size)
tier2 = remaining_pool.iloc[tier2_indices]
else:
tier2 = remaining_pool.head(tier2_size)
pool = tier1._append(tier2, ignore_index=True)
else:
pool = tier1
if len(pool) != original_pool_size:
self.output_func(f"Protection pool limited: {len(pool)}/{original_pool_size} cards (tier1: {tier1_size}, tier2: {len(pool) - tier1_size})")
except Exception as e:
self.output_func(f"Warning: Pool limiting failed, using full pool: {e}")
# Shuffle pool for variety across builds (using seeded RNG for determinism)
try:
if hasattr(self, 'rng') and self.rng is not None:
pool_list = pool.to_dict('records')
self.rng.shuffle(pool_list)
import pandas as pd
pool = pd.DataFrame(pool_list)
except Exception:
pass
added = 0
added_names: List[str] = []
for _, r in pool.iterrows():

View file

@ -878,7 +878,7 @@ class ReportingMixin:
headers = [
"Name","Count","Type","ManaCost","ManaValue","Colors","Power","Toughness",
"Role","SubRole","AddedBy","TriggerTag","Synergy","Tags","Text","DFCNote","Owned"
"Role","SubRole","AddedBy","TriggerTag","Synergy","Tags","MetadataTags","Text","DFCNote","Owned"
]
header_suffix: List[str] = []
@ -946,6 +946,9 @@ class ReportingMixin:
role = info.get('Role', '') or ''
tags = info.get('Tags', []) or []
tags_join = '; '.join(tags)
# M5: Include metadata tags in export
metadata_tags = info.get('MetadataTags', []) or []
metadata_tags_join = '; '.join(metadata_tags)
text_field = ''
colors = ''
power = ''
@ -1014,6 +1017,7 @@ class ReportingMixin:
info.get('TriggerTag') or '',
info.get('Synergy') if info.get('Synergy') is not None else '',
tags_join,
metadata_tags_join, # M5: Include metadata tags
text_field[:800] if isinstance(text_field, str) else str(text_field)[:800],
dfc_note,
owned_flag

View file

@ -2,7 +2,23 @@
This module provides the main setup functionality for the MTG Python Deckbuilder
application. It handles initial setup tasks such as downloading card data,
creating color-filtered card lists, and generating commander-eligible card lists.
creating color-filtered card lists, and gener logger.info(f'Downloading latest card data for {color} cards')
download_cards_csv(MTGJSON_API_URL, f'{CSV_DIRECTORY}/cards.csv')
logger.info('Loading and processing card data')
try:
df = pd.read_csv(f'{CSV_DIRECTORY}/cards.csv', low_memory=False)
except pd.errors.ParserError as e:
logger.warning(f'CSV parsing error encountered: {e}. Retrying with error handling...')
df = pd.read_csv(
f'{CSV_DIRECTORY}/cards.csv',
low_memory=False,
on_bad_lines='warn', # Warn about malformed rows but continue
encoding_errors='replace' # Replace bad encoding chars
)
logger.info('Successfully loaded card data with error handling (some rows may have been skipped)')
logger.info(f'Regenerating {color} cards CSV')der-eligible card lists.
Key Features:
- Initial setup and configuration
@ -197,7 +213,17 @@ def regenerate_csvs_all() -> None:
download_cards_csv(MTGJSON_API_URL, f'{CSV_DIRECTORY}/cards.csv')
logger.info('Loading and processing card data')
df = pd.read_csv(f'{CSV_DIRECTORY}/cards.csv', low_memory=False)
try:
df = pd.read_csv(f'{CSV_DIRECTORY}/cards.csv', low_memory=False)
except pd.errors.ParserError as e:
logger.warning(f'CSV parsing error encountered: {e}. Retrying with error handling...')
df = pd.read_csv(
f'{CSV_DIRECTORY}/cards.csv',
low_memory=False,
on_bad_lines='warn', # Warn about malformed rows but continue
encoding_errors='replace' # Replace bad encoding chars
)
logger.info(f'Successfully loaded card data with error handling (some rows may have been skipped)')
logger.info('Regenerating color identity sorted files')
save_color_filtered_csvs(df, CSV_DIRECTORY)
@ -234,7 +260,12 @@ def regenerate_csv_by_color(color: str) -> None:
download_cards_csv(MTGJSON_API_URL, f'{CSV_DIRECTORY}/cards.csv')
logger.info('Loading and processing card data')
df = pd.read_csv(f'{CSV_DIRECTORY}/cards.csv', low_memory=False)
df = pd.read_csv(
f'{CSV_DIRECTORY}/cards.csv',
low_memory=False,
on_bad_lines='skip', # Skip malformed rows (MTGJSON CSV has escaping issues)
encoding_errors='replace' # Replace bad encoding chars
)
logger.info(f'Regenerating {color} cards CSV')
# Use shared utilities to base-filter once then slice color, honoring bans

View file

@ -0,0 +1,203 @@
"""
Full audit of Protection-tagged cards with kindred metadata support (M2 Phase 2).
Created: October 8, 2025
Purpose: Audit and validate Protection tag precision after implementing grant detection.
Can be re-run periodically to check tagging quality.
This script audits ALL Protection-tagged cards and categorizes them:
- Grant: Gives broad protection to other permanents YOU control
- Kindred: Gives protection to specific creature types (metadata tags)
- Mixed: Both broad and kindred/inherent
- Inherent: Only has protection itself
- ConditionalSelf: Only conditionally grants to itself
- Opponent: Grants to opponent's permanents
- Neither: False positive
Outputs:
- m2_audit_v2.json: Full analysis with summary
- m2_audit_v2_grant.csv: Cards for main Protection tag
- m2_audit_v2_kindred.csv: Cards for kindred metadata tags
- m2_audit_v2_mixed.csv: Cards with both broad and kindred grants
- m2_audit_v2_conditional.csv: Conditional self-grants (exclude)
- m2_audit_v2_inherent.csv: Inherent protection only (exclude)
- m2_audit_v2_opponent.csv: Opponent grants (exclude)
- m2_audit_v2_neither.csv: False positives (exclude)
- m2_audit_v2_all.csv: All cards combined
"""
import sys
from pathlib import Path
import pandas as pd
import json
# Add project root to path
project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root))
from code.tagging.protection_grant_detection import (
categorize_protection_card,
get_kindred_protection_tags,
is_granting_protection,
)
def load_all_cards():
"""Load all cards from color/identity CSV files."""
csv_dir = project_root / 'csv_files'
# Get all color/identity CSVs (not the raw cards.csv)
csv_files = list(csv_dir.glob('*_cards.csv'))
csv_files = [f for f in csv_files if f.stem not in ['cards', 'testdata']]
all_cards = []
for csv_file in csv_files:
try:
df = pd.read_csv(csv_file)
all_cards.append(df)
except Exception as e:
print(f"Warning: Could not load {csv_file.name}: {e}")
# Combine all DataFrames
combined = pd.concat(all_cards, ignore_index=True)
# Drop duplicates (cards appear in multiple color files)
combined = combined.drop_duplicates(subset=['name'], keep='first')
return combined
def audit_all_protection_cards():
"""Audit all Protection-tagged cards."""
print("Loading all cards...")
df = load_all_cards()
print(f"Total cards loaded: {len(df)}")
# Filter to Protection-tagged cards (column is 'themeTags' in color CSVs)
df_prot = df[df['themeTags'].str.contains('Protection', case=False, na=False)].copy()
print(f"Protection-tagged cards: {len(df_prot)}")
# Categorize each card
categories = []
grants_list = []
kindred_tags_list = []
for idx, row in df_prot.iterrows():
name = row['name']
text = str(row.get('text', '')).replace('\\n', '\n') # Convert escaped newlines to real newlines
keywords = str(row.get('keywords', ''))
card_type = str(row.get('type', ''))
# Categorize with kindred exclusion enabled
category = categorize_protection_card(name, text, keywords, card_type, exclude_kindred=True)
# Check if it grants broadly
grants_broad = is_granting_protection(text, keywords, exclude_kindred=True)
# Get kindred tags
kindred_tags = get_kindred_protection_tags(text)
categories.append(category)
grants_list.append(grants_broad)
kindred_tags_list.append(', '.join(sorted(kindred_tags)) if kindred_tags else '')
df_prot['category'] = categories
df_prot['grants_broad'] = grants_list
df_prot['kindred_tags'] = kindred_tags_list
# Generate summary (convert numpy types to native Python for JSON serialization)
summary = {
'total': int(len(df_prot)),
'categories': {k: int(v) for k, v in df_prot['category'].value_counts().to_dict().items()},
'grants_broad_count': int(df_prot['grants_broad'].sum()),
'kindred_cards_count': int((df_prot['kindred_tags'] != '').sum()),
}
# Calculate keep vs remove
keep_categories = {'Grant', 'Mixed'}
kindred_only = df_prot[df_prot['category'] == 'Kindred']
keep_count = len(df_prot[df_prot['category'].isin(keep_categories)])
remove_count = len(df_prot[~df_prot['category'].isin(keep_categories | {'Kindred'})])
summary['keep_main_tag'] = keep_count
summary['kindred_metadata'] = len(kindred_only)
summary['remove'] = remove_count
summary['precision_estimate'] = round((keep_count / len(df_prot)) * 100, 1) if len(df_prot) > 0 else 0
# Print summary
print(f"\n{'='*60}")
print("AUDIT SUMMARY")
print(f"{'='*60}")
print(f"Total Protection-tagged cards: {summary['total']}")
print(f"\nCategories:")
for cat, count in sorted(summary['categories'].items()):
pct = (count / summary['total']) * 100
print(f" {cat:20s} {count:4d} ({pct:5.1f}%)")
print(f"\n{'='*60}")
print(f"Main Protection tag: {keep_count:4d} ({keep_count/len(df_prot)*100:5.1f}%)")
print(f"Kindred metadata only: {len(kindred_only):4d} ({len(kindred_only)/len(df_prot)*100:5.1f}%)")
print(f"Remove: {remove_count:4d} ({remove_count/len(df_prot)*100:5.1f}%)")
print(f"{'='*60}")
print(f"Precision estimate: {summary['precision_estimate']}%")
print(f"{'='*60}\n")
# Export results
output_dir = project_root / 'logs' / 'roadmaps' / 'source' / 'tagging_refinement'
output_dir.mkdir(parents=True, exist_ok=True)
# Export JSON summary
with open(output_dir / 'm2_audit_v2.json', 'w') as f:
json.dump({
'summary': summary,
'cards': df_prot[['name', 'type', 'category', 'grants_broad', 'kindred_tags', 'keywords', 'text']].to_dict(orient='records')
}, f, indent=2)
# Export CSVs by category
export_cols = ['name', 'type', 'category', 'grants_broad', 'kindred_tags', 'keywords', 'text']
# Grant category
df_grant = df_prot[df_prot['category'] == 'Grant']
df_grant[export_cols].to_csv(output_dir / 'm2_audit_v2_grant.csv', index=False)
print(f"Exported {len(df_grant)} Grant cards to m2_audit_v2_grant.csv")
# Kindred category
df_kindred = df_prot[df_prot['category'] == 'Kindred']
df_kindred[export_cols].to_csv(output_dir / 'm2_audit_v2_kindred.csv', index=False)
print(f"Exported {len(df_kindred)} Kindred cards to m2_audit_v2_kindred.csv")
# Mixed category
df_mixed = df_prot[df_prot['category'] == 'Mixed']
df_mixed[export_cols].to_csv(output_dir / 'm2_audit_v2_mixed.csv', index=False)
print(f"Exported {len(df_mixed)} Mixed cards to m2_audit_v2_mixed.csv")
# ConditionalSelf category
df_conditional = df_prot[df_prot['category'] == 'ConditionalSelf']
df_conditional[export_cols].to_csv(output_dir / 'm2_audit_v2_conditional.csv', index=False)
print(f"Exported {len(df_conditional)} ConditionalSelf cards to m2_audit_v2_conditional.csv")
# Inherent category
df_inherent = df_prot[df_prot['category'] == 'Inherent']
df_inherent[export_cols].to_csv(output_dir / 'm2_audit_v2_inherent.csv', index=False)
print(f"Exported {len(df_inherent)} Inherent cards to m2_audit_v2_inherent.csv")
# Opponent category
df_opponent = df_prot[df_prot['category'] == 'Opponent']
df_opponent[export_cols].to_csv(output_dir / 'm2_audit_v2_opponent.csv', index=False)
print(f"Exported {len(df_opponent)} Opponent cards to m2_audit_v2_opponent.csv")
# Neither category
df_neither = df_prot[df_prot['category'] == 'Neither']
df_neither[export_cols].to_csv(output_dir / 'm2_audit_v2_neither.csv', index=False)
print(f"Exported {len(df_neither)} Neither cards to m2_audit_v2_neither.csv")
# All cards
df_prot[export_cols].to_csv(output_dir / 'm2_audit_v2_all.csv', index=False)
print(f"Exported {len(df_prot)} total cards to m2_audit_v2_all.csv")
print(f"\nAll files saved to: {output_dir}")
return df_prot, summary
if __name__ == '__main__':
df_results, summary = audit_all_protection_cards()

View file

@ -1,6 +1,7 @@
from __future__ import annotations
# Standard library imports
import os
from typing import Dict, List, Optional
# ----------------------------------------------------------------------------------
@ -98,4 +99,20 @@ CSV_DIRECTORY: str = 'csv_files'
FILL_NA_COLUMNS: Dict[str, Optional[str]] = {
'colorIdentity': 'Colorless', # Default color identity for cards without one
'faceName': None # Use card's name column value when face name is not available
}
}
# ----------------------------------------------------------------------------------
# TAGGING REFINEMENT FEATURE FLAGS (M1-M5)
# ----------------------------------------------------------------------------------
# M1: Enable keyword normalization and singleton pruning (completed)
TAG_NORMALIZE_KEYWORDS = os.getenv('TAG_NORMALIZE_KEYWORDS', '1').lower() not in ('0', 'false', 'off', 'disabled')
# M2: Enable protection grant detection (completed)
TAG_PROTECTION_GRANTS = os.getenv('TAG_PROTECTION_GRANTS', '1').lower() not in ('0', 'false', 'off', 'disabled')
# M3: Enable metadata/theme partition (completed)
TAG_METADATA_SPLIT = os.getenv('TAG_METADATA_SPLIT', '1').lower() not in ('0', 'false', 'off', 'disabled')
# M5: Enable protection scope filtering in deck builder (completed - Phase 1-3, in progress Phase 4+)
TAG_PROTECTION_SCOPE = os.getenv('TAG_PROTECTION_SCOPE', '1').lower() not in ('0', 'false', 'off', 'disabled')

View file

@ -1,9 +1,11 @@
from __future__ import annotations
# Standard library imports
import json
from pathlib import Path
from typing import Dict, Iterable, Set
# Third-party imports
import pandas as pd
def _ensure_norm_series(df: pd.DataFrame, source_col: str, norm_col: str) -> pd.Series:

View file

@ -1,9 +1,11 @@
from __future__ import annotations
# Standard library imports
import json
from pathlib import Path
from typing import List, Optional
import json
# Third-party imports
from pydantic import BaseModel, Field

View file

@ -1,14 +1,17 @@
from __future__ import annotations
import json
# Standard library imports
import ast
import json
from collections import defaultdict
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Set, DefaultDict
from collections import defaultdict
from typing import DefaultDict, Dict, List, Set
# Third-party imports
import pandas as pd
# Local application imports
from settings import CSV_DIRECTORY, SETUP_COLORS

View file

@ -73,6 +73,132 @@ def load_merge_summary() -> Dict[str, Any]:
return {"updated_at": None, "colors": {}}
def _merge_tag_columns(work_df: pd.DataFrame, group_sorted: pd.DataFrame, primary_idx: int) -> None:
"""Merge list columns (themeTags, roleTags) into union values.
Args:
work_df: Working DataFrame to update
group_sorted: Sorted group of faces for a multi-face card
primary_idx: Index of primary face to update
"""
for column in _LIST_UNION_COLUMNS:
if column in group_sorted.columns:
union_values = _merge_object_lists(group_sorted[column])
work_df.at[primary_idx, column] = union_values
if "keywords" in group_sorted.columns:
keyword_union = _merge_keywords(group_sorted["keywords"])
work_df.at[primary_idx, "keywords"] = _join_keywords(keyword_union)
def _build_face_payload(face_row: pd.Series) -> Dict[str, Any]:
"""Build face metadata payload from a single face row.
Args:
face_row: Single face row from grouped DataFrame
Returns:
Dictionary containing face metadata
"""
text_val = face_row.get("text") or face_row.get("oracleText") or ""
mana_cost_val = face_row.get("manaCost", face_row.get("mana_cost", "")) or ""
mana_value_raw = face_row.get("manaValue", face_row.get("mana_value", ""))
try:
if mana_value_raw in (None, ""):
mana_value_val = None
else:
mana_value_val = float(mana_value_raw)
if math.isnan(mana_value_val):
mana_value_val = None
except Exception:
mana_value_val = None
type_val = face_row.get("type", "") or ""
return {
"face": str(face_row.get("faceName") or face_row.get("name") or ""),
"side": str(face_row.get("side") or ""),
"layout": str(face_row.get("layout") or ""),
"themeTags": _merge_object_lists([face_row.get("themeTags", [])]),
"roleTags": _merge_object_lists([face_row.get("roleTags", [])]),
"type": str(type_val),
"text": str(text_val),
"mana_cost": str(mana_cost_val),
"mana_value": mana_value_val,
"produces_mana": _text_produces_mana(text_val),
"is_land": 'land' in str(type_val).lower(),
}
def _build_merge_detail(name: str, group_sorted: pd.DataFrame, faces_payload: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Build detailed merge information for a multi-face card group.
Args:
name: Card name
group_sorted: Sorted group of faces
faces_payload: List of face metadata dictionaries
Returns:
Dictionary containing merge details
"""
layout_set = sorted({f.get("layout", "") for f in faces_payload if f.get("layout")})
removed_faces = faces_payload[1:] if len(faces_payload) > 1 else []
return {
"name": name,
"total_faces": len(group_sorted),
"dropped_faces": max(len(group_sorted) - 1, 0),
"layouts": layout_set,
"primary_face": faces_payload[0] if faces_payload else {},
"removed_faces": removed_faces,
"theme_tags": sorted({tag for face in faces_payload for tag in face.get("themeTags", [])}),
"role_tags": sorted({tag for face in faces_payload for tag in face.get("roleTags", [])}),
"faces": faces_payload,
}
def _log_merge_summary(color: str, merged_count: int, drop_count: int, multi_face_count: int, logger) -> None:
"""Log merge summary with structured and human-readable formats.
Args:
color: Color being processed
merged_count: Number of card groups merged
drop_count: Number of face rows dropped
multi_face_count: Total multi-face rows processed
logger: Logger instance
"""
try:
logger.info(
"dfc_merge_summary %s",
json.dumps(
{
"event": "dfc_merge_summary",
"color": color,
"groups_merged": merged_count,
"faces_dropped": drop_count,
"multi_face_rows": multi_face_count,
},
sort_keys=True,
),
)
except Exception:
logger.info(
"dfc_merge_summary event=%s groups=%d dropped=%d rows=%d",
color,
merged_count,
drop_count,
multi_face_count,
)
logger.info(
"Merged %d multi-face card groups for %s (dropped %d extra faces)",
merged_count,
color,
drop_count,
)
def merge_multi_face_rows(
df: pd.DataFrame,
color: str,
@ -93,7 +219,6 @@ def merge_multi_face_rows(
return df
work_df = df.copy()
layout_series = work_df["layout"].fillna("").astype(str).str.lower()
multi_mask = layout_series.isin(_MULTI_FACE_LAYOUTS)
@ -110,66 +235,15 @@ def merge_multi_face_rows(
group_sorted = _sort_faces(group)
primary_idx = group_sorted.index[0]
faces_payload: List[Dict[str, Any]] = []
for column in _LIST_UNION_COLUMNS:
if column in group_sorted.columns:
union_values = _merge_object_lists(group_sorted[column])
work_df.at[primary_idx, column] = union_values
_merge_tag_columns(work_df, group_sorted, primary_idx)
if "keywords" in group_sorted.columns:
keyword_union = _merge_keywords(group_sorted["keywords"])
work_df.at[primary_idx, "keywords"] = _join_keywords(keyword_union)
for _, face_row in group_sorted.iterrows():
text_val = face_row.get("text") or face_row.get("oracleText") or ""
mana_cost_val = face_row.get("manaCost", face_row.get("mana_cost", "")) or ""
mana_value_raw = face_row.get("manaValue", face_row.get("mana_value", ""))
try:
if mana_value_raw in (None, ""):
mana_value_val = None
else:
mana_value_val = float(mana_value_raw)
if math.isnan(mana_value_val):
mana_value_val = None
except Exception:
mana_value_val = None
type_val = face_row.get("type", "") or ""
faces_payload.append(
{
"face": str(face_row.get("faceName") or face_row.get("name") or ""),
"side": str(face_row.get("side") or ""),
"layout": str(face_row.get("layout") or ""),
"themeTags": _merge_object_lists([face_row.get("themeTags", [])]),
"roleTags": _merge_object_lists([face_row.get("roleTags", [])]),
"type": str(type_val),
"text": str(text_val),
"mana_cost": str(mana_cost_val),
"mana_value": mana_value_val,
"produces_mana": _text_produces_mana(text_val),
"is_land": 'land' in str(type_val).lower(),
}
)
for idx in group_sorted.index[1:]:
drop_indices.append(idx)
faces_payload = [_build_face_payload(row) for _, row in group_sorted.iterrows()]
drop_indices.extend(group_sorted.index[1:])
merged_count += 1
layout_set = sorted({f.get("layout", "") for f in faces_payload if f.get("layout")})
removed_faces = faces_payload[1:] if len(faces_payload) > 1 else []
merge_details.append(
{
"name": name,
"total_faces": len(group_sorted),
"dropped_faces": max(len(group_sorted) - 1, 0),
"layouts": layout_set,
"primary_face": faces_payload[0] if faces_payload else {},
"removed_faces": removed_faces,
"theme_tags": sorted({tag for face in faces_payload for tag in face.get("themeTags", [])}),
"role_tags": sorted({tag for face in faces_payload for tag in face.get("roleTags", [])}),
"faces": faces_payload,
}
)
merge_details.append(_build_merge_detail(name, group_sorted, faces_payload))
if drop_indices:
work_df = work_df.drop(index=drop_indices)
@ -192,38 +266,10 @@ def merge_multi_face_rows(
logger.warning("Failed to record DFC merge summary for %s: %s", color, exc)
if logger is not None:
try:
logger.info(
"dfc_merge_summary %s",
json.dumps(
{
"event": "dfc_merge_summary",
"color": color,
"groups_merged": merged_count,
"faces_dropped": len(drop_indices),
"multi_face_rows": int(multi_mask.sum()),
},
sort_keys=True,
),
)
except Exception:
logger.info(
"dfc_merge_summary event=%s groups=%d dropped=%d rows=%d",
color,
merged_count,
len(drop_indices),
int(multi_mask.sum()),
)
logger.info(
"Merged %d multi-face card groups for %s (dropped %d extra faces)",
merged_count,
color,
len(drop_indices),
)
_log_merge_summary(color, merged_count, len(drop_indices), int(multi_mask.sum()), logger)
_persist_merge_summary(color, summary_payload, logger)
# Reset index to keep downstream expectations consistent.
return work_df.reset_index(drop=True)

View file

@ -0,0 +1,213 @@
"""
Phasing Scope Detection Module
Detects the scope of phasing effects with multiple dimensions:
- Targeted: Phasing (any targeting effect)
- Self: Phasing (phases itself out)
- Your Permanents: Phasing (phases your permanents out)
- Opponent Permanents: Phasing (phases opponent permanents - removal)
- Blanket: Phasing (phases all permanents out)
Cards can have multiple scope tags (e.g., Targeted + Your Permanents).
Refactored in M2: Create Scope Detection Utilities to use generic scope detection.
"""
# Standard library imports
import re
from typing import Set
# Local application imports
from . import scope_detection_utils as scope_utils
from code.logging_util import get_logger
logger = get_logger(__name__)
# Phasing scope pattern definitions
def _get_phasing_scope_patterns() -> scope_utils.ScopePatterns:
"""
Build scope patterns for phasing abilities.
Returns:
ScopePatterns object with compiled patterns
"""
# Targeting patterns (special for phasing - detects "target...phases out")
targeting_patterns = [
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out', re.IGNORECASE),
re.compile(r'target\s+player\s+controls[^.]*phases?\s+out', re.IGNORECASE),
]
# Self-reference patterns
self_patterns = [
re.compile(r'this\s+(?:creature|permanent|artifact|enchantment)\s+phases?\s+out', re.IGNORECASE),
re.compile(r'~\s+phases?\s+out', re.IGNORECASE),
# Triggered self-phasing (King of the Oathbreakers)
re.compile(r'whenever.*(?:becomes\s+the\s+target|becomes\s+target).*(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
# Consequent self-phasing (Cyclonus: "connive. Then...phase out")
re.compile(r'(?:then|,)\s+(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
# At end of turn/combat self-phasing
re.compile(r'(?:at\s+(?:the\s+)?end\s+of|after).*(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
]
# Opponent patterns
opponent_patterns = [
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent)\s+an?\s+opponents?\s+controls?\s+phases?\s+out', re.IGNORECASE),
# Unqualified targets (can target opponents' stuff if no "you control" restriction)
re.compile(r'(?:up\s+to\s+)?(?:one\s+|x\s+|that\s+many\s+)?(?:other\s+)?(?:another\s+)?target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out', re.IGNORECASE),
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|land|nonland\s+permanent)(?:,|\s+and)?\s+(?:then|and)?\s+it\s+phases?\s+out', re.IGNORECASE),
]
# Your permanents patterns
your_patterns = [
# Explicit "you control"
re.compile(r'(?:target\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'permanents?\s+you\s+control\s+phase\s+out', re.IGNORECASE),
re.compile(r'(?:any|up\s+to)\s+(?:number\s+of\s+)?(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'all\s+(?:creatures?|permanents?)\s+you\s+control\s+phase\s+out', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
# Pronoun reference to "you control" context
re.compile(r'(?:creatures?|permanents?|planeswalkers?)\s+you\s+control[^.]*(?:those|the)\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out', re.IGNORECASE),
re.compile(r'creature\s+you\s+control[^.]*(?:it)\s+phases?\s+out', re.IGNORECASE),
re.compile(r'you\s+control.*those\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out', re.IGNORECASE),
# Equipment/Aura
re.compile(r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out', re.IGNORECASE),
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out', re.IGNORECASE),
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:has|gains?)\s+phasing', re.IGNORECASE),
re.compile(r'(?:equipped|enchanted)\s+(?:creature|permanent)[^.]*,?\s+(?:then\s+)?that\s+(?:creature|permanent)\s+phases?\s+out', re.IGNORECASE),
# Target controlled by specific player
re.compile(r'(?:each|target)\s+(?:creature|permanent)\s+target\s+player\s+controls\s+phases?\s+out', re.IGNORECASE),
]
# Blanket patterns
blanket_patterns = [
re.compile(r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)(?:\s+of\s+that\s+type)?\s+(?:[^.]*\s+)?phase\s+out', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+(?:[^.]*\s+)?phases?\s+out', re.IGNORECASE),
# Type-specific blanket (Shimmer)
re.compile(r'each\s+(?:land|creature|permanent|artifact|enchantment)\s+of\s+the\s+chosen\s+type\s+has\s+phasing', re.IGNORECASE),
re.compile(r'(?:lands?|creatures?|permanents?|artifacts?|enchantments?)\s+of\s+the\s+chosen\s+type\s+(?:have|has)\s+phasing', re.IGNORECASE),
# Pronoun reference to "all creatures"
re.compile(r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)[^.]*,?\s+(?:then\s+)?(?:those|the)\s+(?:creatures?|permanents?)\s+phase\s+out', re.IGNORECASE),
]
return scope_utils.ScopePatterns(
opponent=opponent_patterns,
self_ref=self_patterns,
your_permanents=your_patterns,
blanket=blanket_patterns,
targeted=targeting_patterns
)
def get_phasing_scope_tags(text: str, card_name: str, keywords: str = '') -> Set[str]:
"""
Get all phasing scope metadata tags for a card.
A card can have multiple scope tags:
- "Targeted: Phasing" - Uses targeting
- "Self: Phasing" - Phases itself out
- "Your Permanents: Phasing" - Phases your permanents
- "Opponent Permanents: Phasing" - Phases opponent permanents (removal)
- "Blanket: Phasing" - Phases all permanents
Args:
text: Card text
card_name: Card name
keywords: Card keywords (to check for static "Phasing" ability)
Returns:
Set of metadata tags
"""
if not card_name:
return set()
text_lower = text.lower() if text else ''
keywords_lower = keywords.lower() if keywords else ''
tags = set()
# Check for static "Phasing" keyword ability (self-phasing)
# Only add Self tag if card doesn't grant phasing to others
if 'phasing' in keywords_lower:
# Define patterns for checking if card grants phasing to others
grants_pattern = [re.compile(
r'(other|target|each|all|enchanted|equipped|creatures? you control|permanents? you control).*phas',
re.IGNORECASE
)]
is_static = scope_utils.check_static_keyword_legacy(
keywords=keywords,
static_keyword='phasing',
text=text,
grant_patterns=grants_pattern
)
if is_static:
tags.add('Self: Phasing')
return tags # Early return - static keyword only
# Check if phasing is mentioned in text
if 'phas' not in text_lower:
return tags
# Build phasing patterns and detect scopes
patterns = _get_phasing_scope_patterns()
# Detect all scopes (phasing can have multiple)
scopes = scope_utils.detect_multi_scope(
text=text,
card_name=card_name,
ability_keyword='phas', # Use 'phas' to catch both 'phase' and 'phasing'
patterns=patterns,
check_grant_verbs=False # Phasing doesn't need grant verb checking
)
# Format scope tags with "Phasing" ability name
for scope in scopes:
if scope == "Targeted":
tags.add("Targeted: Phasing")
else:
tags.add(scope_utils.format_scope_tag(scope, "Phasing"))
logger.debug(f"Card '{card_name}': detected {scope}: Phasing")
return tags
def has_phasing(text: str) -> bool:
"""
Quick check if card text contains phasing keywords.
Args:
text: Card text
Returns:
True if phasing keyword found
"""
if not text:
return False
text_lower = text.lower()
# Check for phasing keywords
phasing_keywords = [
'phase out',
'phases out',
'phasing',
'phase in',
'phases in',
]
return any(keyword in text_lower for keyword in phasing_keywords)
def is_removal_phasing(tags: Set[str]) -> bool:
"""
Check if phasing effect acts as removal (targets opponent permanents).
Args:
tags: Set of phasing scope tags
Returns:
True if this is removal-style phasing
"""
return "Opponent Permanents: Phasing" in tags

View file

@ -0,0 +1,551 @@
"""
Protection grant detection implementation for M2.
This module provides helpers to distinguish cards that grant protection effects
from cards that have inherent protection effects.
Usage in tagger.py:
from code.tagging.protection_grant_detection import is_granting_protection
if is_granting_protection(text, keywords):
# Tag as Protection
"""
import re
from typing import List, Pattern, Set
from . import regex_patterns as rgx
from . import tag_utils
from .tag_constants import CONTEXT_WINDOW_SIZE, CREATURE_TYPES, PROTECTION_KEYWORDS
# Pre-compile kindred detection patterns at module load for performance
# Pattern: (compiled_regex, tag_name_template)
def _build_kindred_patterns() -> List[tuple[Pattern, str]]:
"""Build pre-compiled kindred patterns for all creature types.
Returns:
List of tuples containing (compiled_pattern, tag_name)
"""
patterns = []
for creature_type in CREATURE_TYPES:
creature_lower = creature_type.lower()
creature_escaped = re.escape(creature_lower)
tag_name = f"{creature_type}s Gain Protection"
pattern_templates = [
rf'\bother {creature_escaped}s?\b.*\b(have|gain)\b',
rf'\b{creature_escaped} creatures?\b.*\b(have|gain)\b',
rf'\btarget {creature_escaped}\b.*\bgains?\b',
]
for pattern_str in pattern_templates:
try:
compiled = re.compile(pattern_str, re.IGNORECASE)
patterns.append((compiled, tag_name))
except re.error:
# Skip patterns that fail to compile
pass
return patterns
KINDRED_PATTERNS: List[tuple[Pattern, str]] = _build_kindred_patterns()
# Grant verb patterns - cards that give protection to other permanents
# These patterns look for grant verbs that affect OTHER permanents, not self
# M5: Added phasing support
# Pre-compiled at module load for performance
GRANT_VERB_PATTERNS: List[Pattern] = [
re.compile(r'\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
re.compile(r'\bgive[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
re.compile(r'\bgrant[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
re.compile(r'\bhave\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "have hexproof" static grants
re.compile(r'\bget[s]?\b.*\+.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "gets +X/+X and has hexproof" direct
re.compile(r'\bget[s]?\b.*\+.*\band\b.*\b(gain[s]?|have)\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "gets +X/+X and gains hexproof"
re.compile(r'\bphases? out\b', re.IGNORECASE), # M5: Direct phasing triggers (e.g., "it phases out")
]
# Self-reference patterns that should NOT count as granting
# Reminder text and keyword lines only
# M5: Added phasing support
# Pre-compiled at module load for performance
SELF_REFERENCE_PATTERNS: List[Pattern] = [
re.compile(r'^\s*(hexproof|shroud|indestructible|ward|protection|phasing)', re.IGNORECASE), # Start of text (keyword ability)
re.compile(r'\([^)]*\b(hexproof|shroud|indestructible|ward|protection|phasing)[^)]*\)', re.IGNORECASE), # Reminder text in parens
]
# Conditional self-grant patterns - activated/triggered abilities that grant to self
# Pre-compiled at module load for performance
CONDITIONAL_SELF_GRANT_PATTERNS: List[Pattern] = [
# Activated abilities
re.compile(r'\{[^}]*\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
re.compile(r'discard.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
re.compile(r'\{t\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
re.compile(r'sacrifice.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
re.compile(r'pay.*life.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
# Triggered abilities that grant to self only
re.compile(r'whenever.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
re.compile(r'whenever you (cast|play|attack|cycle|discard|commit).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
re.compile(r'at the beginning.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
re.compile(r'whenever.*\b(this creature|this permanent)\b (attacks|enters|becomes).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b', re.IGNORECASE),
# Named self-references (e.g., "Pristine Skywise gains")
re.compile(r'whenever you cast.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
re.compile(r'whenever you.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
# Static conditional abilities (as long as, if you control X)
re.compile(r'as long as.*\b(this creature|this permanent|it|has)\b.*(has|gains?).*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
]
# Mass grant patterns - affects multiple creatures YOU control
# Pre-compiled at module load for performance
MASS_GRANT_PATTERNS: List[Pattern] = [
re.compile(r'creatures you control (have|gain|get)', re.IGNORECASE),
re.compile(r'other .* you control (have|gain|get)', re.IGNORECASE),
re.compile(r'(artifacts?|enchantments?|permanents?) you control (have|gain|get)', re.IGNORECASE), # Artifacts you control have...
re.compile(r'other (creatures?|artifacts?|enchantments?) (have|gain|get)', re.IGNORECASE), # Other creatures have...
re.compile(r'all (creatures?|slivers?|permanents?) (have|gain|get)', re.IGNORECASE), # All creatures/slivers have...
]
# Targeted grant patterns - must specify "you control"
# Pre-compiled at module load for performance
TARGETED_GRANT_PATTERNS: List[Pattern] = [
re.compile(r'target .* you control (gains?|gets?|has)', re.IGNORECASE),
re.compile(r'equipped creature (gains?|gets?|has)', re.IGNORECASE),
re.compile(r'enchanted enchantment (gains?|gets?|has)', re.IGNORECASE),
]
# Exclusion patterns - cards that remove or prevent protection
# Pre-compiled at module load for performance
EXCLUSION_PATTERNS: List[Pattern] = [
re.compile(r"can't have (hexproof|indestructible|ward|shroud)", re.IGNORECASE),
re.compile(r"lose[s]? (hexproof|indestructible|ward|shroud|protection)", re.IGNORECASE),
re.compile(r"without (hexproof|indestructible|ward|shroud)", re.IGNORECASE),
re.compile(r"protection from.*can't", re.IGNORECASE),
]
# Opponent grant patterns - grants to opponent's permanents (EXCLUDE these)
# NOTE: "all creatures" and "all permanents" are BLANKET effects (help you too),
# not opponent grants. Only exclude effects that ONLY help opponents.
# Pre-compiled at module load for performance
OPPONENT_GRANT_PATTERNS: List[Pattern] = [
rgx.TARGET_OPPONENT,
rgx.EACH_OPPONENT,
rgx.OPPONENT_CONTROL,
re.compile(r'opponent.*permanents?.*have', re.IGNORECASE), # opponent's permanents have
]
# Blanket grant patterns - affects all permanents regardless of controller
# These are VALID protection grants that should be tagged (Blanket scope in M5)
# Pre-compiled at module load for performance
BLANKET_GRANT_PATTERNS: List[Pattern] = [
re.compile(r'\ball creatures? (have|gain|get)\b', re.IGNORECASE), # All creatures gain hexproof
re.compile(r'\ball permanents? (have|gain|get)\b', re.IGNORECASE), # All permanents gain indestructible
re.compile(r'\beach creature (has|gains?|gets?)\b', re.IGNORECASE), # Each creature gains ward
rgx.EACH_PLAYER, # Each player gains hexproof (very rare but valid blanket)
]
# Kindred-specific grant patterns for metadata tagging
KINDRED_GRANT_PATTERNS = {
'Knights Gain Protection': [
r'knight[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other knight[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Merfolk Gain Protection': [
r'merfolk you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other merfolk.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Zombies Gain Protection': [
r'zombie[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other zombie[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'target.*zombie.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Vampires Gain Protection': [
r'vampire[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other vampire[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Elves Gain Protection': [
r'el(f|ves) you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other el(f|ves).*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Dragons Gain Protection': [
r'dragon[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other dragon[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Goblins Gain Protection': [
r'goblin[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other goblin[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Slivers Gain Protection': [
r'sliver[s]? you control.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'all sliver[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other sliver[s]?.*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Artifacts Gain Protection': [
r'artifact[s]? you control (have|gain).*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other artifact[s]? (have|gain).*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
'Enchantments Gain Protection': [
r'enchantment[s]? you control (have|gain).*\b(hexproof|shroud|indestructible|ward|protection)\b',
r'other enchantment[s]? (have|gain).*\b(hexproof|shroud|indestructible|ward|protection)\b',
],
}
def get_kindred_protection_tags(text: str) -> Set[str]:
"""
Identify kindred-specific protection grants for metadata tagging.
Returns a set of metadata tag names like:
- "Knights Gain Hexproof"
- "Spiders Gain Ward"
- "Artifacts Gain Indestructible"
Uses both predefined patterns and dynamic creature type detection,
with specific ability detection (hexproof, ward, indestructible, shroud, protection).
IMPORTANT: Only tags the specific abilities that appear in the same sentence
as the creature type grant to avoid false positives like Svyelun.
"""
if not text:
return set()
text_lower = text.lower()
tags = set()
# Only proceed if protective abilities are present (performance optimization)
protective_abilities = ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']
if not any(keyword in text_lower for keyword in protective_abilities):
return tags
for tag_base, patterns in KINDRED_GRANT_PATTERNS.items():
for pattern in patterns:
pattern_compiled = re.compile(pattern, re.IGNORECASE) if isinstance(pattern, str) else pattern
match = pattern_compiled.search(text_lower)
if match:
creature_type = tag_base.split(' Gain ')[0]
# Get the matched text to check which abilities are in this specific grant
matched_text = match.group(0)
# Only tag abilities that appear in the matched phrase
if 'hexproof' in matched_text:
tags.add(f"{creature_type} Gain Hexproof")
if 'shroud' in matched_text:
tags.add(f"{creature_type} Gain Shroud")
if 'indestructible' in matched_text:
tags.add(f"{creature_type} Gain Indestructible")
if 'ward' in matched_text:
tags.add(f"{creature_type} Gain Ward")
if 'protection' in matched_text:
tags.add(f"{creature_type} Gain Protection")
break # Found match for this kindred type, move to next
# Use pre-compiled patterns for all creature types
for compiled_pattern, tag_template in KINDRED_PATTERNS:
match = compiled_pattern.search(text_lower)
if match:
creature_type = tag_template.split(' Gain ')[0]
# Get the matched text to check which abilities are in this specific grant
matched_text = match.group(0)
# Only tag abilities that appear in the matched phrase
if 'hexproof' in matched_text:
tags.add(f"{creature_type} Gain Hexproof")
if 'shroud' in matched_text:
tags.add(f"{creature_type} Gain Shroud")
if 'indestructible' in matched_text:
tags.add(f"{creature_type} Gain Indestructible")
if 'ward' in matched_text:
tags.add(f"{creature_type} Gain Ward")
if 'protection' in matched_text:
tags.add(f"{creature_type} Gain Protection")
# Don't break - a card could grant to multiple creature types
return tags
def is_opponent_grant(text: str) -> bool:
"""
Check if card grants protection to opponent's permanents ONLY.
Returns True if this grants ONLY to opponents (should be excluded from Protection tag).
Does NOT exclude blanket effects like "all creatures gain hexproof" which help you too.
"""
if not text:
return False
text_lower = text.lower()
# Remove reminder text (in parentheses) to avoid false positives
# Reminder text often mentions "opponents control" for hexproof/shroud explanations
text_no_reminder = tag_utils.strip_reminder_text(text_lower)
for pattern in OPPONENT_GRANT_PATTERNS:
match = pattern.search(text_no_reminder)
if match:
# Must be in context of granting protection
if any(prot in text_lower for prot in ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']):
context = tag_utils.extract_context_window(
text_no_reminder, match.start(), match.end(),
window_size=CONTEXT_WINDOW_SIZE, include_before=True
)
# If "you control" appears in the context, it's limiting to YOUR permanents, not opponents
if 'you control' not in context:
return True
return False
def has_conditional_self_grant(text: str) -> bool:
"""
Check if card has any conditional self-grant patterns.
This does NOT check if it ALSO grants to others.
"""
if not text:
return False
text_lower = text.lower()
for pattern in CONDITIONAL_SELF_GRANT_PATTERNS:
if pattern.search(text_lower):
return True
return False
def is_conditional_self_grant(text: str) -> bool:
"""
Check if card only conditionally grants protection to itself.
Examples:
- "{B}, Discard a card: This creature gains hexproof until end of turn."
- "Whenever you cast a noncreature spell, untap this creature. It gains protection..."
- "Whenever this creature attacks, it gains indestructible until end of turn."
These should be excluded as they don't provide protection to OTHER permanents.
"""
if not text:
return False
text_lower = text.lower()
found_conditional_self = has_conditional_self_grant(text)
if not found_conditional_self:
return False
# If we found a conditional self-grant, check if there's ALSO a grant to others
other_grant_patterns = [
rgx.OTHER_CREATURES,
re.compile(r'creatures you control (have|gain)', re.IGNORECASE),
re.compile(r'target (creature|permanent) you control gains', re.IGNORECASE),
re.compile(r'another target (creature|permanent)', re.IGNORECASE),
re.compile(r'equipped creature (has|gains)', re.IGNORECASE),
re.compile(r'enchanted creature (has|gains)', re.IGNORECASE),
re.compile(r'target legendary', re.IGNORECASE),
re.compile(r'permanents you control gain', re.IGNORECASE),
]
has_other_grant = any(pattern.search(text_lower) for pattern in other_grant_patterns)
# Return True only if it's ONLY conditional self-grants (no other grants)
return not has_other_grant
def _should_exclude_token_creation(text_lower: str) -> bool:
"""Check if card only creates tokens with protection (not granting to existing permanents).
Args:
text_lower: Lowercased card text
Returns:
True if card only creates tokens, False if it also grants
"""
token_with_protection = re.compile(r'create.*token.*with.*(hexproof|shroud|indestructible|ward|protection)', re.IGNORECASE)
if token_with_protection.search(text_lower):
has_grant_to_others = any(pattern.search(text_lower) for pattern in MASS_GRANT_PATTERNS)
return not has_grant_to_others
return False
def _should_exclude_kindred_only(text: str, text_lower: str, exclude_kindred: bool) -> bool:
"""Check if card only grants to specific kindred types.
Args:
text: Original card text
text_lower: Lowercased card text
exclude_kindred: Whether to exclude kindred-specific grants
Returns:
True if card only has kindred grants, False if it has broad grants
"""
if not exclude_kindred:
return False
kindred_tags = get_kindred_protection_tags(text)
if not kindred_tags:
return False
broad_only_patterns = [
re.compile(r'\bcreatures you control (have|gain)\b(?!.*(knight|merfolk|zombie|elf|dragon|goblin|sliver))', re.IGNORECASE),
re.compile(r'\bpermanents you control (have|gain)\b', re.IGNORECASE),
re.compile(r'\beach (creature|permanent) you control', re.IGNORECASE),
re.compile(r'\ball (creatures?|permanents?)', re.IGNORECASE),
]
has_broad_grant = any(pattern.search(text_lower) for pattern in broad_only_patterns)
return not has_broad_grant
def _check_pattern_grants(text_lower: str, pattern_list: List[Pattern]) -> bool:
"""Check if text contains protection grants matching pattern list.
Args:
text_lower: Lowercased card text
pattern_list: List of grant patterns to check
Returns:
True if protection grant found, False otherwise
"""
for pattern in pattern_list:
match = pattern.search(text_lower)
if match:
context = tag_utils.extract_context_window(text_lower, match.start(), match.end())
if any(prot in context for prot in PROTECTION_KEYWORDS):
return True
return False
def _has_inherent_protection_only(text_lower: str, keywords: str, found_grant: bool) -> bool:
"""Check if card only has inherent protection without granting.
Args:
text_lower: Lowercased card text
keywords: Card keywords
found_grant: Whether a grant pattern was found
Returns:
True if card only has inherent protection, False otherwise
"""
if not keywords:
return False
keywords_lower = keywords.lower()
has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS)
if not has_inherent or found_grant:
return False
stat_only_pattern = re.compile(r'(get[s]?|gain[s]?)\s+[+\-][0-9X]+/[+\-][0-9X]+', re.IGNORECASE)
has_stat_only = bool(stat_only_pattern.search(text_lower))
mentions_other_without_prot = False
if 'other' in text_lower:
other_idx = text_lower.find('other')
remaining_text = text_lower[other_idx:]
mentions_other_without_prot = not any(prot in remaining_text for prot in PROTECTION_KEYWORDS)
return has_stat_only or mentions_other_without_prot
def is_granting_protection(text: str, keywords: str, exclude_kindred: bool = False) -> bool:
"""
Determine if a card grants protection effects to other permanents.
Returns True if the card gives/grants protection to other cards unconditionally.
Returns False if:
- Card only has inherent protection
- Card only conditionally grants to itself
- Card grants to opponent's permanents
- Card grants only to specific kindred types (when exclude_kindred=True)
- Card creates tokens with protection (not granting to existing permanents)
- Card only modifies non-protection stats of other permanents
Args:
text: Card text to analyze
keywords: Card keywords (comma-separated)
exclude_kindred: If True, exclude kindred-specific grants
Returns:
True if card grants broad protection, False otherwise
"""
if not text:
return False
text_lower = text.lower()
# Early exclusion checks
if is_opponent_grant(text):
return False
if is_conditional_self_grant(text):
return False
if any(pattern.search(text_lower) for pattern in EXCLUSION_PATTERNS):
return False
if _should_exclude_token_creation(text_lower):
return False
if _should_exclude_kindred_only(text, text_lower, exclude_kindred):
return False
found_grant = False
if _check_pattern_grants(text_lower, BLANKET_GRANT_PATTERNS):
found_grant = True
elif _check_pattern_grants(text_lower, MASS_GRANT_PATTERNS):
found_grant = True
elif _check_pattern_grants(text_lower, TARGETED_GRANT_PATTERNS):
found_grant = True
elif any(pattern.search(text_lower) for pattern in GRANT_VERB_PATTERNS):
found_grant = True
if _has_inherent_protection_only(text_lower, keywords, found_grant):
return False
return found_grant
def categorize_protection_card(name: str, text: str, keywords: str, card_type: str, exclude_kindred: bool = False) -> str:
"""
Categorize a Protection-tagged card for audit purposes.
Args:
name: Card name
text: Card text
keywords: Card keywords
card_type: Card type line
exclude_kindred: If True, kindred-specific grants are categorized as metadata, not Grant
Returns:
'Grant' - gives broad protection to others
'Kindred' - gives kindred-specific protection (metadata tag)
'Inherent' - has protection itself
'ConditionalSelf' - only conditionally grants to itself
'Opponent' - grants to opponent's permanents
'Neither' - false positive
"""
keywords_lower = keywords.lower() if keywords else ''
if is_opponent_grant(text):
return 'Opponent'
if is_conditional_self_grant(text):
return 'ConditionalSelf'
has_cond_self = has_conditional_self_grant(text)
has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS)
kindred_tags = get_kindred_protection_tags(text)
if kindred_tags and exclude_kindred:
grants_broad = is_granting_protection(text, keywords, exclude_kindred=True)
if grants_broad and has_inherent:
# Has inherent + kindred + broad grants
return 'Mixed'
elif grants_broad:
# Has kindred + broad grants (but no inherent)
# This is just Grant with kindred metadata tags
return 'Grant'
elif has_inherent:
# Has inherent + kindred only (not broad)
# This is still just Kindred category (inherent is separate from granting)
return 'Kindred'
else:
# Only kindred grants, no inherent or broad
return 'Kindred'
grants_protection = is_granting_protection(text, keywords, exclude_kindred=exclude_kindred)
# Categorize based on what it does
if grants_protection and has_cond_self:
# Has conditional self-grant + grants to others = Mixed
return 'Mixed'
elif grants_protection and has_inherent:
return 'Mixed' # Has inherent + grants broadly
elif grants_protection:
return 'Grant' # Only grants broadly
elif has_inherent:
return 'Inherent' # Only has inherent
else:
return 'Neither' # False positive

View file

@ -0,0 +1,169 @@
"""
Protection Scope Detection Module
Detects the scope of protection effects (Self, Your Permanents, Blanket, Opponent Permanents)
to enable intelligent filtering in deck building.
Part of M5: Protection Effect Granularity milestone.
Refactored in M2: Create Scope Detection Utilities to use generic scope detection.
"""
# Standard library imports
import re
from typing import Optional, Set
# Local application imports
from code.logging_util import get_logger
from . import scope_detection_utils as scope_utils
from .tag_constants import PROTECTION_ABILITIES
logger = get_logger(__name__)
# Protection scope pattern definitions
def _get_protection_scope_patterns(ability: str) -> scope_utils.ScopePatterns:
"""
Build scope patterns for protection abilities.
Args:
ability: Ability keyword (e.g., "hexproof", "ward")
Returns:
ScopePatterns object with compiled patterns
"""
ability_lower = ability.lower()
# Opponent patterns: grants protection TO opponent's permanents
# Note: Must distinguish from hexproof reminder text "opponents control [spells/abilities]"
opponent_patterns = [
re.compile(r'creatures?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', re.IGNORECASE),
re.compile(r'permanents?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', re.IGNORECASE),
re.compile(r'each\s+creature\s+an?\s+opponent\s+controls?\s+(?:has|gains?)', re.IGNORECASE),
]
# Self-reference patterns
self_patterns = [
# Tilde (~) - strong self-reference indicator
re.compile(r'~\s+(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
re.compile(r'~\s+is\s+' + ability_lower, re.IGNORECASE),
# "this creature/permanent" pronouns
re.compile(r'this\s+(?:creature|permanent|artifact|enchantment)\s+(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
# Starts with ability (likely self)
re.compile(r'^(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
]
# Your permanents patterns
your_patterns = [
re.compile(r'(?:other\s+)?(?:creatures?|permanents?|artifacts?|enchantments?)\s+you\s+control', re.IGNORECASE),
re.compile(r'your\s+(?:creatures?|permanents?|artifacts?|enchantments?)', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+you\s+control', re.IGNORECASE),
re.compile(r'other\s+\w+s?\s+you\s+control', re.IGNORECASE), # "Other Merfolk you control", etc.
# "Other X you control...have Y" pattern for static grants
re.compile(r'other\s+(?:\w+\s+)?(?:creatures?|permanents?)\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower, re.IGNORECASE),
re.compile(r'other\s+\w+s?\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower, re.IGNORECASE), # "Other Knights you control...have"
re.compile(r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE), # Equipment
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE), # Aura
re.compile(r'target\s+(?:\w+\s+)?(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:gains?)\s+' + ability_lower, re.IGNORECASE), # Target
]
# Blanket patterns (no ownership qualifier)
# Note: Abilities can be listed with "and" (e.g., "gain hexproof and indestructible")
blanket_patterns = [
re.compile(r'all\s+(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
re.compile(r'(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
]
return scope_utils.ScopePatterns(
opponent=opponent_patterns,
self_ref=self_patterns,
your_permanents=your_patterns,
blanket=blanket_patterns
)
def detect_protection_scope(text: str, card_name: str, ability: str, keywords: Optional[str] = None) -> Optional[str]:
"""
Detect the scope of a protection effect.
Detection priority order (prevents misclassification):
0. Static keyword "Self"
1. Opponent ownership "Opponent Permanents"
2. Self-reference "Self"
3. Your ownership "Your Permanents"
4. No ownership qualifier "Blanket"
Args:
text: Card text (lowercase for pattern matching)
card_name: Card name (for self-reference detection)
ability: Ability type (Ward, Hexproof, etc.)
keywords: Optional keywords field for static keyword detection
Returns:
Scope prefix or None: "Self", "Your Permanents", "Blanket", "Opponent Permanents"
"""
if not text or not ability:
return None
# Build patterns for this ability
patterns = _get_protection_scope_patterns(ability)
# Use generic scope detection with grant verb checking AND keywords
return scope_utils.detect_scope(
text=text,
card_name=card_name,
ability_keyword=ability,
patterns=patterns,
allow_multiple=False,
check_grant_verbs=True,
keywords=keywords
)
def get_protection_scope_tags(text: str, card_name: str, keywords: Optional[str] = None) -> Set[str]:
"""
Get all protection scope metadata tags for a card.
A card can have multiple protection scopes (e.g., self-hexproof + grants ward to others).
Args:
text: Card text
card_name: Card name
keywords: Optional keywords field for static keyword detection
Returns:
Set of metadata tags like {"Self: Indestructible", "Your Permanents: Ward"}
"""
if not text or not card_name:
return set()
scope_tags = set()
# Check each protection ability
for ability in PROTECTION_ABILITIES:
scope = detect_protection_scope(text, card_name, ability, keywords)
if scope:
# Format: "{Scope}: {Ability}"
tag = f"{scope}: {ability}"
scope_tags.add(tag)
logger.debug(f"Card '{card_name}': detected scope tag '{tag}'")
return scope_tags
def has_any_protection(text: str) -> bool:
"""
Quick check if card text contains any protection keywords.
Args:
text: Card text
Returns:
True if any protection keyword found
"""
if not text:
return False
text_lower = text.lower()
return any(ability.lower() in text_lower for ability in PROTECTION_ABILITIES)

View file

@ -0,0 +1,455 @@
"""
Centralized regex patterns for MTG card tagging.
All patterns compiled with re.IGNORECASE for case-insensitive matching.
Organized by semantic category for maintainability and reusability.
Usage:
from code.tagging import regex_patterns as rgx
mask = df['text'].str.contains(rgx.YOU_CONTROL, na=False)
if rgx.GRANT_HEXPROOF.search(text):
...
# Or use builder functions
pattern = rgx.ownership_pattern('creature', 'you')
mask = df['text'].str.contains(pattern, na=False)
"""
import re
from typing import Pattern, List
# =============================================================================
# OWNERSHIP & CONTROLLER PATTERNS
# =============================================================================
YOU_CONTROL: Pattern = re.compile(r'you control', re.IGNORECASE)
THEY_CONTROL: Pattern = re.compile(r'they control', re.IGNORECASE)
OPPONENT_CONTROL: Pattern = re.compile(r'opponent[s]? control', re.IGNORECASE)
CREATURE_YOU_CONTROL: Pattern = re.compile(r'creature[s]? you control', re.IGNORECASE)
PERMANENT_YOU_CONTROL: Pattern = re.compile(r'permanent[s]? you control', re.IGNORECASE)
ARTIFACT_YOU_CONTROL: Pattern = re.compile(r'artifact[s]? you control', re.IGNORECASE)
ENCHANTMENT_YOU_CONTROL: Pattern = re.compile(r'enchantment[s]? you control', re.IGNORECASE)
# =============================================================================
# GRANT VERB PATTERNS
# =============================================================================
GAIN: Pattern = re.compile(r'\bgain[s]?\b', re.IGNORECASE)
HAS: Pattern = re.compile(r'\bhas\b', re.IGNORECASE)
HAVE: Pattern = re.compile(r'\bhave\b', re.IGNORECASE)
GET: Pattern = re.compile(r'\bget[s]?\b', re.IGNORECASE)
GRANT_VERBS: List[str] = ['gain', 'gains', 'has', 'have', 'get', 'gets']
# =============================================================================
# TARGETING PATTERNS
# =============================================================================
TARGET_PLAYER: Pattern = re.compile(r'target player', re.IGNORECASE)
TARGET_OPPONENT: Pattern = re.compile(r'target opponent', re.IGNORECASE)
TARGET_CREATURE: Pattern = re.compile(r'target creature', re.IGNORECASE)
TARGET_PERMANENT: Pattern = re.compile(r'target permanent', re.IGNORECASE)
TARGET_ARTIFACT: Pattern = re.compile(r'target artifact', re.IGNORECASE)
TARGET_ENCHANTMENT: Pattern = re.compile(r'target enchantment', re.IGNORECASE)
EACH_PLAYER: Pattern = re.compile(r'each player', re.IGNORECASE)
EACH_OPPONENT: Pattern = re.compile(r'each opponent', re.IGNORECASE)
TARGET_YOU_CONTROL: Pattern = re.compile(r'target .* you control', re.IGNORECASE)
# =============================================================================
# PROTECTION ABILITY PATTERNS
# =============================================================================
HEXPROOF: Pattern = re.compile(r'\bhexproof\b', re.IGNORECASE)
SHROUD: Pattern = re.compile(r'\bshroud\b', re.IGNORECASE)
INDESTRUCTIBLE: Pattern = re.compile(r'\bindestructible\b', re.IGNORECASE)
WARD: Pattern = re.compile(r'\bward\b', re.IGNORECASE)
PROTECTION_FROM: Pattern = re.compile(r'protection from', re.IGNORECASE)
PROTECTION_ABILITIES: List[str] = ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']
CANT_HAVE_PROTECTION: Pattern = re.compile(r"can't have (hexproof|indestructible|ward|shroud)", re.IGNORECASE)
LOSE_PROTECTION: Pattern = re.compile(r"lose[s]? (hexproof|indestructible|ward|shroud|protection)", re.IGNORECASE)
# =============================================================================
# CARD DRAW PATTERNS
# =============================================================================
DRAW_A_CARD: Pattern = re.compile(r'draw[s]? (?:a|one) card', re.IGNORECASE)
DRAW_CARDS: Pattern = re.compile(r'draw[s]? (?:two|three|four|five|x|\d+) card', re.IGNORECASE)
DRAW: Pattern = re.compile(r'\bdraw[s]?\b', re.IGNORECASE)
# =============================================================================
# TOKEN CREATION PATTERNS
# =============================================================================
CREATE_TOKEN: Pattern = re.compile(r'create[s]?.*token', re.IGNORECASE)
PUT_TOKEN: Pattern = re.compile(r'put[s]?.*token', re.IGNORECASE)
CREATE_TREASURE: Pattern = re.compile(r'create.*treasure token', re.IGNORECASE)
CREATE_FOOD: Pattern = re.compile(r'create.*food token', re.IGNORECASE)
CREATE_CLUE: Pattern = re.compile(r'create.*clue token', re.IGNORECASE)
CREATE_BLOOD: Pattern = re.compile(r'create.*blood token', re.IGNORECASE)
# =============================================================================
# COUNTER PATTERNS
# =============================================================================
PLUS_ONE_COUNTER: Pattern = re.compile(r'\+1/\+1 counter', re.IGNORECASE)
MINUS_ONE_COUNTER: Pattern = re.compile(r'\-1/\-1 counter', re.IGNORECASE)
LOYALTY_COUNTER: Pattern = re.compile(r'loyalty counter', re.IGNORECASE)
PROLIFERATE: Pattern = re.compile(r'\bproliferate\b', re.IGNORECASE)
ONE_OR_MORE_COUNTERS: Pattern = re.compile(r'one or more counter', re.IGNORECASE)
ONE_OR_MORE_PLUS_ONE_COUNTERS: Pattern = re.compile(r'one or more \+1/\+1 counter', re.IGNORECASE)
IF_HAD_COUNTERS: Pattern = re.compile(r'if it had counter', re.IGNORECASE)
WITH_COUNTERS_ON_THEM: Pattern = re.compile(r'with counter[s]? on them', re.IGNORECASE)
# =============================================================================
# SACRIFICE & REMOVAL PATTERNS
# =============================================================================
SACRIFICE: Pattern = re.compile(r'sacrifice[s]?', re.IGNORECASE)
SACRIFICED: Pattern = re.compile(r'sacrificed', re.IGNORECASE)
DESTROY: Pattern = re.compile(r'destroy[s]?', re.IGNORECASE)
EXILE: Pattern = re.compile(r'exile[s]?', re.IGNORECASE)
EXILED: Pattern = re.compile(r'exiled', re.IGNORECASE)
SACRIFICE_DRAW: Pattern = re.compile(r'sacrifice (?:a|an) (?:artifact|creature|permanent)(?:[^,]*),?[^,]*draw', re.IGNORECASE)
SACRIFICE_COLON_DRAW: Pattern = re.compile(r'sacrifice [^:]+: draw', re.IGNORECASE)
SACRIFICED_COMMA_DRAW: Pattern = re.compile(r'sacrificed[^,]+, draw', re.IGNORECASE)
EXILE_RETURN_BATTLEFIELD: Pattern = re.compile(r'exile.*return.*to the battlefield', re.IGNORECASE)
# =============================================================================
# DISCARD PATTERNS
# =============================================================================
DISCARD_A_CARD: Pattern = re.compile(r'discard (?:a|one|two|three|x) card', re.IGNORECASE)
DISCARD_YOUR_HAND: Pattern = re.compile(r'discard your hand', re.IGNORECASE)
YOU_DISCARD: Pattern = re.compile(r'you discard', re.IGNORECASE)
# Discard triggers
WHENEVER_YOU_DISCARD: Pattern = re.compile(r'whenever you discard', re.IGNORECASE)
IF_YOU_DISCARDED: Pattern = re.compile(r'if you discarded', re.IGNORECASE)
WHEN_YOU_DISCARD: Pattern = re.compile(r'when you discard', re.IGNORECASE)
FOR_EACH_DISCARDED: Pattern = re.compile(r'for each card you discarded', re.IGNORECASE)
# Opponent discard
TARGET_PLAYER_DISCARDS: Pattern = re.compile(r'target player discards', re.IGNORECASE)
TARGET_OPPONENT_DISCARDS: Pattern = re.compile(r'target opponent discards', re.IGNORECASE)
EACH_PLAYER_DISCARDS: Pattern = re.compile(r'each player discards', re.IGNORECASE)
EACH_OPPONENT_DISCARDS: Pattern = re.compile(r'each opponent discards', re.IGNORECASE)
THAT_PLAYER_DISCARDS: Pattern = re.compile(r'that player discards', re.IGNORECASE)
# Discard cost
ADDITIONAL_COST_DISCARD: Pattern = re.compile(r'as an additional cost to (?:cast this spell|activate this ability),? discard (?:a|one) card', re.IGNORECASE)
ADDITIONAL_COST_DISCARD_SHORT: Pattern = re.compile(r'as an additional cost,? discard (?:a|one) card', re.IGNORECASE)
MADNESS: Pattern = re.compile(r'\bmadness\b', re.IGNORECASE)
# =============================================================================
# DAMAGE & LIFE LOSS PATTERNS
# =============================================================================
DEALS_ONE_DAMAGE: Pattern = re.compile(r'deals\s+1\s+damage', re.IGNORECASE)
EXACTLY_ONE_DAMAGE: Pattern = re.compile(r'exactly\s+1\s+damage', re.IGNORECASE)
LOSES_ONE_LIFE: Pattern = re.compile(r'loses\s+1\s+life', re.IGNORECASE)
# =============================================================================
# COST REDUCTION PATTERNS
# =============================================================================
COST_LESS: Pattern = re.compile(r'cost[s]? \{[\d\w]\} less', re.IGNORECASE)
COST_LESS_TO_CAST: Pattern = re.compile(r'cost[s]? less to cast', re.IGNORECASE)
WITH_X_IN_COST: Pattern = re.compile(r'with \{[xX]\} in (?:its|their)', re.IGNORECASE)
AFFINITY_FOR: Pattern = re.compile(r'affinity for', re.IGNORECASE)
SPELLS_COST: Pattern = re.compile(r'spells cost', re.IGNORECASE)
SPELLS_YOU_CAST_COST: Pattern = re.compile(r'spells you cast cost', re.IGNORECASE)
# =============================================================================
# MONARCH & INITIATIVE PATTERNS
# =============================================================================
BECOME_MONARCH: Pattern = re.compile(r'becomes? the monarch', re.IGNORECASE)
IS_MONARCH: Pattern = re.compile(r'is the monarch', re.IGNORECASE)
WAS_MONARCH: Pattern = re.compile(r'was the monarch', re.IGNORECASE)
YOU_ARE_MONARCH: Pattern = re.compile(r"you are the monarch|you're the monarch", re.IGNORECASE)
YOU_BECOME_MONARCH: Pattern = re.compile(r'you become the monarch', re.IGNORECASE)
CANT_BECOME_MONARCH: Pattern = re.compile(r"can't become the monarch", re.IGNORECASE)
# =============================================================================
# KEYWORD ABILITY PATTERNS
# =============================================================================
PARTNER_BASIC: Pattern = re.compile(r'\bpartner\b(?!\s*(?:with|[-—–]))', re.IGNORECASE)
PARTNER_WITH: Pattern = re.compile(r'partner with', re.IGNORECASE)
PARTNER_SURVIVORS: Pattern = re.compile(r'Partner\s*[-—–]\s*Survivors', re.IGNORECASE)
PARTNER_FATHER_SON: Pattern = re.compile(r'Partner\s*[-—–]\s*Father\s*&\s*Son', re.IGNORECASE)
FLYING: Pattern = re.compile(r'\bflying\b', re.IGNORECASE)
VIGILANCE: Pattern = re.compile(r'\bvigilance\b', re.IGNORECASE)
TRAMPLE: Pattern = re.compile(r'\btrample\b', re.IGNORECASE)
HASTE: Pattern = re.compile(r'\bhaste\b', re.IGNORECASE)
LIFELINK: Pattern = re.compile(r'\blifelink\b', re.IGNORECASE)
DEATHTOUCH: Pattern = re.compile(r'\bdeathtouch\b', re.IGNORECASE)
DOUBLE_STRIKE: Pattern = re.compile(r'double strike', re.IGNORECASE)
FIRST_STRIKE: Pattern = re.compile(r'first strike', re.IGNORECASE)
MENACE: Pattern = re.compile(r'\bmenace\b', re.IGNORECASE)
REACH: Pattern = re.compile(r'\breach\b', re.IGNORECASE)
UNDYING: Pattern = re.compile(r'\bundying\b', re.IGNORECASE)
PERSIST: Pattern = re.compile(r'\bpersist\b', re.IGNORECASE)
PHASING: Pattern = re.compile(r'\bphasing\b', re.IGNORECASE)
FLASH: Pattern = re.compile(r'\bflash\b', re.IGNORECASE)
TOXIC: Pattern = re.compile(r'toxic\s*\d+', re.IGNORECASE)
# =============================================================================
# RETURN TO BATTLEFIELD PATTERNS
# =============================================================================
RETURN_TO_BATTLEFIELD: Pattern = re.compile(r'return.*to the battlefield', re.IGNORECASE)
RETURN_IT_TO_BATTLEFIELD: Pattern = re.compile(r'return it to the battlefield', re.IGNORECASE)
RETURN_THAT_CARD_TO_BATTLEFIELD: Pattern = re.compile(r'return that card to the battlefield', re.IGNORECASE)
RETURN_THEM_TO_BATTLEFIELD: Pattern = re.compile(r'return them to the battlefield', re.IGNORECASE)
RETURN_THOSE_CARDS_TO_BATTLEFIELD: Pattern = re.compile(r'return those cards to the battlefield', re.IGNORECASE)
RETURN_TO_HAND: Pattern = re.compile(r'return.*to.*hand', re.IGNORECASE)
RETURN_YOU_CONTROL_TO_HAND: Pattern = re.compile(r'return target.*you control.*to.*hand', re.IGNORECASE)
# =============================================================================
# SCOPE & QUALIFIER PATTERNS
# =============================================================================
OTHER_CREATURES: Pattern = re.compile(r'other creature[s]?', re.IGNORECASE)
ALL_CREATURES: Pattern = re.compile(r'\ball creature[s]?\b', re.IGNORECASE)
ALL_PERMANENTS: Pattern = re.compile(r'\ball permanent[s]?\b', re.IGNORECASE)
ALL_SLIVERS: Pattern = re.compile(r'\ball sliver[s]?\b', re.IGNORECASE)
EQUIPPED_CREATURE: Pattern = re.compile(r'equipped creature', re.IGNORECASE)
ENCHANTED_CREATURE: Pattern = re.compile(r'enchanted creature', re.IGNORECASE)
ENCHANTED_PERMANENT: Pattern = re.compile(r'enchanted permanent', re.IGNORECASE)
ENCHANTED_ENCHANTMENT: Pattern = re.compile(r'enchanted enchantment', re.IGNORECASE)
# =============================================================================
# COMBAT PATTERNS
# =============================================================================
ATTACK: Pattern = re.compile(r'\battack[s]?\b', re.IGNORECASE)
ATTACKS: Pattern = re.compile(r'\battacks\b', re.IGNORECASE)
BLOCK: Pattern = re.compile(r'\bblock[s]?\b', re.IGNORECASE)
BLOCKS: Pattern = re.compile(r'\bblocks\b', re.IGNORECASE)
COMBAT_DAMAGE: Pattern = re.compile(r'combat damage', re.IGNORECASE)
WHENEVER_ATTACKS: Pattern = re.compile(r'whenever .* attacks', re.IGNORECASE)
WHEN_ATTACKS: Pattern = re.compile(r'when .* attacks', re.IGNORECASE)
# =============================================================================
# TYPE LINE PATTERNS
# =============================================================================
INSTANT: Pattern = re.compile(r'\bInstant\b', re.IGNORECASE)
SORCERY: Pattern = re.compile(r'\bSorcery\b', re.IGNORECASE)
ARTIFACT: Pattern = re.compile(r'\bArtifact\b', re.IGNORECASE)
ENCHANTMENT: Pattern = re.compile(r'\bEnchantment\b', re.IGNORECASE)
CREATURE: Pattern = re.compile(r'\bCreature\b', re.IGNORECASE)
PLANESWALKER: Pattern = re.compile(r'\bPlaneswalker\b', re.IGNORECASE)
LAND: Pattern = re.compile(r'\bLand\b', re.IGNORECASE)
AURA: Pattern = re.compile(r'\bAura\b', re.IGNORECASE)
EQUIPMENT: Pattern = re.compile(r'\bEquipment\b', re.IGNORECASE)
VEHICLE: Pattern = re.compile(r'\bVehicle\b', re.IGNORECASE)
SAGA: Pattern = re.compile(r'\bSaga\b', re.IGNORECASE)
NONCREATURE: Pattern = re.compile(r'noncreature', re.IGNORECASE)
# =============================================================================
# PATTERN BUILDER FUNCTIONS
# =============================================================================
def ownership_pattern(subject: str, owner: str = "you") -> Pattern:
"""
Build ownership pattern like 'creatures you control', 'permanents opponent controls'.
Args:
subject: The card type (e.g., 'creature', 'permanent', 'artifact')
owner: Controller ('you', 'opponent', 'they', etc.)
Returns:
Compiled regex pattern
Examples:
>>> ownership_pattern('creature', 'you')
# Matches "creatures you control"
>>> ownership_pattern('artifact', 'opponent')
# Matches "artifacts opponent controls"
"""
pattern = fr'{subject}[s]?\s+{owner}\s+control[s]?'
return re.compile(pattern, re.IGNORECASE)
def grant_pattern(subject: str, verb: str, ability: str) -> Pattern:
"""
Build grant pattern like 'creatures you control gain hexproof'.
Args:
subject: What gains the ability ('creatures you control', 'target creature', etc.)
verb: Grant verb ('gain', 'has', 'get', etc.)
ability: Ability granted ('hexproof', 'flying', 'ward', etc.)
Returns:
Compiled regex pattern
Examples:
>>> grant_pattern('creatures you control', 'gain', 'hexproof')
# Matches "creatures you control gain hexproof"
"""
pattern = fr'{subject}\s+{verb}[s]?\s+{ability}'
return re.compile(pattern, re.IGNORECASE)
def token_creation_pattern(quantity: str, token_type: str) -> Pattern:
"""
Build token creation pattern like 'create two 1/1 Soldier tokens'.
Args:
quantity: Number word or variable ('one', 'two', 'x', etc.)
token_type: Token name ('treasure', 'food', 'soldier', etc.)
Returns:
Compiled regex pattern
Examples:
>>> token_creation_pattern('two', 'treasure')
# Matches "create two Treasure tokens"
"""
pattern = fr'create[s]?\s+(?:{quantity})\s+.*{token_type}\s+token'
return re.compile(pattern, re.IGNORECASE)
def kindred_grant_pattern(tribe: str, ability: str) -> Pattern:
"""
Build kindred grant pattern like 'knights you control gain protection'.
Args:
tribe: Creature type ('knight', 'elf', 'zombie', etc.)
ability: Ability granted ('hexproof', 'protection', etc.)
Returns:
Compiled regex pattern
Examples:
>>> kindred_grant_pattern('knight', 'hexproof')
# Matches "Knights you control gain hexproof"
"""
pattern = fr'{tribe}[s]?\s+you\s+control.*\b{ability}\b'
return re.compile(pattern, re.IGNORECASE)
def targeting_pattern(target: str, subject: str = None) -> Pattern:
"""
Build targeting pattern like 'target creature you control'.
Args:
target: What is targeted ('player', 'opponent', 'creature', etc.)
subject: Optional qualifier ('you control', 'opponent controls', etc.)
Returns:
Compiled regex pattern
Examples:
>>> targeting_pattern('creature', 'you control')
# Matches "target creature you control"
>>> targeting_pattern('opponent')
# Matches "target opponent"
"""
if subject:
pattern = fr'target\s+{target}\s+{subject}'
else:
pattern = fr'target\s+{target}'
return re.compile(pattern, re.IGNORECASE)
# =============================================================================
# MODULE EXPORTS
# =============================================================================
__all__ = [
# Ownership
'YOU_CONTROL', 'THEY_CONTROL', 'OPPONENT_CONTROL',
'CREATURE_YOU_CONTROL', 'PERMANENT_YOU_CONTROL', 'ARTIFACT_YOU_CONTROL',
'ENCHANTMENT_YOU_CONTROL',
# Grant verbs
'GAIN', 'HAS', 'HAVE', 'GET', 'GRANT_VERBS',
# Targeting
'TARGET_PLAYER', 'TARGET_OPPONENT', 'TARGET_CREATURE', 'TARGET_PERMANENT',
'TARGET_ARTIFACT', 'TARGET_ENCHANTMENT', 'EACH_PLAYER', 'EACH_OPPONENT',
'TARGET_YOU_CONTROL',
# Protection abilities
'HEXPROOF', 'SHROUD', 'INDESTRUCTIBLE', 'WARD', 'PROTECTION_FROM',
'PROTECTION_ABILITIES', 'CANT_HAVE_PROTECTION', 'LOSE_PROTECTION',
# Draw
'DRAW_A_CARD', 'DRAW_CARDS', 'DRAW',
# Tokens
'CREATE_TOKEN', 'PUT_TOKEN',
'CREATE_TREASURE', 'CREATE_FOOD', 'CREATE_CLUE', 'CREATE_BLOOD',
# Counters
'PLUS_ONE_COUNTER', 'MINUS_ONE_COUNTER', 'LOYALTY_COUNTER', 'PROLIFERATE',
'ONE_OR_MORE_COUNTERS', 'ONE_OR_MORE_PLUS_ONE_COUNTERS', 'IF_HAD_COUNTERS', 'WITH_COUNTERS_ON_THEM',
# Removal
'SACRIFICE', 'SACRIFICED', 'DESTROY', 'EXILE', 'EXILED',
'SACRIFICE_DRAW', 'SACRIFICE_COLON_DRAW', 'SACRIFICED_COMMA_DRAW',
'EXILE_RETURN_BATTLEFIELD',
# Discard
'DISCARD_A_CARD', 'DISCARD_YOUR_HAND', 'YOU_DISCARD',
'WHENEVER_YOU_DISCARD', 'IF_YOU_DISCARDED', 'WHEN_YOU_DISCARD', 'FOR_EACH_DISCARDED',
'TARGET_PLAYER_DISCARDS', 'TARGET_OPPONENT_DISCARDS', 'EACH_PLAYER_DISCARDS',
'EACH_OPPONENT_DISCARDS', 'THAT_PLAYER_DISCARDS',
'ADDITIONAL_COST_DISCARD', 'ADDITIONAL_COST_DISCARD_SHORT', 'MADNESS',
# Damage & Life Loss
'DEALS_ONE_DAMAGE', 'EXACTLY_ONE_DAMAGE', 'LOSES_ONE_LIFE',
# Cost reduction
'COST_LESS', 'COST_LESS_TO_CAST', 'WITH_X_IN_COST', 'AFFINITY_FOR', 'SPELLS_COST', 'SPELLS_YOU_CAST_COST',
# Monarch
'BECOME_MONARCH', 'IS_MONARCH', 'WAS_MONARCH', 'YOU_ARE_MONARCH',
'YOU_BECOME_MONARCH', 'CANT_BECOME_MONARCH',
# Keywords
'PARTNER_BASIC', 'PARTNER_WITH', 'PARTNER_SURVIVORS', 'PARTNER_FATHER_SON',
'FLYING', 'VIGILANCE', 'TRAMPLE', 'HASTE', 'LIFELINK', 'DEATHTOUCH',
'DOUBLE_STRIKE', 'FIRST_STRIKE', 'MENACE', 'REACH',
'UNDYING', 'PERSIST', 'PHASING', 'FLASH', 'TOXIC',
# Return
'RETURN_TO_BATTLEFIELD', 'RETURN_IT_TO_BATTLEFIELD', 'RETURN_THAT_CARD_TO_BATTLEFIELD',
'RETURN_THEM_TO_BATTLEFIELD', 'RETURN_THOSE_CARDS_TO_BATTLEFIELD',
'RETURN_TO_HAND', 'RETURN_YOU_CONTROL_TO_HAND',
# Scope
'OTHER_CREATURES', 'ALL_CREATURES', 'ALL_PERMANENTS', 'ALL_SLIVERS',
'EQUIPPED_CREATURE', 'ENCHANTED_CREATURE', 'ENCHANTED_PERMANENT', 'ENCHANTED_ENCHANTMENT',
# Combat
'ATTACK', 'ATTACKS', 'BLOCK', 'BLOCKS', 'COMBAT_DAMAGE',
'WHENEVER_ATTACKS', 'WHEN_ATTACKS',
# Type line
'INSTANT', 'SORCERY', 'ARTIFACT', 'ENCHANTMENT', 'CREATURE', 'PLANESWALKER', 'LAND',
'AURA', 'EQUIPMENT', 'VEHICLE', 'SAGA', 'NONCREATURE',
# Builders
'ownership_pattern', 'grant_pattern', 'token_creation_pattern',
'kindred_grant_pattern', 'targeting_pattern',
]

View file

@ -0,0 +1,420 @@
"""
Scope Detection Utilities
Generic utilities for detecting the scope of card abilities (protection, phasing, etc.).
Provides reusable pattern-matching logic to avoid duplication across modules.
Created as part of M2: Create Scope Detection Utilities milestone.
"""
# Standard library imports
import re
from dataclasses import dataclass
from typing import List, Optional, Set
# Local application imports
from . import regex_patterns as rgx
from . import tag_utils
from code.logging_util import get_logger
logger = get_logger(__name__)
@dataclass
class ScopePatterns:
"""
Pattern collections for scope detection.
Attributes:
opponent: Patterns that indicate opponent ownership
self_ref: Patterns that indicate self-reference
your_permanents: Patterns that indicate "you control"
blanket: Patterns that indicate no ownership qualifier
targeted: Patterns that indicate targeting (optional)
"""
opponent: List[re.Pattern]
self_ref: List[re.Pattern]
your_permanents: List[re.Pattern]
blanket: List[re.Pattern]
targeted: Optional[List[re.Pattern]] = None
def detect_scope(
text: str,
card_name: str,
ability_keyword: str,
patterns: ScopePatterns,
allow_multiple: bool = False,
check_grant_verbs: bool = False,
keywords: Optional[str] = None,
) -> Optional[str]:
"""
Generic scope detection with priority ordering.
Detection priority (prevents misclassification):
0. Static keyword (in keywords field or simple list) "Self"
1. Opponent ownership "Opponent Permanents"
2. Self-reference "Self"
3. Your ownership "Your Permanents"
4. No ownership qualifier "Blanket"
Args:
text: Card text
card_name: Card name (for self-reference detection)
ability_keyword: Ability keyword to look for (e.g., "hexproof", "phasing")
patterns: ScopePatterns object with pattern collections
allow_multiple: If True, returns Set[str] instead of single scope
check_grant_verbs: If True, checks for grant verbs before assuming "Self"
keywords: Optional keywords field from card data (for static keyword detection)
Returns:
Scope string or None: "Self", "Your Permanents", "Blanket", "Opponent Permanents"
If allow_multiple=True, returns Set[str] with all matching scopes
"""
if not text or not ability_keyword:
return set() if allow_multiple else None
text_lower = text.lower()
ability_lower = ability_keyword.lower()
card_name_lower = card_name.lower() if card_name else ''
# Check if ability is mentioned in text
if ability_lower not in text_lower:
return set() if allow_multiple else None
# Priority 0: Check if this is a static keyword ability
# Static keywords appear in the keywords field or as simple comma-separated lists
# without grant verbs (e.g., "Flying, first strike, protection from black")
if check_static_keyword(ability_keyword, keywords, text):
if allow_multiple:
return {"Self"}
else:
return "Self"
if allow_multiple:
scopes = set()
else:
scopes = None
# Priority 1: Opponent ownership
for pattern in patterns.opponent:
if pattern.search(text_lower):
if allow_multiple:
scopes.add("Opponent Permanents")
break
else:
return "Opponent Permanents"
# Priority 2: Self-reference
is_self = _check_self_reference(text_lower, card_name_lower, ability_lower, patterns.self_ref)
# If check_grant_verbs is True, verify we don't have grant patterns before assuming Self
if is_self and check_grant_verbs:
has_grant_pattern = _has_grant_verbs(text_lower)
if not has_grant_pattern:
if allow_multiple:
scopes.add("Self")
else:
return "Self"
elif is_self:
if allow_multiple:
scopes.add("Self")
else:
return "Self"
# Priority 3: Your ownership
for pattern in patterns.your_permanents:
if pattern.search(text_lower):
if allow_multiple:
scopes.add("Your Permanents")
break
else:
return "Your Permanents"
# Priority 4: Blanket (no ownership qualifier)
for pattern in patterns.blanket:
if pattern.search(text_lower):
# Double-check no ownership was missed
if not rgx.YOU_CONTROL.search(text_lower) and 'opponent' not in text_lower:
if allow_multiple:
scopes.add("Blanket")
break
else:
return "Blanket"
return scopes if allow_multiple else None
def detect_multi_scope(
text: str,
card_name: str,
ability_keyword: str,
patterns: ScopePatterns,
check_grant_verbs: bool = False,
keywords: Optional[str] = None,
) -> Set[str]:
"""
Detect multiple scopes for cards with multiple effects.
Some cards grant abilities to multiple scopes:
- Self-hexproof + grants ward to others
- Target phasing + your permanents phasing
Args:
text: Card text
card_name: Card name
ability_keyword: Ability keyword to look for
patterns: ScopePatterns object
check_grant_verbs: If True, checks for grant verbs before assuming "Self"
keywords: Optional keywords field for static keyword detection
Returns:
Set of scope strings
"""
scopes = set()
if not text or not ability_keyword:
return scopes
text_lower = text.lower()
ability_lower = ability_keyword.lower()
card_name_lower = card_name.lower() if card_name else ''
# Check for static keyword first
if check_static_keyword(ability_keyword, keywords, text):
scopes.add("Self")
# For static keywords, we usually don't have multiple scopes
# But continue checking in case there are additional effects
# Check if ability is mentioned
if ability_lower not in text_lower:
return scopes
# Check opponent patterns
if any(pattern.search(text_lower) for pattern in patterns.opponent):
scopes.add("Opponent Permanents")
# Check self-reference
is_self = _check_self_reference(text_lower, card_name_lower, ability_lower, patterns.self_ref)
if is_self:
if check_grant_verbs:
has_grant_pattern = _has_grant_verbs(text_lower)
if not has_grant_pattern:
scopes.add("Self")
else:
scopes.add("Self")
# Check your permanents
if any(pattern.search(text_lower) for pattern in patterns.your_permanents):
scopes.add("Your Permanents")
# Check blanket (no ownership)
has_blanket = any(pattern.search(text_lower) for pattern in patterns.blanket)
no_ownership = not rgx.YOU_CONTROL.search(text_lower) and 'opponent' not in text_lower
if has_blanket and no_ownership:
scopes.add("Blanket")
# Optional: Check for targeting
if patterns.targeted:
if any(pattern.search(text_lower) for pattern in patterns.targeted):
scopes.add("Targeted")
return scopes
def _check_self_reference(
text_lower: str,
card_name_lower: str,
ability_lower: str,
self_patterns: List[re.Pattern]
) -> bool:
"""
Check if text contains self-reference patterns.
Args:
text_lower: Lowercase card text
card_name_lower: Lowercase card name
ability_lower: Lowercase ability keyword
self_patterns: List of self-reference patterns
Returns:
True if self-reference found
"""
# Check provided self patterns
for pattern in self_patterns:
if pattern.search(text_lower):
return True
# Check for card name reference (if provided)
if card_name_lower:
card_name_escaped = re.escape(card_name_lower)
card_name_pattern = re.compile(rf'\b{card_name_escaped}\b', re.IGNORECASE)
if card_name_pattern.search(text_lower):
# Make sure it's in a self-ability context
self_context_patterns = [
re.compile(rf'\b{card_name_escaped}\s+(?:has|gains?)\s+{ability_lower}', re.IGNORECASE),
re.compile(rf'\b{card_name_escaped}\s+is\s+{ability_lower}', re.IGNORECASE),
]
for pattern in self_context_patterns:
if pattern.search(text_lower):
return True
return False
def _has_grant_verbs(text_lower: str) -> bool:
"""
Check if text contains grant verb patterns.
Used to distinguish inherent abilities from granted abilities.
Args:
text_lower: Lowercase card text
Returns:
True if grant verbs found
"""
grant_patterns = [
re.compile(r'(?:have|gain|grant|give|get)[s]?\s+', re.IGNORECASE),
rgx.OTHER_CREATURES,
rgx.CREATURE_YOU_CONTROL,
rgx.PERMANENT_YOU_CONTROL,
rgx.EQUIPPED_CREATURE,
rgx.ENCHANTED_CREATURE,
rgx.TARGET_CREATURE,
]
return any(pattern.search(text_lower) for pattern in grant_patterns)
def format_scope_tag(scope: str, ability: str) -> str:
"""
Format a scope and ability into a metadata tag.
Args:
scope: Scope string (e.g., "Self", "Your Permanents")
ability: Ability name (e.g., "Hexproof", "Phasing")
Returns:
Formatted tag string (e.g., "Self: Hexproof")
"""
return f"{scope}: {ability}"
def has_keyword(text: str, keywords: List[str]) -> bool:
"""
Quick check if card text contains any of the specified keywords.
Args:
text: Card text
keywords: List of keywords to search for
Returns:
True if any keyword found
"""
if not text:
return False
text_lower = text.lower()
return any(keyword.lower() in text_lower for keyword in keywords)
def check_static_keyword(
ability_keyword: str,
keywords: Optional[str] = None,
text: Optional[str] = None
) -> bool:
"""
Check if card has ability as a static keyword (not granted to others).
A static keyword is one that appears:
1. In the keywords field, OR
2. As a simple comma-separated list without grant verbs
(e.g., "Flying, first strike, protection from black")
Args:
ability_keyword: Ability to check (e.g., "Protection", "Hexproof")
keywords: Optional keywords field from card data
text: Optional card text for fallback detection
Returns:
True if ability appears as static keyword
"""
ability_lower = ability_keyword.lower()
# Check keywords field first (most reliable)
if keywords:
keywords_lower = keywords.lower()
if ability_lower in keywords_lower:
return True
# Fallback: Check if ability appears in simple comma-separated keyword list
# Pattern: starts with keywords (Flying, First strike, etc.) without grant verbs
# Example: "Flying, first strike, vigilance, trample, haste, protection from black"
if text:
text_lower = text.lower()
# Check if ability appears in text but WITHOUT grant verbs
if ability_lower in text_lower:
# Look for grant verbs that would indicate this is NOT a static keyword
grant_verbs = ['have', 'has', 'gain', 'gains', 'get', 'gets', 'grant', 'grants', 'give', 'gives']
# Find the position of the ability in text
ability_pos = text_lower.find(ability_lower)
# Check the 50 characters before the ability for grant verbs
# This catches patterns like "creatures gain protection" or "has hexproof"
context_before = text_lower[max(0, ability_pos - 50):ability_pos]
# If no grant verbs found nearby, it's likely a static keyword
if not any(verb in context_before for verb in grant_verbs):
# Additional check: is it part of a comma-separated list?
# This helps with "Flying, first strike, protection from X" patterns
context_before_30 = text_lower[max(0, ability_pos - 30):ability_pos]
if ',' in context_before_30 or ability_pos < 10:
return True
return False
def check_static_keyword_legacy(
keywords: str,
static_keyword: str,
text: str,
grant_patterns: Optional[List[re.Pattern]] = None
) -> bool:
"""
LEGACY: Check if card has static keyword without granting it to others.
Used for abilities like "Phasing" that can be both static and granted.
Args:
keywords: Card keywords field
static_keyword: Keyword to search for (e.g., "phasing")
text: Card text
grant_patterns: Optional patterns to check for granting language
Returns:
True if static keyword found and not granted to others
"""
if not keywords:
return False
keywords_lower = keywords.lower()
if static_keyword.lower() not in keywords_lower:
return False
# If grant patterns provided, check if card grants to others
if grant_patterns:
text_no_reminder = tag_utils.strip_reminder_text(text.lower()) if text else ''
grants_to_others = any(pattern.search(text_no_reminder) for pattern in grant_patterns)
# Only return True if NOT granting to others
return not grants_to_others
return True

View file

@ -1,13 +1,59 @@
from typing import Dict, List, Final
"""
Tag Constants Module
Centralized constants for card tagging and theme detection across the MTG deckbuilder.
This module contains all shared constants used by the tagging system including:
- Card types and creature types
- Pattern groups and regex fragments
- Tag groupings and relationships
- Protection and ability keywords
- Magic numbers and thresholds
"""
from typing import Dict, Final, List
# =============================================================================
# TABLE OF CONTENTS
# =============================================================================
# 1. TRIGGERS & BASIC PATTERNS
# 2. TAG GROUPS & RELATIONSHIPS
# 3. PATTERN GROUPS & REGEX FRAGMENTS
# 4. PHRASE GROUPS
# 5. COUNTER TYPES
# 6. CREATURE TYPES
# 7. NON-CREATURE TYPES & SPECIAL TYPES
# 8. PROTECTION & ABILITY KEYWORDS
# 9. TOKEN TYPES
# 10. MAGIC NUMBERS & THRESHOLDS
# 11. DATAFRAME COLUMN REQUIREMENTS
# 12. TYPE-TAG MAPPINGS
# 13. DRAW-RELATED CONSTANTS
# 14. EQUIPMENT-RELATED CONSTANTS
# 15. AURA & VOLTRON CONSTANTS
# 16. LANDS MATTER PATTERNS
# 17. SACRIFICE & GRAVEYARD PATTERNS
# 18. CREATURE-RELATED PATTERNS
# 19. TOKEN-RELATED PATTERNS
# 20. REMOVAL & DESTRUCTION PATTERNS
# 21. SPELL-RELATED PATTERNS
# 22. MISC PATTERNS & EXCLUSIONS
# =============================================================================
# 1. TRIGGERS & BASIC PATTERNS
# =============================================================================
TRIGGERS: List[str] = ['when', 'whenever', 'at']
NUM_TO_SEARCH: List[str] = ['a', 'an', 'one', '1', 'two', '2', 'three', '3', 'four','4', 'five', '5',
'six', '6', 'seven', '7', 'eight', '8', 'nine', '9', 'ten', '10',
'x','one or more']
NUM_TO_SEARCH: List[str] = [
'a', 'an', 'one', '1', 'two', '2', 'three', '3', 'four', '4', 'five', '5',
'six', '6', 'seven', '7', 'eight', '8', 'nine', '9', 'ten', '10',
'x', 'one or more'
]
# =============================================================================
# 2. TAG GROUPS & RELATIONSHIPS
# =============================================================================
# Constants for common tag groupings
TAG_GROUPS: Dict[str, List[str]] = {
"Cantrips": ["Cantrips", "Card Draw", "Spellslinger", "Spells Matter"],
"Tokens": ["Token Creation", "Tokens Matter"],
@ -19,8 +65,11 @@ TAG_GROUPS: Dict[str, List[str]] = {
"Spells": ["Spellslinger", "Spells Matter"]
}
# Common regex patterns
PATTERN_GROUPS: Dict[str, str] = {
# =============================================================================
# 3. PATTERN GROUPS & REGEX FRAGMENTS
# =============================================================================
PATTERN_GROUPS: Dict[str, str] = {
"draw": r"draw[s]? a card|draw[s]? one card",
"combat": r"attack[s]?|block[s]?|combat damage",
"tokens": r"create[s]? .* token|put[s]? .* token",
@ -30,7 +79,10 @@ PATTERN_GROUPS: Dict[str, str] = {
"cost_reduction": r"cost[s]? \{[\d\w]\} less|affinity for|cost[s]? less to cast|chosen type cost|copy cost|from exile cost|from exile this turn cost|from your graveyard cost|has undaunted|have affinity for artifacts|other than your hand cost|spells cost|spells you cast cost|that target .* cost|those spells cost|you cast cost|you pay cost"
}
# Common phrase groups (lists) used across taggers
# =============================================================================
# 4. PHRASE GROUPS
# =============================================================================
PHRASE_GROUPS: Dict[str, List[str]] = {
# Variants for monarch wording
"monarch": [
@ -52,11 +104,15 @@ PHRASE_GROUPS: Dict[str, List[str]] = {
r"return .* to the battlefield"
]
}
# Common action patterns
CREATE_ACTION_PATTERN: Final[str] = r"create|put"
# Creature/Counter types
COUNTER_TYPES: List[str] = [r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+2/\+0', r'\+2/\+2',
# =============================================================================
# 5. COUNTER TYPES
# =============================================================================
COUNTER_TYPES: List[str] = [
r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+2/\+0', r'\+2/\+2',
'-0/-1', '-0/-2', '-1/-0', '-1/-2', '-2/-0', '-2/-2',
'Acorn', 'Aegis', 'Age', 'Aim', 'Arrow', 'Arrowhead','Awakening',
'Bait', 'Blaze', 'Blessing', 'Blight',' Blood', 'Bloddline',
@ -90,9 +146,15 @@ COUNTER_TYPES: List[str] = [r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+
'Task', 'Ticket', 'Tide', 'Time', 'Tower', 'Training', 'Trap',
'Treasure', 'Unity', 'Unlock', 'Valor', 'Velocity', 'Verse',
'Vitality', 'Void', 'Volatile', 'Vortex', 'Vow', 'Voyage', 'Wage',
'Winch', 'Wind', 'Wish']
'Winch', 'Wind', 'Wish'
]
CREATURE_TYPES: List[str] = ['Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel', 'Antelope', 'Ape', 'Archer', 'Archon', 'Armadillo',
# =============================================================================
# 6. CREATURE TYPES
# =============================================================================
CREATURE_TYPES: List[str] = [
'Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel', 'Antelope', 'Ape', 'Archer', 'Archon', 'Armadillo',
'Army', 'Artificer', 'Assassin', 'Assembly-Worker', 'Astartes', 'Atog', 'Aurochs', 'Automaton',
'Avatar', 'Azra', 'Badger', 'Balloon', 'Barbarian', 'Bard', 'Basilisk', 'Bat', 'Bear', 'Beast', 'Beaver',
'Beeble', 'Beholder', 'Berserker', 'Bird', 'Blinkmoth', 'Boar', 'Brainiac', 'Bringer', 'Brushwagg',
@ -122,9 +184,15 @@ CREATURE_TYPES: List[str] = ['Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel',
'Thopter', 'Thrull', 'Tiefling', 'Time Lord', 'Toy', 'Treefolk', 'Trilobite', 'Triskelavite', 'Troll',
'Turtle', 'Tyranid', 'Unicorn', 'Urzan', 'Vampire', 'Varmint', 'Vedalken', 'Volver', 'Wall', 'Walrus',
'Warlock', 'Warrior', 'Wasp', 'Weasel', 'Weird', 'Werewolf', 'Whale', 'Wizard', 'Wolf', 'Wolverine', 'Wombat',
'Worm', 'Wraith', 'Wurm', 'Yeti', 'Zombie', 'Zubera']
'Worm', 'Wraith', 'Wurm', 'Yeti', 'Zombie', 'Zubera'
]
NON_CREATURE_TYPES: List[str] = ['Legendary', 'Creature', 'Enchantment', 'Artifact',
# =============================================================================
# 7. NON-CREATURE TYPES & SPECIAL TYPES
# =============================================================================
NON_CREATURE_TYPES: List[str] = [
'Legendary', 'Creature', 'Enchantment', 'Artifact',
'Battle', 'Sorcery', 'Instant', 'Land', '-', '',
'Blood', 'Clue', 'Food', 'Gold', 'Incubator',
'Junk', 'Map', 'Powerstone', 'Treasure',
@ -136,23 +204,66 @@ NON_CREATURE_TYPES: List[str] = ['Legendary', 'Creature', 'Enchantment', 'Artifa
'Shrine',
'Plains', 'Island', 'Swamp', 'Forest', 'Mountain',
'Cave', 'Desert', 'Gate', 'Lair', 'Locus', 'Mine',
'Power-Plant', 'Sphere', 'Tower', 'Urza\'s']
'Power-Plant', 'Sphere', 'Tower', 'Urza\'s'
]
OUTLAW_TYPES: List[str] = ['Assassin', 'Mercenary', 'Pirate', 'Rogue', 'Warlock']
ENCHANTMENT_TOKENS: List[str] = ['Cursed Role', 'Monster Role', 'Royal Role', 'Sorcerer Role',
'Virtuous Role', 'Wicked Role', 'Young Hero Role', 'Shard']
ARTIFACT_TOKENS: List[str] = ['Blood', 'Clue', 'Food', 'Gold', 'Incubator',
'Junk','Map','Powerstone', 'Treasure']
# =============================================================================
# 8. PROTECTION & ABILITY KEYWORDS
# =============================================================================
PROTECTION_ABILITIES: List[str] = [
'Protection',
'Ward',
'Hexproof',
'Shroud',
'Indestructible'
]
PROTECTION_KEYWORDS: Final[frozenset] = frozenset({
'hexproof',
'shroud',
'indestructible',
'ward',
'protection from',
'protection',
})
# =============================================================================
# 9. TOKEN TYPES
# =============================================================================
ENCHANTMENT_TOKENS: List[str] = [
'Cursed Role', 'Monster Role', 'Royal Role', 'Sorcerer Role',
'Virtuous Role', 'Wicked Role', 'Young Hero Role', 'Shard'
]
ARTIFACT_TOKENS: List[str] = [
'Blood', 'Clue', 'Food', 'Gold', 'Incubator',
'Junk', 'Map', 'Powerstone', 'Treasure'
]
# =============================================================================
# 10. MAGIC NUMBERS & THRESHOLDS
# =============================================================================
CONTEXT_WINDOW_SIZE: Final[int] = 70 # Characters to examine around a regex match
# =============================================================================
# 11. DATAFRAME COLUMN REQUIREMENTS
# =============================================================================
# Constants for DataFrame validation and processing
REQUIRED_COLUMNS: List[str] = [
'name', 'faceName', 'edhrecRank', 'colorIdentity', 'colors',
'manaCost', 'manaValue', 'type', 'creatureTypes', 'text',
'power', 'toughness', 'keywords', 'themeTags', 'layout', 'side'
]
# Mapping of card types to their corresponding theme tags
# =============================================================================
# 12. TYPE-TAG MAPPINGS
# =============================================================================
TYPE_TAG_MAPPING: Dict[str, List[str]] = {
'Artifact': ['Artifacts Matter'],
'Battle': ['Battles Matter'],
@ -166,7 +277,10 @@ TYPE_TAG_MAPPING: Dict[str, List[str]] = {
'Sorcery': ['Spells Matter', 'Spellslinger']
}
# Constants for draw-related functionality
# =============================================================================
# 13. DRAW-RELATED CONSTANTS
# =============================================================================
DRAW_RELATED_TAGS: List[str] = [
'Card Draw', # General card draw effects
'Conditional Draw', # Draw effects with conditions/triggers
@ -175,16 +289,18 @@ DRAW_RELATED_TAGS: List[str] = [
'Loot', # Draw + discard effects
'Replacement Draw', # Effects that modify or replace draws
'Sacrifice to Draw', # Draw effects requiring sacrificing permanents
'Unconditional Draw' # Pure card draw without conditions
'Unconditional Draw' # Pure card draw without conditions
]
# Text patterns that exclude cards from being tagged as unconditional draw
DRAW_EXCLUSION_PATTERNS: List[str] = [
'annihilator', # Eldrazi mechanic that can match 'draw' patterns
'ravenous', # Keyword that can match 'draw' patterns
'ravenous', # Keyword that can match 'draw' patterns
]
# Equipment-related constants
# =============================================================================
# 14. EQUIPMENT-RELATED CONSTANTS
# =============================================================================
EQUIPMENT_EXCLUSIONS: List[str] = [
'Bruenor Battlehammer', # Equipment cost reduction
'Nazahn, Revered Bladesmith', # Equipment tutor
@ -223,7 +339,10 @@ EQUIPMENT_TEXT_PATTERNS: List[str] = [
'unequip', # Equipment removal
]
# Aura-related constants
# =============================================================================
# 15. AURA & VOLTRON CONSTANTS
# =============================================================================
AURA_SPECIFIC_CARDS: List[str] = [
'Ardenn, Intrepid Archaeologist', # Aura movement
'Calix, Guided By Fate', # Create duplicate Auras
@ -267,7 +386,10 @@ VOLTRON_PATTERNS: List[str] = [
'reconfigure'
]
# Constants for lands matter functionality
# =============================================================================
# 16. LANDS MATTER PATTERNS
# =============================================================================
LANDS_MATTER_PATTERNS: Dict[str, List[str]] = {
'land_play': [
'play a land',
@ -849,4 +971,110 @@ TOPDECK_EXCLUSION_PATTERNS: List[str] = [
'from the top of their library',
'look at the top card of target player\'s library',
'reveal the top card of target player\'s library'
]
]
# ==============================================================================
# Keyword Normalization (M1 - Tagging Refinement)
# ==============================================================================
# Keyword normalization map: variant -> canonical
# Maps Commander-specific and variant keywords to their canonical forms
KEYWORD_NORMALIZATION_MAP: Dict[str, str] = {
# Commander variants
'Commander ninjutsu': 'Ninjutsu',
'Commander Ninjutsu': 'Ninjutsu',
# Partner variants (already excluded but mapped for reference)
'Partner with': 'Partner',
'Choose a Background': 'Choose a Background', # Keep distinct
"Doctor's Companion": "Doctor's Companion", # Keep distinct
# Case normalization for common keywords (most are already correct)
'flying': 'Flying',
'trample': 'Trample',
'vigilance': 'Vigilance',
'haste': 'Haste',
'deathtouch': 'Deathtouch',
'lifelink': 'Lifelink',
'menace': 'Menace',
'reach': 'Reach',
}
# Keywords that should never appear in theme tags
# Already excluded during keyword tagging, but documented here
KEYWORD_EXCLUSION_SET: set[str] = {
'partner', # Already excluded in tag_for_keywords
}
# Keyword allowlist - keywords that should survive singleton pruning
# Seeded from top keywords and theme whitelist
KEYWORD_ALLOWLIST: set[str] = {
# Evergreen keywords (top 50 from baseline)
'Flying', 'Enchant', 'Trample', 'Vigilance', 'Haste', 'Equip', 'Flash',
'Mill', 'Scry', 'Transform', 'Cycling', 'First strike', 'Reach', 'Menace',
'Lifelink', 'Treasure', 'Defender', 'Deathtouch', 'Kicker', 'Flashback',
'Protection', 'Surveil', 'Landfall', 'Crew', 'Ward', 'Morph', 'Devoid',
'Investigate', 'Fight', 'Food', 'Partner', 'Double strike', 'Indestructible',
'Threshold', 'Proliferate', 'Convoke', 'Hexproof', 'Cumulative upkeep',
'Goad', 'Delirium', 'Prowess', 'Suspend', 'Affinity', 'Madness', 'Manifest',
'Amass', 'Domain', 'Unearth', 'Explore', 'Changeling',
# Additional important mechanics
'Myriad', 'Cascade', 'Storm', 'Dredge', 'Delve', 'Escape', 'Mutate',
'Ninjutsu', 'Overload', 'Rebound', 'Retrace', 'Bloodrush', 'Cipher',
'Extort', 'Evolve', 'Undying', 'Persist', 'Wither', 'Infect', 'Annihilator',
'Exalted', 'Phasing', 'Shadow', 'Horsemanship', 'Banding', 'Rampage',
'Shroud', 'Split second', 'Totem armor', 'Living weapon', 'Undaunted',
'Improvise', 'Surge', 'Emerge', 'Escalate', 'Meld', 'Partner', 'Afflict',
'Aftermath', 'Embalm', 'Eternalize', 'Exert', 'Fabricate', 'Improvise',
'Assist', 'Jump-start', 'Mentor', 'Riot', 'Spectacle', 'Addendum',
'Afterlife', 'Adapt', 'Enrage', 'Ascend', 'Learn', 'Boast', 'Foretell',
'Squad', 'Encore', 'Daybound', 'Nightbound', 'Disturb', 'Cleave', 'Training',
'Reconfigure', 'Blitz', 'Casualty', 'Connive', 'Hideaway', 'Prototype',
'Read ahead', 'Living metal', 'More than meets the eye', 'Ravenous',
'Squad', 'Toxic', 'For Mirrodin!', 'Backup', 'Bargain', 'Craft', 'Freerunning',
'Plot', 'Spree', 'Offspring', 'Bestow', 'Monstrosity', 'Tribute',
# Partner mechanics (distinct types)
'Choose a Background', "Doctor's Companion",
# Token types (frequently used)
'Blood', 'Clue', 'Food', 'Gold', 'Treasure', 'Powerstone',
# Common ability words
'Landfall', 'Raid', 'Revolt', 'Threshold', 'Metalcraft', 'Morbid',
'Bloodthirst', 'Battalion', 'Channel', 'Grandeur', 'Kinship', 'Sweep',
'Radiance', 'Join forces', 'Fateful hour', 'Inspired', 'Heroic',
'Constellation', 'Strive', 'Prowess', 'Ferocious', 'Formidable', 'Renown',
'Tempting offer', 'Will of the council', 'Parley', 'Adamant', 'Devotion',
}
# ==============================================================================
# Metadata Tag Classification (M3 - Tagging Refinement)
# ==============================================================================
# Metadata tag prefixes - tags starting with these are classified as metadata
METADATA_TAG_PREFIXES: List[str] = [
'Applied:',
'Bracket:',
'Diagnostic:',
'Internal:',
]
# Specific metadata tags (full match) - additional tags to classify as metadata
# These are typically diagnostic, bracket-related, or internal annotations
METADATA_TAG_ALLOWLIST: set[str] = {
# Bracket annotations
'Bracket: Game Changer',
'Bracket: Staple',
'Bracket: Format Warping',
# Cost reduction diagnostics (from Applied: namespace)
'Applied: Cost Reduction',
# Kindred-specific protection metadata (from M2)
# Format: "{CreatureType}s Gain Protection"
# These are auto-generated for kindred-specific protection grants
# Example: "Knights Gain Protection", "Frogs Gain Protection"
# Note: These are dynamically generated, so we match via prefix in classify_tag
}

View file

@ -13,18 +13,11 @@ The module is designed to work with pandas DataFrames containing card data and p
vectorized operations for efficient processing of large card collections.
"""
from __future__ import annotations
# Standard library imports
import re
from typing import List, Set, Union, Any, Tuple
from functools import lru_cache
from typing import Any, List, Set, Tuple, Union
import numpy as np
# Third-party imports
import pandas as pd
# Local application imports
from . import tag_constants
@ -58,7 +51,6 @@ def _ensure_norm_series(df: pd.DataFrame, source_col: str, norm_col: str) -> pd.
"""
if norm_col in df.columns:
return df[norm_col]
# Create normalized string series
series = df[source_col].fillna('') if source_col in df.columns else pd.Series([''] * len(df), index=df.index)
series = series.astype(str)
df[norm_col] = series
@ -120,8 +112,6 @@ def create_type_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0:
return pd.Series([], dtype=bool)
# Use normalized cached series
type_series = _ensure_norm_series(df, 'type', '__type_s')
if regex:
@ -160,8 +150,6 @@ def create_text_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0:
return pd.Series([], dtype=bool)
# Use normalized cached series
text_series = _ensure_norm_series(df, 'text', '__text_s')
if regex:
@ -192,10 +180,7 @@ def create_keyword_mask(df: pd.DataFrame, type_text: Union[str, List[str]], rege
TypeError: If type_text is not a string or list of strings
ValueError: If required 'keywords' column is missing from DataFrame
"""
# Validate required columns
validate_dataframe_columns(df, {'keywords'})
# Handle empty DataFrame case
if len(df) == 0:
return pd.Series([], dtype=bool)
@ -206,8 +191,6 @@ def create_keyword_mask(df: pd.DataFrame, type_text: Union[str, List[str]], rege
type_text = [type_text]
elif not isinstance(type_text, list):
raise TypeError("type_text must be a string or list of strings")
# Use normalized cached series for keywords
keywords = _ensure_norm_series(df, 'keywords', '__keywords_s')
if regex:
@ -245,8 +228,6 @@ def create_name_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0:
return pd.Series([], dtype=bool)
# Use normalized cached series
name_series = _ensure_norm_series(df, 'name', '__name_s')
if regex:
@ -324,21 +305,14 @@ def create_tag_mask(df: pd.DataFrame, tag_patterns: Union[str, List[str]], colum
Boolean Series indicating matching rows
Examples:
# Match cards with draw-related tags
>>> mask = create_tag_mask(df, ['Card Draw', 'Conditional Draw'])
>>> mask = create_tag_mask(df, 'Unconditional Draw')
"""
if isinstance(tag_patterns, str):
tag_patterns = [tag_patterns]
# Handle empty DataFrame case
if len(df) == 0:
return pd.Series([], dtype=bool)
# Create mask for each pattern
masks = [df[column].apply(lambda x: any(pattern in tag for tag in x)) for pattern in tag_patterns]
# Combine masks with OR
return pd.concat(masks, axis=1).any(axis=1)
def validate_dataframe_columns(df: pd.DataFrame, required_columns: Set[str]) -> None:
@ -365,11 +339,7 @@ def apply_tag_vectorized(df: pd.DataFrame, mask: pd.Series[bool], tags: Union[st
"""
if not isinstance(tags, list):
tags = [tags]
# Get current tags for masked rows
current_tags = df.loc[mask, 'themeTags']
# Add new tags
df.loc[mask, 'themeTags'] = current_tags.apply(lambda x: sorted(list(set(x + tags))))
def apply_rules(df: pd.DataFrame, rules: List[dict]) -> None:
@ -463,7 +433,6 @@ def create_numbered_phrase_mask(
numbers = tag_constants.NUM_TO_SEARCH
# Normalize verbs to list
verbs = [verb] if isinstance(verb, str) else verb
# Build patterns
if noun:
patterns = [fr"{v}\s+{num}\s+{noun}" for v in verbs for num in numbers]
else:
@ -490,13 +459,8 @@ def create_mass_damage_mask(df: pd.DataFrame) -> pd.Series[bool]:
Returns:
Boolean Series indicating which cards have mass damage effects
"""
# Create patterns for numeric damage
number_patterns = [create_damage_pattern(i) for i in range(1, 21)]
# Add X damage pattern
number_patterns.append(create_damage_pattern('X'))
# Add patterns for damage targets
target_patterns = [
'to each creature',
'to all creatures',
@ -504,9 +468,385 @@ def create_mass_damage_mask(df: pd.DataFrame) -> pd.Series[bool]:
'to each opponent',
'to everything'
]
# Create masks
damage_mask = create_text_mask(df, number_patterns)
target_mask = create_text_mask(df, target_patterns)
return damage_mask & target_mask
return damage_mask & target_mask
# ==============================================================================
# Keyword Normalization (M1 - Tagging Refinement)
# ==============================================================================
def normalize_keywords(
raw: Union[List[str], Set[str], Tuple[str, ...]],
allowlist: Set[str],
frequency_map: dict[str, int]
) -> list[str]:
"""Normalize keyword strings for theme tagging.
Applies normalization rules:
1. Case normalization (via normalization map)
2. Canonical mapping (e.g., "Commander Ninjutsu" -> "Ninjutsu")
3. Singleton pruning (unless allowlisted)
4. Deduplication
5. Exclusion of blacklisted keywords
Args:
raw: Iterable of raw keyword strings
allowlist: Set of keywords that should survive singleton pruning
frequency_map: Dict mapping keywords to their occurrence count
Returns:
Deduplicated list of normalized keywords
Raises:
ValueError: If raw is not iterable
Examples:
>>> normalize_keywords(
... ['Commander Ninjutsu', 'Flying', 'Allons-y!'],
... {'Flying', 'Ninjutsu'},
... {'Commander Ninjutsu': 2, 'Flying': 100, 'Allons-y!': 1}
... )
['Ninjutsu', 'Flying'] # 'Allons-y!' pruned as singleton
"""
if not hasattr(raw, '__iter__') or isinstance(raw, (str, bytes)):
raise ValueError(f"raw must be iterable, got {type(raw)}")
normalized_keywords: set[str] = set()
for keyword in raw:
if not isinstance(keyword, str):
continue
keyword = keyword.strip()
if not keyword:
continue
if keyword.lower() in tag_constants.KEYWORD_EXCLUSION_SET:
continue
normalized = tag_constants.KEYWORD_NORMALIZATION_MAP.get(keyword, keyword)
frequency = frequency_map.get(keyword, 0)
is_singleton = frequency == 1
is_allowlisted = normalized in allowlist or keyword in allowlist
# Prune singletons that aren't allowlisted
if is_singleton and not is_allowlisted:
continue
normalized_keywords.add(normalized)
return sorted(list(normalized_keywords))
# ==============================================================================
# M3: Metadata vs Theme Tag Classification
# ==============================================================================
def classify_tag(tag: str) -> str:
"""Classify a tag as either 'metadata' or 'theme'.
Metadata tags are diagnostic, bracket-related, or internal annotations that
should not appear in theme catalogs or player-facing tag lists. Theme tags
represent gameplay mechanics and deck archetypes.
Classification rules (in order of precedence):
1. Prefix match: Tags starting with METADATA_TAG_PREFIXES metadata
2. Exact match: Tags in METADATA_TAG_ALLOWLIST metadata
3. Kindred pattern: "{Type}s Gain Protection" metadata
4. Default: All other tags theme
Args:
tag: Tag string to classify
Returns:
"metadata" or "theme"
Examples:
>>> classify_tag("Applied: Cost Reduction")
'metadata'
>>> classify_tag("Bracket: Game Changer")
'metadata'
>>> classify_tag("Knights Gain Protection")
'metadata'
>>> classify_tag("Card Draw")
'theme'
>>> classify_tag("Spellslinger")
'theme'
"""
# Prefix-based classification
for prefix in tag_constants.METADATA_TAG_PREFIXES:
if tag.startswith(prefix):
return "metadata"
# Exact match classification
if tag in tag_constants.METADATA_TAG_ALLOWLIST:
return "metadata"
# Kindred protection metadata patterns: "{Type} Gain {Ability}"
# Covers all protective abilities: Protection, Ward, Hexproof, Shroud, Indestructible
# Examples: "Knights Gain Protection", "Spiders Gain Ward", "Merfolk Gain Ward"
# Note: Checks for " Gain " pattern since some creature types like "Merfolk" don't end in 's'
kindred_abilities = ["Protection", "Ward", "Hexproof", "Shroud", "Indestructible"]
for ability in kindred_abilities:
if " Gain " in tag and tag.endswith(ability):
return "metadata"
# Protection scope metadata patterns (M5): "{Scope}: {Ability}"
# Indicates whether protection applies to self, your permanents, all permanents, or opponent's permanents
# Examples: "Self: Hexproof", "Your Permanents: Ward", "Blanket: Indestructible"
# These enable deck builder to filter for board-relevant protection vs self-only
protection_scopes = ["Self:", "Your Permanents:", "Blanket:", "Opponent Permanents:"]
for scope in protection_scopes:
if tag.startswith(scope):
return "metadata"
# Phasing scope metadata patterns: "{Scope}: Phasing"
# Indicates whether phasing applies to self, your permanents, all permanents, or opponents
# Examples: "Self: Phasing", "Your Permanents: Phasing", "Blanket: Phasing",
# "Targeted: Phasing", "Opponent Permanents: Phasing"
# Similar to protection scopes, enables filtering for board-relevant phasing
# Opponent Permanents: Phasing also triggers Removal tag (removal-style phasing)
if tag in ["Self: Phasing", "Your Permanents: Phasing", "Blanket: Phasing",
"Targeted: Phasing", "Opponent Permanents: Phasing"]:
return "metadata"
# Default: treat as theme tag
return "theme"
# --- Text Processing Helpers (M0.6) ---------------------------------------------------------
def strip_reminder_text(text: str) -> str:
"""Remove reminder text (content in parentheses) from card text.
Reminder text often contains keywords and patterns that can cause false positives
in pattern matching. This function strips all parenthetical content to focus on
the actual game text.
Args:
text: Card text possibly containing reminder text in parentheses
Returns:
Text with all parenthetical content removed
Example:
>>> strip_reminder_text("Hexproof (This creature can't be the target of spells)")
"Hexproof "
"""
if not text:
return text
return re.sub(r'\([^)]*\)', '', text)
def extract_context_window(text: str, match_start: int, match_end: int,
window_size: int = None, include_before: bool = False) -> str:
"""Extract a context window around a regex match for validation.
When pattern matching finds a potential match, we often need to examine
the surrounding text to validate the match or check for additional keywords.
This function extracts a window of text around the match position.
Args:
text: Full text to extract context from
match_start: Start position of the regex match
match_end: End position of the regex match
window_size: Number of characters to include after the match.
If None, uses CONTEXT_WINDOW_SIZE from tag_constants (default: 70).
To include context before the match, use include_before=True.
include_before: If True, includes window_size characters before the match
in addition to after. If False (default), only includes after.
Returns:
Substring of text containing the match plus surrounding context
Example:
>>> text = "Creatures you control have hexproof and vigilance"
>>> match = re.search(r'creatures you control', text)
>>> extract_context_window(text, match.start(), match.end(), window_size=30)
'Creatures you control have hexproof and '
"""
if not text:
return text
if window_size is None:
from .tag_constants import CONTEXT_WINDOW_SIZE
window_size = CONTEXT_WINDOW_SIZE
# Calculate window boundaries
if include_before:
context_start = max(0, match_start - window_size)
else:
context_start = match_start
context_end = min(len(text), match_end + window_size)
return text[context_start:context_end]
# --- Enhanced Tagging Utilities (M3.5/M3.6) ----------------------------------------------------
def build_combined_mask(
df: pd.DataFrame,
text_patterns: Union[str, List[str], None] = None,
type_patterns: Union[str, List[str], None] = None,
keyword_patterns: Union[str, List[str], None] = None,
name_list: Union[List[str], None] = None,
exclusion_patterns: Union[str, List[str], None] = None,
combine_with_or: bool = True
) -> pd.Series[bool]:
"""Build a combined boolean mask from multiple pattern types.
This utility reduces boilerplate when creating complex masks by combining
text, type, keyword, and name patterns into a single mask. Patterns are
combined with OR by default, but can be combined with AND.
Args:
df: DataFrame to search
text_patterns: Patterns to match in 'text' column
type_patterns: Patterns to match in 'type' column
keyword_patterns: Patterns to match in 'keywords' column
name_list: List of exact card names to match
exclusion_patterns: Text patterns to exclude from final mask
combine_with_or: If True, combine masks with OR (default).
If False, combine with AND (requires all conditions)
Returns:
Boolean Series combining all specified patterns
Example:
>>> # Match cards with flying OR haste, exclude creatures
>>> mask = build_combined_mask(
... df,
... keyword_patterns=['Flying', 'Haste'],
... exclusion_patterns='Creature'
... )
"""
if combine_with_or:
result = pd.Series([False] * len(df), index=df.index)
else:
result = pd.Series([True] * len(df), index=df.index)
masks = []
if text_patterns is not None:
masks.append(create_text_mask(df, text_patterns))
if type_patterns is not None:
masks.append(create_type_mask(df, type_patterns))
if keyword_patterns is not None:
masks.append(create_keyword_mask(df, keyword_patterns))
if name_list is not None:
masks.append(create_name_mask(df, name_list))
if masks:
if combine_with_or:
for mask in masks:
result |= mask
else:
for mask in masks:
result &= mask
if exclusion_patterns is not None:
exclusion_mask = create_text_mask(df, exclusion_patterns)
result &= ~exclusion_mask
return result
def tag_with_logging(
df: pd.DataFrame,
mask: pd.Series[bool],
tags: Union[str, List[str]],
log_message: str,
color: str = '',
logger=None
) -> int:
"""Apply tags with standardized logging.
This utility wraps the common pattern of applying tags and logging the count.
It provides consistent formatting for log messages across the tagging module.
Args:
df: DataFrame to modify
mask: Boolean mask indicating which rows to tag
tags: Tag(s) to apply
log_message: Description of what's being tagged (e.g., "flying creatures")
color: Color identifier for context (optional)
logger: Logger instance to use (optional, uses print if None)
Returns:
Count of cards tagged
Example:
>>> count = tag_with_logging(
... df,
... flying_mask,
... 'Flying',
... 'creatures with flying ability',
... color='blue',
... logger=logger
... )
# Logs: "Tagged 42 blue creatures with flying ability"
"""
count = mask.sum()
if count > 0:
apply_tag_vectorized(df, mask, tags)
color_part = f'{color} ' if color else ''
full_message = f'Tagged {count} {color_part}{log_message}'
if logger:
logger.info(full_message)
else:
print(full_message)
return count
def tag_with_rules_and_logging(
df: pd.DataFrame,
rules: List[dict],
summary_message: str,
color: str = '',
logger=None
) -> int:
"""Apply multiple tag rules with summarized logging.
This utility combines apply_rules with logging, providing a summary of
all cards affected across multiple rules.
Args:
df: DataFrame to modify
rules: List of rule dicts (each with 'mask' and 'tags')
summary_message: Overall description (e.g., "card draw effects")
color: Color identifier for context (optional)
logger: Logger instance to use (optional)
Returns:
Total count of unique cards affected by any rule
Example:
>>> rules = [
... {'mask': flying_mask, 'tags': ['Flying']},
... {'mask': haste_mask, 'tags': ['Haste', 'Aggro']}
... ]
>>> count = tag_with_rules_and_logging(
... df, rules, 'evasive creatures', color='red', logger=logger
... )
"""
affected = pd.Series([False] * len(df), index=df.index)
for rule in rules:
mask = rule.get('mask')
if callable(mask):
mask = mask(df)
if mask is not None and mask.any():
tags = rule.get('tags', [])
apply_tag_vectorized(df, mask, tags)
affected |= mask
count = affected.sum()
color_part = f'{color} ' if color else ''
full_message = f'Tagged {count} {color_part}{summary_message}'
if logger:
logger.info(full_message)
else:
print(full_message)
return count

File diff suppressed because it is too large Load diff

View file

@ -4,7 +4,7 @@ from pathlib import Path
import pytest
from headless_runner import _resolve_additional_theme_inputs, _parse_theme_list
from code.headless_runner import resolve_additional_theme_inputs as _resolve_additional_theme_inputs, _parse_theme_list
def _write_catalog(path: Path) -> None:

View file

@ -0,0 +1,182 @@
"""Tests for keyword normalization (M1 - Tagging Refinement)."""
from __future__ import annotations
import pytest
from code.tagging import tag_utils, tag_constants
class TestKeywordNormalization:
"""Test suite for normalize_keywords function."""
def test_canonical_mappings(self):
"""Test that variant keywords map to canonical forms."""
raw = ['Commander Ninjutsu', 'Flying', 'Trample']
allowlist = tag_constants.KEYWORD_ALLOWLIST
frequency_map = {
'Commander Ninjutsu': 2,
'Flying': 100,
'Trample': 50
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Ninjutsu' in result
assert 'Flying' in result
assert 'Trample' in result
assert 'Commander Ninjutsu' not in result
def test_singleton_pruning(self):
"""Test that singleton keywords are pruned unless allowlisted."""
raw = ['Allons-y!', 'Flying', 'Take 59 Flights of Stairs']
allowlist = {'Flying'} # Only Flying is allowlisted
frequency_map = {
'Allons-y!': 1,
'Flying': 100,
'Take 59 Flights of Stairs': 1
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Flying' in result
assert 'Allons-y!' not in result
assert 'Take 59 Flights of Stairs' not in result
def test_case_normalization(self):
"""Test that keywords are normalized to proper case."""
raw = ['flying', 'TRAMPLE', 'vigilance']
allowlist = {'Flying', 'Trample', 'Vigilance'}
frequency_map = {
'flying': 100,
'TRAMPLE': 50,
'vigilance': 75
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
# Case normalization happens via the map
# If not in map, original case is preserved
assert len(result) == 3
def test_partner_exclusion(self):
"""Test that partner keywords remain excluded."""
raw = ['Partner', 'Flying', 'Trample']
allowlist = {'Flying', 'Trample'}
frequency_map = {
'Partner': 50,
'Flying': 100,
'Trample': 50
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Flying' in result
assert 'Trample' in result
assert 'Partner' not in result # Excluded
assert 'partner' not in result
def test_empty_input(self):
"""Test that empty input returns empty list."""
result = tag_utils.normalize_keywords([], set(), {})
assert result == []
def test_whitespace_handling(self):
"""Test that whitespace is properly stripped."""
raw = [' Flying ', 'Trample ', ' Vigilance']
allowlist = {'Flying', 'Trample', 'Vigilance'}
frequency_map = {
'Flying': 100,
'Trample': 50,
'Vigilance': 75
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Flying' in result
assert 'Trample' in result
assert 'Vigilance' in result
def test_deduplication(self):
"""Test that duplicate keywords are deduplicated."""
raw = ['Flying', 'Flying', 'Trample', 'Flying']
allowlist = {'Flying', 'Trample'}
frequency_map = {
'Flying': 100,
'Trample': 50
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert result.count('Flying') == 1
assert result.count('Trample') == 1
def test_non_string_entries_skipped(self):
"""Test that non-string entries are safely skipped."""
raw = ['Flying', None, 123, 'Trample', '']
allowlist = {'Flying', 'Trample'}
frequency_map = {
'Flying': 100,
'Trample': 50
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Flying' in result
assert 'Trample' in result
assert len(result) == 2
def test_invalid_input_raises_error(self):
"""Test that non-iterable input raises ValueError."""
with pytest.raises(ValueError, match="raw must be iterable"):
tag_utils.normalize_keywords("not-a-list", set(), {})
def test_allowlist_preserves_singletons(self):
"""Test that allowlisted keywords survive even if they're singletons."""
raw = ['Myriad', 'Flying', 'Cascade']
allowlist = {'Flying', 'Myriad', 'Cascade'} # All allowlisted
frequency_map = {
'Myriad': 1, # Singleton
'Flying': 100,
'Cascade': 1 # Singleton
}
result = tag_utils.normalize_keywords(raw, allowlist, frequency_map)
assert 'Myriad' in result # Preserved despite being singleton
assert 'Flying' in result
assert 'Cascade' in result # Preserved despite being singleton
class TestKeywordIntegration:
"""Integration tests for keyword normalization in tagging flow."""
def test_normalization_preserves_evergreen_keywords(self):
"""Test that common evergreen keywords are always preserved."""
evergreen = ['Flying', 'Trample', 'Vigilance', 'Haste', 'Deathtouch', 'Lifelink']
allowlist = tag_constants.KEYWORD_ALLOWLIST
frequency_map = {kw: 100 for kw in evergreen} # All common
result = tag_utils.normalize_keywords(evergreen, allowlist, frequency_map)
for kw in evergreen:
assert kw in result
def test_crossover_keywords_pruned(self):
"""Test that crossover-specific singletons are pruned."""
crossover_singletons = [
'Gae Bolg', # Final Fantasy
'Psychic Defense', # Warhammer 40K
'Allons-y!', # Doctor Who
'Flying' # Evergreen (control)
]
allowlist = {'Flying'} # Only Flying allowed
frequency_map = {
'Gae Bolg': 1,
'Psychic Defense': 1,
'Allons-y!': 1,
'Flying': 100
}
result = tag_utils.normalize_keywords(crossover_singletons, allowlist, frequency_map)
assert result == ['Flying'] # Only evergreen survived

View file

@ -0,0 +1,300 @@
"""Tests for M3 metadata/theme tag partition functionality.
Tests cover:
- Tag classification (metadata vs theme)
- Column creation and data migration
- Feature flag behavior
- Compatibility with missing columns
- CSV read/write with new schema
"""
import pandas as pd
import pytest
from code.tagging import tag_utils
from code.tagging.tagger import _apply_metadata_partition
class TestTagClassification:
"""Tests for classify_tag function."""
def test_prefix_based_metadata(self):
"""Metadata tags identified by prefix."""
assert tag_utils.classify_tag("Applied: Cost Reduction") == "metadata"
assert tag_utils.classify_tag("Bracket: Game Changer") == "metadata"
assert tag_utils.classify_tag("Diagnostic: Test") == "metadata"
assert tag_utils.classify_tag("Internal: Debug") == "metadata"
def test_exact_match_metadata(self):
"""Metadata tags identified by exact match."""
assert tag_utils.classify_tag("Bracket: Game Changer") == "metadata"
assert tag_utils.classify_tag("Bracket: Staple") == "metadata"
def test_kindred_protection_metadata(self):
"""Kindred protection tags are metadata."""
assert tag_utils.classify_tag("Knights Gain Protection") == "metadata"
assert tag_utils.classify_tag("Frogs Gain Protection") == "metadata"
assert tag_utils.classify_tag("Zombies Gain Protection") == "metadata"
def test_theme_classification(self):
"""Regular gameplay tags are themes."""
assert tag_utils.classify_tag("Card Draw") == "theme"
assert tag_utils.classify_tag("Spellslinger") == "theme"
assert tag_utils.classify_tag("Tokens Matter") == "theme"
assert tag_utils.classify_tag("Ramp") == "theme"
assert tag_utils.classify_tag("Protection") == "theme"
def test_edge_cases(self):
"""Edge cases in tag classification."""
# Empty string
assert tag_utils.classify_tag("") == "theme"
# Similar but not exact matches
assert tag_utils.classify_tag("Apply: Something") == "theme" # Wrong prefix
assert tag_utils.classify_tag("Knights Have Protection") == "theme" # Not "Gain"
# Case sensitivity
assert tag_utils.classify_tag("applied: Cost Reduction") == "theme" # Lowercase
class TestMetadataPartition:
"""Tests for _apply_metadata_partition function."""
def test_basic_partition(self, monkeypatch):
"""Basic partition splits tags correctly."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A', 'Card B'],
'themeTags': [
['Card Draw', 'Applied: Cost Reduction'],
['Spellslinger', 'Bracket: Game Changer', 'Tokens Matter']
]
})
df_out, diag = _apply_metadata_partition(df)
# Check theme tags
assert df_out.loc[0, 'themeTags'] == ['Card Draw']
assert df_out.loc[1, 'themeTags'] == ['Spellslinger', 'Tokens Matter']
# Check metadata tags
assert df_out.loc[0, 'metadataTags'] == ['Applied: Cost Reduction']
assert df_out.loc[1, 'metadataTags'] == ['Bracket: Game Changer']
# Check diagnostics
assert diag['enabled'] is True
assert diag['rows_with_tags'] == 2
assert diag['metadata_tags_moved'] == 2
assert diag['theme_tags_kept'] == 3
def test_empty_tags(self, monkeypatch):
"""Handles empty tag lists."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A', 'Card B'],
'themeTags': [[], ['Card Draw']]
})
df_out, diag = _apply_metadata_partition(df)
assert df_out.loc[0, 'themeTags'] == []
assert df_out.loc[0, 'metadataTags'] == []
assert df_out.loc[1, 'themeTags'] == ['Card Draw']
assert df_out.loc[1, 'metadataTags'] == []
assert diag['rows_with_tags'] == 1
def test_all_metadata_tags(self, monkeypatch):
"""Handles rows with only metadata tags."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Applied: Cost Reduction', 'Bracket: Game Changer']]
})
df_out, diag = _apply_metadata_partition(df)
assert df_out.loc[0, 'themeTags'] == []
assert df_out.loc[0, 'metadataTags'] == ['Applied: Cost Reduction', 'Bracket: Game Changer']
assert diag['metadata_tags_moved'] == 2
assert diag['theme_tags_kept'] == 0
def test_all_theme_tags(self, monkeypatch):
"""Handles rows with only theme tags."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Card Draw', 'Ramp', 'Spellslinger']]
})
df_out, diag = _apply_metadata_partition(df)
assert df_out.loc[0, 'themeTags'] == ['Card Draw', 'Ramp', 'Spellslinger']
assert df_out.loc[0, 'metadataTags'] == []
assert diag['metadata_tags_moved'] == 0
assert diag['theme_tags_kept'] == 3
def test_feature_flag_disabled(self, monkeypatch):
"""Feature flag disables partition."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '0')
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Card Draw', 'Applied: Cost Reduction']]
})
df_out, diag = _apply_metadata_partition(df)
# Should not create metadataTags column
assert 'metadataTags' not in df_out.columns
# Should not modify themeTags
assert df_out.loc[0, 'themeTags'] == ['Card Draw', 'Applied: Cost Reduction']
# Should indicate disabled
assert diag['enabled'] is False
def test_missing_theme_tags_column(self, monkeypatch):
"""Handles missing themeTags column gracefully."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A'],
'other_column': ['value']
})
df_out, diag = _apply_metadata_partition(df)
# Should return unchanged
assert 'themeTags' not in df_out.columns
assert 'metadataTags' not in df_out.columns
# Should indicate error
assert diag['enabled'] is True
assert 'error' in diag
def test_non_list_tags(self, monkeypatch):
"""Handles non-list values in themeTags."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A', 'Card B', 'Card C'],
'themeTags': [['Card Draw'], None, 'not a list']
})
df_out, diag = _apply_metadata_partition(df)
# Only first row should be processed
assert df_out.loc[0, 'themeTags'] == ['Card Draw']
assert df_out.loc[0, 'metadataTags'] == []
assert diag['rows_with_tags'] == 1
def test_kindred_protection_partition(self, monkeypatch):
"""Kindred protection tags are moved to metadata."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Protection', 'Knights Gain Protection', 'Card Draw']]
})
df_out, diag = _apply_metadata_partition(df)
assert 'Protection' in df_out.loc[0, 'themeTags']
assert 'Card Draw' in df_out.loc[0, 'themeTags']
assert 'Knights Gain Protection' in df_out.loc[0, 'metadataTags']
def test_diagnostics_structure(self, monkeypatch):
"""Diagnostics contain expected fields."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Card Draw', 'Applied: Cost Reduction']]
})
df_out, diag = _apply_metadata_partition(df)
# Check required diagnostic fields
assert 'enabled' in diag
assert 'total_rows' in diag
assert 'rows_with_tags' in diag
assert 'metadata_tags_moved' in diag
assert 'theme_tags_kept' in diag
assert 'unique_metadata_tags' in diag
assert 'unique_theme_tags' in diag
assert 'most_common_metadata' in diag
assert 'most_common_themes' in diag
# Check types
assert isinstance(diag['most_common_metadata'], list)
assert isinstance(diag['most_common_themes'], list)
class TestCSVCompatibility:
"""Tests for CSV read/write with new schema."""
def test_csv_roundtrip_with_metadata(self, tmp_path, monkeypatch):
"""CSV roundtrip preserves both columns."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
csv_path = tmp_path / "test_cards.csv"
# Create initial dataframe
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Card Draw', 'Ramp']],
'metadataTags': [['Applied: Cost Reduction']]
})
# Write to CSV
df.to_csv(csv_path, index=False)
# Read back
df_read = pd.read_csv(
csv_path,
converters={'themeTags': pd.eval, 'metadataTags': pd.eval}
)
# Verify data preserved
assert df_read.loc[0, 'themeTags'] == ['Card Draw', 'Ramp']
assert df_read.loc[0, 'metadataTags'] == ['Applied: Cost Reduction']
def test_csv_backward_compatible(self, tmp_path, monkeypatch):
"""Can read old CSVs without metadataTags."""
monkeypatch.setenv('TAG_METADATA_SPLIT', '1')
csv_path = tmp_path / "old_cards.csv"
# Create old-style CSV without metadataTags
df = pd.DataFrame({
'name': ['Card A'],
'themeTags': [['Card Draw', 'Applied: Cost Reduction']]
})
df.to_csv(csv_path, index=False)
# Read back
df_read = pd.read_csv(csv_path, converters={'themeTags': pd.eval})
# Should read successfully
assert 'themeTags' in df_read.columns
assert 'metadataTags' not in df_read.columns
assert df_read.loc[0, 'themeTags'] == ['Card Draw', 'Applied: Cost Reduction']
# Apply partition
df_partitioned, _ = _apply_metadata_partition(df_read)
# Should now have both columns
assert 'themeTags' in df_partitioned.columns
assert 'metadataTags' in df_partitioned.columns
assert df_partitioned.loc[0, 'themeTags'] == ['Card Draw']
assert df_partitioned.loc[0, 'metadataTags'] == ['Applied: Cost Reduction']
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View file

@ -0,0 +1,169 @@
"""
Tests for protection grant detection (M2).
Tests the ability to distinguish between cards that grant protection
and cards that have inherent protection.
"""
import pytest
from code.tagging.protection_grant_detection import (
is_granting_protection,
categorize_protection_card
)
class TestGrantDetection:
"""Test grant verb detection."""
def test_gains_hexproof(self):
"""Cards with 'gains hexproof' should be detected as granting."""
text = "Target creature gains hexproof until end of turn."
assert is_granting_protection(text, "")
def test_gives_indestructible(self):
"""Cards with 'gives indestructible' should be detected as granting."""
text = "This creature gives target creature indestructible."
assert is_granting_protection(text, "")
def test_creatures_you_control_have(self):
"""Mass grant pattern should be detected."""
text = "Creatures you control have hexproof."
assert is_granting_protection(text, "")
def test_equipped_creature_gets(self):
"""Equipment grant pattern should be detected."""
text = "Equipped creature gets +2/+2 and has indestructible."
assert is_granting_protection(text, "")
class TestInherentDetection:
"""Test inherent protection detection."""
def test_creature_with_hexproof_keyword(self):
"""Creature with hexproof keyword should not be detected as granting."""
text = "Hexproof (This creature can't be the target of spells or abilities.)"
keywords = "Hexproof"
assert not is_granting_protection(text, keywords)
def test_indestructible_artifact(self):
"""Artifact with indestructible keyword should not be detected as granting."""
text = "Indestructible"
keywords = "Indestructible"
assert not is_granting_protection(text, keywords)
def test_ward_creature(self):
"""Creature with Ward should not be detected as granting (unless it grants to others)."""
text = "Ward {2}"
keywords = "Ward"
assert not is_granting_protection(text, keywords)
class TestMixedCases:
"""Test cards that both grant and have protection."""
def test_creature_with_self_grant(self):
"""Creature that grants itself protection should be detected."""
text = "This creature gains indestructible until end of turn."
keywords = ""
assert is_granting_protection(text, keywords)
def test_equipment_with_inherent_and_grant(self):
"""Equipment with indestructible that grants protection."""
text = "Indestructible. Equipped creature has hexproof."
keywords = "Indestructible"
# Should be detected as granting because of "has hexproof"
assert is_granting_protection(text, keywords)
class TestExclusions:
"""Test exclusion patterns."""
def test_cant_have_hexproof(self):
"""Cards that prevent protection should not be tagged."""
text = "Creatures your opponents control can't have hexproof."
assert not is_granting_protection(text, "")
def test_loses_indestructible(self):
"""Cards that remove protection should not be tagged."""
text = "Target creature loses indestructible until end of turn."
assert not is_granting_protection(text, "")
class TestEdgeCases:
"""Test edge cases and special patterns."""
def test_protection_from_color(self):
"""Protection from [quality] in keywords without grant text."""
text = "Protection from red"
keywords = "Protection from red"
assert not is_granting_protection(text, keywords)
def test_empty_text(self):
"""Empty text should return False."""
assert not is_granting_protection("", "")
def test_none_text(self):
"""None text should return False."""
assert not is_granting_protection(None, "")
class TestCategorization:
"""Test full card categorization."""
def test_shell_shield_is_grant(self):
"""Shell Shield grants hexproof - should be Grant."""
text = "Target creature gets +0/+3 and gains hexproof until end of turn."
cat = categorize_protection_card("Shell Shield", text, "", "Instant")
assert cat == "Grant"
def test_geist_of_saint_traft_is_mixed(self):
"""Geist has hexproof and creates tokens - Mixed."""
text = "Hexproof. Whenever this attacks, create a token."
keywords = "Hexproof"
cat = categorize_protection_card("Geist", text, keywords, "Creature")
# Has hexproof keyword, so inherent
assert cat in ("Inherent", "Mixed")
def test_darksteel_brute_is_inherent(self):
"""Darksteel Brute has indestructible - should be Inherent."""
text = "Indestructible"
keywords = "Indestructible"
cat = categorize_protection_card("Darksteel Brute", text, keywords, "Artifact")
assert cat == "Inherent"
def test_scion_of_oona_is_grant(self):
"""Scion of Oona grants shroud to other faeries - should be Grant."""
text = "Other Faeries you control have shroud."
keywords = "Flying, Flash"
cat = categorize_protection_card("Scion of Oona", text, keywords, "Creature")
assert cat == "Grant"
class TestRealWorldCards:
"""Test against actual card samples from baseline audit."""
def test_bulwark_ox(self):
"""Bulwark Ox - grants hexproof and indestructible."""
text = "Sacrifice: Creatures you control with counters gain hexproof and indestructible"
assert is_granting_protection(text, "")
def test_bloodsworn_squire(self):
"""Bloodsworn Squire - grants itself indestructible."""
text = "This creature gains indestructible until end of turn"
assert is_granting_protection(text, "")
def test_kaldra_compleat(self):
"""Kaldra Compleat - equipment with indestructible that grants."""
text = "Indestructible. Equipped creature gets +5/+5 and has indestructible"
keywords = "Indestructible"
assert is_granting_protection(text, keywords)
def test_ward_sliver(self):
"""Ward Sliver - grants protection to all slivers."""
text = "All Slivers have protection from the chosen color"
assert is_granting_protection(text, "")
def test_rebbec(self):
"""Rebbec - grants protection to artifacts."""
text = "Artifacts you control have protection from each mana value"
assert is_granting_protection(text, "")

View file

@ -170,7 +170,7 @@ def _step5_summary_placeholder_html(token: int, *, message: str | None = None) -
return (
f'<div id="deck-summary" data-summary '
f'hx-get="/build/step5/summary?token={token}" '
'hx-trigger="load, step5:refresh from:body" hx-swap="outerHTML">'
'hx-trigger="step5:refresh from:body" hx-swap="outerHTML">'
f'<div class="muted" style="margin-top:1rem;">{_esc(text)}</div>'
'</div>'
)

View file

@ -159,11 +159,18 @@ def _read_csv_summary(csv_path: Path) -> Tuple[dict, Dict[str, int], Dict[str, i
# Type counts/cards (exclude commander entry from distribution)
if not is_commander:
type_counts[cat] = type_counts.get(cat, 0) + cnt
# M5: Extract metadata tags column if present
metadata_tags_raw = ''
metadata_idx = headers.index('MetadataTags') if 'MetadataTags' in headers else -1
if metadata_idx >= 0 and metadata_idx < len(row):
metadata_tags_raw = row[metadata_idx] or ''
metadata_tags_list = [t.strip() for t in metadata_tags_raw.split(';') if t.strip()]
type_cards.setdefault(cat, []).append({
'name': name,
'count': cnt,
'role': role,
'tags': tags_list,
'metadata_tags': metadata_tags_list, # M5: Include metadata tags
})
# Curve

View file

@ -900,7 +900,7 @@ def ideal_labels() -> Dict[str, str]:
'removal': 'Spot Removal',
'wipes': 'Board Wipes',
'card_advantage': 'Card Advantage',
'protection': 'Protection',
'protection': 'Protective Effects',
}
@ -1181,6 +1181,9 @@ def _ensure_setup_ready(out, force: bool = False) -> None:
# Only flip phase if previous run finished
if st.get('phase') in {'themes','themes-fast'}:
st['phase'] = 'done'
# Also ensure percent is 100 when done
if st.get('finished_at'):
st['percent'] = 100
with open(status_path, 'w', encoding='utf-8') as _wf:
json.dump(st, _wf)
except Exception:
@ -1463,16 +1466,17 @@ def _ensure_setup_ready(out, force: bool = False) -> None:
except Exception:
pass
# Unconditional fallback: if (for any reason) no theme export ran above, perform a fast-path export now.
# This guarantees that clicking Run Setup/Tagging always leaves themes current even when tagging wasn't needed.
# Conditional fallback: only run theme export if refresh_needed was True but somehow no export performed.
# This avoids repeated exports when setup is already complete and _ensure_setup_ready is called again.
try:
if not theme_export_performed:
if not theme_export_performed and refresh_needed:
_refresh_theme_catalog(out, force=False, fast_path=True)
except Exception:
pass
else: # If export just ran (either earlier or via fallback), ensure enrichment ran (safety double-call guard inside helper)
try:
_run_theme_metadata_enrichment(out)
if theme_export_performed or refresh_needed:
_run_theme_metadata_enrichment(out)
except Exception:
pass
@ -1907,7 +1911,7 @@ def _make_stages(b: DeckBuilder) -> List[Dict[str, Any]]:
("removal", "Confirm Removal", "add_removal"),
("wipes", "Confirm Board Wipes", "add_board_wipes"),
("card_advantage", "Confirm Card Advantage", "add_card_advantage"),
("protection", "Confirm Protection", "add_protection"),
("protection", "Confirm Protective Effects", "add_protection"),
]
any_granular = any(callable(getattr(b, rn, None)) for _key, _label, rn in spell_categories)
if any_granular:

View file

@ -309,7 +309,8 @@
.catch(function(){ /* noop */ });
} catch(e) {}
}
setInterval(pollStatus, 3000);
// Poll every 10 seconds instead of 3 to reduce server load (only for header indicator)
setInterval(pollStatus, 10000);
pollStatus();
// Health indicator poller
@ -1011,6 +1012,7 @@
var role = (attr('data-role')||'').trim();
var reasonsRaw = attr('data-reasons')||'';
var tagsRaw = attr('data-tags')||'';
var metadataTagsRaw = attr('data-metadata-tags')||''; // M5: Extract metadata tags
var reasonsRaw = attr('data-reasons')||'';
var roleEl = panel.querySelector('.hcp-role');
var hasFlip = !!card.querySelector('.dfc-toggle');
@ -1115,6 +1117,14 @@
tagsEl.style.display = 'none';
} else {
var tagText = allTags.map(displayLabel).join(', ');
// M5: Temporarily append metadata tags for debugging
if(metadataTagsRaw && metadataTagsRaw.trim()){
var metaTags = metadataTagsRaw.split(',').map(function(t){return t.trim();}).filter(Boolean);
if(metaTags.length){
var metaText = metaTags.map(displayLabel).join(', ');
tagText = tagText ? (tagText + ' | META: ' + metaText) : ('META: ' + metaText);
}
}
tagsEl.textContent = tagText;
tagsEl.style.display = tagText ? '' : 'none';
}

View file

@ -462,11 +462,12 @@
<!-- controls now above -->
{% if allow_must_haves %}
{% include "partials/include_exclude_summary.html" with oob=False %}
{% set oob = False %}
{% include "partials/include_exclude_summary.html" %}
{% endif %}
<div id="deck-summary" data-summary
hx-get="/build/step5/summary?token={{ summary_token }}"
hx-trigger="load, step5:refresh from:body"
hx-trigger="load once, step5:refresh from:body"
hx-swap="outerHTML">
<div class="muted" style="margin-top:1rem;">
{% if summary_ready %}Loading deck summary…{% else %}Deck summary will appear after the build completes.{% endif %}

View file

@ -74,7 +74,7 @@
{% set owned = (owned_set is defined and c.name and (c.name|lower in owned_set)) %}
<span class="count">{{ cnt }}</span>
<span class="times">x</span>
<span class="name dfc-anchor" title="{{ c.name }}" data-card-name="{{ c.name }}" data-count="{{ cnt }}" data-role="{{ c.role }}" data-tags="{{ (c.tags|map('trim')|join(', ')) if c.tags else '' }}"{% if overlaps %} data-overlaps="{{ overlaps|join(', ') }}"{% endif %}>{{ c.name }}</span>
<span class="name dfc-anchor" title="{{ c.name }}" data-card-name="{{ c.name }}" data-count="{{ cnt }}" data-role="{{ c.role }}" data-tags="{{ (c.tags|map('trim')|join(', ')) if c.tags else '' }}"{% if c.metadata_tags %} data-metadata-tags="{{ (c.metadata_tags|map('trim')|join(', ')) }}"{% endif %}{% if overlaps %} data-overlaps="{{ overlaps|join(', ') }}"{% endif %}>{{ c.name }}</span>
<span class="flip-slot" aria-hidden="true">
{% if c.dfc_land %}
<span class="dfc-land-chip {% if c.dfc_adds_extra_land %}extra{% else %}counts{% endif %}" title="{{ c.dfc_note or 'Modal double-faced land' }}">DFC land{% if c.dfc_adds_extra_land %} +1{% endif %}</span>

View file

@ -127,7 +127,8 @@
.then(update)
.catch(function(){});
}
setInterval(poll, 3000);
// Poll every 5 seconds instead of 3 to reduce server load
setInterval(poll, 5000);
poll();
})();
</script>

File diff suppressed because it is too large Load diff

View file

@ -99,6 +99,12 @@ services:
WEB_AUTO_REFRESH_DAYS: "7" # Refresh cards.csv if older than N days; 0=never
WEB_TAG_PARALLEL: "1" # 1=parallelize tagging
WEB_TAG_WORKERS: "4" # Worker count when parallel tagging
# Tagging Refinement Feature Flags
TAG_NORMALIZE_KEYWORDS: "1" # 1=normalize keywords & filter specialty mechanics (recommended)
TAG_PROTECTION_GRANTS: "1" # 1=Protection tag only for cards granting shields (recommended)
TAG_METADATA_SPLIT: "1" # 1=separate metadata tags from themes in CSVs (recommended)
THEME_CATALOG_MODE: "merge" # Use merged Phase B catalog builder (with YAML export)
THEME_YAML_FAST_SKIP: "0" # 1=allow skipping per-theme YAML on fast path (rare; default always export)
# Live YAML scan interval in seconds for change detection (dev convenience)

View file

@ -101,6 +101,12 @@ services:
WEB_AUTO_REFRESH_DAYS: "7" # Refresh cards.csv if older than N days; 0=never
WEB_TAG_PARALLEL: "1" # 1=parallelize tagging
WEB_TAG_WORKERS: "4" # Worker count when parallel tagging
# Tagging Refinement Feature Flags
TAG_NORMALIZE_KEYWORDS: "1" # 1=normalize keywords & filter specialty mechanics (recommended)
TAG_PROTECTION_GRANTS: "1" # 1=Protection tag only for cards granting shields (recommended)
TAG_METADATA_SPLIT: "1" # 1=separate metadata tags from themes in CSVs (recommended)
THEME_CATALOG_MODE: "merge" # Use merged Phase B catalog builder (with YAML export)
THEME_YAML_FAST_SKIP: "0" # 1=allow skipping per-theme YAML on fast path (rare; default always export)
# Live YAML scan interval in seconds for change detection (dev convenience)