maintenance: cleaned up, consolidated, and refined codebase for tagging

This commit is contained in:
matt 2025-10-12 21:26:37 -07:00
parent f2863ef362
commit 0dd5b4cf64
20 changed files with 3191 additions and 2816 deletions

View file

@ -19,6 +19,9 @@ This format follows Keep a Changelog principles and aims for Semantic Versioning
- Intelligent deck builder filtering includes board-relevant protection while excluding self-only and type-specific cards - Intelligent deck builder filtering includes board-relevant protection while excluding self-only and type-specific cards
- Tiered pool limiting focuses on high-quality staples while maintaining variety across builds - Tiered pool limiting focuses on high-quality staples while maintaining variety across builds
- Improved scope tagging for cards with keyword-only protection effects (no grant text, just inherent keywords) - Improved scope tagging for cards with keyword-only protection effects (no grant text, just inherent keywords)
- **Tagging Module Refactoring**: Large-scale refactor to improve code quality and maintainability
- Centralized regex patterns, extracted reusable utilities, decomposed complex functions
- Improved code organization and readability while maintaining 100% tagging accuracy
### Added ### Added
- Metadata partition system separates diagnostic tags from gameplay themes in card data - Metadata partition system separates diagnostic tags from gameplay themes in card data
@ -42,11 +45,13 @@ This format follows Keep a Changelog principles and aims for Semantic Versioning
- Setup progress polling reduced from 3s to 5-10s intervals for better performance - Setup progress polling reduced from 3s to 5-10s intervals for better performance
- Theme catalog streamlined from 753 to 736 themes (-2.3%) with improved quality - Theme catalog streamlined from 753 to 736 themes (-2.3%) with improved quality
- Protection tag refined to focus on 329 cards that grant shields (down from 1,166 with inherent effects) - Protection tag refined to focus on 329 cards that grant shields (down from 1,166 with inherent effects)
- Protection tag renamed to "Protective Effects" throughout web interface to avoid confusion with the Magic keyword "protection"
- Theme catalog automatically excludes metadata tags from theme suggestions - Theme catalog automatically excludes metadata tags from theme suggestions
- Grant detection now strips reminder text before pattern matching to avoid false positives - Grant detection now strips reminder text before pattern matching to avoid false positives
- Deck builder protection phase now filters by scope metadata: includes "Your Permanents:", excludes "Self:" protection - Deck builder protection phase now filters by scope metadata: includes "Your Permanents:", excludes "Self:" protection
- Protection card selection now randomized per build for variety (using seeded RNG when deterministic mode enabled) - Protection card selection now randomized per build for variety (using seeded RNG when deterministic mode enabled)
- Protection pool now limited to ~40-50 high-quality cards (tiered selection: top 3x target + random 10-20 extras) - Protection pool now limited to ~40-50 high-quality cards (tiered selection: top 3x target + random 10-20 extras)
- Tagging module imports standardized with consistent organization and centralized constants
### Fixed ### Fixed
- Setup progress now shows 100% completion instead of getting stuck at 99% - Setup progress now shows 100% completion instead of getting stuck at 99%
@ -63,6 +68,9 @@ This format follows Keep a Changelog principles and aims for Semantic Versioning
- Cloak of Invisibility, Teferi's Curse now get "Your Permanents: Phasing" tags - Cloak of Invisibility, Teferi's Curse now get "Your Permanents: Phasing" tags
- Shimmer now gets "Blanket: Phasing" tag for chosen type effect - Shimmer now gets "Blanket: Phasing" tag for chosen type effect
- King of the Oathbreakers now gets "Self: Phasing" tag for reactive trigger - King of the Oathbreakers now gets "Self: Phasing" tag for reactive trigger
- Cards with static keywords (Protection, Hexproof, Ward, Indestructible) in their keywords field now get proper scope metadata tags
- Cards with X in their mana cost now properly identified and tagged with "X Spells" theme for better deck building accuracy
- Card tagging system enhanced with smarter pattern detection and more consistent categorization
## [2.5.2] - 2025-10-08 ## [2.5.2] - 2025-10-08
### Summary ### Summary

View file

@ -101,13 +101,49 @@ Execute saved configs without manual input.
Refresh data and caches when formats shift. Refresh data and caches when formats shift.
- Runs card downloads, CSV regeneration, smart tagging (keywords + protection grants), and commander catalog rebuilds. - Runs card downloads, CSV regeneration, smart tagging (keywords + protection grants), and commander catalog rebuilds.
- Controlled by `SHOW_SETUP=1` (on by default in compose). - Controlled by `SHOW_SETUP=1` (on by default in compose).
- Force a rebuild manually: - **Force a full rebuild (setup + tagging)**:
```powershell ```powershell
docker compose run --rm --entrypoint bash web -lc "python -m code.file_setup.setup" # Docker:
docker compose run --rm web python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging()"
# Local (with venv activated):
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging()"
# With parallel processing (faster):
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging(parallel=True)"
# With parallel processing and custom worker count:
python -c "from code.file_setup.setup import initial_setup; from code.tagging.tagger import run_tagging; initial_setup(); run_tagging(parallel=True, max_workers=4)"
``` ```
- Rebuild only the commander catalog: - **Rebuild only CSVs without tagging**:
```powershell ```powershell
docker compose run --rm --entrypoint bash web -lc "python -m code.scripts.refresh_commander_catalog" # Docker:
docker compose run --rm web python -c "from code.file_setup.setup import initial_setup; initial_setup()"
# Local:
python -c "from code.file_setup.setup import initial_setup; initial_setup()"
```
- **Run only tagging (CSVs must exist)**:
```powershell
# Docker:
docker compose run --rm web python -c "from code.tagging.tagger import run_tagging; run_tagging()"
# Local:
python -c "from code.tagging.tagger import run_tagging; run_tagging()"
# With parallel processing (faster):
python -c "from code.tagging.tagger import run_tagging; run_tagging(parallel=True)"
# With parallel processing and custom worker count:
python -c "from code.tagging.tagger import run_tagging; run_tagging(parallel=True, max_workers=4)"
```
- **Rebuild only the commander catalog**:
```powershell
# Docker:
docker compose run --rm web python -m code.scripts.refresh_commander_catalog
# Local:
python -m code.scripts.refresh_commander_catalog
``` ```
### Owned Library ### Owned Library

View file

@ -1,61 +1,66 @@
# MTG Python Deckbuilder ${VERSION} # MTG Python Deckbuilder ${VERSION}
## [Unreleased] ## [Unreleased]
### Summary ### Summary
- Card tagging improvements separate gameplay themes from internal metadata for cleaner deck building - Card tagging system improvements split metadata from gameplay themes for cleaner deck building experience
- Keyword cleanup reduces specialty keyword noise by 96% while keeping important mechanics - Keyword normalization reduces specialty keyword noise by 96% while maintaining theme catalog quality
- Protection tag now highlights cards that grant shields to your board, not just inherent protection - Protection tag now focuses on cards that grant shields to others, not just those with inherent protection
- **Protection System Overhaul**: Smarter card detection, scope-aware filtering, and focused pool selection deliver consistent, high-quality protection card recommendations - Web UI improvements: faster polling, fixed progress display, and theme refresh stability
- Deck builder distinguishes between board-wide protection and self-only effects using fine-grained metadata - **Protection System Overhaul**: Comprehensive enhancement to protection card detection, classification, and deck building
- Intelligent pool limiting focuses on high-quality staples while maintaining variety across builds - Fine-grained scope metadata distinguishes self-protection from board-wide effects ("Your Permanents: Hexproof" vs "Self: Hexproof")
- Scope-aware filtering automatically excludes self-protection and type-specific cards that don't match your deck - Enhanced grant detection with Equipment/Aura patterns, phasing support, and complex trigger handling
- Enhanced detection handles Equipment, Auras, phasing effects, and complex triggers correctly - Intelligent deck builder filtering includes board-relevant protection while excluding self-only and type-specific cards
- Web UI responsiveness upgrades with smarter caching and streamlined loading - Tiered pool limiting focuses on high-quality staples while maintaining variety across builds
- Improved scope tagging for cards with keyword-only protection effects (no grant text, just inherent keywords)
- **Tagging Module Refactoring**: Large-scale refactor to improve code quality and maintainability
- Centralized regex patterns, extracted reusable utilities, decomposed complex functions
- Improved code organization and readability while maintaining 100% tagging accuracy
### Added ### Added
- Metadata partition keeps internal tags separate from gameplay themes - Metadata partition system separates diagnostic tags from gameplay themes in card data
- Keyword normalization filters out one-off specialty mechanics while keeping evergreen abilities - Keyword normalization system with smart filtering of one-off specialty mechanics
- Protection grant detection identifies cards that give Hexproof, Ward, or other shields to your permanents - Allowlist preserves important keywords like Flying, Myriad, and Transform
- Creature-type-specific protection automatically tagged (e.g., "Knights Gain Protection" for tribal strategies) - Protection grant detection identifies cards that give Hexproof, Ward, or Indestructible to other permanents
- Protection scope filtering (feature flag: `TAG_PROTECTION_SCOPE`) automatically excludes self-only protection like Svyelun - Automatic tagging for creature-type-specific protection (e.g., "Knights Gain Protection")
- Phasing cards with protective effects now included in protection pool (e.g., cards that phase out your permanents) - New `metadataTags` column in card data for bracket annotations and internal diagnostics
- Debug mode: Hover over cards to see metadata tags showing protection scope (e.g., "Your Permanents: Hexproof") - Static phasing keyword detection from keywords field (catches creatures like Breezekeeper)
- Skeleton placeholders with smart timing across build wizard and commander catalog - "Other X you control have Y" protection pattern for static ability grants
- Must-have toggle API with telemetry tracking for include/exclude interactions - "Enchanted creature has phasing" pattern detection
- Commander catalog lazy-loads art and caches frequently accessed views - Chosen type blanket phasing patterns
- Collapsible sections for mana analytics defer loading until expanded - Complex trigger phasing patterns (reactive, consequent, end-of-turn)
- Click-to-pin chart tooltips for easier card comparisons - Protection scope filtering in deck builder (feature flag: `TAG_PROTECTION_SCOPE`) intelligently selects board-relevant protection
- Virtualized card lists handle large decks smoothly - Phasing cards with "Your Permanents:" or "Targeted:" metadata now tagged as Protection and included in protection pool
- Metadata tags temporarily visible in card hover previews for debugging (shows scope like "Your Permanents: Hexproof")
### Changed ### Changed
- Card tags now split between themes (for deck building) and metadata (for diagnostics) - Card tags now split between themes (for deck building) and metadata (for diagnostics)
- Keywords consolidate variants (e.g., "Commander ninjutsu" → "Ninjutsu") for consistent theme matching - Keywords now consolidate variants (e.g., "Commander ninjutsu" becomes "Ninjutsu")
- Protection tag refined to focus on shield-granting cards (329 cards vs 1,166 previously) - Setup progress polling reduced from 3s to 5-10s intervals for better performance
- Deck builder protection phase filters by scope: includes "Your Permanents:", excludes "Self:" protection - Theme catalog streamlined from 753 to 736 themes (-2.3%) with improved quality
- Protection card selection randomized for variety across builds (deterministic when using seeded mode) - Protection tag refined to focus on 329 cards that grant shields (down from 1,166 with inherent effects)
- Theme catalog streamlined with improved quality (736 themes, down 2.3%) - Protection tag renamed to "Protective Effects" throughout web interface to avoid confusion with the Magic keyword "protection"
- Theme catalog automatically excludes metadata tags from suggestions - Theme catalog automatically excludes metadata tags from theme suggestions
- Commander search and theme picker share intelligent debounce to prevent redundant requests - Grant detection now strips reminder text before pattern matching to avoid false positives
- Include/exclude buttons respond immediately with optimistic updates - Deck builder protection phase now filters by scope metadata: includes "Your Permanents:", excludes "Self:" protection
- Commander catalog default view loads from cache for sub-200ms response times - Protection card selection now randomized per build for variety (using seeded RNG when deterministic mode enabled)
- Deck review loads in focused chunks for faster initial page loads - Protection pool now limited to ~40-50 high-quality cards (tiered selection: top 3x target + random 10-20 extras)
- Chart hover zones expanded for easier interaction - Tagging module imports standardized with consistent organization and centralized constants
### Fixed ### Fixed
### Fixed - Setup progress now shows 100% completion instead of getting stuck at 99%
- Setup progress correctly displays 100% upon completion - Theme catalog no longer continuously regenerates after setup completes
- Theme catalog refresh stability improved after initial setup - Health indicator polling optimized to reduce server load
- Server polling optimized for reduced load - Protection detection now correctly excludes creatures with only inherent keywords
- Protection detection accurately filters inherent vs granted effects - Dive Down, Glint no longer falsely identified as granting to opponents (reminder text fix)
- Protection scope detection improvements for 11+ cards: - Drogskol Captain, Haytham Kenway now correctly get "Your Permanents" scope tags
- Dive Down, Glint no longer falsely marked as opponent grants (reminder text now stripped) - 7 cards with static Phasing keyword now properly detected (Breezekeeper, Teferi's Drake, etc.)
- Drogskol Captain and similar cards with "Other X you control have Y" patterns now tagged correctly - Type-specific protection grants (e.g., "Knights Gain Indestructible") now correctly excluded from general protection pool
- 7 cards with static Phasing keyword now detected (Breezekeeper, Teferi's Drake, etc.) - Protection scope filter now properly prioritizes exclusions over inclusions (fixes Knight Exemplar in non-Knight decks)
- Cloak of Invisibility and Teferi's Curse now get "Your Permanents: Phasing" tags - Inherent protection cards (Aysen Highway, Phantom Colossus, etc.) now correctly get "Self: Protection" metadata tags
- Shimmer now gets "Blanket: Phasing" for chosen type effect - Scope tagging now applies to ALL cards with protection effects, not just grant cards
- King of the Oathbreakers reactive trigger now properly detected - Cloak of Invisibility, Teferi's Curse now get "Your Permanents: Phasing" tags
- Type-specific protection (Knight Exemplar, Timber Protector) no longer added to non-matching decks - Shimmer now gets "Blanket: Phasing" tag for chosen type effect
- Deck builder correctly excludes "Self:" protection cards (e.g., Svyelun) from protection pool - King of the Oathbreakers now gets "Self: Phasing" tag for reactive trigger
- Inherent protection cards (Aysen Highway, Phantom Colossus) now correctly receive scope metadata tags - Cards with static keywords (Protection, Hexproof, Ward, Indestructible) in their keywords field now get proper scope metadata tags
- Protection pool now intelligently limited to focus on high-quality, relevant cards for your deck - Cards with X in their mana cost now properly identified and tagged with "X Spells" theme for better deck building accuracy
- Card tagging system enhanced with smarter pattern detection and more consistent categorization

View file

@ -1 +0,0 @@
=\ 1\; & \c:/Users/Matt/mtg_python/mtg_python_deckbuilder/.venv/Scripts/python.exe\ code/scripts/build_theme_catalog.py --output config/themes/theme_list_tmp.json

View file

@ -438,7 +438,7 @@ DEFAULT_REMOVAL_COUNT: Final[int] = 10 # Default number of spot removal spells
DEFAULT_WIPES_COUNT: Final[int] = 2 # Default number of board wipes DEFAULT_WIPES_COUNT: Final[int] = 2 # Default number of board wipes
DEFAULT_CARD_ADVANTAGE_COUNT: Final[int] = 10 # Default number of card advantage pieces DEFAULT_CARD_ADVANTAGE_COUNT: Final[int] = 10 # Default number of card advantage pieces
DEFAULT_PROTECTION_COUNT: Final[int] = 8 # Default number of protection spells DEFAULT_PROTECTION_COUNT: Final[int] = 8 # Default number of protective effects (hexproof, indestructible, protection, ward, etc.)
# Deck composition prompts # Deck composition prompts
DECK_COMPOSITION_PROMPTS: Final[Dict[str, str]] = { DECK_COMPOSITION_PROMPTS: Final[Dict[str, str]] = {
@ -450,7 +450,7 @@ DECK_COMPOSITION_PROMPTS: Final[Dict[str, str]] = {
'removal': 'Enter desired number of spot removal spells (default: 10):', 'removal': 'Enter desired number of spot removal spells (default: 10):',
'wipes': 'Enter desired number of board wipes (default: 2):', 'wipes': 'Enter desired number of board wipes (default: 2):',
'card_advantage': 'Enter desired number of card advantage pieces (default: 10):', 'card_advantage': 'Enter desired number of card advantage pieces (default: 10):',
'protection': 'Enter desired number of protection spells (default: 8):', 'protection': 'Enter desired number of protective effects (default: 8):',
'max_deck_price': 'Enter maximum total deck price in dollars (default: 400.0):', 'max_deck_price': 'Enter maximum total deck price in dollars (default: 400.0):',
'max_card_price': 'Enter maximum price per card in dollars (default: 20.0):' 'max_card_price': 'Enter maximum price per card in dollars (default: 20.0):'
} }
@ -511,7 +511,7 @@ DEFAULT_THEME_TAGS = [
'Combat Matters', 'Control', 'Counters Matter', 'Energy', 'Combat Matters', 'Control', 'Counters Matter', 'Energy',
'Enter the Battlefield', 'Equipment', 'Exile Matters', 'Infect', 'Enter the Battlefield', 'Equipment', 'Exile Matters', 'Infect',
'Interaction', 'Lands Matter', 'Leave the Battlefield', 'Legends Matter', 'Interaction', 'Lands Matter', 'Leave the Battlefield', 'Legends Matter',
'Life Matters', 'Mill', 'Monarch', 'Protection', 'Ramp', 'Reanimate', 'Life Matters', 'Mill', 'Monarch', 'Protective Effects', 'Ramp', 'Reanimate',
'Removal', 'Sacrifice Matters', 'Spellslinger', 'Stax', 'Superfriends', 'Removal', 'Sacrifice Matters', 'Spellslinger', 'Stax', 'Superfriends',
'Theft', 'Token Creation', 'Tokens Matter', 'Voltron', 'X Spells' 'Theft', 'Token Creation', 'Tokens Matter', 'Voltron', 'X Spells'
] ]

View file

@ -1,9 +1,11 @@
from __future__ import annotations from __future__ import annotations
# Standard library imports
import json import json
from pathlib import Path from pathlib import Path
from typing import Dict, Iterable, Set from typing import Dict, Iterable, Set
# Third-party imports
import pandas as pd import pandas as pd
def _ensure_norm_series(df: pd.DataFrame, source_col: str, norm_col: str) -> pd.Series: def _ensure_norm_series(df: pd.DataFrame, source_col: str, norm_col: str) -> pd.Series:

View file

@ -1,9 +1,11 @@
from __future__ import annotations from __future__ import annotations
# Standard library imports
import json
from pathlib import Path from pathlib import Path
from typing import List, Optional from typing import List, Optional
import json # Third-party imports
from pydantic import BaseModel, Field from pydantic import BaseModel, Field

View file

@ -1,14 +1,17 @@
from __future__ import annotations from __future__ import annotations
import json # Standard library imports
import ast import ast
import json
from collections import defaultdict
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from typing import Dict, List, Set, DefaultDict from typing import DefaultDict, Dict, List, Set
from collections import defaultdict
# Third-party imports
import pandas as pd import pandas as pd
# Local application imports
from settings import CSV_DIRECTORY, SETUP_COLORS from settings import CSV_DIRECTORY, SETUP_COLORS

View file

@ -73,6 +73,132 @@ def load_merge_summary() -> Dict[str, Any]:
return {"updated_at": None, "colors": {}} return {"updated_at": None, "colors": {}}
def _merge_tag_columns(work_df: pd.DataFrame, group_sorted: pd.DataFrame, primary_idx: int) -> None:
"""Merge list columns (themeTags, roleTags) into union values.
Args:
work_df: Working DataFrame to update
group_sorted: Sorted group of faces for a multi-face card
primary_idx: Index of primary face to update
"""
for column in _LIST_UNION_COLUMNS:
if column in group_sorted.columns:
union_values = _merge_object_lists(group_sorted[column])
work_df.at[primary_idx, column] = union_values
if "keywords" in group_sorted.columns:
keyword_union = _merge_keywords(group_sorted["keywords"])
work_df.at[primary_idx, "keywords"] = _join_keywords(keyword_union)
def _build_face_payload(face_row: pd.Series) -> Dict[str, Any]:
"""Build face metadata payload from a single face row.
Args:
face_row: Single face row from grouped DataFrame
Returns:
Dictionary containing face metadata
"""
text_val = face_row.get("text") or face_row.get("oracleText") or ""
mana_cost_val = face_row.get("manaCost", face_row.get("mana_cost", "")) or ""
mana_value_raw = face_row.get("manaValue", face_row.get("mana_value", ""))
try:
if mana_value_raw in (None, ""):
mana_value_val = None
else:
mana_value_val = float(mana_value_raw)
if math.isnan(mana_value_val):
mana_value_val = None
except Exception:
mana_value_val = None
type_val = face_row.get("type", "") or ""
return {
"face": str(face_row.get("faceName") or face_row.get("name") or ""),
"side": str(face_row.get("side") or ""),
"layout": str(face_row.get("layout") or ""),
"themeTags": _merge_object_lists([face_row.get("themeTags", [])]),
"roleTags": _merge_object_lists([face_row.get("roleTags", [])]),
"type": str(type_val),
"text": str(text_val),
"mana_cost": str(mana_cost_val),
"mana_value": mana_value_val,
"produces_mana": _text_produces_mana(text_val),
"is_land": 'land' in str(type_val).lower(),
}
def _build_merge_detail(name: str, group_sorted: pd.DataFrame, faces_payload: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Build detailed merge information for a multi-face card group.
Args:
name: Card name
group_sorted: Sorted group of faces
faces_payload: List of face metadata dictionaries
Returns:
Dictionary containing merge details
"""
layout_set = sorted({f.get("layout", "") for f in faces_payload if f.get("layout")})
removed_faces = faces_payload[1:] if len(faces_payload) > 1 else []
return {
"name": name,
"total_faces": len(group_sorted),
"dropped_faces": max(len(group_sorted) - 1, 0),
"layouts": layout_set,
"primary_face": faces_payload[0] if faces_payload else {},
"removed_faces": removed_faces,
"theme_tags": sorted({tag for face in faces_payload for tag in face.get("themeTags", [])}),
"role_tags": sorted({tag for face in faces_payload for tag in face.get("roleTags", [])}),
"faces": faces_payload,
}
def _log_merge_summary(color: str, merged_count: int, drop_count: int, multi_face_count: int, logger) -> None:
"""Log merge summary with structured and human-readable formats.
Args:
color: Color being processed
merged_count: Number of card groups merged
drop_count: Number of face rows dropped
multi_face_count: Total multi-face rows processed
logger: Logger instance
"""
try:
logger.info(
"dfc_merge_summary %s",
json.dumps(
{
"event": "dfc_merge_summary",
"color": color,
"groups_merged": merged_count,
"faces_dropped": drop_count,
"multi_face_rows": multi_face_count,
},
sort_keys=True,
),
)
except Exception:
logger.info(
"dfc_merge_summary event=%s groups=%d dropped=%d rows=%d",
color,
merged_count,
drop_count,
multi_face_count,
)
logger.info(
"Merged %d multi-face card groups for %s (dropped %d extra faces)",
merged_count,
color,
drop_count,
)
def merge_multi_face_rows( def merge_multi_face_rows(
df: pd.DataFrame, df: pd.DataFrame,
color: str, color: str,
@ -93,7 +219,6 @@ def merge_multi_face_rows(
return df return df
work_df = df.copy() work_df = df.copy()
layout_series = work_df["layout"].fillna("").astype(str).str.lower() layout_series = work_df["layout"].fillna("").astype(str).str.lower()
multi_mask = layout_series.isin(_MULTI_FACE_LAYOUTS) multi_mask = layout_series.isin(_MULTI_FACE_LAYOUTS)
@ -110,66 +235,15 @@ def merge_multi_face_rows(
group_sorted = _sort_faces(group) group_sorted = _sort_faces(group)
primary_idx = group_sorted.index[0] primary_idx = group_sorted.index[0]
faces_payload: List[Dict[str, Any]] = []
for column in _LIST_UNION_COLUMNS: _merge_tag_columns(work_df, group_sorted, primary_idx)
if column in group_sorted.columns:
union_values = _merge_object_lists(group_sorted[column])
work_df.at[primary_idx, column] = union_values
if "keywords" in group_sorted.columns: faces_payload = [_build_face_payload(row) for _, row in group_sorted.iterrows()]
keyword_union = _merge_keywords(group_sorted["keywords"])
work_df.at[primary_idx, "keywords"] = _join_keywords(keyword_union)
for _, face_row in group_sorted.iterrows():
text_val = face_row.get("text") or face_row.get("oracleText") or ""
mana_cost_val = face_row.get("manaCost", face_row.get("mana_cost", "")) or ""
mana_value_raw = face_row.get("manaValue", face_row.get("mana_value", ""))
try:
if mana_value_raw in (None, ""):
mana_value_val = None
else:
mana_value_val = float(mana_value_raw)
if math.isnan(mana_value_val):
mana_value_val = None
except Exception:
mana_value_val = None
type_val = face_row.get("type", "") or ""
faces_payload.append(
{
"face": str(face_row.get("faceName") or face_row.get("name") or ""),
"side": str(face_row.get("side") or ""),
"layout": str(face_row.get("layout") or ""),
"themeTags": _merge_object_lists([face_row.get("themeTags", [])]),
"roleTags": _merge_object_lists([face_row.get("roleTags", [])]),
"type": str(type_val),
"text": str(text_val),
"mana_cost": str(mana_cost_val),
"mana_value": mana_value_val,
"produces_mana": _text_produces_mana(text_val),
"is_land": 'land' in str(type_val).lower(),
}
)
for idx in group_sorted.index[1:]:
drop_indices.append(idx)
drop_indices.extend(group_sorted.index[1:])
merged_count += 1 merged_count += 1
layout_set = sorted({f.get("layout", "") for f in faces_payload if f.get("layout")}) merge_details.append(_build_merge_detail(name, group_sorted, faces_payload))
removed_faces = faces_payload[1:] if len(faces_payload) > 1 else []
merge_details.append(
{
"name": name,
"total_faces": len(group_sorted),
"dropped_faces": max(len(group_sorted) - 1, 0),
"layouts": layout_set,
"primary_face": faces_payload[0] if faces_payload else {},
"removed_faces": removed_faces,
"theme_tags": sorted({tag for face in faces_payload for tag in face.get("themeTags", [])}),
"role_tags": sorted({tag for face in faces_payload for tag in face.get("roleTags", [])}),
"faces": faces_payload,
}
)
if drop_indices: if drop_indices:
work_df = work_df.drop(index=drop_indices) work_df = work_df.drop(index=drop_indices)
@ -192,38 +266,10 @@ def merge_multi_face_rows(
logger.warning("Failed to record DFC merge summary for %s: %s", color, exc) logger.warning("Failed to record DFC merge summary for %s: %s", color, exc)
if logger is not None: if logger is not None:
try: _log_merge_summary(color, merged_count, len(drop_indices), int(multi_mask.sum()), logger)
logger.info(
"dfc_merge_summary %s",
json.dumps(
{
"event": "dfc_merge_summary",
"color": color,
"groups_merged": merged_count,
"faces_dropped": len(drop_indices),
"multi_face_rows": int(multi_mask.sum()),
},
sort_keys=True,
),
)
except Exception:
logger.info(
"dfc_merge_summary event=%s groups=%d dropped=%d rows=%d",
color,
merged_count,
len(drop_indices),
int(multi_mask.sum()),
)
logger.info(
"Merged %d multi-face card groups for %s (dropped %d extra faces)",
merged_count,
color,
len(drop_indices),
)
_persist_merge_summary(color, summary_payload, logger) _persist_merge_summary(color, summary_payload, logger)
# Reset index to keep downstream expectations consistent.
return work_df.reset_index(drop=True) return work_df.reset_index(drop=True)

View file

@ -9,15 +9,97 @@ Detects the scope of phasing effects with multiple dimensions:
- Blanket: Phasing (phases all permanents out) - Blanket: Phasing (phases all permanents out)
Cards can have multiple scope tags (e.g., Targeted + Your Permanents). Cards can have multiple scope tags (e.g., Targeted + Your Permanents).
Refactored in M2: Create Scope Detection Utilities to use generic scope detection.
""" """
# Standard library imports
import re import re
from typing import Set from typing import Set
# Local application imports
from . import scope_detection_utils as scope_utils
from code.logging_util import get_logger from code.logging_util import get_logger
logger = get_logger(__name__) logger = get_logger(__name__)
# Phasing scope pattern definitions
def _get_phasing_scope_patterns() -> scope_utils.ScopePatterns:
"""
Build scope patterns for phasing abilities.
Returns:
ScopePatterns object with compiled patterns
"""
# Targeting patterns (special for phasing - detects "target...phases out")
targeting_patterns = [
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out', re.IGNORECASE),
re.compile(r'target\s+player\s+controls[^.]*phases?\s+out', re.IGNORECASE),
]
# Self-reference patterns
self_patterns = [
re.compile(r'this\s+(?:creature|permanent|artifact|enchantment)\s+phases?\s+out', re.IGNORECASE),
re.compile(r'~\s+phases?\s+out', re.IGNORECASE),
# Triggered self-phasing (King of the Oathbreakers)
re.compile(r'whenever.*(?:becomes\s+the\s+target|becomes\s+target).*(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
# Consequent self-phasing (Cyclonus: "connive. Then...phase out")
re.compile(r'(?:then|,)\s+(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
# At end of turn/combat self-phasing
re.compile(r'(?:at\s+(?:the\s+)?end\s+of|after).*(?:it|this\s+creature)\s+phases?\s+out', re.IGNORECASE),
]
# Opponent patterns
opponent_patterns = [
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent)\s+an?\s+opponents?\s+controls?\s+phases?\s+out', re.IGNORECASE),
# Unqualified targets (can target opponents' stuff if no "you control" restriction)
re.compile(r'(?:up\s+to\s+)?(?:one\s+|x\s+|that\s+many\s+)?(?:other\s+)?(?:another\s+)?target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out', re.IGNORECASE),
re.compile(r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|land|nonland\s+permanent)(?:,|\s+and)?\s+(?:then|and)?\s+it\s+phases?\s+out', re.IGNORECASE),
]
# Your permanents patterns
your_patterns = [
# Explicit "you control"
re.compile(r'(?:target\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'permanents?\s+you\s+control\s+phase\s+out', re.IGNORECASE),
re.compile(r'(?:any|up\s+to)\s+(?:number\s+of\s+)?(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
re.compile(r'all\s+(?:creatures?|permanents?)\s+you\s+control\s+phase\s+out', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+you\s+control\s+phases?\s+out', re.IGNORECASE),
# Pronoun reference to "you control" context
re.compile(r'(?:creatures?|permanents?|planeswalkers?)\s+you\s+control[^.]*(?:those|the)\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out', re.IGNORECASE),
re.compile(r'creature\s+you\s+control[^.]*(?:it)\s+phases?\s+out', re.IGNORECASE),
re.compile(r'you\s+control.*those\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out', re.IGNORECASE),
# Equipment/Aura
re.compile(r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out', re.IGNORECASE),
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out', re.IGNORECASE),
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:has|gains?)\s+phasing', re.IGNORECASE),
re.compile(r'(?:equipped|enchanted)\s+(?:creature|permanent)[^.]*,?\s+(?:then\s+)?that\s+(?:creature|permanent)\s+phases?\s+out', re.IGNORECASE),
# Target controlled by specific player
re.compile(r'(?:each|target)\s+(?:creature|permanent)\s+target\s+player\s+controls\s+phases?\s+out', re.IGNORECASE),
]
# Blanket patterns
blanket_patterns = [
re.compile(r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)(?:\s+of\s+that\s+type)?\s+(?:[^.]*\s+)?phase\s+out', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+(?:[^.]*\s+)?phases?\s+out', re.IGNORECASE),
# Type-specific blanket (Shimmer)
re.compile(r'each\s+(?:land|creature|permanent|artifact|enchantment)\s+of\s+the\s+chosen\s+type\s+has\s+phasing', re.IGNORECASE),
re.compile(r'(?:lands?|creatures?|permanents?|artifacts?|enchantments?)\s+of\s+the\s+chosen\s+type\s+(?:have|has)\s+phasing', re.IGNORECASE),
# Pronoun reference to "all creatures"
re.compile(r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)[^.]*,?\s+(?:then\s+)?(?:those|the)\s+(?:creatures?|permanents?)\s+phase\s+out', re.IGNORECASE),
]
return scope_utils.ScopePatterns(
opponent=opponent_patterns,
self_ref=self_patterns,
your_permanents=your_patterns,
blanket=blanket_patterns,
targeted=targeting_patterns
)
def get_phasing_scope_tags(text: str, card_name: str, keywords: str = '') -> Set[str]: def get_phasing_scope_tags(text: str, card_name: str, keywords: str = '') -> Set[str]:
""" """
Get all phasing scope metadata tags for a card. Get all phasing scope metadata tags for a card.
@ -47,121 +129,46 @@ def get_phasing_scope_tags(text: str, card_name: str, keywords: str = '') -> Set
# Check for static "Phasing" keyword ability (self-phasing) # Check for static "Phasing" keyword ability (self-phasing)
# Only add Self tag if card doesn't grant phasing to others # Only add Self tag if card doesn't grant phasing to others
if 'phasing' in keywords_lower: if 'phasing' in keywords_lower:
# Remove reminder text to avoid false positives # Define patterns for checking if card grants phasing to others
text_no_reminder = re.sub(r'\([^)]*\)', '', text_lower) grants_pattern = [re.compile(
# Check if card grants phasing to others (has granting language in main text)
# Look for patterns like "enchanted creature has", "other X have", "target", etc.
grants_to_others = bool(re.search(
r'(other|target|each|all|enchanted|equipped|creatures? you control|permanents? you control).*phas', r'(other|target|each|all|enchanted|equipped|creatures? you control|permanents? you control).*phas',
text_no_reminder re.IGNORECASE
)) )]
# If no granting language, it's just self-phasing is_static = scope_utils.check_static_keyword_legacy(
if not grants_to_others: keywords=keywords,
static_keyword='phasing',
text=text,
grant_patterns=grants_pattern
)
if is_static:
tags.add('Self: Phasing') tags.add('Self: Phasing')
return tags # Early return - static keyword only return tags # Early return - static keyword only
# Check if phasing is mentioned in text (including "has phasing", "gain phasing", etc.) # Check if phasing is mentioned in text
if 'phas' not in text_lower: # Changed from 'phase' to 'phas' to catch "phasing" too if 'phas' not in text_lower:
return tags return tags
# Check for targeting (any "target" + phasing) # Build phasing patterns and detect scopes
# Targeting detection - must have target AND phase in same sentence/clause patterns = _get_phasing_scope_patterns()
targeting_patterns = [
r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out',
r'target\s+player\s+controls[^.]*phases?\s+out',
]
is_targeted = any(re.search(pattern, text_lower) for pattern in targeting_patterns) # Detect all scopes (phasing can have multiple)
scopes = scope_utils.detect_multi_scope(
text=text,
card_name=card_name,
ability_keyword='phas', # Use 'phas' to catch both 'phase' and 'phasing'
patterns=patterns,
check_grant_verbs=False # Phasing doesn't need grant verb checking
)
if is_targeted: # Format scope tags with "Phasing" ability name
tags.add("Targeted: Phasing") for scope in scopes:
logger.debug(f"Card '{card_name}': detected Targeted: Phasing") if scope == "Targeted":
tags.add("Targeted: Phasing")
# Check for self-phasing else:
self_patterns = [ tags.add(scope_utils.format_scope_tag(scope, "Phasing"))
r'this\s+(?:creature|permanent|artifact|enchantment)\s+phases?\s+out', logger.debug(f"Card '{card_name}': detected {scope}: Phasing")
r'~\s+phases?\s+out',
rf'\b{re.escape(card_name.lower())}\s+phases?\s+out',
# NEW: Triggered self-phasing (King of the Oathbreakers: "it phases out" as reactive protection)
r'whenever.*(?:becomes\s+the\s+target|becomes\s+target).*(?:it|this\s+creature)\s+phases?\s+out',
# NEW: Consequent self-phasing (Cyclonus: "connive. Then...phase out")
r'(?:then|,)\s+(?:it|this\s+creature)\s+phases?\s+out',
# NEW: At end of turn/combat self-phasing
r'(?:at\s+(?:the\s+)?end\s+of|after).*(?:it|this\s+creature)\s+phases?\s+out',
]
if any(re.search(pattern, text_lower) for pattern in self_patterns):
tags.add("Self: Phasing")
logger.debug(f"Card '{card_name}': detected Self: Phasing")
# Check for opponent permanent phasing (removal effect)
opponent_patterns = [
r'target\s+(?:\w+\s+)*(?:creature|permanent)\s+an?\s+opponents?\s+controls?\s+phases?\s+out',
]
# Check for unqualified targets (can target opponents' stuff)
# More flexible to handle various phasing patterns
unqualified_target_patterns = [
r'(?:up\s+to\s+)?(?:one\s+|x\s+|that\s+many\s+)?(?:other\s+)?(?:another\s+)?target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|nonland\s+permanent)s?(?:[^.]*)?phases?\s+out',
r'target\s+(?:\w+\s+)*(?:creature|permanent|artifact|enchantment|land|nonland\s+permanent)(?:,|\s+and)?\s+(?:then|and)?\s+it\s+phases?\s+out',
]
has_opponent_specific = any(re.search(pattern, text_lower) for pattern in opponent_patterns)
has_unqualified_target = any(re.search(pattern, text_lower) for pattern in unqualified_target_patterns)
# If unqualified AND not restricted to "you control", can target opponents
if has_opponent_specific or (has_unqualified_target and 'you control' not in text_lower):
tags.add("Opponent Permanents: Phasing")
logger.debug(f"Card '{card_name}': detected Opponent Permanents: Phasing")
# Check for your permanents phasing
your_patterns = [
# Explicit "you control"
r'(?:target\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out',
r'(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?)\s+you\s+control\s+phases?\s+out',
r'permanents?\s+you\s+control\s+phase\s+out',
r'(?:any|up\s+to)\s+(?:number\s+of\s+)?(?:target\s+)?(?:other\s+)?(?:creatures?|permanents?|nonland\s+permanents?)\s+you\s+control\s+phases?\s+out',
r'all\s+(?:creatures?|permanents?)\s+you\s+control\s+phase\s+out',
r'each\s+(?:creature|permanent)\s+you\s+control\s+phases?\s+out',
# Pronoun reference to "you control" context
r'(?:creatures?|permanents?|planeswalkers?)\s+you\s+control[^.]*(?:those|the)\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out',
r'creature\s+you\s+control[^.]*(?:it)\s+phases?\s+out',
# "Those permanents" referring back to controlled permanents (across sentence boundaries)
r'you\s+control.*those\s+(?:creatures?|permanents?|planeswalkers?)\s+phase\s+out',
# Equipment/Aura (beneficial to your permanents)
r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out',
r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?phases?\s+out',
r'enchanted\s+(?:creature|permanent)\s+(?:has|gains?)\s+phasing', # NEW: "has phasing" for Cloak of Invisibility, Teferi's Curse
# Pronoun reference after equipped/enchanted creature mentioned
r'(?:equipped|enchanted)\s+(?:creature|permanent)[^.]*,?\s+(?:then\s+)?that\s+(?:creature|permanent)\s+phases?\s+out',
# Target controlled by specific player
r'(?:each|target)\s+(?:creature|permanent)\s+target\s+player\s+controls\s+phases?\s+out',
]
if any(re.search(pattern, text_lower) for pattern in your_patterns):
tags.add("Your Permanents: Phasing")
logger.debug(f"Card '{card_name}': detected Your Permanents: Phasing")
# Check for blanket phasing (all permanents, no ownership)
blanket_patterns = [
r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)(?:\s+of\s+that\s+type)?\s+(?:[^.]*\s+)?phase\s+out',
r'each\s+(?:creature|permanent)\s+(?:[^.]*\s+)?phases?\s+out',
# NEW: Type-specific blanket (Shimmer: "Each land of the chosen type has phasing")
r'each\s+(?:land|creature|permanent|artifact|enchantment)\s+of\s+the\s+chosen\s+type\s+has\s+phasing',
r'(?:lands?|creatures?|permanents?|artifacts?|enchantments?)\s+of\s+the\s+chosen\s+type\s+(?:have|has)\s+phasing',
# Pronoun reference to "all creatures"
r'all\s+(?:nontoken\s+)?(?:creatures?|permanents?)[^.]*,?\s+(?:then\s+)?(?:those|the)\s+(?:creatures?|permanents?)\s+phase\s+out',
]
# Only blanket if no specific ownership mentioned
has_blanket_pattern = any(re.search(pattern, text_lower) for pattern in blanket_patterns)
no_ownership = 'you control' not in text_lower and 'target player controls' not in text_lower and 'opponent' not in text_lower
if has_blanket_pattern and no_ownership:
tags.add("Blanket: Phasing")
logger.debug(f"Card '{card_name}': detected Blanket: Phasing")
return tags return tags

View file

@ -10,126 +10,135 @@ Usage in tagger.py:
if is_granting_protection(text, keywords): if is_granting_protection(text, keywords):
# Tag as Protection # Tag as Protection
""" """
import re import re
from typing import Set, List, Pattern from typing import List, Pattern, Set
from . import regex_patterns as rgx
from code.tagging.tag_constants import CREATURE_TYPES from . import tag_utils
from .tag_constants import CONTEXT_WINDOW_SIZE, CREATURE_TYPES, PROTECTION_KEYWORDS
# Pre-compile kindred detection patterns at module load for performance # Pre-compile kindred detection patterns at module load for performance
# Pattern: (compiled_regex, tag_name_template) # Pattern: (compiled_regex, tag_name_template)
KINDRED_PATTERNS: List[tuple[Pattern, str]] = [] def _build_kindred_patterns() -> List[tuple[Pattern, str]]:
"""Build pre-compiled kindred patterns for all creature types.
def _init_kindred_patterns():
"""Initialize pre-compiled kindred patterns for all creature types.""" Returns:
global KINDRED_PATTERNS List of tuples containing (compiled_pattern, tag_name)
if KINDRED_PATTERNS: """
return # Already initialized patterns = []
for creature_type in CREATURE_TYPES: for creature_type in CREATURE_TYPES:
creature_lower = creature_type.lower() creature_lower = creature_type.lower()
creature_escaped = re.escape(creature_lower) creature_escaped = re.escape(creature_lower)
tag_name = f"{creature_type}s Gain Protection" tag_name = f"{creature_type}s Gain Protection"
pattern_templates = [
# Create 3 patterns per type rf'\bother {creature_escaped}s?\b.*\b(have|gain)\b',
patterns_to_compile = [ rf'\b{creature_escaped} creatures?\b.*\b(have|gain)\b',
(rf'\bother {creature_escaped}s?\b.*\b(have|gain)\b', tag_name), rf'\btarget {creature_escaped}\b.*\bgains?\b',
(rf'\b{creature_escaped} creatures?\b.*\b(have|gain)\b', tag_name),
(rf'\btarget {creature_escaped}\b.*\bgains?\b', tag_name),
] ]
for pattern_str, tag in patterns_to_compile: for pattern_str in pattern_templates:
try: try:
compiled = re.compile(pattern_str, re.IGNORECASE) compiled = re.compile(pattern_str, re.IGNORECASE)
KINDRED_PATTERNS.append((compiled, tag)) patterns.append((compiled, tag_name))
except re.error: except re.error:
# Skip patterns that fail to compile # Skip patterns that fail to compile
pass pass
return patterns
KINDRED_PATTERNS: List[tuple[Pattern, str]] = _build_kindred_patterns()
# Grant verb patterns - cards that give protection to other permanents # Grant verb patterns - cards that give protection to other permanents
# These patterns look for grant verbs that affect OTHER permanents, not self # These patterns look for grant verbs that affect OTHER permanents, not self
# M5: Added phasing support # M5: Added phasing support
GRANT_VERB_PATTERNS = [ # Pre-compiled at module load for performance
r'\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', GRANT_VERB_PATTERNS: List[Pattern] = [
r'\bgive[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.compile(r'\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
r'\bgrant[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.compile(r'\bgive[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
r'\bhave\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', # "have hexproof" static grants re.compile(r'\bgrant[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE),
r'\bget[s]?\b.*\+.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', # "gets +X/+X and has hexproof" direct re.compile(r'\bhave\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "have hexproof" static grants
r'\bget[s]?\b.*\+.*\band\b.*\b(gain[s]?|have)\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', # "gets +X/+X and gains hexproof" re.compile(r'\bget[s]?\b.*\+.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "gets +X/+X and has hexproof" direct
r'\bphases? out\b', # M5: Direct phasing triggers (e.g., "it phases out") re.compile(r'\bget[s]?\b.*\+.*\band\b.*\b(gain[s]?|have)\b.*\b(hexproof|shroud|indestructible|ward|protection|phasing)\b', re.IGNORECASE), # "gets +X/+X and gains hexproof"
re.compile(r'\bphases? out\b', re.IGNORECASE), # M5: Direct phasing triggers (e.g., "it phases out")
] ]
# Self-reference patterns that should NOT count as granting # Self-reference patterns that should NOT count as granting
# Reminder text and keyword lines only # Reminder text and keyword lines only
# M5: Added phasing support # M5: Added phasing support
SELF_REFERENCE_PATTERNS = [ # Pre-compiled at module load for performance
r'^\s*(hexproof|shroud|indestructible|ward|protection|phasing)', # Start of text (keyword ability) SELF_REFERENCE_PATTERNS: List[Pattern] = [
r'\([^)]*\b(hexproof|shroud|indestructible|ward|protection|phasing)[^)]*\)', # Reminder text in parens re.compile(r'^\s*(hexproof|shroud|indestructible|ward|protection|phasing)', re.IGNORECASE), # Start of text (keyword ability)
re.compile(r'\([^)]*\b(hexproof|shroud|indestructible|ward|protection|phasing)[^)]*\)', re.IGNORECASE), # Reminder text in parens
] ]
# Conditional self-grant patterns - activated/triggered abilities that grant to self # Conditional self-grant patterns - activated/triggered abilities that grant to self
CONDITIONAL_SELF_GRANT_PATTERNS = [ # Pre-compiled at module load for performance
CONDITIONAL_SELF_GRANT_PATTERNS: List[Pattern] = [
# Activated abilities # Activated abilities
r'\{[^}]*\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'\{[^}]*\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
r'discard.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.compile(r'discard.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
r'\{t\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.compile(r'\{t\}.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
r'sacrifice.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.compile(r'sacrifice.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
r'pay.*life.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.compile(r'pay.*life.*:.*\bthis (creature|permanent|artifact|enchantment)\b.*\bgain[s]?\b', re.IGNORECASE),
# Triggered abilities that grant to self only # Triggered abilities that grant to self only
r'whenever.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'whenever.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
r'whenever you (cast|play|attack|cycle|discard|commit).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'whenever you (cast|play|attack|cycle|discard|commit).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
r'at the beginning.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'at the beginning.*\b(this creature|this permanent|it)\b.*\bgain[s]?\b.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
r'whenever.*\b(this creature|this permanent)\b (attacks|enters|becomes).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b', re.compile(r'whenever.*\b(this creature|this permanent)\b (attacks|enters|becomes).*\b(this creature|this permanent|it)\b.*\bgain[s]?\b', re.IGNORECASE),
# Named self-references (e.g., "Pristine Skywise gains") # Named self-references (e.g., "Pristine Skywise gains")
r'whenever you cast.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'whenever you cast.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
r'whenever you.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'whenever you.*[A-Z][a-z]+.*gains.*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
# Static conditional abilities (as long as, if you control X) # Static conditional abilities (as long as, if you control X)
r'as long as.*\b(this creature|this permanent|it|has)\b.*(has|gains?).*\b(hexproof|shroud|indestructible|ward|protection)\b', re.compile(r'as long as.*\b(this creature|this permanent|it|has)\b.*(has|gains?).*\b(hexproof|shroud|indestructible|ward|protection)\b', re.IGNORECASE),
] ]
# Mass grant patterns - affects multiple creatures YOU control # Mass grant patterns - affects multiple creatures YOU control
MASS_GRANT_PATTERNS = [ # Pre-compiled at module load for performance
r'creatures you control (have|gain|get)', MASS_GRANT_PATTERNS: List[Pattern] = [
r'other .* you control (have|gain|get)', re.compile(r'creatures you control (have|gain|get)', re.IGNORECASE),
r'(artifacts?|enchantments?|permanents?) you control (have|gain|get)', # Artifacts you control have... re.compile(r'other .* you control (have|gain|get)', re.IGNORECASE),
r'other (creatures?|artifacts?|enchantments?) (have|gain|get)', # Other creatures have... re.compile(r'(artifacts?|enchantments?|permanents?) you control (have|gain|get)', re.IGNORECASE), # Artifacts you control have...
r'all (creatures?|slivers?|permanents?) (have|gain|get)', # All creatures/slivers have... re.compile(r'other (creatures?|artifacts?|enchantments?) (have|gain|get)', re.IGNORECASE), # Other creatures have...
re.compile(r'all (creatures?|slivers?|permanents?) (have|gain|get)', re.IGNORECASE), # All creatures/slivers have...
] ]
# Targeted grant patterns - must specify "you control" # Targeted grant patterns - must specify "you control"
TARGETED_GRANT_PATTERNS = [ # Pre-compiled at module load for performance
r'target .* you control (gains?|gets?|has)', TARGETED_GRANT_PATTERNS: List[Pattern] = [
r'equipped creature (gains?|gets?|has)', re.compile(r'target .* you control (gains?|gets?|has)', re.IGNORECASE),
r'enchanted creature (gains?|gets?|has)', re.compile(r'equipped creature (gains?|gets?|has)', re.IGNORECASE),
re.compile(r'enchanted enchantment (gains?|gets?|has)', re.IGNORECASE),
] ]
# Exclusion patterns - cards that remove or prevent protection # Exclusion patterns - cards that remove or prevent protection
EXCLUSION_PATTERNS = [ # Pre-compiled at module load for performance
r"can't have (hexproof|indestructible|ward|shroud)", EXCLUSION_PATTERNS: List[Pattern] = [
r"lose[s]? (hexproof|indestructible|ward|shroud|protection)", re.compile(r"can't have (hexproof|indestructible|ward|shroud)", re.IGNORECASE),
r"without (hexproof|indestructible|ward|shroud)", re.compile(r"lose[s]? (hexproof|indestructible|ward|shroud|protection)", re.IGNORECASE),
r"protection from.*can't", re.compile(r"without (hexproof|indestructible|ward|shroud)", re.IGNORECASE),
re.compile(r"protection from.*can't", re.IGNORECASE),
] ]
# Opponent grant patterns - grants to opponent's permanents (EXCLUDE these) # Opponent grant patterns - grants to opponent's permanents (EXCLUDE these)
# NOTE: "all creatures" and "all permanents" are BLANKET effects (help you too), # NOTE: "all creatures" and "all permanents" are BLANKET effects (help you too),
# not opponent grants. Only exclude effects that ONLY help opponents. # not opponent grants. Only exclude effects that ONLY help opponents.
OPPONENT_GRANT_PATTERNS = [ # Pre-compiled at module load for performance
r'target opponent', OPPONENT_GRANT_PATTERNS: List[Pattern] = [
r'each opponent', rgx.TARGET_OPPONENT,
r'opponents? control', # creatures your opponents control rgx.EACH_OPPONENT,
r'opponent.*permanents?.*have', # opponent's permanents have rgx.OPPONENT_CONTROL,
re.compile(r'opponent.*permanents?.*have', re.IGNORECASE), # opponent's permanents have
] ]
# Blanket grant patterns - affects all permanents regardless of controller # Blanket grant patterns - affects all permanents regardless of controller
# These are VALID protection grants that should be tagged (Blanket scope in M5) # These are VALID protection grants that should be tagged (Blanket scope in M5)
BLANKET_GRANT_PATTERNS = [ # Pre-compiled at module load for performance
r'\ball creatures? (have|gain|get)\b', # All creatures gain hexproof BLANKET_GRANT_PATTERNS: List[Pattern] = [
r'\ball permanents? (have|gain|get)\b', # All permanents gain indestructible re.compile(r'\ball creatures? (have|gain|get)\b', re.IGNORECASE), # All creatures gain hexproof
r'\beach creature (has|gains?|gets?)\b', # Each creature gains ward re.compile(r'\ball permanents? (have|gain|get)\b', re.IGNORECASE), # All permanents gain indestructible
r'\beach player\b', # Each player gains hexproof (very rare but valid blanket) re.compile(r'\beach creature (has|gains?|gets?)\b', re.IGNORECASE), # Each creature gains ward
rgx.EACH_PLAYER, # Each player gains hexproof (very rare but valid blanket)
] ]
# Kindred-specific grant patterns for metadata tagging # Kindred-specific grant patterns for metadata tagging
@ -178,16 +187,6 @@ KINDRED_GRANT_PATTERNS = {
], ],
} }
# Protection keyword patterns for inherent check
PROTECTION_KEYWORDS = {
'hexproof',
'shroud',
'indestructible',
'ward',
'protection from',
'protection',
}
def get_kindred_protection_tags(text: str) -> Set[str]: def get_kindred_protection_tags(text: str) -> Set[str]:
""" """
@ -207,9 +206,6 @@ def get_kindred_protection_tags(text: str) -> Set[str]:
if not text: if not text:
return set() return set()
# Initialize pre-compiled patterns if needed
_init_kindred_patterns()
text_lower = text.lower() text_lower = text.lower()
tags = set() tags = set()
@ -217,13 +213,11 @@ def get_kindred_protection_tags(text: str) -> Set[str]:
protective_abilities = ['hexproof', 'shroud', 'indestructible', 'ward', 'protection'] protective_abilities = ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']
if not any(keyword in text_lower for keyword in protective_abilities): if not any(keyword in text_lower for keyword in protective_abilities):
return tags return tags
# Check predefined patterns (specific kindred types we track)
for tag_base, patterns in KINDRED_GRANT_PATTERNS.items(): for tag_base, patterns in KINDRED_GRANT_PATTERNS.items():
for pattern in patterns: for pattern in patterns:
match = re.search(pattern, text_lower, re.IGNORECASE) pattern_compiled = re.compile(pattern, re.IGNORECASE) if isinstance(pattern, str) else pattern
match = pattern_compiled.search(text_lower)
if match: if match:
# Extract creature type from tag_base (e.g., "Knights" from "Knights Gain Protection")
creature_type = tag_base.split(' Gain ')[0] creature_type = tag_base.split(' Gain ')[0]
# Get the matched text to check which abilities are in this specific grant # Get the matched text to check which abilities are in this specific grant
matched_text = match.group(0) matched_text = match.group(0)
@ -244,7 +238,6 @@ def get_kindred_protection_tags(text: str) -> Set[str]:
for compiled_pattern, tag_template in KINDRED_PATTERNS: for compiled_pattern, tag_template in KINDRED_PATTERNS:
match = compiled_pattern.search(text_lower) match = compiled_pattern.search(text_lower)
if match: if match:
# Extract creature type from tag_template (e.g., "Knights" from "Knights Gain Protection")
creature_type = tag_template.split(' Gain ')[0] creature_type = tag_template.split(' Gain ')[0]
# Get the matched text to check which abilities are in this specific grant # Get the matched text to check which abilities are in this specific grant
matched_text = match.group(0) matched_text = match.group(0)
@ -278,18 +271,16 @@ def is_opponent_grant(text: str) -> bool:
# Remove reminder text (in parentheses) to avoid false positives # Remove reminder text (in parentheses) to avoid false positives
# Reminder text often mentions "opponents control" for hexproof/shroud explanations # Reminder text often mentions "opponents control" for hexproof/shroud explanations
text_no_reminder = re.sub(r'\([^)]*\)', '', text_lower) text_no_reminder = tag_utils.strip_reminder_text(text_lower)
# Check for opponent-specific grant patterns in the main text (not reminder)
for pattern in OPPONENT_GRANT_PATTERNS: for pattern in OPPONENT_GRANT_PATTERNS:
match = re.search(pattern, text_no_reminder, re.IGNORECASE) match = pattern.search(text_no_reminder)
if match: if match:
# Must be in context of granting protection # Must be in context of granting protection
if any(prot in text_lower for prot in ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']): if any(prot in text_lower for prot in ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']):
# Check the context around the match context = tag_utils.extract_context_window(
context_start = max(0, match.start() - 30) text_no_reminder, match.start(), match.end(),
context_end = min(len(text_no_reminder), match.end() + 70) window_size=CONTEXT_WINDOW_SIZE, include_before=True
context = text_no_reminder[context_start:context_end] )
# If "you control" appears in the context, it's limiting to YOUR permanents, not opponents # If "you control" appears in the context, it's limiting to YOUR permanents, not opponents
if 'you control' not in context: if 'you control' not in context:
@ -307,10 +298,8 @@ def has_conditional_self_grant(text: str) -> bool:
return False return False
text_lower = text.lower() text_lower = text.lower()
# Check for conditional self-grant patterns (activated/triggered abilities)
for pattern in CONDITIONAL_SELF_GRANT_PATTERNS: for pattern in CONDITIONAL_SELF_GRANT_PATTERNS:
if re.search(pattern, text_lower, re.IGNORECASE): if pattern.search(text_lower):
return True return True
return False return False
@ -331,30 +320,121 @@ def is_conditional_self_grant(text: str) -> bool:
return False return False
text_lower = text.lower() text_lower = text.lower()
# Check if it has conditional self-grant patterns
found_conditional_self = has_conditional_self_grant(text) found_conditional_self = has_conditional_self_grant(text)
if not found_conditional_self: if not found_conditional_self:
return False return False
# If we found a conditional self-grant, check if there's ALSO a grant to others # If we found a conditional self-grant, check if there's ALSO a grant to others
# Look for patterns that grant to creatures besides itself other_grant_patterns = [
has_other_grant = any(re.search(pattern, text_lower, re.IGNORECASE) for pattern in [ rgx.OTHER_CREATURES,
r'other creatures', re.compile(r'creatures you control (have|gain)', re.IGNORECASE),
r'creatures you control (have|gain)', re.compile(r'target (creature|permanent) you control gains', re.IGNORECASE),
r'target (creature|permanent) you control gains', re.compile(r'another target (creature|permanent)', re.IGNORECASE),
r'another target (creature|permanent)', re.compile(r'equipped creature (has|gains)', re.IGNORECASE),
r'equipped creature (has|gains)', re.compile(r'enchanted creature (has|gains)', re.IGNORECASE),
r'enchanted creature (has|gains)', re.compile(r'target legendary', re.IGNORECASE),
r'target legendary', re.compile(r'permanents you control gain', re.IGNORECASE),
r'permanents you control gain', ]
]) has_other_grant = any(pattern.search(text_lower) for pattern in other_grant_patterns)
# Return True only if it's ONLY conditional self-grants (no other grants) # Return True only if it's ONLY conditional self-grants (no other grants)
return not has_other_grant return not has_other_grant
def _should_exclude_token_creation(text_lower: str) -> bool:
"""Check if card only creates tokens with protection (not granting to existing permanents).
Args:
text_lower: Lowercased card text
Returns:
True if card only creates tokens, False if it also grants
"""
token_with_protection = re.compile(r'create.*token.*with.*(hexproof|shroud|indestructible|ward|protection)', re.IGNORECASE)
if token_with_protection.search(text_lower):
has_grant_to_others = any(pattern.search(text_lower) for pattern in MASS_GRANT_PATTERNS)
return not has_grant_to_others
return False
def _should_exclude_kindred_only(text: str, text_lower: str, exclude_kindred: bool) -> bool:
"""Check if card only grants to specific kindred types.
Args:
text: Original card text
text_lower: Lowercased card text
exclude_kindred: Whether to exclude kindred-specific grants
Returns:
True if card only has kindred grants, False if it has broad grants
"""
if not exclude_kindred:
return False
kindred_tags = get_kindred_protection_tags(text)
if not kindred_tags:
return False
broad_only_patterns = [
re.compile(r'\bcreatures you control (have|gain)\b(?!.*(knight|merfolk|zombie|elf|dragon|goblin|sliver))', re.IGNORECASE),
re.compile(r'\bpermanents you control (have|gain)\b', re.IGNORECASE),
re.compile(r'\beach (creature|permanent) you control', re.IGNORECASE),
re.compile(r'\ball (creatures?|permanents?)', re.IGNORECASE),
]
has_broad_grant = any(pattern.search(text_lower) for pattern in broad_only_patterns)
return not has_broad_grant
def _check_pattern_grants(text_lower: str, pattern_list: List[Pattern]) -> bool:
"""Check if text contains protection grants matching pattern list.
Args:
text_lower: Lowercased card text
pattern_list: List of grant patterns to check
Returns:
True if protection grant found, False otherwise
"""
for pattern in pattern_list:
match = pattern.search(text_lower)
if match:
context = tag_utils.extract_context_window(text_lower, match.start(), match.end())
if any(prot in context for prot in PROTECTION_KEYWORDS):
return True
return False
def _has_inherent_protection_only(text_lower: str, keywords: str, found_grant: bool) -> bool:
"""Check if card only has inherent protection without granting.
Args:
text_lower: Lowercased card text
keywords: Card keywords
found_grant: Whether a grant pattern was found
Returns:
True if card only has inherent protection, False otherwise
"""
if not keywords:
return False
keywords_lower = keywords.lower()
has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS)
if not has_inherent or found_grant:
return False
stat_only_pattern = re.compile(r'(get[s]?|gain[s]?)\s+[+\-][0-9X]+/[+\-][0-9X]+', re.IGNORECASE)
has_stat_only = bool(stat_only_pattern.search(text_lower))
mentions_other_without_prot = False
if 'other' in text_lower:
other_idx = text_lower.find('other')
remaining_text = text_lower[other_idx:]
mentions_other_without_prot = not any(prot in remaining_text for prot in PROTECTION_KEYWORDS)
return has_stat_only or mentions_other_without_prot
def is_granting_protection(text: str, keywords: str, exclude_kindred: bool = False) -> bool: def is_granting_protection(text: str, keywords: str, exclude_kindred: bool = False) -> bool:
""" """
Determine if a card grants protection effects to other permanents. Determine if a card grants protection effects to other permanents.
@ -381,117 +461,32 @@ def is_granting_protection(text: str, keywords: str, exclude_kindred: bool = Fal
text_lower = text.lower() text_lower = text.lower()
# EXCLUDE: Opponent grants # Early exclusion checks
if is_opponent_grant(text): if is_opponent_grant(text):
return False return False
# EXCLUDE: Conditional self-grants only
if is_conditional_self_grant(text): if is_conditional_self_grant(text):
return False return False
# EXCLUDE: Cards that remove protection if any(pattern.search(text_lower) for pattern in EXCLUSION_PATTERNS):
for pattern in EXCLUSION_PATTERNS: return False
if re.search(pattern, text_lower, re.IGNORECASE):
return False
# EXCLUDE: Token creation with protection (not granting to existing permanents) if _should_exclude_token_creation(text_lower):
if re.search(r'create.*token.*with.*(hexproof|shroud|indestructible|ward|protection)', text_lower, re.IGNORECASE): return False
# Check if there's ALSO granting to other permanents
has_grant_to_others = any(re.search(pattern, text_lower, re.IGNORECASE) for pattern in MASS_GRANT_PATTERNS)
if not has_grant_to_others:
return False
# EXCLUDE: Kindred-specific grants if requested if _should_exclude_kindred_only(text, text_lower, exclude_kindred):
if exclude_kindred: return False
kindred_tags = get_kindred_protection_tags(text)
if kindred_tags:
# If we detected kindred tags, check if there's ALSO a non-kindred grant
# Look for grant patterns that explicitly grant to ALL creatures/permanents broadly
has_broad_grant = False
# Patterns that indicate truly broad grants (not type-specific)
broad_only_patterns = [
r'\bcreatures you control (have|gain)\b(?!.*(knight|merfolk|zombie|elf|dragon|goblin|sliver))', # Only if not followed by type
r'\bpermanents you control (have|gain)\b',
r'\beach (creature|permanent) you control',
r'\ball (creatures?|permanents?)',
]
for pattern in broad_only_patterns:
if re.search(pattern, text_lower, re.IGNORECASE):
has_broad_grant = True
break
if not has_broad_grant:
return False # Only kindred grants, exclude
# Check if card has inherent protection keywords
has_inherent = False
if keywords:
keywords_lower = keywords.lower()
has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS)
# Check for explicit grants with protection keywords
found_grant = False found_grant = False
if _check_pattern_grants(text_lower, BLANKET_GRANT_PATTERNS):
# Blanket grant patterns (all creatures gain hexproof) - these are VALID grants found_grant = True
for pattern in BLANKET_GRANT_PATTERNS: elif _check_pattern_grants(text_lower, MASS_GRANT_PATTERNS):
match = re.search(pattern, text_lower, re.IGNORECASE) found_grant = True
if match: elif _check_pattern_grants(text_lower, TARGETED_GRANT_PATTERNS):
# Check if protection keyword appears nearby found_grant = True
context_start = match.start() elif any(pattern.search(text_lower) for pattern in GRANT_VERB_PATTERNS):
context_end = min(len(text_lower), match.end() + 70) found_grant = True
context = text_lower[context_start:context_end] if _has_inherent_protection_only(text_lower, keywords, found_grant):
return False
if any(prot in context for prot in PROTECTION_KEYWORDS):
found_grant = True
break
# Mass grant patterns (creatures you control have/gain)
if not found_grant:
for pattern in MASS_GRANT_PATTERNS:
match = re.search(pattern, text_lower, re.IGNORECASE)
if match:
# Check if protection keyword appears in the same sentence or nearby (within 70 chars AFTER the match)
# This ensures we're looking at "creatures you control HAVE hexproof" not just having both phrases
context_start = match.start()
context_end = min(len(text_lower), match.end() + 70)
context = text_lower[context_start:context_end]
if any(prot in context for prot in PROTECTION_KEYWORDS):
found_grant = True
break
# Targeted grant patterns (target creature gains)
if not found_grant:
for pattern in TARGETED_GRANT_PATTERNS:
match = re.search(pattern, text_lower, re.IGNORECASE)
if match:
# Check if protection keyword appears after the grant verb (within 70 chars)
context_start = match.start()
context_end = min(len(text_lower), match.end() + 70)
context = text_lower[context_start:context_end]
if any(prot in context for prot in PROTECTION_KEYWORDS):
found_grant = True
break
# Grant verb patterns (creature gains/gets hexproof)
if not found_grant:
for pattern in GRANT_VERB_PATTERNS:
if re.search(pattern, text_lower, re.IGNORECASE):
found_grant = True
break
# If we have inherent protection and the ONLY text is about stats (no grant words), exclude
if has_inherent and not found_grant:
# Check if text only talks about other stats (power/toughness, +X/+X)
has_stat_only = bool(re.search(r'(get[s]?|gain[s]?)\s+[+\-][0-9X]+/[+\-][0-9X]+', text_lower))
# Check if text mentions "other" without protection keywords
mentions_other_without_prot = 'other' in text_lower and not any(prot in text_lower for prot in PROTECTION_KEYWORDS if prot in text_lower[text_lower.find('other'):])
if has_stat_only or mentions_other_without_prot:
return False
return found_grant return found_grant
@ -516,25 +511,14 @@ def categorize_protection_card(name: str, text: str, keywords: str, card_type: s
'Neither' - false positive 'Neither' - false positive
""" """
keywords_lower = keywords.lower() if keywords else '' keywords_lower = keywords.lower() if keywords else ''
# Check for opponent grants first
if is_opponent_grant(text): if is_opponent_grant(text):
return 'Opponent' return 'Opponent'
# Check for conditional self-grants (ONLY self, no other grants)
if is_conditional_self_grant(text): if is_conditional_self_grant(text):
return 'ConditionalSelf' return 'ConditionalSelf'
# Check if it has conditional self-grant (may also have other grants)
has_cond_self = has_conditional_self_grant(text) has_cond_self = has_conditional_self_grant(text)
# Check if it has inherent protection
has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS) has_inherent = any(k in keywords_lower for k in PROTECTION_KEYWORDS)
# Check for kindred-specific grants
kindred_tags = get_kindred_protection_tags(text) kindred_tags = get_kindred_protection_tags(text)
if kindred_tags and exclude_kindred: if kindred_tags and exclude_kindred:
# Check if there's ALSO a broad grant (excluding kindred)
grants_broad = is_granting_protection(text, keywords, exclude_kindred=True) grants_broad = is_granting_protection(text, keywords, exclude_kindred=True)
if grants_broad and has_inherent: if grants_broad and has_inherent:
@ -551,8 +535,6 @@ def categorize_protection_card(name: str, text: str, keywords: str, card_type: s
else: else:
# Only kindred grants, no inherent or broad # Only kindred grants, no inherent or broad
return 'Kindred' return 'Kindred'
# Check if it grants protection broadly (not kindred-specific)
grants_protection = is_granting_protection(text, keywords, exclude_kindred=exclude_kindred) grants_protection = is_granting_protection(text, keywords, exclude_kindred=exclude_kindred)
# Categorize based on what it does # Categorize based on what it does

View file

@ -5,39 +5,99 @@ Detects the scope of protection effects (Self, Your Permanents, Blanket, Opponen
to enable intelligent filtering in deck building. to enable intelligent filtering in deck building.
Part of M5: Protection Effect Granularity milestone. Part of M5: Protection Effect Granularity milestone.
Refactored in M2: Create Scope Detection Utilities to use generic scope detection.
""" """
# Standard library imports
import re import re
from typing import Optional, Set from typing import Optional, Set
# Local application imports
from code.logging_util import get_logger from code.logging_util import get_logger
from . import scope_detection_utils as scope_utils
from .tag_constants import PROTECTION_ABILITIES
logger = get_logger(__name__) logger = get_logger(__name__)
# Protection abilities to detect # Protection scope pattern definitions
PROTECTION_ABILITIES = [ def _get_protection_scope_patterns(ability: str) -> scope_utils.ScopePatterns:
'Protection', """
'Ward', Build scope patterns for protection abilities.
'Hexproof',
'Shroud', Args:
'Indestructible' ability: Ability keyword (e.g., "hexproof", "ward")
]
Returns:
ScopePatterns object with compiled patterns
"""
ability_lower = ability.lower()
# Opponent patterns: grants protection TO opponent's permanents
# Note: Must distinguish from hexproof reminder text "opponents control [spells/abilities]"
opponent_patterns = [
re.compile(r'creatures?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', re.IGNORECASE),
re.compile(r'permanents?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', re.IGNORECASE),
re.compile(r'each\s+creature\s+an?\s+opponent\s+controls?\s+(?:has|gains?)', re.IGNORECASE),
]
# Self-reference patterns
self_patterns = [
# Tilde (~) - strong self-reference indicator
re.compile(r'~\s+(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
re.compile(r'~\s+is\s+' + ability_lower, re.IGNORECASE),
# "this creature/permanent" pronouns
re.compile(r'this\s+(?:creature|permanent|artifact|enchantment)\s+(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
# Starts with ability (likely self)
re.compile(r'^(?:has|gains?)\s+' + ability_lower, re.IGNORECASE),
]
# Your permanents patterns
your_patterns = [
re.compile(r'(?:other\s+)?(?:creatures?|permanents?|artifacts?|enchantments?)\s+you\s+control', re.IGNORECASE),
re.compile(r'your\s+(?:creatures?|permanents?|artifacts?|enchantments?)', re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+you\s+control', re.IGNORECASE),
re.compile(r'other\s+\w+s?\s+you\s+control', re.IGNORECASE), # "Other Merfolk you control", etc.
# "Other X you control...have Y" pattern for static grants
re.compile(r'other\s+(?:\w+\s+)?(?:creatures?|permanents?)\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower, re.IGNORECASE),
re.compile(r'other\s+\w+s?\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower, re.IGNORECASE), # "Other Knights you control...have"
re.compile(r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE), # Equipment
re.compile(r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE), # Aura
re.compile(r'target\s+(?:\w+\s+)?(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:gains?)\s+' + ability_lower, re.IGNORECASE), # Target
]
# Blanket patterns (no ownership qualifier)
# Note: Abilities can be listed with "and" (e.g., "gain hexproof and indestructible")
blanket_patterns = [
re.compile(r'all\s+(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
re.compile(r'each\s+(?:creature|permanent)\s+(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
re.compile(r'(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower, re.IGNORECASE),
]
return scope_utils.ScopePatterns(
opponent=opponent_patterns,
self_ref=self_patterns,
your_permanents=your_patterns,
blanket=blanket_patterns
)
def detect_protection_scope(text: str, card_name: str, ability: str) -> Optional[str]: def detect_protection_scope(text: str, card_name: str, ability: str, keywords: Optional[str] = None) -> Optional[str]:
""" """
Detect the scope of a protection effect. Detect the scope of a protection effect.
Detection priority order (prevents misclassification): Detection priority order (prevents misclassification):
0. Static keyword "Self"
1. Opponent ownership "Opponent Permanents" 1. Opponent ownership "Opponent Permanents"
2. Your ownership "Your Permanents" 2. Self-reference "Self"
3. Self-reference "Self" 3. Your ownership "Your Permanents"
4. No ownership qualifier "Blanket" 4. No ownership qualifier "Blanket"
Args: Args:
text: Card text (lowercase for pattern matching) text: Card text (lowercase for pattern matching)
card_name: Card name (for self-reference detection) card_name: Card name (for self-reference detection)
ability: Ability type (Ward, Hexproof, etc.) ability: Ability type (Ward, Hexproof, etc.)
keywords: Optional keywords field for static keyword detection
Returns: Returns:
Scope prefix or None: "Self", "Your Permanents", "Blanket", "Opponent Permanents" Scope prefix or None: "Self", "Your Permanents", "Blanket", "Opponent Permanents"
@ -45,120 +105,22 @@ def detect_protection_scope(text: str, card_name: str, ability: str) -> Optional
if not text or not ability: if not text or not ability:
return None return None
text_lower = text.lower() # Build patterns for this ability
ability_lower = ability.lower() patterns = _get_protection_scope_patterns(ability)
card_name_lower = card_name.lower()
# Check if ability is mentioned in text # Use generic scope detection with grant verb checking AND keywords
if ability_lower not in text_lower: return scope_utils.detect_scope(
return None text=text,
card_name=card_name,
# Priority 1: Opponent ownership (grants protection TO opponent's permanents) ability_keyword=ability,
# Note: Must distinguish from hexproof reminder text "opponents control [spells/abilities]" patterns=patterns,
# Only match when "opponents control" refers to creatures/permanents, not spells allow_multiple=False,
opponent_patterns = [ check_grant_verbs=True,
r'creatures?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', keywords=keywords
r'permanents?\s+(?:your\s+)?opponents?\s+control\s+(?:have|gain)', )
r'each\s+creature\s+an?\s+opponent\s+controls?\s+(?:has|gains?)'
]
for pattern in opponent_patterns:
if re.search(pattern, text_lower):
return "Opponent Permanents"
# Priority 2: Check for self-reference BEFORE "Your Permanents"
# This prevents tilde (~) from being caught by creature type patterns
# Check for tilde (~) - strong self-reference indicator
tilde_patterns = [
r'~\s+(?:has|gains?)\s+' + ability_lower,
r'~\s+is\s+' + ability_lower
]
for pattern in tilde_patterns:
if re.search(pattern, text_lower):
return "Self"
# Check for "this creature/permanent" pronouns
this_patterns = [
r'this\s+(?:creature|permanent|artifact|enchantment)\s+(?:has|gains?)\s+' + ability_lower,
r'^(?:has|gains?)\s+' + ability_lower # Starts with ability (likely self)
]
for pattern in this_patterns:
if re.search(pattern, text_lower):
return "Self"
# Check for card name (replace special characters for matching)
card_name_escaped = re.escape(card_name_lower)
if re.search(rf'\b{card_name_escaped}\b', text_lower):
# Make sure it's in a self-protection context
# e.g., "Svyelun has indestructible" not "Svyelun and other Merfolk"
self_context_patterns = [
rf'\b{card_name_escaped}\s+(?:has|gains?)\s+{ability_lower}',
rf'\b{card_name_escaped}\s+is\s+{ability_lower}'
]
for pattern in self_context_patterns:
if re.search(pattern, text_lower):
return "Self"
# NEW: If no grant patterns found at all, assume inherent protection (Self)
# This catches cards where protection is in the keywords field but not explained in text
# e.g., "Protection from creatures" as a keyword line
# Check if we have the ability keyword but no grant patterns
has_grant_pattern = any(re.search(pattern, text_lower) for pattern in [
r'(?:have|gain|grant|give|get)[s]?\s+',
r'other\s+',
r'creatures?\s+you\s+control',
r'permanents?\s+you\s+control',
r'equipped',
r'enchanted',
r'target'
])
if not has_grant_pattern:
# No grant verbs found - likely inherent protection
return "Self"
# Priority 3: Your ownership (most common)
# Note: "Other [Type]" patterns included for type-specific grants
# Note: "equipped creature", "target creature", etc. are permanents you control
your_patterns = [
r'(?:other\s+)?(?:creatures?|permanents?|artifacts?|enchantments?)\s+you\s+control',
r'your\s+(?:creatures?|permanents?|artifacts?|enchantments?)',
r'each\s+(?:creature|permanent)\s+you\s+control',
r'other\s+\w+s?\s+you\s+control', # "Other Merfolk you control", etc.
# NEW: "Other X you control...have Y" pattern for static grants
r'other\s+(?:\w+\s+)?(?:creatures?|permanents?)\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower,
r'other\s+\w+s?\s+you\s+control\s+(?:get\s+[^.]*\s+and\s+)?have\s+' + ability_lower, # "Other Knights you control...have"
r'equipped\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, # Equipment
r'enchanted\s+(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower, # Aura
r'target\s+(?:\w+\s+)?(?:creature|permanent)\s+(?:gets\s+[^.]*\s+and\s+)?(?:gains?)\s+' + ability_lower # Target (with optional adjective)
]
for pattern in your_patterns:
if re.search(pattern, text_lower):
return "Your Permanents"
# Priority 4: Blanket (no ownership qualifier)
# Only apply if we have protection keyword but no ownership context
# Note: Abilities can be listed with "and" (e.g., "gain hexproof and indestructible")
blanket_patterns = [
r'all\s+(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower,
r'each\s+(?:creature|permanent)\s+(?:has|gains?)\s+(?:[^.]*\s+and\s+)?' + ability_lower,
r'(?:creatures?|permanents?)\s+(?:have|gain)\s+(?:[^.]*\s+and\s+)?' + ability_lower
]
for pattern in blanket_patterns:
if re.search(pattern, text_lower):
# Double-check no ownership was missed
if 'you control' not in text_lower and 'opponent' not in text_lower:
return "Blanket"
return None
def get_protection_scope_tags(text: str, card_name: str) -> Set[str]: def get_protection_scope_tags(text: str, card_name: str, keywords: Optional[str] = None) -> Set[str]:
""" """
Get all protection scope metadata tags for a card. Get all protection scope metadata tags for a card.
@ -167,6 +129,7 @@ def get_protection_scope_tags(text: str, card_name: str) -> Set[str]:
Args: Args:
text: Card text text: Card text
card_name: Card name card_name: Card name
keywords: Optional keywords field for static keyword detection
Returns: Returns:
Set of metadata tags like {"Self: Indestructible", "Your Permanents: Ward"} Set of metadata tags like {"Self: Indestructible", "Your Permanents: Ward"}
@ -178,7 +141,7 @@ def get_protection_scope_tags(text: str, card_name: str) -> Set[str]:
# Check each protection ability # Check each protection ability
for ability in PROTECTION_ABILITIES: for ability in PROTECTION_ABILITIES:
scope = detect_protection_scope(text, card_name, ability) scope = detect_protection_scope(text, card_name, ability, keywords)
if scope: if scope:
# Format: "{Scope}: {Ability}" # Format: "{Scope}: {Ability}"

View file

@ -0,0 +1,455 @@
"""
Centralized regex patterns for MTG card tagging.
All patterns compiled with re.IGNORECASE for case-insensitive matching.
Organized by semantic category for maintainability and reusability.
Usage:
from code.tagging import regex_patterns as rgx
mask = df['text'].str.contains(rgx.YOU_CONTROL, na=False)
if rgx.GRANT_HEXPROOF.search(text):
...
# Or use builder functions
pattern = rgx.ownership_pattern('creature', 'you')
mask = df['text'].str.contains(pattern, na=False)
"""
import re
from typing import Pattern, List
# =============================================================================
# OWNERSHIP & CONTROLLER PATTERNS
# =============================================================================
YOU_CONTROL: Pattern = re.compile(r'you control', re.IGNORECASE)
THEY_CONTROL: Pattern = re.compile(r'they control', re.IGNORECASE)
OPPONENT_CONTROL: Pattern = re.compile(r'opponent[s]? control', re.IGNORECASE)
CREATURE_YOU_CONTROL: Pattern = re.compile(r'creature[s]? you control', re.IGNORECASE)
PERMANENT_YOU_CONTROL: Pattern = re.compile(r'permanent[s]? you control', re.IGNORECASE)
ARTIFACT_YOU_CONTROL: Pattern = re.compile(r'artifact[s]? you control', re.IGNORECASE)
ENCHANTMENT_YOU_CONTROL: Pattern = re.compile(r'enchantment[s]? you control', re.IGNORECASE)
# =============================================================================
# GRANT VERB PATTERNS
# =============================================================================
GAIN: Pattern = re.compile(r'\bgain[s]?\b', re.IGNORECASE)
HAS: Pattern = re.compile(r'\bhas\b', re.IGNORECASE)
HAVE: Pattern = re.compile(r'\bhave\b', re.IGNORECASE)
GET: Pattern = re.compile(r'\bget[s]?\b', re.IGNORECASE)
GRANT_VERBS: List[str] = ['gain', 'gains', 'has', 'have', 'get', 'gets']
# =============================================================================
# TARGETING PATTERNS
# =============================================================================
TARGET_PLAYER: Pattern = re.compile(r'target player', re.IGNORECASE)
TARGET_OPPONENT: Pattern = re.compile(r'target opponent', re.IGNORECASE)
TARGET_CREATURE: Pattern = re.compile(r'target creature', re.IGNORECASE)
TARGET_PERMANENT: Pattern = re.compile(r'target permanent', re.IGNORECASE)
TARGET_ARTIFACT: Pattern = re.compile(r'target artifact', re.IGNORECASE)
TARGET_ENCHANTMENT: Pattern = re.compile(r'target enchantment', re.IGNORECASE)
EACH_PLAYER: Pattern = re.compile(r'each player', re.IGNORECASE)
EACH_OPPONENT: Pattern = re.compile(r'each opponent', re.IGNORECASE)
TARGET_YOU_CONTROL: Pattern = re.compile(r'target .* you control', re.IGNORECASE)
# =============================================================================
# PROTECTION ABILITY PATTERNS
# =============================================================================
HEXPROOF: Pattern = re.compile(r'\bhexproof\b', re.IGNORECASE)
SHROUD: Pattern = re.compile(r'\bshroud\b', re.IGNORECASE)
INDESTRUCTIBLE: Pattern = re.compile(r'\bindestructible\b', re.IGNORECASE)
WARD: Pattern = re.compile(r'\bward\b', re.IGNORECASE)
PROTECTION_FROM: Pattern = re.compile(r'protection from', re.IGNORECASE)
PROTECTION_ABILITIES: List[str] = ['hexproof', 'shroud', 'indestructible', 'ward', 'protection']
CANT_HAVE_PROTECTION: Pattern = re.compile(r"can't have (hexproof|indestructible|ward|shroud)", re.IGNORECASE)
LOSE_PROTECTION: Pattern = re.compile(r"lose[s]? (hexproof|indestructible|ward|shroud|protection)", re.IGNORECASE)
# =============================================================================
# CARD DRAW PATTERNS
# =============================================================================
DRAW_A_CARD: Pattern = re.compile(r'draw[s]? (?:a|one) card', re.IGNORECASE)
DRAW_CARDS: Pattern = re.compile(r'draw[s]? (?:two|three|four|five|x|\d+) card', re.IGNORECASE)
DRAW: Pattern = re.compile(r'\bdraw[s]?\b', re.IGNORECASE)
# =============================================================================
# TOKEN CREATION PATTERNS
# =============================================================================
CREATE_TOKEN: Pattern = re.compile(r'create[s]?.*token', re.IGNORECASE)
PUT_TOKEN: Pattern = re.compile(r'put[s]?.*token', re.IGNORECASE)
CREATE_TREASURE: Pattern = re.compile(r'create.*treasure token', re.IGNORECASE)
CREATE_FOOD: Pattern = re.compile(r'create.*food token', re.IGNORECASE)
CREATE_CLUE: Pattern = re.compile(r'create.*clue token', re.IGNORECASE)
CREATE_BLOOD: Pattern = re.compile(r'create.*blood token', re.IGNORECASE)
# =============================================================================
# COUNTER PATTERNS
# =============================================================================
PLUS_ONE_COUNTER: Pattern = re.compile(r'\+1/\+1 counter', re.IGNORECASE)
MINUS_ONE_COUNTER: Pattern = re.compile(r'\-1/\-1 counter', re.IGNORECASE)
LOYALTY_COUNTER: Pattern = re.compile(r'loyalty counter', re.IGNORECASE)
PROLIFERATE: Pattern = re.compile(r'\bproliferate\b', re.IGNORECASE)
ONE_OR_MORE_COUNTERS: Pattern = re.compile(r'one or more counter', re.IGNORECASE)
ONE_OR_MORE_PLUS_ONE_COUNTERS: Pattern = re.compile(r'one or more \+1/\+1 counter', re.IGNORECASE)
IF_HAD_COUNTERS: Pattern = re.compile(r'if it had counter', re.IGNORECASE)
WITH_COUNTERS_ON_THEM: Pattern = re.compile(r'with counter[s]? on them', re.IGNORECASE)
# =============================================================================
# SACRIFICE & REMOVAL PATTERNS
# =============================================================================
SACRIFICE: Pattern = re.compile(r'sacrifice[s]?', re.IGNORECASE)
SACRIFICED: Pattern = re.compile(r'sacrificed', re.IGNORECASE)
DESTROY: Pattern = re.compile(r'destroy[s]?', re.IGNORECASE)
EXILE: Pattern = re.compile(r'exile[s]?', re.IGNORECASE)
EXILED: Pattern = re.compile(r'exiled', re.IGNORECASE)
SACRIFICE_DRAW: Pattern = re.compile(r'sacrifice (?:a|an) (?:artifact|creature|permanent)(?:[^,]*),?[^,]*draw', re.IGNORECASE)
SACRIFICE_COLON_DRAW: Pattern = re.compile(r'sacrifice [^:]+: draw', re.IGNORECASE)
SACRIFICED_COMMA_DRAW: Pattern = re.compile(r'sacrificed[^,]+, draw', re.IGNORECASE)
EXILE_RETURN_BATTLEFIELD: Pattern = re.compile(r'exile.*return.*to the battlefield', re.IGNORECASE)
# =============================================================================
# DISCARD PATTERNS
# =============================================================================
DISCARD_A_CARD: Pattern = re.compile(r'discard (?:a|one|two|three|x) card', re.IGNORECASE)
DISCARD_YOUR_HAND: Pattern = re.compile(r'discard your hand', re.IGNORECASE)
YOU_DISCARD: Pattern = re.compile(r'you discard', re.IGNORECASE)
# Discard triggers
WHENEVER_YOU_DISCARD: Pattern = re.compile(r'whenever you discard', re.IGNORECASE)
IF_YOU_DISCARDED: Pattern = re.compile(r'if you discarded', re.IGNORECASE)
WHEN_YOU_DISCARD: Pattern = re.compile(r'when you discard', re.IGNORECASE)
FOR_EACH_DISCARDED: Pattern = re.compile(r'for each card you discarded', re.IGNORECASE)
# Opponent discard
TARGET_PLAYER_DISCARDS: Pattern = re.compile(r'target player discards', re.IGNORECASE)
TARGET_OPPONENT_DISCARDS: Pattern = re.compile(r'target opponent discards', re.IGNORECASE)
EACH_PLAYER_DISCARDS: Pattern = re.compile(r'each player discards', re.IGNORECASE)
EACH_OPPONENT_DISCARDS: Pattern = re.compile(r'each opponent discards', re.IGNORECASE)
THAT_PLAYER_DISCARDS: Pattern = re.compile(r'that player discards', re.IGNORECASE)
# Discard cost
ADDITIONAL_COST_DISCARD: Pattern = re.compile(r'as an additional cost to (?:cast this spell|activate this ability),? discard (?:a|one) card', re.IGNORECASE)
ADDITIONAL_COST_DISCARD_SHORT: Pattern = re.compile(r'as an additional cost,? discard (?:a|one) card', re.IGNORECASE)
MADNESS: Pattern = re.compile(r'\bmadness\b', re.IGNORECASE)
# =============================================================================
# DAMAGE & LIFE LOSS PATTERNS
# =============================================================================
DEALS_ONE_DAMAGE: Pattern = re.compile(r'deals\s+1\s+damage', re.IGNORECASE)
EXACTLY_ONE_DAMAGE: Pattern = re.compile(r'exactly\s+1\s+damage', re.IGNORECASE)
LOSES_ONE_LIFE: Pattern = re.compile(r'loses\s+1\s+life', re.IGNORECASE)
# =============================================================================
# COST REDUCTION PATTERNS
# =============================================================================
COST_LESS: Pattern = re.compile(r'cost[s]? \{[\d\w]\} less', re.IGNORECASE)
COST_LESS_TO_CAST: Pattern = re.compile(r'cost[s]? less to cast', re.IGNORECASE)
WITH_X_IN_COST: Pattern = re.compile(r'with \{[xX]\} in (?:its|their)', re.IGNORECASE)
AFFINITY_FOR: Pattern = re.compile(r'affinity for', re.IGNORECASE)
SPELLS_COST: Pattern = re.compile(r'spells cost', re.IGNORECASE)
SPELLS_YOU_CAST_COST: Pattern = re.compile(r'spells you cast cost', re.IGNORECASE)
# =============================================================================
# MONARCH & INITIATIVE PATTERNS
# =============================================================================
BECOME_MONARCH: Pattern = re.compile(r'becomes? the monarch', re.IGNORECASE)
IS_MONARCH: Pattern = re.compile(r'is the monarch', re.IGNORECASE)
WAS_MONARCH: Pattern = re.compile(r'was the monarch', re.IGNORECASE)
YOU_ARE_MONARCH: Pattern = re.compile(r"you are the monarch|you're the monarch", re.IGNORECASE)
YOU_BECOME_MONARCH: Pattern = re.compile(r'you become the monarch', re.IGNORECASE)
CANT_BECOME_MONARCH: Pattern = re.compile(r"can't become the monarch", re.IGNORECASE)
# =============================================================================
# KEYWORD ABILITY PATTERNS
# =============================================================================
PARTNER_BASIC: Pattern = re.compile(r'\bpartner\b(?!\s*(?:with|[-—–]))', re.IGNORECASE)
PARTNER_WITH: Pattern = re.compile(r'partner with', re.IGNORECASE)
PARTNER_SURVIVORS: Pattern = re.compile(r'Partner\s*[-—–]\s*Survivors', re.IGNORECASE)
PARTNER_FATHER_SON: Pattern = re.compile(r'Partner\s*[-—–]\s*Father\s*&\s*Son', re.IGNORECASE)
FLYING: Pattern = re.compile(r'\bflying\b', re.IGNORECASE)
VIGILANCE: Pattern = re.compile(r'\bvigilance\b', re.IGNORECASE)
TRAMPLE: Pattern = re.compile(r'\btrample\b', re.IGNORECASE)
HASTE: Pattern = re.compile(r'\bhaste\b', re.IGNORECASE)
LIFELINK: Pattern = re.compile(r'\blifelink\b', re.IGNORECASE)
DEATHTOUCH: Pattern = re.compile(r'\bdeathtouch\b', re.IGNORECASE)
DOUBLE_STRIKE: Pattern = re.compile(r'double strike', re.IGNORECASE)
FIRST_STRIKE: Pattern = re.compile(r'first strike', re.IGNORECASE)
MENACE: Pattern = re.compile(r'\bmenace\b', re.IGNORECASE)
REACH: Pattern = re.compile(r'\breach\b', re.IGNORECASE)
UNDYING: Pattern = re.compile(r'\bundying\b', re.IGNORECASE)
PERSIST: Pattern = re.compile(r'\bpersist\b', re.IGNORECASE)
PHASING: Pattern = re.compile(r'\bphasing\b', re.IGNORECASE)
FLASH: Pattern = re.compile(r'\bflash\b', re.IGNORECASE)
TOXIC: Pattern = re.compile(r'toxic\s*\d+', re.IGNORECASE)
# =============================================================================
# RETURN TO BATTLEFIELD PATTERNS
# =============================================================================
RETURN_TO_BATTLEFIELD: Pattern = re.compile(r'return.*to the battlefield', re.IGNORECASE)
RETURN_IT_TO_BATTLEFIELD: Pattern = re.compile(r'return it to the battlefield', re.IGNORECASE)
RETURN_THAT_CARD_TO_BATTLEFIELD: Pattern = re.compile(r'return that card to the battlefield', re.IGNORECASE)
RETURN_THEM_TO_BATTLEFIELD: Pattern = re.compile(r'return them to the battlefield', re.IGNORECASE)
RETURN_THOSE_CARDS_TO_BATTLEFIELD: Pattern = re.compile(r'return those cards to the battlefield', re.IGNORECASE)
RETURN_TO_HAND: Pattern = re.compile(r'return.*to.*hand', re.IGNORECASE)
RETURN_YOU_CONTROL_TO_HAND: Pattern = re.compile(r'return target.*you control.*to.*hand', re.IGNORECASE)
# =============================================================================
# SCOPE & QUALIFIER PATTERNS
# =============================================================================
OTHER_CREATURES: Pattern = re.compile(r'other creature[s]?', re.IGNORECASE)
ALL_CREATURES: Pattern = re.compile(r'\ball creature[s]?\b', re.IGNORECASE)
ALL_PERMANENTS: Pattern = re.compile(r'\ball permanent[s]?\b', re.IGNORECASE)
ALL_SLIVERS: Pattern = re.compile(r'\ball sliver[s]?\b', re.IGNORECASE)
EQUIPPED_CREATURE: Pattern = re.compile(r'equipped creature', re.IGNORECASE)
ENCHANTED_CREATURE: Pattern = re.compile(r'enchanted creature', re.IGNORECASE)
ENCHANTED_PERMANENT: Pattern = re.compile(r'enchanted permanent', re.IGNORECASE)
ENCHANTED_ENCHANTMENT: Pattern = re.compile(r'enchanted enchantment', re.IGNORECASE)
# =============================================================================
# COMBAT PATTERNS
# =============================================================================
ATTACK: Pattern = re.compile(r'\battack[s]?\b', re.IGNORECASE)
ATTACKS: Pattern = re.compile(r'\battacks\b', re.IGNORECASE)
BLOCK: Pattern = re.compile(r'\bblock[s]?\b', re.IGNORECASE)
BLOCKS: Pattern = re.compile(r'\bblocks\b', re.IGNORECASE)
COMBAT_DAMAGE: Pattern = re.compile(r'combat damage', re.IGNORECASE)
WHENEVER_ATTACKS: Pattern = re.compile(r'whenever .* attacks', re.IGNORECASE)
WHEN_ATTACKS: Pattern = re.compile(r'when .* attacks', re.IGNORECASE)
# =============================================================================
# TYPE LINE PATTERNS
# =============================================================================
INSTANT: Pattern = re.compile(r'\bInstant\b', re.IGNORECASE)
SORCERY: Pattern = re.compile(r'\bSorcery\b', re.IGNORECASE)
ARTIFACT: Pattern = re.compile(r'\bArtifact\b', re.IGNORECASE)
ENCHANTMENT: Pattern = re.compile(r'\bEnchantment\b', re.IGNORECASE)
CREATURE: Pattern = re.compile(r'\bCreature\b', re.IGNORECASE)
PLANESWALKER: Pattern = re.compile(r'\bPlaneswalker\b', re.IGNORECASE)
LAND: Pattern = re.compile(r'\bLand\b', re.IGNORECASE)
AURA: Pattern = re.compile(r'\bAura\b', re.IGNORECASE)
EQUIPMENT: Pattern = re.compile(r'\bEquipment\b', re.IGNORECASE)
VEHICLE: Pattern = re.compile(r'\bVehicle\b', re.IGNORECASE)
SAGA: Pattern = re.compile(r'\bSaga\b', re.IGNORECASE)
NONCREATURE: Pattern = re.compile(r'noncreature', re.IGNORECASE)
# =============================================================================
# PATTERN BUILDER FUNCTIONS
# =============================================================================
def ownership_pattern(subject: str, owner: str = "you") -> Pattern:
"""
Build ownership pattern like 'creatures you control', 'permanents opponent controls'.
Args:
subject: The card type (e.g., 'creature', 'permanent', 'artifact')
owner: Controller ('you', 'opponent', 'they', etc.)
Returns:
Compiled regex pattern
Examples:
>>> ownership_pattern('creature', 'you')
# Matches "creatures you control"
>>> ownership_pattern('artifact', 'opponent')
# Matches "artifacts opponent controls"
"""
pattern = fr'{subject}[s]?\s+{owner}\s+control[s]?'
return re.compile(pattern, re.IGNORECASE)
def grant_pattern(subject: str, verb: str, ability: str) -> Pattern:
"""
Build grant pattern like 'creatures you control gain hexproof'.
Args:
subject: What gains the ability ('creatures you control', 'target creature', etc.)
verb: Grant verb ('gain', 'has', 'get', etc.)
ability: Ability granted ('hexproof', 'flying', 'ward', etc.)
Returns:
Compiled regex pattern
Examples:
>>> grant_pattern('creatures you control', 'gain', 'hexproof')
# Matches "creatures you control gain hexproof"
"""
pattern = fr'{subject}\s+{verb}[s]?\s+{ability}'
return re.compile(pattern, re.IGNORECASE)
def token_creation_pattern(quantity: str, token_type: str) -> Pattern:
"""
Build token creation pattern like 'create two 1/1 Soldier tokens'.
Args:
quantity: Number word or variable ('one', 'two', 'x', etc.)
token_type: Token name ('treasure', 'food', 'soldier', etc.)
Returns:
Compiled regex pattern
Examples:
>>> token_creation_pattern('two', 'treasure')
# Matches "create two Treasure tokens"
"""
pattern = fr'create[s]?\s+(?:{quantity})\s+.*{token_type}\s+token'
return re.compile(pattern, re.IGNORECASE)
def kindred_grant_pattern(tribe: str, ability: str) -> Pattern:
"""
Build kindred grant pattern like 'knights you control gain protection'.
Args:
tribe: Creature type ('knight', 'elf', 'zombie', etc.)
ability: Ability granted ('hexproof', 'protection', etc.)
Returns:
Compiled regex pattern
Examples:
>>> kindred_grant_pattern('knight', 'hexproof')
# Matches "Knights you control gain hexproof"
"""
pattern = fr'{tribe}[s]?\s+you\s+control.*\b{ability}\b'
return re.compile(pattern, re.IGNORECASE)
def targeting_pattern(target: str, subject: str = None) -> Pattern:
"""
Build targeting pattern like 'target creature you control'.
Args:
target: What is targeted ('player', 'opponent', 'creature', etc.)
subject: Optional qualifier ('you control', 'opponent controls', etc.)
Returns:
Compiled regex pattern
Examples:
>>> targeting_pattern('creature', 'you control')
# Matches "target creature you control"
>>> targeting_pattern('opponent')
# Matches "target opponent"
"""
if subject:
pattern = fr'target\s+{target}\s+{subject}'
else:
pattern = fr'target\s+{target}'
return re.compile(pattern, re.IGNORECASE)
# =============================================================================
# MODULE EXPORTS
# =============================================================================
__all__ = [
# Ownership
'YOU_CONTROL', 'THEY_CONTROL', 'OPPONENT_CONTROL',
'CREATURE_YOU_CONTROL', 'PERMANENT_YOU_CONTROL', 'ARTIFACT_YOU_CONTROL',
'ENCHANTMENT_YOU_CONTROL',
# Grant verbs
'GAIN', 'HAS', 'HAVE', 'GET', 'GRANT_VERBS',
# Targeting
'TARGET_PLAYER', 'TARGET_OPPONENT', 'TARGET_CREATURE', 'TARGET_PERMANENT',
'TARGET_ARTIFACT', 'TARGET_ENCHANTMENT', 'EACH_PLAYER', 'EACH_OPPONENT',
'TARGET_YOU_CONTROL',
# Protection abilities
'HEXPROOF', 'SHROUD', 'INDESTRUCTIBLE', 'WARD', 'PROTECTION_FROM',
'PROTECTION_ABILITIES', 'CANT_HAVE_PROTECTION', 'LOSE_PROTECTION',
# Draw
'DRAW_A_CARD', 'DRAW_CARDS', 'DRAW',
# Tokens
'CREATE_TOKEN', 'PUT_TOKEN',
'CREATE_TREASURE', 'CREATE_FOOD', 'CREATE_CLUE', 'CREATE_BLOOD',
# Counters
'PLUS_ONE_COUNTER', 'MINUS_ONE_COUNTER', 'LOYALTY_COUNTER', 'PROLIFERATE',
'ONE_OR_MORE_COUNTERS', 'ONE_OR_MORE_PLUS_ONE_COUNTERS', 'IF_HAD_COUNTERS', 'WITH_COUNTERS_ON_THEM',
# Removal
'SACRIFICE', 'SACRIFICED', 'DESTROY', 'EXILE', 'EXILED',
'SACRIFICE_DRAW', 'SACRIFICE_COLON_DRAW', 'SACRIFICED_COMMA_DRAW',
'EXILE_RETURN_BATTLEFIELD',
# Discard
'DISCARD_A_CARD', 'DISCARD_YOUR_HAND', 'YOU_DISCARD',
'WHENEVER_YOU_DISCARD', 'IF_YOU_DISCARDED', 'WHEN_YOU_DISCARD', 'FOR_EACH_DISCARDED',
'TARGET_PLAYER_DISCARDS', 'TARGET_OPPONENT_DISCARDS', 'EACH_PLAYER_DISCARDS',
'EACH_OPPONENT_DISCARDS', 'THAT_PLAYER_DISCARDS',
'ADDITIONAL_COST_DISCARD', 'ADDITIONAL_COST_DISCARD_SHORT', 'MADNESS',
# Damage & Life Loss
'DEALS_ONE_DAMAGE', 'EXACTLY_ONE_DAMAGE', 'LOSES_ONE_LIFE',
# Cost reduction
'COST_LESS', 'COST_LESS_TO_CAST', 'WITH_X_IN_COST', 'AFFINITY_FOR', 'SPELLS_COST', 'SPELLS_YOU_CAST_COST',
# Monarch
'BECOME_MONARCH', 'IS_MONARCH', 'WAS_MONARCH', 'YOU_ARE_MONARCH',
'YOU_BECOME_MONARCH', 'CANT_BECOME_MONARCH',
# Keywords
'PARTNER_BASIC', 'PARTNER_WITH', 'PARTNER_SURVIVORS', 'PARTNER_FATHER_SON',
'FLYING', 'VIGILANCE', 'TRAMPLE', 'HASTE', 'LIFELINK', 'DEATHTOUCH',
'DOUBLE_STRIKE', 'FIRST_STRIKE', 'MENACE', 'REACH',
'UNDYING', 'PERSIST', 'PHASING', 'FLASH', 'TOXIC',
# Return
'RETURN_TO_BATTLEFIELD', 'RETURN_IT_TO_BATTLEFIELD', 'RETURN_THAT_CARD_TO_BATTLEFIELD',
'RETURN_THEM_TO_BATTLEFIELD', 'RETURN_THOSE_CARDS_TO_BATTLEFIELD',
'RETURN_TO_HAND', 'RETURN_YOU_CONTROL_TO_HAND',
# Scope
'OTHER_CREATURES', 'ALL_CREATURES', 'ALL_PERMANENTS', 'ALL_SLIVERS',
'EQUIPPED_CREATURE', 'ENCHANTED_CREATURE', 'ENCHANTED_PERMANENT', 'ENCHANTED_ENCHANTMENT',
# Combat
'ATTACK', 'ATTACKS', 'BLOCK', 'BLOCKS', 'COMBAT_DAMAGE',
'WHENEVER_ATTACKS', 'WHEN_ATTACKS',
# Type line
'INSTANT', 'SORCERY', 'ARTIFACT', 'ENCHANTMENT', 'CREATURE', 'PLANESWALKER', 'LAND',
'AURA', 'EQUIPMENT', 'VEHICLE', 'SAGA', 'NONCREATURE',
# Builders
'ownership_pattern', 'grant_pattern', 'token_creation_pattern',
'kindred_grant_pattern', 'targeting_pattern',
]

View file

@ -0,0 +1,420 @@
"""
Scope Detection Utilities
Generic utilities for detecting the scope of card abilities (protection, phasing, etc.).
Provides reusable pattern-matching logic to avoid duplication across modules.
Created as part of M2: Create Scope Detection Utilities milestone.
"""
# Standard library imports
import re
from dataclasses import dataclass
from typing import List, Optional, Set
# Local application imports
from . import regex_patterns as rgx
from . import tag_utils
from code.logging_util import get_logger
logger = get_logger(__name__)
@dataclass
class ScopePatterns:
"""
Pattern collections for scope detection.
Attributes:
opponent: Patterns that indicate opponent ownership
self_ref: Patterns that indicate self-reference
your_permanents: Patterns that indicate "you control"
blanket: Patterns that indicate no ownership qualifier
targeted: Patterns that indicate targeting (optional)
"""
opponent: List[re.Pattern]
self_ref: List[re.Pattern]
your_permanents: List[re.Pattern]
blanket: List[re.Pattern]
targeted: Optional[List[re.Pattern]] = None
def detect_scope(
text: str,
card_name: str,
ability_keyword: str,
patterns: ScopePatterns,
allow_multiple: bool = False,
check_grant_verbs: bool = False,
keywords: Optional[str] = None,
) -> Optional[str]:
"""
Generic scope detection with priority ordering.
Detection priority (prevents misclassification):
0. Static keyword (in keywords field or simple list) "Self"
1. Opponent ownership "Opponent Permanents"
2. Self-reference "Self"
3. Your ownership "Your Permanents"
4. No ownership qualifier "Blanket"
Args:
text: Card text
card_name: Card name (for self-reference detection)
ability_keyword: Ability keyword to look for (e.g., "hexproof", "phasing")
patterns: ScopePatterns object with pattern collections
allow_multiple: If True, returns Set[str] instead of single scope
check_grant_verbs: If True, checks for grant verbs before assuming "Self"
keywords: Optional keywords field from card data (for static keyword detection)
Returns:
Scope string or None: "Self", "Your Permanents", "Blanket", "Opponent Permanents"
If allow_multiple=True, returns Set[str] with all matching scopes
"""
if not text or not ability_keyword:
return set() if allow_multiple else None
text_lower = text.lower()
ability_lower = ability_keyword.lower()
card_name_lower = card_name.lower() if card_name else ''
# Check if ability is mentioned in text
if ability_lower not in text_lower:
return set() if allow_multiple else None
# Priority 0: Check if this is a static keyword ability
# Static keywords appear in the keywords field or as simple comma-separated lists
# without grant verbs (e.g., "Flying, first strike, protection from black")
if check_static_keyword(ability_keyword, keywords, text):
if allow_multiple:
return {"Self"}
else:
return "Self"
if allow_multiple:
scopes = set()
else:
scopes = None
# Priority 1: Opponent ownership
for pattern in patterns.opponent:
if pattern.search(text_lower):
if allow_multiple:
scopes.add("Opponent Permanents")
break
else:
return "Opponent Permanents"
# Priority 2: Self-reference
is_self = _check_self_reference(text_lower, card_name_lower, ability_lower, patterns.self_ref)
# If check_grant_verbs is True, verify we don't have grant patterns before assuming Self
if is_self and check_grant_verbs:
has_grant_pattern = _has_grant_verbs(text_lower)
if not has_grant_pattern:
if allow_multiple:
scopes.add("Self")
else:
return "Self"
elif is_self:
if allow_multiple:
scopes.add("Self")
else:
return "Self"
# Priority 3: Your ownership
for pattern in patterns.your_permanents:
if pattern.search(text_lower):
if allow_multiple:
scopes.add("Your Permanents")
break
else:
return "Your Permanents"
# Priority 4: Blanket (no ownership qualifier)
for pattern in patterns.blanket:
if pattern.search(text_lower):
# Double-check no ownership was missed
if not rgx.YOU_CONTROL.search(text_lower) and 'opponent' not in text_lower:
if allow_multiple:
scopes.add("Blanket")
break
else:
return "Blanket"
return scopes if allow_multiple else None
def detect_multi_scope(
text: str,
card_name: str,
ability_keyword: str,
patterns: ScopePatterns,
check_grant_verbs: bool = False,
keywords: Optional[str] = None,
) -> Set[str]:
"""
Detect multiple scopes for cards with multiple effects.
Some cards grant abilities to multiple scopes:
- Self-hexproof + grants ward to others
- Target phasing + your permanents phasing
Args:
text: Card text
card_name: Card name
ability_keyword: Ability keyword to look for
patterns: ScopePatterns object
check_grant_verbs: If True, checks for grant verbs before assuming "Self"
keywords: Optional keywords field for static keyword detection
Returns:
Set of scope strings
"""
scopes = set()
if not text or not ability_keyword:
return scopes
text_lower = text.lower()
ability_lower = ability_keyword.lower()
card_name_lower = card_name.lower() if card_name else ''
# Check for static keyword first
if check_static_keyword(ability_keyword, keywords, text):
scopes.add("Self")
# For static keywords, we usually don't have multiple scopes
# But continue checking in case there are additional effects
# Check if ability is mentioned
if ability_lower not in text_lower:
return scopes
# Check opponent patterns
if any(pattern.search(text_lower) for pattern in patterns.opponent):
scopes.add("Opponent Permanents")
# Check self-reference
is_self = _check_self_reference(text_lower, card_name_lower, ability_lower, patterns.self_ref)
if is_self:
if check_grant_verbs:
has_grant_pattern = _has_grant_verbs(text_lower)
if not has_grant_pattern:
scopes.add("Self")
else:
scopes.add("Self")
# Check your permanents
if any(pattern.search(text_lower) for pattern in patterns.your_permanents):
scopes.add("Your Permanents")
# Check blanket (no ownership)
has_blanket = any(pattern.search(text_lower) for pattern in patterns.blanket)
no_ownership = not rgx.YOU_CONTROL.search(text_lower) and 'opponent' not in text_lower
if has_blanket and no_ownership:
scopes.add("Blanket")
# Optional: Check for targeting
if patterns.targeted:
if any(pattern.search(text_lower) for pattern in patterns.targeted):
scopes.add("Targeted")
return scopes
def _check_self_reference(
text_lower: str,
card_name_lower: str,
ability_lower: str,
self_patterns: List[re.Pattern]
) -> bool:
"""
Check if text contains self-reference patterns.
Args:
text_lower: Lowercase card text
card_name_lower: Lowercase card name
ability_lower: Lowercase ability keyword
self_patterns: List of self-reference patterns
Returns:
True if self-reference found
"""
# Check provided self patterns
for pattern in self_patterns:
if pattern.search(text_lower):
return True
# Check for card name reference (if provided)
if card_name_lower:
card_name_escaped = re.escape(card_name_lower)
card_name_pattern = re.compile(rf'\b{card_name_escaped}\b', re.IGNORECASE)
if card_name_pattern.search(text_lower):
# Make sure it's in a self-ability context
self_context_patterns = [
re.compile(rf'\b{card_name_escaped}\s+(?:has|gains?)\s+{ability_lower}', re.IGNORECASE),
re.compile(rf'\b{card_name_escaped}\s+is\s+{ability_lower}', re.IGNORECASE),
]
for pattern in self_context_patterns:
if pattern.search(text_lower):
return True
return False
def _has_grant_verbs(text_lower: str) -> bool:
"""
Check if text contains grant verb patterns.
Used to distinguish inherent abilities from granted abilities.
Args:
text_lower: Lowercase card text
Returns:
True if grant verbs found
"""
grant_patterns = [
re.compile(r'(?:have|gain|grant|give|get)[s]?\s+', re.IGNORECASE),
rgx.OTHER_CREATURES,
rgx.CREATURE_YOU_CONTROL,
rgx.PERMANENT_YOU_CONTROL,
rgx.EQUIPPED_CREATURE,
rgx.ENCHANTED_CREATURE,
rgx.TARGET_CREATURE,
]
return any(pattern.search(text_lower) for pattern in grant_patterns)
def format_scope_tag(scope: str, ability: str) -> str:
"""
Format a scope and ability into a metadata tag.
Args:
scope: Scope string (e.g., "Self", "Your Permanents")
ability: Ability name (e.g., "Hexproof", "Phasing")
Returns:
Formatted tag string (e.g., "Self: Hexproof")
"""
return f"{scope}: {ability}"
def has_keyword(text: str, keywords: List[str]) -> bool:
"""
Quick check if card text contains any of the specified keywords.
Args:
text: Card text
keywords: List of keywords to search for
Returns:
True if any keyword found
"""
if not text:
return False
text_lower = text.lower()
return any(keyword.lower() in text_lower for keyword in keywords)
def check_static_keyword(
ability_keyword: str,
keywords: Optional[str] = None,
text: Optional[str] = None
) -> bool:
"""
Check if card has ability as a static keyword (not granted to others).
A static keyword is one that appears:
1. In the keywords field, OR
2. As a simple comma-separated list without grant verbs
(e.g., "Flying, first strike, protection from black")
Args:
ability_keyword: Ability to check (e.g., "Protection", "Hexproof")
keywords: Optional keywords field from card data
text: Optional card text for fallback detection
Returns:
True if ability appears as static keyword
"""
ability_lower = ability_keyword.lower()
# Check keywords field first (most reliable)
if keywords:
keywords_lower = keywords.lower()
if ability_lower in keywords_lower:
return True
# Fallback: Check if ability appears in simple comma-separated keyword list
# Pattern: starts with keywords (Flying, First strike, etc.) without grant verbs
# Example: "Flying, first strike, vigilance, trample, haste, protection from black"
if text:
text_lower = text.lower()
# Check if ability appears in text but WITHOUT grant verbs
if ability_lower in text_lower:
# Look for grant verbs that would indicate this is NOT a static keyword
grant_verbs = ['have', 'has', 'gain', 'gains', 'get', 'gets', 'grant', 'grants', 'give', 'gives']
# Find the position of the ability in text
ability_pos = text_lower.find(ability_lower)
# Check the 50 characters before the ability for grant verbs
# This catches patterns like "creatures gain protection" or "has hexproof"
context_before = text_lower[max(0, ability_pos - 50):ability_pos]
# If no grant verbs found nearby, it's likely a static keyword
if not any(verb in context_before for verb in grant_verbs):
# Additional check: is it part of a comma-separated list?
# This helps with "Flying, first strike, protection from X" patterns
context_before_30 = text_lower[max(0, ability_pos - 30):ability_pos]
if ',' in context_before_30 or ability_pos < 10:
return True
return False
def check_static_keyword_legacy(
keywords: str,
static_keyword: str,
text: str,
grant_patterns: Optional[List[re.Pattern]] = None
) -> bool:
"""
LEGACY: Check if card has static keyword without granting it to others.
Used for abilities like "Phasing" that can be both static and granted.
Args:
keywords: Card keywords field
static_keyword: Keyword to search for (e.g., "phasing")
text: Card text
grant_patterns: Optional patterns to check for granting language
Returns:
True if static keyword found and not granted to others
"""
if not keywords:
return False
keywords_lower = keywords.lower()
if static_keyword.lower() not in keywords_lower:
return False
# If grant patterns provided, check if card grants to others
if grant_patterns:
text_no_reminder = tag_utils.strip_reminder_text(text.lower()) if text else ''
grants_to_others = any(pattern.search(text_no_reminder) for pattern in grant_patterns)
# Only return True if NOT granting to others
return not grants_to_others
return True

View file

@ -1,13 +1,59 @@
from typing import Dict, List, Final """
Tag Constants Module
Centralized constants for card tagging and theme detection across the MTG deckbuilder.
This module contains all shared constants used by the tagging system including:
- Card types and creature types
- Pattern groups and regex fragments
- Tag groupings and relationships
- Protection and ability keywords
- Magic numbers and thresholds
"""
from typing import Dict, Final, List
# =============================================================================
# TABLE OF CONTENTS
# =============================================================================
# 1. TRIGGERS & BASIC PATTERNS
# 2. TAG GROUPS & RELATIONSHIPS
# 3. PATTERN GROUPS & REGEX FRAGMENTS
# 4. PHRASE GROUPS
# 5. COUNTER TYPES
# 6. CREATURE TYPES
# 7. NON-CREATURE TYPES & SPECIAL TYPES
# 8. PROTECTION & ABILITY KEYWORDS
# 9. TOKEN TYPES
# 10. MAGIC NUMBERS & THRESHOLDS
# 11. DATAFRAME COLUMN REQUIREMENTS
# 12. TYPE-TAG MAPPINGS
# 13. DRAW-RELATED CONSTANTS
# 14. EQUIPMENT-RELATED CONSTANTS
# 15. AURA & VOLTRON CONSTANTS
# 16. LANDS MATTER PATTERNS
# 17. SACRIFICE & GRAVEYARD PATTERNS
# 18. CREATURE-RELATED PATTERNS
# 19. TOKEN-RELATED PATTERNS
# 20. REMOVAL & DESTRUCTION PATTERNS
# 21. SPELL-RELATED PATTERNS
# 22. MISC PATTERNS & EXCLUSIONS
# =============================================================================
# 1. TRIGGERS & BASIC PATTERNS
# =============================================================================
TRIGGERS: List[str] = ['when', 'whenever', 'at'] TRIGGERS: List[str] = ['when', 'whenever', 'at']
NUM_TO_SEARCH: List[str] = ['a', 'an', 'one', '1', 'two', '2', 'three', '3', 'four','4', 'five', '5', NUM_TO_SEARCH: List[str] = [
'six', '6', 'seven', '7', 'eight', '8', 'nine', '9', 'ten', '10', 'a', 'an', 'one', '1', 'two', '2', 'three', '3', 'four', '4', 'five', '5',
'x','one or more'] 'six', '6', 'seven', '7', 'eight', '8', 'nine', '9', 'ten', '10',
'x', 'one or more'
]
# =============================================================================
# 2. TAG GROUPS & RELATIONSHIPS
# =============================================================================
# Constants for common tag groupings
TAG_GROUPS: Dict[str, List[str]] = { TAG_GROUPS: Dict[str, List[str]] = {
"Cantrips": ["Cantrips", "Card Draw", "Spellslinger", "Spells Matter"], "Cantrips": ["Cantrips", "Card Draw", "Spellslinger", "Spells Matter"],
"Tokens": ["Token Creation", "Tokens Matter"], "Tokens": ["Token Creation", "Tokens Matter"],
@ -19,8 +65,11 @@ TAG_GROUPS: Dict[str, List[str]] = {
"Spells": ["Spellslinger", "Spells Matter"] "Spells": ["Spellslinger", "Spells Matter"]
} }
# Common regex patterns # =============================================================================
PATTERN_GROUPS: Dict[str, str] = { # 3. PATTERN GROUPS & REGEX FRAGMENTS
# =============================================================================
PATTERN_GROUPS: Dict[str, str] = {
"draw": r"draw[s]? a card|draw[s]? one card", "draw": r"draw[s]? a card|draw[s]? one card",
"combat": r"attack[s]?|block[s]?|combat damage", "combat": r"attack[s]?|block[s]?|combat damage",
"tokens": r"create[s]? .* token|put[s]? .* token", "tokens": r"create[s]? .* token|put[s]? .* token",
@ -30,7 +79,10 @@ PATTERN_GROUPS: Dict[str, str] = {
"cost_reduction": r"cost[s]? \{[\d\w]\} less|affinity for|cost[s]? less to cast|chosen type cost|copy cost|from exile cost|from exile this turn cost|from your graveyard cost|has undaunted|have affinity for artifacts|other than your hand cost|spells cost|spells you cast cost|that target .* cost|those spells cost|you cast cost|you pay cost" "cost_reduction": r"cost[s]? \{[\d\w]\} less|affinity for|cost[s]? less to cast|chosen type cost|copy cost|from exile cost|from exile this turn cost|from your graveyard cost|has undaunted|have affinity for artifacts|other than your hand cost|spells cost|spells you cast cost|that target .* cost|those spells cost|you cast cost|you pay cost"
} }
# Common phrase groups (lists) used across taggers # =============================================================================
# 4. PHRASE GROUPS
# =============================================================================
PHRASE_GROUPS: Dict[str, List[str]] = { PHRASE_GROUPS: Dict[str, List[str]] = {
# Variants for monarch wording # Variants for monarch wording
"monarch": [ "monarch": [
@ -52,11 +104,15 @@ PHRASE_GROUPS: Dict[str, List[str]] = {
r"return .* to the battlefield" r"return .* to the battlefield"
] ]
} }
# Common action patterns
CREATE_ACTION_PATTERN: Final[str] = r"create|put" CREATE_ACTION_PATTERN: Final[str] = r"create|put"
# Creature/Counter types # =============================================================================
COUNTER_TYPES: List[str] = [r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+2/\+0', r'\+2/\+2', # 5. COUNTER TYPES
# =============================================================================
COUNTER_TYPES: List[str] = [
r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+2/\+0', r'\+2/\+2',
'-0/-1', '-0/-2', '-1/-0', '-1/-2', '-2/-0', '-2/-2', '-0/-1', '-0/-2', '-1/-0', '-1/-2', '-2/-0', '-2/-2',
'Acorn', 'Aegis', 'Age', 'Aim', 'Arrow', 'Arrowhead','Awakening', 'Acorn', 'Aegis', 'Age', 'Aim', 'Arrow', 'Arrowhead','Awakening',
'Bait', 'Blaze', 'Blessing', 'Blight',' Blood', 'Bloddline', 'Bait', 'Blaze', 'Blessing', 'Blight',' Blood', 'Bloddline',
@ -90,9 +146,15 @@ COUNTER_TYPES: List[str] = [r'\+0/\+1', r'\+0/\+2', r'\+1/\+0', r'\+1/\+2', r'\+
'Task', 'Ticket', 'Tide', 'Time', 'Tower', 'Training', 'Trap', 'Task', 'Ticket', 'Tide', 'Time', 'Tower', 'Training', 'Trap',
'Treasure', 'Unity', 'Unlock', 'Valor', 'Velocity', 'Verse', 'Treasure', 'Unity', 'Unlock', 'Valor', 'Velocity', 'Verse',
'Vitality', 'Void', 'Volatile', 'Vortex', 'Vow', 'Voyage', 'Wage', 'Vitality', 'Void', 'Volatile', 'Vortex', 'Vow', 'Voyage', 'Wage',
'Winch', 'Wind', 'Wish'] 'Winch', 'Wind', 'Wish'
]
CREATURE_TYPES: List[str] = ['Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel', 'Antelope', 'Ape', 'Archer', 'Archon', 'Armadillo', # =============================================================================
# 6. CREATURE TYPES
# =============================================================================
CREATURE_TYPES: List[str] = [
'Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel', 'Antelope', 'Ape', 'Archer', 'Archon', 'Armadillo',
'Army', 'Artificer', 'Assassin', 'Assembly-Worker', 'Astartes', 'Atog', 'Aurochs', 'Automaton', 'Army', 'Artificer', 'Assassin', 'Assembly-Worker', 'Astartes', 'Atog', 'Aurochs', 'Automaton',
'Avatar', 'Azra', 'Badger', 'Balloon', 'Barbarian', 'Bard', 'Basilisk', 'Bat', 'Bear', 'Beast', 'Beaver', 'Avatar', 'Azra', 'Badger', 'Balloon', 'Barbarian', 'Bard', 'Basilisk', 'Bat', 'Bear', 'Beast', 'Beaver',
'Beeble', 'Beholder', 'Berserker', 'Bird', 'Blinkmoth', 'Boar', 'Brainiac', 'Bringer', 'Brushwagg', 'Beeble', 'Beholder', 'Berserker', 'Bird', 'Blinkmoth', 'Boar', 'Brainiac', 'Bringer', 'Brushwagg',
@ -122,9 +184,15 @@ CREATURE_TYPES: List[str] = ['Advisor', 'Aetherborn', 'Alien', 'Ally', 'Angel',
'Thopter', 'Thrull', 'Tiefling', 'Time Lord', 'Toy', 'Treefolk', 'Trilobite', 'Triskelavite', 'Troll', 'Thopter', 'Thrull', 'Tiefling', 'Time Lord', 'Toy', 'Treefolk', 'Trilobite', 'Triskelavite', 'Troll',
'Turtle', 'Tyranid', 'Unicorn', 'Urzan', 'Vampire', 'Varmint', 'Vedalken', 'Volver', 'Wall', 'Walrus', 'Turtle', 'Tyranid', 'Unicorn', 'Urzan', 'Vampire', 'Varmint', 'Vedalken', 'Volver', 'Wall', 'Walrus',
'Warlock', 'Warrior', 'Wasp', 'Weasel', 'Weird', 'Werewolf', 'Whale', 'Wizard', 'Wolf', 'Wolverine', 'Wombat', 'Warlock', 'Warrior', 'Wasp', 'Weasel', 'Weird', 'Werewolf', 'Whale', 'Wizard', 'Wolf', 'Wolverine', 'Wombat',
'Worm', 'Wraith', 'Wurm', 'Yeti', 'Zombie', 'Zubera'] 'Worm', 'Wraith', 'Wurm', 'Yeti', 'Zombie', 'Zubera'
]
NON_CREATURE_TYPES: List[str] = ['Legendary', 'Creature', 'Enchantment', 'Artifact', # =============================================================================
# 7. NON-CREATURE TYPES & SPECIAL TYPES
# =============================================================================
NON_CREATURE_TYPES: List[str] = [
'Legendary', 'Creature', 'Enchantment', 'Artifact',
'Battle', 'Sorcery', 'Instant', 'Land', '-', '', 'Battle', 'Sorcery', 'Instant', 'Land', '-', '',
'Blood', 'Clue', 'Food', 'Gold', 'Incubator', 'Blood', 'Clue', 'Food', 'Gold', 'Incubator',
'Junk', 'Map', 'Powerstone', 'Treasure', 'Junk', 'Map', 'Powerstone', 'Treasure',
@ -136,23 +204,66 @@ NON_CREATURE_TYPES: List[str] = ['Legendary', 'Creature', 'Enchantment', 'Artifa
'Shrine', 'Shrine',
'Plains', 'Island', 'Swamp', 'Forest', 'Mountain', 'Plains', 'Island', 'Swamp', 'Forest', 'Mountain',
'Cave', 'Desert', 'Gate', 'Lair', 'Locus', 'Mine', 'Cave', 'Desert', 'Gate', 'Lair', 'Locus', 'Mine',
'Power-Plant', 'Sphere', 'Tower', 'Urza\'s'] 'Power-Plant', 'Sphere', 'Tower', 'Urza\'s'
]
OUTLAW_TYPES: List[str] = ['Assassin', 'Mercenary', 'Pirate', 'Rogue', 'Warlock'] OUTLAW_TYPES: List[str] = ['Assassin', 'Mercenary', 'Pirate', 'Rogue', 'Warlock']
ENCHANTMENT_TOKENS: List[str] = ['Cursed Role', 'Monster Role', 'Royal Role', 'Sorcerer Role', # =============================================================================
'Virtuous Role', 'Wicked Role', 'Young Hero Role', 'Shard'] # 8. PROTECTION & ABILITY KEYWORDS
ARTIFACT_TOKENS: List[str] = ['Blood', 'Clue', 'Food', 'Gold', 'Incubator', # =============================================================================
'Junk','Map','Powerstone', 'Treasure']
PROTECTION_ABILITIES: List[str] = [
'Protection',
'Ward',
'Hexproof',
'Shroud',
'Indestructible'
]
PROTECTION_KEYWORDS: Final[frozenset] = frozenset({
'hexproof',
'shroud',
'indestructible',
'ward',
'protection from',
'protection',
})
# =============================================================================
# 9. TOKEN TYPES
# =============================================================================
ENCHANTMENT_TOKENS: List[str] = [
'Cursed Role', 'Monster Role', 'Royal Role', 'Sorcerer Role',
'Virtuous Role', 'Wicked Role', 'Young Hero Role', 'Shard'
]
ARTIFACT_TOKENS: List[str] = [
'Blood', 'Clue', 'Food', 'Gold', 'Incubator',
'Junk', 'Map', 'Powerstone', 'Treasure'
]
# =============================================================================
# 10. MAGIC NUMBERS & THRESHOLDS
# =============================================================================
CONTEXT_WINDOW_SIZE: Final[int] = 70 # Characters to examine around a regex match
# =============================================================================
# 11. DATAFRAME COLUMN REQUIREMENTS
# =============================================================================
# Constants for DataFrame validation and processing
REQUIRED_COLUMNS: List[str] = [ REQUIRED_COLUMNS: List[str] = [
'name', 'faceName', 'edhrecRank', 'colorIdentity', 'colors', 'name', 'faceName', 'edhrecRank', 'colorIdentity', 'colors',
'manaCost', 'manaValue', 'type', 'creatureTypes', 'text', 'manaCost', 'manaValue', 'type', 'creatureTypes', 'text',
'power', 'toughness', 'keywords', 'themeTags', 'layout', 'side' 'power', 'toughness', 'keywords', 'themeTags', 'layout', 'side'
] ]
# Mapping of card types to their corresponding theme tags # =============================================================================
# 12. TYPE-TAG MAPPINGS
# =============================================================================
TYPE_TAG_MAPPING: Dict[str, List[str]] = { TYPE_TAG_MAPPING: Dict[str, List[str]] = {
'Artifact': ['Artifacts Matter'], 'Artifact': ['Artifacts Matter'],
'Battle': ['Battles Matter'], 'Battle': ['Battles Matter'],
@ -166,7 +277,10 @@ TYPE_TAG_MAPPING: Dict[str, List[str]] = {
'Sorcery': ['Spells Matter', 'Spellslinger'] 'Sorcery': ['Spells Matter', 'Spellslinger']
} }
# Constants for draw-related functionality # =============================================================================
# 13. DRAW-RELATED CONSTANTS
# =============================================================================
DRAW_RELATED_TAGS: List[str] = [ DRAW_RELATED_TAGS: List[str] = [
'Card Draw', # General card draw effects 'Card Draw', # General card draw effects
'Conditional Draw', # Draw effects with conditions/triggers 'Conditional Draw', # Draw effects with conditions/triggers
@ -175,16 +289,18 @@ DRAW_RELATED_TAGS: List[str] = [
'Loot', # Draw + discard effects 'Loot', # Draw + discard effects
'Replacement Draw', # Effects that modify or replace draws 'Replacement Draw', # Effects that modify or replace draws
'Sacrifice to Draw', # Draw effects requiring sacrificing permanents 'Sacrifice to Draw', # Draw effects requiring sacrificing permanents
'Unconditional Draw' # Pure card draw without conditions 'Unconditional Draw' # Pure card draw without conditions
] ]
# Text patterns that exclude cards from being tagged as unconditional draw
DRAW_EXCLUSION_PATTERNS: List[str] = [ DRAW_EXCLUSION_PATTERNS: List[str] = [
'annihilator', # Eldrazi mechanic that can match 'draw' patterns 'annihilator', # Eldrazi mechanic that can match 'draw' patterns
'ravenous', # Keyword that can match 'draw' patterns 'ravenous', # Keyword that can match 'draw' patterns
] ]
# Equipment-related constants # =============================================================================
# 14. EQUIPMENT-RELATED CONSTANTS
# =============================================================================
EQUIPMENT_EXCLUSIONS: List[str] = [ EQUIPMENT_EXCLUSIONS: List[str] = [
'Bruenor Battlehammer', # Equipment cost reduction 'Bruenor Battlehammer', # Equipment cost reduction
'Nazahn, Revered Bladesmith', # Equipment tutor 'Nazahn, Revered Bladesmith', # Equipment tutor
@ -223,7 +339,10 @@ EQUIPMENT_TEXT_PATTERNS: List[str] = [
'unequip', # Equipment removal 'unequip', # Equipment removal
] ]
# Aura-related constants # =============================================================================
# 15. AURA & VOLTRON CONSTANTS
# =============================================================================
AURA_SPECIFIC_CARDS: List[str] = [ AURA_SPECIFIC_CARDS: List[str] = [
'Ardenn, Intrepid Archaeologist', # Aura movement 'Ardenn, Intrepid Archaeologist', # Aura movement
'Calix, Guided By Fate', # Create duplicate Auras 'Calix, Guided By Fate', # Create duplicate Auras
@ -267,7 +386,10 @@ VOLTRON_PATTERNS: List[str] = [
'reconfigure' 'reconfigure'
] ]
# Constants for lands matter functionality # =============================================================================
# 16. LANDS MATTER PATTERNS
# =============================================================================
LANDS_MATTER_PATTERNS: Dict[str, List[str]] = { LANDS_MATTER_PATTERNS: Dict[str, List[str]] = {
'land_play': [ 'land_play': [
'play a land', 'play a land',

View file

@ -13,18 +13,11 @@ The module is designed to work with pandas DataFrames containing card data and p
vectorized operations for efficient processing of large card collections. vectorized operations for efficient processing of large card collections.
""" """
from __future__ import annotations from __future__ import annotations
# Standard library imports
import re import re
from typing import List, Set, Union, Any, Tuple
from functools import lru_cache from functools import lru_cache
from typing import Any, List, Set, Tuple, Union
import numpy as np import numpy as np
# Third-party imports
import pandas as pd import pandas as pd
# Local application imports
from . import tag_constants from . import tag_constants
@ -58,7 +51,6 @@ def _ensure_norm_series(df: pd.DataFrame, source_col: str, norm_col: str) -> pd.
""" """
if norm_col in df.columns: if norm_col in df.columns:
return df[norm_col] return df[norm_col]
# Create normalized string series
series = df[source_col].fillna('') if source_col in df.columns else pd.Series([''] * len(df), index=df.index) series = df[source_col].fillna('') if source_col in df.columns else pd.Series([''] * len(df), index=df.index)
series = series.astype(str) series = series.astype(str)
df[norm_col] = series df[norm_col] = series
@ -120,8 +112,6 @@ def create_type_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0: if len(df) == 0:
return pd.Series([], dtype=bool) return pd.Series([], dtype=bool)
# Use normalized cached series
type_series = _ensure_norm_series(df, 'type', '__type_s') type_series = _ensure_norm_series(df, 'type', '__type_s')
if regex: if regex:
@ -160,8 +150,6 @@ def create_text_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0: if len(df) == 0:
return pd.Series([], dtype=bool) return pd.Series([], dtype=bool)
# Use normalized cached series
text_series = _ensure_norm_series(df, 'text', '__text_s') text_series = _ensure_norm_series(df, 'text', '__text_s')
if regex: if regex:
@ -192,10 +180,7 @@ def create_keyword_mask(df: pd.DataFrame, type_text: Union[str, List[str]], rege
TypeError: If type_text is not a string or list of strings TypeError: If type_text is not a string or list of strings
ValueError: If required 'keywords' column is missing from DataFrame ValueError: If required 'keywords' column is missing from DataFrame
""" """
# Validate required columns
validate_dataframe_columns(df, {'keywords'}) validate_dataframe_columns(df, {'keywords'})
# Handle empty DataFrame case
if len(df) == 0: if len(df) == 0:
return pd.Series([], dtype=bool) return pd.Series([], dtype=bool)
@ -206,8 +191,6 @@ def create_keyword_mask(df: pd.DataFrame, type_text: Union[str, List[str]], rege
type_text = [type_text] type_text = [type_text]
elif not isinstance(type_text, list): elif not isinstance(type_text, list):
raise TypeError("type_text must be a string or list of strings") raise TypeError("type_text must be a string or list of strings")
# Use normalized cached series for keywords
keywords = _ensure_norm_series(df, 'keywords', '__keywords_s') keywords = _ensure_norm_series(df, 'keywords', '__keywords_s')
if regex: if regex:
@ -245,8 +228,6 @@ def create_name_mask(df: pd.DataFrame, type_text: Union[str, List[str]], regex:
if len(df) == 0: if len(df) == 0:
return pd.Series([], dtype=bool) return pd.Series([], dtype=bool)
# Use normalized cached series
name_series = _ensure_norm_series(df, 'name', '__name_s') name_series = _ensure_norm_series(df, 'name', '__name_s')
if regex: if regex:
@ -324,21 +305,14 @@ def create_tag_mask(df: pd.DataFrame, tag_patterns: Union[str, List[str]], colum
Boolean Series indicating matching rows Boolean Series indicating matching rows
Examples: Examples:
# Match cards with draw-related tags
>>> mask = create_tag_mask(df, ['Card Draw', 'Conditional Draw']) >>> mask = create_tag_mask(df, ['Card Draw', 'Conditional Draw'])
>>> mask = create_tag_mask(df, 'Unconditional Draw') >>> mask = create_tag_mask(df, 'Unconditional Draw')
""" """
if isinstance(tag_patterns, str): if isinstance(tag_patterns, str):
tag_patterns = [tag_patterns] tag_patterns = [tag_patterns]
# Handle empty DataFrame case
if len(df) == 0: if len(df) == 0:
return pd.Series([], dtype=bool) return pd.Series([], dtype=bool)
# Create mask for each pattern
masks = [df[column].apply(lambda x: any(pattern in tag for tag in x)) for pattern in tag_patterns] masks = [df[column].apply(lambda x: any(pattern in tag for tag in x)) for pattern in tag_patterns]
# Combine masks with OR
return pd.concat(masks, axis=1).any(axis=1) return pd.concat(masks, axis=1).any(axis=1)
def validate_dataframe_columns(df: pd.DataFrame, required_columns: Set[str]) -> None: def validate_dataframe_columns(df: pd.DataFrame, required_columns: Set[str]) -> None:
@ -365,11 +339,7 @@ def apply_tag_vectorized(df: pd.DataFrame, mask: pd.Series[bool], tags: Union[st
""" """
if not isinstance(tags, list): if not isinstance(tags, list):
tags = [tags] tags = [tags]
# Get current tags for masked rows
current_tags = df.loc[mask, 'themeTags'] current_tags = df.loc[mask, 'themeTags']
# Add new tags
df.loc[mask, 'themeTags'] = current_tags.apply(lambda x: sorted(list(set(x + tags)))) df.loc[mask, 'themeTags'] = current_tags.apply(lambda x: sorted(list(set(x + tags))))
def apply_rules(df: pd.DataFrame, rules: List[dict]) -> None: def apply_rules(df: pd.DataFrame, rules: List[dict]) -> None:
@ -463,7 +433,6 @@ def create_numbered_phrase_mask(
numbers = tag_constants.NUM_TO_SEARCH numbers = tag_constants.NUM_TO_SEARCH
# Normalize verbs to list # Normalize verbs to list
verbs = [verb] if isinstance(verb, str) else verb verbs = [verb] if isinstance(verb, str) else verb
# Build patterns
if noun: if noun:
patterns = [fr"{v}\s+{num}\s+{noun}" for v in verbs for num in numbers] patterns = [fr"{v}\s+{num}\s+{noun}" for v in verbs for num in numbers]
else: else:
@ -490,13 +459,8 @@ def create_mass_damage_mask(df: pd.DataFrame) -> pd.Series[bool]:
Returns: Returns:
Boolean Series indicating which cards have mass damage effects Boolean Series indicating which cards have mass damage effects
""" """
# Create patterns for numeric damage
number_patterns = [create_damage_pattern(i) for i in range(1, 21)] number_patterns = [create_damage_pattern(i) for i in range(1, 21)]
# Add X damage pattern
number_patterns.append(create_damage_pattern('X')) number_patterns.append(create_damage_pattern('X'))
# Add patterns for damage targets
target_patterns = [ target_patterns = [
'to each creature', 'to each creature',
'to all creatures', 'to all creatures',
@ -504,8 +468,6 @@ def create_mass_damage_mask(df: pd.DataFrame) -> pd.Series[bool]:
'to each opponent', 'to each opponent',
'to everything' 'to everything'
] ]
# Create masks
damage_mask = create_text_mask(df, number_patterns) damage_mask = create_text_mask(df, number_patterns)
target_mask = create_text_mask(df, target_patterns) target_mask = create_text_mask(df, target_patterns)
@ -555,23 +517,14 @@ def normalize_keywords(
normalized_keywords: set[str] = set() normalized_keywords: set[str] = set()
for keyword in raw: for keyword in raw:
# Skip non-string entries
if not isinstance(keyword, str): if not isinstance(keyword, str):
continue continue
# Skip empty strings
keyword = keyword.strip() keyword = keyword.strip()
if not keyword: if not keyword:
continue continue
# Skip excluded keywords
if keyword.lower() in tag_constants.KEYWORD_EXCLUSION_SET: if keyword.lower() in tag_constants.KEYWORD_EXCLUSION_SET:
continue continue
# Apply normalization map
normalized = tag_constants.KEYWORD_NORMALIZATION_MAP.get(keyword, keyword) normalized = tag_constants.KEYWORD_NORMALIZATION_MAP.get(keyword, keyword)
# Check if singleton (unless allowlisted)
frequency = frequency_map.get(keyword, 0) frequency = frequency_map.get(keyword, 0)
is_singleton = frequency == 1 is_singleton = frequency == 1
is_allowlisted = normalized in allowlist or keyword in allowlist is_allowlisted = normalized in allowlist or keyword in allowlist
@ -658,4 +611,242 @@ def classify_tag(tag: str) -> str:
return "metadata" return "metadata"
# Default: treat as theme tag # Default: treat as theme tag
return "theme" return "theme"
# --- Text Processing Helpers (M0.6) ---------------------------------------------------------
def strip_reminder_text(text: str) -> str:
"""Remove reminder text (content in parentheses) from card text.
Reminder text often contains keywords and patterns that can cause false positives
in pattern matching. This function strips all parenthetical content to focus on
the actual game text.
Args:
text: Card text possibly containing reminder text in parentheses
Returns:
Text with all parenthetical content removed
Example:
>>> strip_reminder_text("Hexproof (This creature can't be the target of spells)")
"Hexproof "
"""
if not text:
return text
return re.sub(r'\([^)]*\)', '', text)
def extract_context_window(text: str, match_start: int, match_end: int,
window_size: int = None, include_before: bool = False) -> str:
"""Extract a context window around a regex match for validation.
When pattern matching finds a potential match, we often need to examine
the surrounding text to validate the match or check for additional keywords.
This function extracts a window of text around the match position.
Args:
text: Full text to extract context from
match_start: Start position of the regex match
match_end: End position of the regex match
window_size: Number of characters to include after the match.
If None, uses CONTEXT_WINDOW_SIZE from tag_constants (default: 70).
To include context before the match, use include_before=True.
include_before: If True, includes window_size characters before the match
in addition to after. If False (default), only includes after.
Returns:
Substring of text containing the match plus surrounding context
Example:
>>> text = "Creatures you control have hexproof and vigilance"
>>> match = re.search(r'creatures you control', text)
>>> extract_context_window(text, match.start(), match.end(), window_size=30)
'Creatures you control have hexproof and '
"""
if not text:
return text
if window_size is None:
from .tag_constants import CONTEXT_WINDOW_SIZE
window_size = CONTEXT_WINDOW_SIZE
# Calculate window boundaries
if include_before:
context_start = max(0, match_start - window_size)
else:
context_start = match_start
context_end = min(len(text), match_end + window_size)
return text[context_start:context_end]
# --- Enhanced Tagging Utilities (M3.5/M3.6) ----------------------------------------------------
def build_combined_mask(
df: pd.DataFrame,
text_patterns: Union[str, List[str], None] = None,
type_patterns: Union[str, List[str], None] = None,
keyword_patterns: Union[str, List[str], None] = None,
name_list: Union[List[str], None] = None,
exclusion_patterns: Union[str, List[str], None] = None,
combine_with_or: bool = True
) -> pd.Series[bool]:
"""Build a combined boolean mask from multiple pattern types.
This utility reduces boilerplate when creating complex masks by combining
text, type, keyword, and name patterns into a single mask. Patterns are
combined with OR by default, but can be combined with AND.
Args:
df: DataFrame to search
text_patterns: Patterns to match in 'text' column
type_patterns: Patterns to match in 'type' column
keyword_patterns: Patterns to match in 'keywords' column
name_list: List of exact card names to match
exclusion_patterns: Text patterns to exclude from final mask
combine_with_or: If True, combine masks with OR (default).
If False, combine with AND (requires all conditions)
Returns:
Boolean Series combining all specified patterns
Example:
>>> # Match cards with flying OR haste, exclude creatures
>>> mask = build_combined_mask(
... df,
... keyword_patterns=['Flying', 'Haste'],
... exclusion_patterns='Creature'
... )
"""
if combine_with_or:
result = pd.Series([False] * len(df), index=df.index)
else:
result = pd.Series([True] * len(df), index=df.index)
masks = []
if text_patterns is not None:
masks.append(create_text_mask(df, text_patterns))
if type_patterns is not None:
masks.append(create_type_mask(df, type_patterns))
if keyword_patterns is not None:
masks.append(create_keyword_mask(df, keyword_patterns))
if name_list is not None:
masks.append(create_name_mask(df, name_list))
if masks:
if combine_with_or:
for mask in masks:
result |= mask
else:
for mask in masks:
result &= mask
if exclusion_patterns is not None:
exclusion_mask = create_text_mask(df, exclusion_patterns)
result &= ~exclusion_mask
return result
def tag_with_logging(
df: pd.DataFrame,
mask: pd.Series[bool],
tags: Union[str, List[str]],
log_message: str,
color: str = '',
logger=None
) -> int:
"""Apply tags with standardized logging.
This utility wraps the common pattern of applying tags and logging the count.
It provides consistent formatting for log messages across the tagging module.
Args:
df: DataFrame to modify
mask: Boolean mask indicating which rows to tag
tags: Tag(s) to apply
log_message: Description of what's being tagged (e.g., "flying creatures")
color: Color identifier for context (optional)
logger: Logger instance to use (optional, uses print if None)
Returns:
Count of cards tagged
Example:
>>> count = tag_with_logging(
... df,
... flying_mask,
... 'Flying',
... 'creatures with flying ability',
... color='blue',
... logger=logger
... )
# Logs: "Tagged 42 blue creatures with flying ability"
"""
count = mask.sum()
if count > 0:
apply_tag_vectorized(df, mask, tags)
color_part = f'{color} ' if color else ''
full_message = f'Tagged {count} {color_part}{log_message}'
if logger:
logger.info(full_message)
else:
print(full_message)
return count
def tag_with_rules_and_logging(
df: pd.DataFrame,
rules: List[dict],
summary_message: str,
color: str = '',
logger=None
) -> int:
"""Apply multiple tag rules with summarized logging.
This utility combines apply_rules with logging, providing a summary of
all cards affected across multiple rules.
Args:
df: DataFrame to modify
rules: List of rule dicts (each with 'mask' and 'tags')
summary_message: Overall description (e.g., "card draw effects")
color: Color identifier for context (optional)
logger: Logger instance to use (optional)
Returns:
Total count of unique cards affected by any rule
Example:
>>> rules = [
... {'mask': flying_mask, 'tags': ['Flying']},
... {'mask': haste_mask, 'tags': ['Haste', 'Aggro']}
... ]
>>> count = tag_with_rules_and_logging(
... df, rules, 'evasive creatures', color='red', logger=logger
... )
"""
affected = pd.Series([False] * len(df), index=df.index)
for rule in rules:
mask = rule.get('mask')
if callable(mask):
mask = mask(df)
if mask is not None and mask.any():
tags = rule.get('tags', [])
apply_tag_vectorized(df, mask, tags)
affected |= mask
count = affected.sum()
color_part = f'{color} ' if color else ''
full_message = f'Tagged {count} {color_part}{summary_message}'
if logger:
logger.info(full_message)
else:
print(full_message)
return count

File diff suppressed because it is too large Load diff

View file

@ -900,7 +900,7 @@ def ideal_labels() -> Dict[str, str]:
'removal': 'Spot Removal', 'removal': 'Spot Removal',
'wipes': 'Board Wipes', 'wipes': 'Board Wipes',
'card_advantage': 'Card Advantage', 'card_advantage': 'Card Advantage',
'protection': 'Protection', 'protection': 'Protective Effects',
} }
@ -1911,7 +1911,7 @@ def _make_stages(b: DeckBuilder) -> List[Dict[str, Any]]:
("removal", "Confirm Removal", "add_removal"), ("removal", "Confirm Removal", "add_removal"),
("wipes", "Confirm Board Wipes", "add_board_wipes"), ("wipes", "Confirm Board Wipes", "add_board_wipes"),
("card_advantage", "Confirm Card Advantage", "add_card_advantage"), ("card_advantage", "Confirm Card Advantage", "add_card_advantage"),
("protection", "Confirm Protection", "add_protection"), ("protection", "Confirm Protective Effects", "add_protection"),
] ]
any_granular = any(callable(getattr(b, rn, None)) for _key, _label, rn in spell_categories) any_granular = any(callable(getattr(b, rn, None)) for _key, _label, rn in spell_categories)
if any_granular: if any_granular:

File diff suppressed because it is too large Load diff

View file

@ -98,7 +98,7 @@ services:
WEB_AUTO_SETUP: "1" # 1=auto-run setup/tagging when needed WEB_AUTO_SETUP: "1" # 1=auto-run setup/tagging when needed
WEB_AUTO_REFRESH_DAYS: "7" # Refresh cards.csv if older than N days; 0=never WEB_AUTO_REFRESH_DAYS: "7" # Refresh cards.csv if older than N days; 0=never
WEB_TAG_PARALLEL: "1" # 1=parallelize tagging WEB_TAG_PARALLEL: "1" # 1=parallelize tagging
WEB_TAG_WORKERS: "8" # Worker count when parallel tagging WEB_TAG_WORKERS: "4" # Worker count when parallel tagging
# Tagging Refinement Feature Flags # Tagging Refinement Feature Flags
TAG_NORMALIZE_KEYWORDS: "1" # 1=normalize keywords & filter specialty mechanics (recommended) TAG_NORMALIZE_KEYWORDS: "1" # 1=normalize keywords & filter specialty mechanics (recommended)