mirror of https://github.com/mwisnowski/mtg_python_deckbuilder.git synced 2025-12-16 07:30:13 +01:00

matt f70ffca23e feat: consolidate card data into optimized format for faster queries and reduced file sizes

2025-10-15 11:04:49 -07:00

7.7 KiB

Raw Blame History

All Cards Consolidation - Migration Guide

Overview

This guide covers the migration from individual card CSV files to the consolidated all_cards.parquet format introduced in v2.8.0. The new format provides:

87% smaller file size (3.74 MB vs ~30 MB for CSVs)
2-5x faster queries (single lookup ~1.3ms, filters <70ms)
Improved caching with automatic reload on file changes
Unified query API via AllCardsLoader and CardQueryBuilder

Migration Timeline

Phase 1: v2.8.0 (Current) - Soft Launch

✅ AllCardsLoader and CardQueryBuilder available
✅ Automatic aggregation after tagging
✅ Legacy adapter functions provided for backward compatibility
✅ Feature flag USE_ALL_CARDS_FILE=1 (enabled by default)
✅ Deprecation warnings logged when using legacy functions
cards.csv still supported (kept for compatibility)
commander_cards.csv replaced by commander_cards.parquet

Phase 2: v2.9.0 - Broader Adoption

Update deck_builder modules to use AllCardsLoader directly
Update web routes to use new query API
Continue supporting legacy adapter for external code
Increase test coverage for real-world usage patterns

Phase 3: v3.0.0 - Primary Method

New code must use AllCardsLoader (no new legacy adapter usage)
Legacy adapter still works but discouraged
Documentation emphasizes new API
cards.csv continues to work (not deprecated yet)

Phase 4: v3.1.0+ - Sunset Legacy (Future)

Remove legacy adapter functions
Remove individual card CSV file support (cards.csv sunset)
commander_cards.parquet permanently replaces CSV version
All code uses AllCardsLoader exclusively

Quick Start

For New Code (Recommended)

from code.services.all_cards_loader import AllCardsLoader
from code.services.card_query_builder import CardQueryBuilder

# Simple loading
loader = AllCardsLoader()
all_cards = loader.load()

# Single card lookup
sol_ring = loader.get_by_name("Sol Ring")

# Batch lookup
cards = loader.get_by_names(["Sol Ring", "Lightning Bolt", "Counterspell"])

# Filtering
red_cards = loader.filter_by_color_identity(["R"])
token_cards = loader.filter_by_themes(["tokens"], mode="any")
creatures = loader.filter_by_type("Creature")

# Text search
results = loader.search("create token", limit=100)

# Complex queries with fluent API
results = (CardQueryBuilder()
    .colors(["G"])
    .themes(["ramp"], mode="any")
    .types("Creature")
    .limit(20)
    .execute())

For Existing Code (Legacy Adapter)

If you have existing code using old file-loading patterns, the legacy adapter provides backward compatibility:

# Old code continues to work (with deprecation warnings)
from code.services.legacy_loader_adapter import (
    load_all_cards,
    load_cards_by_name,
    load_cards_by_type,
    load_cards_with_tag,
)

# These still work but log deprecation warnings
all_cards = load_all_cards()
sol_ring = load_cards_by_name("Sol Ring")
creatures = load_cards_by_type("Creature")
token_cards = load_cards_with_tag("tokens")

Important: Migrate to the new API as soon as possible. Legacy functions will be removed in v3.1+.

Migration Steps

Step 1: Update Imports

Before:

# Old pattern (if you were loading cards directly)
import pandas as pd
df = pd.read_csv("csv_files/some_card.csv")

After:

from code.services.all_cards_loader import AllCardsLoader

loader = AllCardsLoader()
card = loader.get_by_name("Card Name")

Step 2: Update Query Patterns

Before:

# Old: Manual filtering
all_cards = load_all_individual_csvs()  # Slow
creatures = all_cards[all_cards["type"].str.contains("Creature")]
red_creatures = creatures[creatures["colorIdentity"] == "R"]

After:

# New: Efficient queries
loader = AllCardsLoader()
red_creatures = (CardQueryBuilder(loader)
    .colors(["R"])
    .types("Creature")
    .execute())

Step 3: Update Caching

Before:

# Old: Manual caching
_cache = {}
def get_card(name):
    if name not in _cache:
        _cache[name] = load_from_csv(name)
    return _cache[name]

After:

# New: Built-in caching
loader = AllCardsLoader()  # Caches automatically
card = loader.get_by_name(name)  # Fast on repeat calls

Feature Flag

The USE_ALL_CARDS_FILE environment variable controls whether the consolidated Parquet file is used:

# Enable (default)
USE_ALL_CARDS_FILE=1

# Disable (fallback to old method)
USE_ALL_CARDS_FILE=0

When to disable:

Troubleshooting issues with the new loader
Testing backward compatibility
Temporary fallback during migration

Performance Comparison

Operation	Old (CSV)	New (Parquet)	Improvement
Initial load	~2-3s	0.104s	20-30x faster
Single lookup	~50-100ms	1.3ms	40-75x faster
Color filter	~200ms	2.1ms	95x faster
Theme filter	~500ms	67ms	7.5x faster
File size	~30 MB	3.74 MB	87% smaller

Troubleshooting

"all_cards.parquet not found"

Run the aggregation process:

Web UI: Go to Setup page → "Rebuild Card Files" button
CLI: python code/scripts/aggregate_cards.py
Automatic: Run tagging workflow (aggregation happens automatically)

Deprecation Warnings

DEPRECATION: load_cards_by_name() called. Migrate to AllCardsLoader().get_by_name() before v3.1+

Solution: Update your code to use the new API as shown in this guide.

Performance Issues

# Check cache status
loader = AllCardsLoader()
stats = loader.get_stats()
print(stats)  # Shows cache age, file size, etc.

# Force reload if data seems stale
loader.load(force_reload=True)

# Clear cache
loader.clear_cache()

Feature Flag Not Working

Ensure environment variable is set before importing:

import os
os.environ['USE_ALL_CARDS_FILE'] = '1'

# Then import
from code.services.all_cards_loader import AllCardsLoader

Testing Your Migration

# Run migration compatibility tests
pytest code/tests/test_migration_compatibility.py -v

# Run all cards loader tests
pytest code/tests/test_all_cards_loader.py -v

FAQ

Q: Do I need to regenerate all_cards.parquet after tagging? A: No, it's automatic. Aggregation runs after tagging completes. You can manually trigger via "Rebuild Card Files" button if needed.

Q: What happens to cards.csv? A: Still supported through v3.0.x for compatibility. Will be sunset in v3.1+. Start migrating now.

Q: What about commander_cards.csv? A: Already replaced by commander_cards.parquet in v2.8.0. CSV version is no longer used.

Q: Can I use both methods during migration? A: Yes, the legacy adapter allows mixed usage, but aim to fully migrate to the new API.

Q: Will my existing decks break? A: No, existing decks are unaffected. This only changes how cards are loaded internally.

Q: How do I disable the new loader? A: Set USE_ALL_CARDS_FILE=0 environment variable. Not recommended except for troubleshooting.

Q: Are there any breaking changes? A: No breaking changes in v2.8.0. Legacy functions work with deprecation warnings. Breaking changes planned for v3.1+.

Support

If you encounter issues during migration:

Check deprecation warnings in logs
Run migration compatibility tests
Try disabling feature flag temporarily
File an issue on GitHub with details

Summary

✅ Use AllCardsLoader for all new code ✅ Migrate existing code using this guide ✅ Test thoroughly with provided test suites ✅ Monitor deprecation warnings and address them ✅ Plan ahead for v3.1+ sunset of legacy functions

The new consolidated format provides significant performance improvements and a cleaner API. Start migrating today!

7.7 KiB Raw Blame History

All Cards Consolidation - Migration Guide

Overview

Migration Timeline

Phase 1: v2.8.0 (Current) - Soft Launch

Phase 2: v2.9.0 - Broader Adoption

Phase 3: v3.0.0 - Primary Method

Phase 4: v3.1.0+ - Sunset Legacy (Future)

Quick Start

For New Code (Recommended)

For Existing Code (Legacy Adapter)

Migration Steps

Step 1: Update Imports

Step 2: Update Query Patterns

Step 3: Update Caching

Feature Flag

Performance Comparison

Troubleshooting

"all_cards.parquet not found"

Deprecation Warnings

Performance Issues

Feature Flag Not Working

Testing Your Migration

FAQ

Support

Summary

7.7 KiB

Raw Blame History