mtg_python_deckbuilder/docs/migration/all_cards_migration.md

# All Cards Consolidation - Migration Guide

## Overview
This guide covers the migration from individual card CSV files to the consolidated `all_cards.parquet` format introduced in v2.8.0. The new format provides:

- **87% smaller file size** (3.74 MB vs ~30 MB for CSVs)
- **2-5x faster queries** (single lookup ~1.3ms, filters <70ms)
- **Improved caching** with automatic reload on file changes
- **Unified query API** via `AllCardsLoader` and `CardQueryBuilder`

## Migration Timeline

### Phase 1: v2.8.0 (Current) - Soft Launch
- ✅ AllCardsLoader and CardQueryBuilder available
- ✅ Automatic aggregation after tagging
- ✅ Legacy adapter functions provided for backward compatibility
- ✅ Feature flag `USE_ALL_CARDS_FILE=1` (enabled by default)
- ✅ Deprecation warnings logged when using legacy functions
- **cards.csv still supported** (kept for compatibility)
- **commander_cards.csv replaced** by `commander_cards.parquet`

### Phase 2: v2.9.0 - Broader Adoption
- Update deck_builder modules to use AllCardsLoader directly
- Update web routes to use new query API
- Continue supporting legacy adapter for external code
- Increase test coverage for real-world usage patterns

### Phase 3: v3.0.0 - Primary Method
- New code must use AllCardsLoader (no new legacy adapter usage)
- Legacy adapter still works but discouraged
- Documentation emphasizes new API
- cards.csv continues to work (not deprecated yet)

### Phase 4: v3.1.0+ - Sunset Legacy (Future)
- Remove legacy adapter functions
- Remove individual card CSV file support (cards.csv sunset)
- **commander_cards.parquet permanently replaces CSV version**
- All code uses AllCardsLoader exclusively

## Quick Start

### For New Code (Recommended)

```python
from code.services.all_cards_loader import AllCardsLoader
from code.services.card_query_builder import CardQueryBuilder

# Simple loading
loader = AllCardsLoader()
all_cards = loader.load()

# Single card lookup
sol_ring = loader.get_by_name("Sol Ring")

# Batch lookup
cards = loader.get_by_names(["Sol Ring", "Lightning Bolt", "Counterspell"])

# Filtering
red_cards = loader.filter_by_color_identity(["R"])
token_cards = loader.filter_by_themes(["tokens"], mode="any")
creatures = loader.filter_by_type("Creature")

# Text search
results = loader.search("create token", limit=100)

# Complex queries with fluent API
results = (CardQueryBuilder()
    .colors(["G"])
    .themes(["ramp"], mode="any")
    .types("Creature")
    .limit(20)
    .execute())
```

### For Existing Code (Legacy Adapter)

If you have existing code using old file-loading patterns, the legacy adapter provides backward compatibility:

```python
# Old code continues to work (with deprecation warnings)
from code.services.legacy_loader_adapter import (
    load_all_cards,
    load_cards_by_name,
    load_cards_by_type,
    load_cards_with_tag,
)

# These still work but log deprecation warnings
all_cards = load_all_cards()
sol_ring = load_cards_by_name("Sol Ring")
creatures = load_cards_by_type("Creature")
token_cards = load_cards_with_tag("tokens")
```

**Important**: Migrate to the new API as soon as possible. Legacy functions will be removed in v3.1+.

## Migration Steps

### Step 1: Update Imports

**Before:**
```python
# Old pattern (if you were loading cards directly)
import pandas as pd
df = pd.read_csv("csv_files/some_card.csv")
```

**After:**
```python
from code.services.all_cards_loader import AllCardsLoader

loader = AllCardsLoader()
card = loader.get_by_name("Card Name")
```

### Step 2: Update Query Patterns

**Before:**
```python
# Old: Manual filtering
all_cards = load_all_individual_csvs()  # Slow
creatures = all_cards[all_cards["type"].str.contains("Creature")]
red_creatures = creatures[creatures["colorIdentity"] == "R"]
```

**After:**
```python
# New: Efficient queries
loader = AllCardsLoader()
red_creatures = (CardQueryBuilder(loader)
    .colors(["R"])
    .types("Creature")
    .execute())
```

### Step 3: Update Caching

**Before:**
```python
# Old: Manual caching
_cache = {}
def get_card(name):
    if name not in _cache:
        _cache[name] = load_from_csv(name)
    return _cache[name]
```

**After:**
```python
# New: Built-in caching
loader = AllCardsLoader()  # Caches automatically
card = loader.get_by_name(name)  # Fast on repeat calls
```

## Feature Flag

The `USE_ALL_CARDS_FILE` environment variable controls whether the consolidated Parquet file is used:

```bash
# Enable (default)
USE_ALL_CARDS_FILE=1

# Disable (fallback to old method)
USE_ALL_CARDS_FILE=0
```

**When to disable:**
- Troubleshooting issues with the new loader
- Testing backward compatibility
- Temporary fallback during migration

## Performance Comparison

| Operation | Old (CSV) | New (Parquet) | Improvement |
|-----------|-----------|---------------|-------------|
| Initial load | ~2-3s | 0.104s | 20-30x faster |
| Single lookup | ~50-100ms | 1.3ms | 40-75x faster |
| Color filter | ~200ms | 2.1ms | 95x faster |
| Theme filter | ~500ms | 67ms | 7.5x faster |
| File size | ~30 MB | 3.74 MB | 87% smaller |

## Troubleshooting

### "all_cards.parquet not found"

Run the aggregation process:
1. Web UI: Go to Setup page → "Rebuild Card Files" button
2. CLI: `python code/scripts/aggregate_cards.py`
3. Automatic: Run tagging workflow (aggregation happens automatically)

### Deprecation Warnings

```
DEPRECATION: load_cards_by_name() called. Migrate to AllCardsLoader().get_by_name() before v3.1+
```

**Solution**: Update your code to use the new API as shown in this guide.

### Performance Issues

```python
# Check cache status
loader = AllCardsLoader()
stats = loader.get_stats()
print(stats)  # Shows cache age, file size, etc.

# Force reload if data seems stale
loader.load(force_reload=True)

# Clear cache
loader.clear_cache()
```

### Feature Flag Not Working

Ensure environment variable is set before importing:
```python
import os
os.environ['USE_ALL_CARDS_FILE'] = '1'

# Then import
from code.services.all_cards_loader import AllCardsLoader
```

## Testing Your Migration

```python
# Run migration compatibility tests
pytest code/tests/test_migration_compatibility.py -v

# Run all cards loader tests
pytest code/tests/test_all_cards_loader.py -v
```

## FAQ

**Q: Do I need to regenerate all_cards.parquet after tagging?**
A: No, it's automatic. Aggregation runs after tagging completes. You can manually trigger via "Rebuild Card Files" button if needed.

**Q: What happens to cards.csv?**
A: Still supported through v3.0.x for compatibility. Will be sunset in v3.1+. Start migrating now.

**Q: What about commander_cards.csv?**
A: Already replaced by `commander_cards.parquet` in v2.8.0. CSV version is no longer used.

**Q: Can I use both methods during migration?**
A: Yes, the legacy adapter allows mixed usage, but aim to fully migrate to the new API.

**Q: Will my existing decks break?**
A: No, existing decks are unaffected. This only changes how cards are loaded internally.

**Q: How do I disable the new loader?**
A: Set `USE_ALL_CARDS_FILE=0` environment variable. Not recommended except for troubleshooting.

**Q: Are there any breaking changes?**
A: No breaking changes in v2.8.0. Legacy functions work with deprecation warnings. Breaking changes planned for v3.1+.

## Support

If you encounter issues during migration:
1. Check deprecation warnings in logs
2. Run migration compatibility tests
3. Try disabling feature flag temporarily
4. File an issue on GitHub with details

## Summary

✅ **Use AllCardsLoader** for all new code
✅ **Migrate existing code** using this guide
✅ **Test thoroughly** with provided test suites
✅ **Monitor deprecation warnings** and address them
✅ **Plan ahead** for v3.1+ sunset of legacy functions

The new consolidated format provides significant performance improvements and a cleaner API. Start migrating today!
feat: consolidate card data into optimized format for faster queries and reduced file sizes 2025-10-15 11:04:49 -07:00			`# All Cards Consolidation - Migration Guide`

			`## Overview`
			This guide covers the migration from individual card CSV files to the consolidated `all_cards.parquet` format introduced in v2.8.0. The new format provides:

			`- 87% smaller file size (3.74 MB vs ~30 MB for CSVs)`
			`- 2-5x faster queries (single lookup ~1.3ms, filters <70ms)`
			`- Improved caching with automatic reload on file changes`
			- Unified query API via `AllCardsLoader` and `CardQueryBuilder`

			`## Migration Timeline`

			`### Phase 1: v2.8.0 (Current) - Soft Launch`
			`- ✅ AllCardsLoader and CardQueryBuilder available`
			`- ✅ Automatic aggregation after tagging`
			`- ✅ Legacy adapter functions provided for backward compatibility`
			- ✅ Feature flag `USE_ALL_CARDS_FILE=1` (enabled by default)
			`- ✅ Deprecation warnings logged when using legacy functions`
			`- cards.csv still supported (kept for compatibility)`
			- commander_cards.csv replaced by `commander_cards.parquet`

			`### Phase 2: v2.9.0 - Broader Adoption`
			`- Update deck_builder modules to use AllCardsLoader directly`
			`- Update web routes to use new query API`
			`- Continue supporting legacy adapter for external code`
			`- Increase test coverage for real-world usage patterns`

			`### Phase 3: v3.0.0 - Primary Method`
			`- New code must use AllCardsLoader (no new legacy adapter usage)`
			`- Legacy adapter still works but discouraged`
			`- Documentation emphasizes new API`
			`- cards.csv continues to work (not deprecated yet)`

			`### Phase 4: v3.1.0+ - Sunset Legacy (Future)`
			`- Remove legacy adapter functions`
			`- Remove individual card CSV file support (cards.csv sunset)`
			`- commander_cards.parquet permanently replaces CSV version`
			`- All code uses AllCardsLoader exclusively`

			`## Quick Start`

			`### For New Code (Recommended)`

			```python
			`from code.services.all_cards_loader import AllCardsLoader`
			`from code.services.card_query_builder import CardQueryBuilder`

			`# Simple loading`
			`loader = AllCardsLoader()`
			`all_cards = loader.load()`

			`# Single card lookup`
			`sol_ring = loader.get_by_name("Sol Ring")`

			`# Batch lookup`
			`cards = loader.get_by_names(["Sol Ring", "Lightning Bolt", "Counterspell"])`

			`# Filtering`
			`red_cards = loader.filter_by_color_identity(["R"])`
			`token_cards = loader.filter_by_themes(["tokens"], mode="any")`
			`creatures = loader.filter_by_type("Creature")`

			`# Text search`
			`results = loader.search("create token", limit=100)`

			`# Complex queries with fluent API`
			`results = (CardQueryBuilder()`
			`.colors(["G"])`
			`.themes(["ramp"], mode="any")`
			`.types("Creature")`
			`.limit(20)`
			`.execute())`
			```

			`### For Existing Code (Legacy Adapter)`

			`If you have existing code using old file-loading patterns, the legacy adapter provides backward compatibility:`

			```python
			`# Old code continues to work (with deprecation warnings)`
			`from code.services.legacy_loader_adapter import (`
			`load_all_cards,`
			`load_cards_by_name,`
			`load_cards_by_type,`
			`load_cards_with_tag,`
			`)`

			`# These still work but log deprecation warnings`
			`all_cards = load_all_cards()`
			`sol_ring = load_cards_by_name("Sol Ring")`
			`creatures = load_cards_by_type("Creature")`
			`token_cards = load_cards_with_tag("tokens")`
			```

			`Important: Migrate to the new API as soon as possible. Legacy functions will be removed in v3.1+.`

			`## Migration Steps`

			`### Step 1: Update Imports`

			`Before:`
			```python
			`# Old pattern (if you were loading cards directly)`
			`import pandas as pd`
			`df = pd.read_csv("csv_files/some_card.csv")`
			```

			`After:`
			```python
			`from code.services.all_cards_loader import AllCardsLoader`

			`loader = AllCardsLoader()`
			`card = loader.get_by_name("Card Name")`
			```

			`### Step 2: Update Query Patterns`

			`Before:`
			```python
			`# Old: Manual filtering`
			`all_cards = load_all_individual_csvs() # Slow`
			`creatures = all_cards[all_cards["type"].str.contains("Creature")]`
			`red_creatures = creatures[creatures["colorIdentity"] == "R"]`
			```

			`After:`
			```python
			`# New: Efficient queries`
			`loader = AllCardsLoader()`
			`red_creatures = (CardQueryBuilder(loader)`
			`.colors(["R"])`
			`.types("Creature")`
			`.execute())`
			```

			`### Step 3: Update Caching`

			`Before:`
			```python
			`# Old: Manual caching`
			`_cache = {}`
			`def get_card(name):`
			`if name not in _cache:`
			`_cache[name] = load_from_csv(name)`
			`return _cache[name]`
			```

			`After:`
			```python
			`# New: Built-in caching`
			`loader = AllCardsLoader() # Caches automatically`
			`card = loader.get_by_name(name) # Fast on repeat calls`
			```

			`## Feature Flag`

			The `USE_ALL_CARDS_FILE` environment variable controls whether the consolidated Parquet file is used:

			```bash
			`# Enable (default)`
			`USE_ALL_CARDS_FILE=1`

			`# Disable (fallback to old method)`
			`USE_ALL_CARDS_FILE=0`
			```

			`When to disable:`
			`- Troubleshooting issues with the new loader`
			`- Testing backward compatibility`
			`- Temporary fallback during migration`

			`## Performance Comparison`

			`\| Operation \| Old (CSV) \| New (Parquet) \| Improvement \|`
			`\|-----------\|-----------\|---------------\|-------------\|`
			`\| Initial load \| ~2-3s \| 0.104s \| 20-30x faster \|`
			`\| Single lookup \| ~50-100ms \| 1.3ms \| 40-75x faster \|`
			`\| Color filter \| ~200ms \| 2.1ms \| 95x faster \|`
			`\| Theme filter \| ~500ms \| 67ms \| 7.5x faster \|`
			`\| File size \| ~30 MB \| 3.74 MB \| 87% smaller \|`

			`## Troubleshooting`

			`### "all_cards.parquet not found"`

			`Run the aggregation process:`
			`1. Web UI: Go to Setup page → "Rebuild Card Files" button`
			2. CLI: `python code/scripts/aggregate_cards.py`
			`3. Automatic: Run tagging workflow (aggregation happens automatically)`

			`### Deprecation Warnings`

			```
			`DEPRECATION: load_cards_by_name() called. Migrate to AllCardsLoader().get_by_name() before v3.1+`
			```

			`Solution: Update your code to use the new API as shown in this guide.`

			`### Performance Issues`

			```python
			`# Check cache status`
			`loader = AllCardsLoader()`
			`stats = loader.get_stats()`
			`print(stats) # Shows cache age, file size, etc.`

			`# Force reload if data seems stale`
			`loader.load(force_reload=True)`

			`# Clear cache`
			`loader.clear_cache()`
			```

			`### Feature Flag Not Working`

			`Ensure environment variable is set before importing:`
			```python
			`import os`
			`os.environ['USE_ALL_CARDS_FILE'] = '1'`

			`# Then import`
			`from code.services.all_cards_loader import AllCardsLoader`
			```

			`## Testing Your Migration`

			```python
			`# Run migration compatibility tests`
			`pytest code/tests/test_migration_compatibility.py -v`

			`# Run all cards loader tests`
			`pytest code/tests/test_all_cards_loader.py -v`
			```

			`## FAQ`

			`Q: Do I need to regenerate all_cards.parquet after tagging?`
			`A: No, it's automatic. Aggregation runs after tagging completes. You can manually trigger via "Rebuild Card Files" button if needed.`

			`Q: What happens to cards.csv?`
			`A: Still supported through v3.0.x for compatibility. Will be sunset in v3.1+. Start migrating now.`

			`Q: What about commander_cards.csv?`
			A: Already replaced by `commander_cards.parquet` in v2.8.0. CSV version is no longer used.

			`Q: Can I use both methods during migration?`
			`A: Yes, the legacy adapter allows mixed usage, but aim to fully migrate to the new API.`

			`Q: Will my existing decks break?`
			`A: No, existing decks are unaffected. This only changes how cards are loaded internally.`

			`Q: How do I disable the new loader?`
			A: Set `USE_ALL_CARDS_FILE=0` environment variable. Not recommended except for troubleshooting.

			`Q: Are there any breaking changes?`
			`A: No breaking changes in v2.8.0. Legacy functions work with deprecation warnings. Breaking changes planned for v3.1+.`

			`## Support`

			`If you encounter issues during migration:`
			`1. Check deprecation warnings in logs`
			`2. Run migration compatibility tests`
			`3. Try disabling feature flag temporarily`
			`4. File an issue on GitHub with details`

			`## Summary`

			`✅ Use AllCardsLoader for all new code`
			`✅ Migrate existing code using this guide`
			`✅ Test thoroughly with provided test suites`
			`✅ Monitor deprecation warnings and address them`
			`✅ Plan ahead for v3.1+ sunset of legacy functions`

			`The new consolidated format provides significant performance improvements and a cleaner API. Start migrating today!`