fix: handle numpy arrays in card_similarity parse_theme_tags

The similarity cache build was failing because parse_theme_tags() was checking isinstance(tags, list) but Parquet files return numpy.ndarray objects. This caused all cards to be flagged as having no theme tags, resulting in an empty cache.

Changed to use hasattr(__len__) check instead, which works for both lists and numpy arrays.
This commit is contained in:
matt 2025-10-19 08:26:20 -07:00
parent bff64de370
commit 505bbdf166

View file

@ -252,9 +252,10 @@ class CardSimilarity:
if pd.isna(tags) if isinstance(tags, (str, float, int, type(None))) else False:
return set()
if isinstance(tags, list):
# M4: Parquet format - already a list
return set(tags) if tags else set()
# M4: Handle numpy arrays from Parquet files
if hasattr(tags, '__len__') and not isinstance(tags, str):
# Parquet format - convert array-like to list
return set(list(tags)) if len(tags) > 0 else set()
if isinstance(tags, str):
# Handle string representation of list: "['tag1', 'tag2']"