fix: handle numpy arrays in card_similarity parse_theme_tags

The similarity cache build was failing because parse_theme_tags() was checking isinstance(tags, list) but Parquet files return numpy.ndarray objects. This caused all cards to be flagged as having no theme tags, resulting in an empty cache.

Changed to use hasattr(__len__) check instead, which works for both lists and numpy arrays.
This commit is contained in:
matt 2025-10-19 08:26:20 -07:00
parent bff64de370
commit 505bbdf166

View file

@ -252,9 +252,10 @@ class CardSimilarity:
if pd.isna(tags) if isinstance(tags, (str, float, int, type(None))) else False: if pd.isna(tags) if isinstance(tags, (str, float, int, type(None))) else False:
return set() return set()
if isinstance(tags, list): # M4: Handle numpy arrays from Parquet files
# M4: Parquet format - already a list if hasattr(tags, '__len__') and not isinstance(tags, str):
return set(tags) if tags else set() # Parquet format - convert array-like to list
return set(list(tags)) if len(tags) > 0 else set()
if isinstance(tags, str): if isinstance(tags, str):
# Handle string representation of list: "['tag1', 'tag2']" # Handle string representation of list: "['tag1', 'tag2']"