Python, фильтровать векторы из хранилища векторов Pinecone на основе поля, сохраненного в метаданных этих векторов.

Python, фильтровать векторы из хранилища векторов Pinecone на основе поля, сохраненного в метаданных этих векторов. ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Python, фильтровать векторы из хранилища векторов Pinecone на основе поля, сохраненного в метаданных этих векторов.

Цитата

Сообщение Anonymous » 26 янв 2025, 10:44

У меня есть векторы, хранящиеся в векторном хранилище Pinecone, каждый вектор представляет содержание файла PDF: < /p>

metadata ::
hash_code: "D53D7EC8B0E66E9A83A97ACDA09EDD3FE9867CADB42833F9BF5525CC3B89FE2D"
ID: "CC54FFBE-9CBA-4DE9-9F30-A114E4C3C3FB" поле в метаданных, которое HASH_CODE содержания PDF, чтобы не добавлять тот же файл снова и снова в векторный хранилище. хочу добавить, тогда я хочу сканировать существующие, чтобы найти, если кто -либо из них уже существует, а затем отфильтровать. Еще не удалось достичь моей цели: < /p>
Первый метод: < /p>
def filter_existing_docs(index_name, docs):
# Initialize the Pinecone index
index = pinecone_client.Index(index_name)

# Extract hash_codes from the docs list using the appropriate method for your Document objects
hash_codes = [doc.metadata['hash_code'] for doc in docs] # Accessing 'metadata' if it's an attribute
print("Hash Codes:", hash_codes)

# Fetch by list of hash_codes (ensure hash_codes are valid ids)
fetch_response = index.fetch(ids=hash_codes)
print("Fetch Response:", fetch_response)

# Get the existing hash_codes that are already in the Pinecone index
existing_hash_codes = set(fetch_response.get('vectors', {}).keys()) # Extract existing IDs from the response
print("1 -----------> Existing Hash Codes:", len(existing_hash_codes))

# Filter out the docs that have already been added to Pinecone
filtered_docs = [doc for doc in docs if doc.metadata['hash_code'] not in existing_hash_codes]
print("2 -----------> Filtered Docs:", len(filtered_docs))

return filtered_docs

Затем попробовал другой подход:
def filter_existing_docs(index_name, docs):
# Initialize the Pinecone index
index = pinecone_client.Index(index_name)

# Extract hash_codes from the docs list using the appropriate method for your Document objects
hash_codes = [doc.metadata['hash_code'] for doc in docs] # Accessing 'metadata' if it's an attribute
print("Hash Codes:", hash_codes)

# We need to query Pinecone using `top_k` and search through the index
query_response = index.query(
top_k=100, # Set a suitable `top_k` to return a reasonable number of documents
include_metadata=True,
#namespace=namespace
)

# Debug: Print the query response to see its structure
print("Query Response:", query_response)

# Extract the hash_codes of the existing documents in Pinecone
existing_hash_codes = {item['metadata']['hash_code'] for item in query_response['matches']}
print("1 -----------> Existing Hash Codes:", len(existing_hash_codes))

# Filter out the docs that have already been added to Pinecone based on hash_code
filtered_docs = [doc for doc in docs if str(doc.metadata['hash_code']) not in existing_hash_codes]
print("2 -----------> Filtered Docs:", len(filtered_docs))

return filtered_docs

Подробнее здесь: https://stackoverflow.com/questions/793 ... ved-in-the

1737877490

Anonymous

 У меня есть векторы, хранящиеся в векторном хранилище Pinecone, каждый вектор представляет содержание файла PDF: < /p>

metadata ::
hash_code: "D53D7EC8B0E66E9A83A97ACDA09EDD3FE9867CADB42833F9BF5525CC3B89FE2D"
ID: "CC54FFBE-9CBA-4DE9-9F30-A114E4C3C3FB" поле в метаданных, которое HASH_CODE содержания PDF, чтобы не добавлять тот же файл снова и снова в векторный хранилище. хочу добавить, тогда я хочу сканировать существующие, чтобы найти, если кто -либо из них уже существует, а затем отфильтровать. Еще не удалось достичь моей цели: < /p>
Первый метод: < /p>
def filter_existing_docs(index_name, docs):
# Initialize the Pinecone index
index = pinecone_client.Index(index_name)

# Extract hash_codes from the docs list using the appropriate method for your Document objects
hash_codes = [doc.metadata['hash_code'] for doc in docs]  # Accessing 'metadata' if it's an attribute
print("Hash Codes:", hash_codes)

# Fetch by list of hash_codes (ensure hash_codes are valid ids)
fetch_response = index.fetch(ids=hash_codes)
print("Fetch Response:", fetch_response)

# Get the existing hash_codes that are already in the Pinecone index
existing_hash_codes = set(fetch_response.get('vectors', {}).keys())  # Extract existing IDs from the response
print("1 -----------> Existing Hash Codes:", len(existing_hash_codes))

# Filter out the docs that have already been added to Pinecone
filtered_docs = [doc for doc in docs if doc.metadata['hash_code'] not in existing_hash_codes]
print("2 -----------> Filtered Docs:", len(filtered_docs))

return filtered_docs

Затем попробовал другой подход:
def filter_existing_docs(index_name, docs):
# Initialize the Pinecone index
index = pinecone_client.Index(index_name)

# Extract hash_codes from the docs list using the appropriate method for your Document objects
hash_codes = [doc.metadata['hash_code'] for doc in docs]  # Accessing 'metadata' if it's an attribute
print("Hash Codes:", hash_codes)

# We need to query Pinecone using `top_k` and search through the index
query_response = index.query(
top_k=100,  # Set a suitable `top_k` to return a reasonable number of documents
include_metadata=True,
#namespace=namespace
)

# Debug: Print the query response to see its structure
print("Query Response:", query_response)

# Extract the hash_codes of the existing documents in Pinecone
existing_hash_codes = {item['metadata']['hash_code'] for item in query_response['matches']}
print("1 -----------> Existing Hash Codes:", len(existing_hash_codes))

# Filter out the docs that have already been added to Pinecone based on hash_code
filtered_docs = [doc for doc in docs if str(doc.metadata['hash_code']) not in existing_hash_codes]
print("2 -----------> Filtered Docs:", len(filtered_docs))

return filtered_docs
 

Подробнее здесь: [url]https://stackoverflow.com/questions/79387289/python-filter-vectors-from-pinecone-vector-store-based-on-a-field-saved-in-the[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Python, векторы фильтров из векторного хранилища Pinecone на основе поля, сохраненного в метаданных этих векторов

Последнее сообщение Anonymous « 25 янв 2025, 22:08
Добавлено в форуме Python

Anonymous » 25 янв 2025, 22:08 » в форуме Python

У меня есть векторы, хранящиеся в хранилище векторов Pinecone, каждый вектор представляет собой содержимое PDF-файла:

Метаданные::
hash_code: d53d7ec8b0e66e9a83a97acda09edd3fe9867cadb42833f9bf5525cc3b89fe2d
id: cc54ffbe-9cba-4de9-9f30-a114e4c3c3fb...

0 Ответы

11 Просмотры

Последнее сообщение Anonymous
25 янв 2025, 22:08
ValueError: клиент должен быть экземпляром pinecone.index, Got

Последнее сообщение Anonymous « 21 фев 2025, 19:37
Добавлено в форуме Python

Anonymous » 21 фев 2025, 19:37 » в форуме Python

Помогите мне исправить:
import os
from pinecone import Pinecone, ServerlessSpec
from langchain.vectorstores import Pinecone as PineconeLangchain

os.environ = PINECONE_API_KEY

pc = Pinecone(api_key=PINECONE_API_KEY)

index_name = medchat

if...

0 Ответы

77 Просмотры

Последнее сообщение Anonymous
21 фев 2025, 19:37
Проверьте, когда векторы загружаются в пространство имен Pinecone.

Последнее сообщение Anonymous « 24 окт 2024, 19:33
Добавлено в форуме Python

Anonymous » 24 окт 2024, 19:33 » в форуме Python

У меня настроен индекс сосновой шишки, и я использую Langchain для загрузки подготовленных документов в определенное пространство имен.
Моя проблема в том, что я возвращаю PineconeVectorStore и передаю его в качестве параметра другому классу; однако...

0 Ответы

12 Просмотры

Последнее сообщение Anonymous
24 окт 2024, 19:33
Как установить версию метаданных основных метаданных при создании пакета Python?

Последнее сообщение Anonymous « 11 авг 2025, 18:44
Добавлено в форуме Python

Anonymous » 11 авг 2025, 18:44 » в форуме Python

Основной вопрос
Я хочу построить колесо, используя метаданскую версию: 2.1 . Я не могу понять, как включить это в конфигурации сборки. Есть советы? Я построил пример пакета, используя официальный учебник, и когда я попробовал его загрузить...

0 Ответы

4 Просмотры

Последнее сообщение Anonymous
11 авг 2025, 18:44
Добавьте векторные встроенные данные из PDF-файлов в Pinecone с помощью Langchain и OpenAI.

Последнее сообщение Anonymous « 22 июл 2024, 18:57
Добавлено в форуме Python

Anonymous » 22 июл 2024, 18:57 » в форуме Python

Я не уверен, какая альтернатива .get() для шишки и langchain. Я хочу, чтобы этот код запускался, но постоянно получаю сообщение об ошибке, что .get не является атрибутом для сосновой шишки. Я не уверен, какой альтернативой является его замена.
def...

0 Ответы

24 Просмотры

Последнее сообщение Anonymous
22 июл 2024, 18:57

Вернуться в «Python»