RAG на Mac (M3) с langchain (RetrivalQA): код работает бесконечно

RAG на Mac (M3) с langchain (RetrivalQA): код работает бесконечно ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

RAG на Mac (M3) с langchain (RetrivalQA): код работает бесконечно

Цитата

Сообщение Anonymous » 15 янв 2025, 13:12

Я пытаюсь запустить систему RAG на своем Mac M3-pro (18 ГБ ОЗУ), используя langchain и `Llama-3.2-3B-Instruct` на ноутбуке Jupyter (векторное хранилище — Milvus). ).

Когда я вызываю RetrivalQA.from_chain_type, ячейка работает неопределенно долго (минимум 15 минут, не давала ей работать) дольше...).

Код: Выделить всё

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,  # (optional)
chain_type_kwargs={"prompt": prompt}
)
response = qa_chain.invoke({"query": question})

Можете ли вы помочь решить проблему, пожалуйста?
LLM, программа извлечения и подсказка приведены ниже:

Код: Выделить всё

from langchain.llms.base import LLM
from typing import List, Dict
from pydantic import PrivateAttr

class HuggingFaceLLM(LLM):
# Define pipeline as a private attribute
_pipeline: any = PrivateAttr()

def __init__(self, pipeline):
super().__init__()
self._pipeline = pipeline

def _call(self, prompt: str, stop: List[str] = None) -> str:
# Generate text using the Hugging Face pipeline
# response = self._pipeline(prompt, max_length=512, num_return_sequences=1)
response = self._pipeline(prompt, num_return_sequences=1)
return response[0]["generated_text"]

@property
def _identifying_params(self):
return {"name": "HuggingFaceLLM"}

@property
def _llm_type(self):
return "custom"

llm = HuggingFaceLLM(pipeline=llm_pipeline)

Конвейер llm:

Код: Выделить всё

from langchain.prompts import PromptTemplate
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name = "meta-llama/Llama-3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=hf_token)
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=hf_token)

llm_pipeline = pipeline( "text-generation",
model=model,
tokenizer=tokenizer,
device=0,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
truncation=True,
)

подсказка:

Код: Выделить всё

prompt_template = """
You are a helpful assistant.  Use the following context to answer the question concisely.
If you do not know the answer from the context, please state so and do not search for an answer elsewhere.

Context:
{context}

Question:
{question}

Answer:
"""

prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template
)

Ретривер:

Код: Выделить всё

class MilvusRetriever(BaseRetriever, BaseModel):
collection: any
embedding_function: Callable[[str], np.ndarray]
text_field: str
vector_field: str
top_k: int = 5

def get_relevant_documents(self, query: str) -> List[Dict]:
query_embedding = self.embedding_function(query)

search_params = {"metric_type": "IP", "params": {"nprobe": 10}}
results = self.collection.search(
data=[query_embedding],
anns_field=self.vector_field,
param=search_params,
limit=self.top_k,
output_fields=[self.text_field]
)

documents = []
for hit in results[0]:
documents.append(
Document(
page_content=hit.entity.get(self.text_field),
metadata={"score": hit.distance}
)
)
return documents

async def aget_relevant_documents(self, query: str) -> List[Dict]:
"""Asynchronous version of get_relevant_documents."""
return self.get_relevant_documents(query)

retriever = MilvusRetriever(
collection=collection,
embedding_function=embed_model.embed_query,
text_field="text",
vector_field="embedding",
top_k=5
)

Я также проверяю, включены ли графические процессоры Mac:

Код: Выделить всё

import torch
if torch.backends.mps.is_available():
print("MPS is available!")

Изменить 1[/b]: Как рекомендовано здесь, я попробовал добавить подробное описание:

Код: Выделить всё

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,  # (optional)
# return_source_documents=False,  # (optional)
verbose=True,
chain_type_kwargs={
"verbose": True,
"prompt": prompt
}
)

Теперь результат:

Код: Выделить всё

> Entering new RetrievalQA chain...

> Entering new StuffDocumentsChain chain...

> Entering new LLMChain chain...
Prompt after formatting:


Context:


Question:


Answer:

(и все еще застрял здесь)

Подробнее здесь: https://stackoverflow.com/questions/793 ... definitely

1736935935

Anonymous

[b]
Я пытаюсь запустить систему RAG на своем Mac M3-pro (18 ГБ ОЗУ), используя langchain и `Llama-3.2-3B-Instruct` на ноутбуке Jupyter (векторное хранилище — Milvus). ).

Когда я вызываю RetrivalQA.from_chain_type, ячейка работает неопределенно долго (минимум 15 минут, не давала ей работать) дольше...).

[code]from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,  # (optional)
chain_type_kwargs={"prompt": prompt}
)
response = qa_chain.invoke({"query": question})
[/code]
Можете ли вы помочь решить проблему, пожалуйста?
LLM, программа извлечения и подсказка приведены ниже:
[code]from langchain.llms.base import LLM
from typing import List, Dict
from pydantic import PrivateAttr

class HuggingFaceLLM(LLM):
# Define pipeline as a private attribute
_pipeline: any = PrivateAttr()

def __init__(self, pipeline):
super().__init__()
self._pipeline = pipeline

def _call(self, prompt: str, stop: List[str] = None) -> str:
# Generate text using the Hugging Face pipeline
# response = self._pipeline(prompt, max_length=512, num_return_sequences=1)
response = self._pipeline(prompt, num_return_sequences=1)
return response[0]["generated_text"]

@property
def _identifying_params(self):
return {"name": "HuggingFaceLLM"}

@property
def _llm_type(self):
return "custom"

llm = HuggingFaceLLM(pipeline=llm_pipeline)
[/code]
Конвейер llm:
[code]from langchain.prompts import PromptTemplate
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name = "meta-llama/Llama-3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=hf_token)
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=hf_token)

llm_pipeline = pipeline( "text-generation",
model=model,
tokenizer=tokenizer,
device=0,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
truncation=True,
)
[/code]
подсказка:
[code]prompt_template = """
You are a helpful assistant.  Use the following context to answer the question concisely.
If you do not know the answer from the context, please state so and do not search for an answer elsewhere.

Context:
{context}

Question:
{question}

Answer:
"""

prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template
)
[/code]
Ретривер:
[code]class MilvusRetriever(BaseRetriever, BaseModel):
collection: any
embedding_function: Callable[[str], np.ndarray]
text_field: str
vector_field: str
top_k: int = 5

def get_relevant_documents(self, query: str) -> List[Dict]:
query_embedding = self.embedding_function(query)

search_params = {"metric_type": "IP", "params": {"nprobe": 10}}
results = self.collection.search(
data=[query_embedding],
anns_field=self.vector_field,
param=search_params,
limit=self.top_k,
output_fields=[self.text_field]
)

documents = []
for hit in results[0]:
documents.append(
Document(
page_content=hit.entity.get(self.text_field),
metadata={"score": hit.distance}
)
)
return documents

async def aget_relevant_documents(self, query: str) -> List[Dict]:
"""Asynchronous version of get_relevant_documents."""
return self.get_relevant_documents(query)

retriever = MilvusRetriever(
collection=collection,
embedding_function=embed_model.embed_query,
text_field="text",
vector_field="embedding",
top_k=5
)
[/code]
Я также проверяю, включены ли графические процессоры Mac:
[code]import torch
if torch.backends.mps.is_available():
print("MPS is available!")
[/code]

Изменить 1[/b]: Как рекомендовано здесь, я попробовал добавить подробное описание:
[code]qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,  # (optional)
# return_source_documents=False,  # (optional)
verbose=True,
chain_type_kwargs={
"verbose": True,
"prompt": prompt
}
)
[/code]
Теперь результат:
[code]> Entering new RetrievalQA chain...

> Entering new StuffDocumentsChain chain...

> Entering new LLMChain chain...
Prompt after formatting:


Context:


Question:


Answer:

[/code]
(и все еще застрял здесь) 

Подробнее здесь: [url]https://stackoverflow.com/questions/79351880/rag-on-mac-m3-with-langchain-retrievalqa-code-runs-indefinitely[/url]