Код: Выделить всё
from transformers import AutoModel, AutoTokenizer
import numpy as np
from rank_bm25 import BM25Okapi
from sklearn.neighbors import NearestNeighbors
class EmbeddingModels:
def bert(self, model_name, text):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).detach().numpy()
return embeddings
def create_chunks(self, text, chunk_size):
return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
Код: Выделить всё
A parameter name that contains 'beta' will be renamed internally to 'bias'.
Please use a different name to suppress this warning.
A parameter name that contains 'gamma' will be renamed internally to 'weight'.
Please use a different name to suppress this warning.
Обновление пакета, подавление предупреждений с помощью предупреждений об импорте >
Подробнее здесь: https://stackoverflow.com/questions/788 ... g-utils-py