Я хотел бы использовать модель BridgeTower для обработки текста и изображений в векторы, сохранения их в векторной базе данных LanceDB, а затем получения соответствующих изображений с помощью текстовых запросов. Чтобы воспроизвести проблему, выполните следующие действия:
[*]Сначала обработайте текст и изображение в векторные данные:
Код: Выделить всё
def bt_embedding_from_local_pretrained(prompt, image_path):
model_name = "D:\\download\\bridgetower-large-itm-mlm-itc"
processor = AutoProcessor.from_pretrained(model_name)
model = BridgeTowerModel.from_pretrained(model_name)
if image_path is not None and image_path != '':
image = Image.open(image_path)
inputs = processor(images=image, text=prompt, return_tensors="pt")
else:
return
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.pooler_output
embeddings_list = embeddings.squeeze().tolist()
return embeddings_list
text = "The image features a young boy walking on a playground floor, which is designed to look like a carpet. The boy is wearing a blue shirt and appears to be enjoying his time at the playground. \n\nIn the background, there are two chairs, one located near the left side of the playground and the other closer to the right side. The playground also has a bench situated in the middle of the scene."
image_path = "./frame_0.jpg"
vector = bt_embedding_from_local_pretrained(text, image_path)
print(vector)
print(len(vector))
#output
[-0.4280051290988922, 0.8150287866592407, -0.4738779664039612, -0.8128997683525085, 0.0006316843791864812, 0.4806518256664276, 0.22251057624816895, 0.6701756715774536, ....
2048 #the length of vector
Код: Выделить всё
inputs = processor(images=image, text=prompt, return_tensors="pt")
Подробнее здесь: https://stackoverflow.com/questions/791 ... ridgetower
Мобильная версия