Как при тестировании LLM в наборе данных GSM8K сравнивать результаты и маркировать (основную истину) как в традиционном

Как при тестировании LLM в наборе данных GSM8K сравнивать результаты и маркировать (основную истину) как в традиционном ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Как при тестировании LLM в наборе данных GSM8K сравнивать результаты и маркировать (основную истину) как в традиционном

Цитата

Сообщение Anonymous » 03 дек 2024, 17:09

Я обучался LLM с помощью GSM8K. Когда я хочу это протестировать:
образец данных:
{'question': "Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy. She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed. In the afternoon, she gives her chickens another 25 cups of feed. How many cups of feed does she need to give her chickens in the final meal of the day if the size of Wendi's flock is 20 chickens?"

'answer(label)': 'If each chicken eats 3 cups of feed per day, then for 20 chickens they would need 3*20=60 cups of feed per day.\nIf she feeds the flock 15 cups of feed in the morning, and 25 cups in the afternoon, then the final meal would require 60-15-25=20 cups of chicken feed.\n#### 20'

Пока генерация:
Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy. She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed. In the afternoon, she gives her chickens another 25 cups of feed. How many cups of feed does she need to give her chickens in the final meal of the day if the size of Wendi's flock is 20 chickens? To determine how many cups of feed Wendi needs for the final meal of the day, we start by calculating the total amount of feed required for all the chickens.

First, we calculate the amount of feed given in a single day:
\[
15 \text{ cups (morning)} + 25 \text{ cups (afternoon)} = 40 \text{ cups}
\]

Next, we need to account for the number of chickens:
\[
20 \text{ chickens}
\]

Хотя ответ правильный, как мне найти метку в выходных данных?
Или как сделать так, чтобы выходные данные совпадали с меткой по формату?
Код:
import re
def extract_num(text):
# Regex pattern to find the number following '####'
pattern = r'####\s*(\d+)'
# Using re.search to find the first match
match = re.search(pattern, text)
if match:
result = match.group(1)
print(result)
else:
print(text)
result = ""
try:
return int(result.replace(",", ""))
except:
print(f"'{result}' can't be converted")
return 0

all = 0
correct = 0
samples = test_ds.select(range(10))
# samples = test_ds[:50]
# print(samples[0]['question'])
for example in samples:
# print(f"Question: {example['question']}")
# print(f"Answer: {example['answer']}")
input_text = f"Question: {example['question']}\nPlease provide the answer in the format '#### + number' shortly without repeating the quetions."
inputs = tokenizer(example['question'], return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {key: value.to(device) for key, value in inputs.items()}
outputs = model.generate(inputs['input_ids'], max_length=512)
pred_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

gt = extract_num(example["answer"])
pred = extract_num(pred_text)
correct += int(gt == pred)
all += 1
# if all % 100 == 0:
print(f"{all} Acc: {correct/all:.2f}")
# t.set_description(f"Accuracy: {correct/all*100:.2f}%")

print("Acc:", correct/all)

Подробнее здесь: https://stackoverflow.com/questions/792 ... ground-tru

1733234992

Anonymous

Я обучался LLM с помощью GSM8K. Когда я хочу это протестировать:
образец данных:
{'question': "Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy.  She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed.  In the afternoon, she gives her chickens another 25 cups of feed.  How many cups of feed does she need to give her chickens in the final meal of the day if the size of Wendi's flock is 20 chickens?"

'answer(label)': 'If each chicken eats 3 cups of feed per day, then for 20 chickens they would need 3*20=60 cups of feed per day.\nIf she feeds the flock 15 cups of feed in the morning, and 25 cups in the afternoon, then the final meal would require 60-15-25=20 cups of chicken feed.\n#### 20'

Пока генерация:
Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy.  She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed.  In the afternoon, she gives her chickens another 25 cups of feed.  How many cups of feed does she need to give her chickens in the final meal of the day if the size of Wendi's flock is 20 chickens? To determine how many cups of feed Wendi needs for the final meal of the day, we start by calculating the total amount of feed required for all the chickens.

First, we calculate the amount of feed given in a single day:
\[
15 \text{ cups (morning)} + 25 \text{ cups (afternoon)} = 40 \text{ cups}
\]

Next, we need to account for the number of chickens:
\[
20 \text{ chickens}
\]

Хотя ответ правильный, как мне найти метку в выходных данных?
Или как сделать так, чтобы выходные данные совпадали с меткой по формату?
Код:
import re
def extract_num(text):
# Regex pattern to find the number following '####'
pattern = r'####\s*(\d+)'
# Using re.search to find the first match
match = re.search(pattern, text)
if match:
result = match.group(1)
print(result)
else:
print(text)
result = ""
try:
return int(result.replace(",", ""))
except:
print(f"'{result}' can't be converted")
return 0

all = 0
correct = 0
samples = test_ds.select(range(10))
# samples = test_ds[:50]
# print(samples[0]['question'])
for example in samples:
# print(f"Question: {example['question']}")
# print(f"Answer: {example['answer']}")
input_text = f"Question: {example['question']}\nPlease provide the answer in the format '#### + number' shortly without repeating the quetions."
inputs = tokenizer(example['question'], return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {key: value.to(device) for key, value in inputs.items()}
outputs = model.generate(inputs['input_ids'], max_length=512)
pred_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

gt = extract_num(example["answer"])
pred = extract_num(pred_text)
correct += int(gt == pred)
all += 1
# if all % 100 == 0:
print(f"{all} Acc: {correct/all:.2f}")
# t.set_description(f"Accuracy: {correct/all*100:.2f}%")

print("Acc:", correct/all)
 

Подробнее здесь: [url]https://stackoverflow.com/questions/79247924/when-testing-a-llm-in-gsm8k-dataset-how-to-compare-the-out-and-labelground-tru[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Как при тестировании LLM в наборе данных GSM8K сравнить выходные данные с меткой (основная истина), например, при традиц

Последнее сообщение Anonymous « 03 дек 2024, 17:57
Добавлено в форуме Python

Anonymous » 03 дек 2024, 17:57 » в форуме Python

Я обучался LLM с помощью GSM8K. Когда я хочу это протестировать:
образец данных:
{'question': Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy. She...

0 Ответы

20 Просмотры

Последнее сообщение Anonymous
03 дек 2024, 17:57
Классификация с несколькими метками в pytorch: как представить основную истину и какую функцию потерь использовать?

Последнее сообщение Anonymous « 19 ноя 2024, 15:33
Добавлено в форуме Python

Anonymous » 19 ноя 2024, 15:33 » в форуме Python

Я работаю над проектом, в котором мне нужно выполнить классификацию с помощью нейронной сети. Я использую простой MLP, начиная с 1024 функций. Итак, у меня есть 1024-мерный массив с одним или двумя связанными с ним числами.
Эти числа (в данном...

0 Ответы

15 Просмотры

Последнее сообщение Anonymous
19 ноя 2024, 15:33
Модель ResNet50 имеет высокую точность на тестовом наборе, но плохо работает на том же наборе при тестировании вручную.

Последнее сообщение Anonymous « 17 ноя 2024, 15:01
Добавлено в форуме Python

Anonymous » 17 ноя 2024, 15:01 » в форуме Python

Я новичок в машинном обучении и пытаюсь обучить модель ResNet50 на наборе данных из примерно 100 классов.
Для этого я сначала использую разделенные папки, чтобы разделить данные на обучающий набор, набор проверки и тестовый набор, эти наборы затем...

0 Ответы

27 Просмотры

Последнее сообщение Anonymous
17 ноя 2024, 15:01
Модель ResNet50 имеет высокую точность на тестовом наборе, но плохо работает на том же наборе при тестировании вручную.

Последнее сообщение Anonymous « 17 ноя 2024, 22:24
Добавлено в форуме Python

Anonymous » 17 ноя 2024, 22:24 » в форуме Python

Я новичок в машинном обучении и пытаюсь обучить модель ResNet50 на наборе данных из примерно 100 классов.
Для этого я сначала использую разделенные папки, чтобы разделить данные на обучающий набор, набор проверки и тестовый набор, эти наборы затем...

0 Ответы

35 Просмотры

Последнее сообщение Anonymous
17 ноя 2024, 22:24
Модель ResNet50 имеет высокую точность на тестовом наборе, но плохо работает на том же наборе при тестировании вручную.

Последнее сообщение Anonymous « 18 ноя 2024, 01:08
Добавлено в форуме Python

Anonymous » 18 ноя 2024, 01:08 » в форуме Python

Я новичок в машинном обучении и пытаюсь обучить модель ResNet50 на наборе данных из примерно 100 классов.
Для этого я сначала использую разделенные папки, чтобы разделить данные на обучающий набор, набор проверки и тестовый набор, эти наборы затем...

0 Ответы

19 Просмотры

Последнее сообщение Anonymous
18 ноя 2024, 01:08

Вернуться в «Python»