Код: Выделить всё
from datasets import load_dataset
ds = load_dataset("UCLNLP/adversarial_qa", "adversarialQA")
Код: Выделить всё
d0 = ds['train'][0]
d0
{'id': '7ba1e8f4261d3170fcf42e84a81dd749116fae95',
'title': 'Brain',
'context': 'Another approach to brain function is to examine the consequences of damage to specific brain areas. Even though it is protected by the skull and meninges, surrounded by cerebrospinal fluid, and isolated from the bloodstream by the blood–brain barrier, the delicate nature of the brain makes it vulnerable to numerous diseases and several types of damage. In humans, the effects of strokes and other types of brain damage have been a key source of information about brain function. Because there is no ability to experimentally control the nature of the damage, however, this information is often difficult to interpret. In animal studies, most commonly involving rats, it is possible to use electrodes or locally injected chemicals to produce precise patterns of damage and then examine the consequences for behavior.',
'question': 'What sare the benifts of the blood brain barrir?',
'answers': {'text': ['isolated from the bloodstream'], 'answer_start': [195]},
'metadata': {'split': 'train', 'model_in_the_loop': 'Combined'}}
Код: Выделить всё
from transformers import BertTokenizerFast
bert_tokenizer = BertTokenizerFast.from_pretrained('bert-large-uncased', return_token_type_ids=True)
bert_tokenizer.decode(bert_tokenizer.encode(d0['question'], d0['context'])[56:61])
'isolated from the bloodstream'
Это из учебного класса linkedin. Инструктор выполнил преобразование и создал файл CSV, но не поделился им или кодом для этого. Это ожидаемый результат:

Подробнее здесь: https://stackoverflow.com/questions/791 ... en-indices