Проблемы с ne_chunk nltk - Цифровое Кемерово

Проблемы с ne_chunk nltk ⇐ Python

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Проблемы с ne_chunk nltk

Цитата

Сообщение Anonymous » 10 янв 2025, 15:19

Я пытался использовать блокировщик сущностей nltk и пробовал разные подходы, но постоянно получаю ошибку:

Код: Выделить всё

LookupError                               Traceback (most recent call last)
...
8 pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences]
10 # Create the named entity chunks: chunked_sentences
---> 11 chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)
13 # Test for stems of the tree with 'NE' tags
14 for sent in chunked_sentences:

178 """
179 Use NLTK's currently recommended named entity chunker to
180 chunk the given list of tagged tokens.
(...)
187
188 """
189 if binary:
--> 190     chunker = ne_chunker(fmt="binary")
191 else:
192     chunker = ne_chunker()

170 def ne_chunker(fmt="multiclass"):
171     """
172     Load NLTK's currently recommended named entity chunker.
173     """
--> 174     return Maxent_NE_Chunker(fmt)
...
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'

Я попытался использовать как свой собственный код, так и приведенный ниже пример, который я нашел в сообщении блога:

Код: Выделить всё

from nltk.tokenize import word_tokenize

import nltk
from nltk.chunk import ne_chunk
nltk.download('punkt')
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

article="The taxi-hailing company Uber brings into very sharp focus the question of whether corporations can be said to have a moral character. If any human being were..."
print(article)

# Tokenize the article into sentences: sentences
sentences = nltk.sent_tokenize(article)

# Tokenize each sentence into words: token_sentences
token_sentences = [nltk.word_tokenize(sent) for sent in sentences]

# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences]

# Create the named entity chunks: chunked_sentences
chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)

# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
for chunk in sent:
if hasattr(chunk, "label") and chunk.label() == "NE":
print(chunk)

Я пробовал «из nltk.chunk импортировать ne_chunk» и «из nltk импортировать ne_chunk», а также пробовал использовать ne_chunk_sents() вместо ne_chunk(). Я пробовал воспроизвести несколько других примеров кода, но похоже, что при использовании ne_chunk nltk я все еще получаю ту же ошибку.
Мой вопрос: что может быть причиной этого?

Подробнее здесь: https://stackoverflow.com/questions/793 ... s-ne-chunk

1736511554

Anonymous

Я пытался использовать блокировщик сущностей nltk и пробовал разные подходы, но постоянно получаю ошибку:
[code]LookupError                               Traceback (most recent call last)
...
8 pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences]
10 # Create the named entity chunks: chunked_sentences
---> 11 chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)
13 # Test for stems of the tree with 'NE' tags
14 for sent in chunked_sentences:

178 """
179 Use NLTK's currently recommended named entity chunker to
180 chunk the given list of tagged tokens.
(...)
187
188 """
189 if binary:
--> 190     chunker = ne_chunker(fmt="binary")
191 else:
192     chunker = ne_chunker()

170 def ne_chunker(fmt="multiclass"):
171     """
172     Load NLTK's currently recommended named entity chunker.
173     """
--> 174     return Maxent_NE_Chunker(fmt)
...
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'

[/code]
Я попытался использовать как свой собственный код, так и приведенный ниже пример, который я нашел в сообщении блога:
[code]from nltk.tokenize import word_tokenize

import nltk
from nltk.chunk import ne_chunk
nltk.download('punkt')
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

article="The taxi-hailing company Uber brings into very sharp focus the question of whether corporations can be said to have a moral character. If any human being were..."
print(article)

# Tokenize the article into sentences: sentences
sentences = nltk.sent_tokenize(article)

# Tokenize each sentence into words: token_sentences
token_sentences = [nltk.word_tokenize(sent) for sent in sentences]

# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences]

# Create the named entity chunks: chunked_sentences
chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)

# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
for chunk in sent:
if hasattr(chunk, "label") and chunk.label() == "NE":
print(chunk)
[/code]
Я пробовал «из nltk.chunk импортировать ne_chunk» и «из nltk импортировать ne_chunk», а также пробовал использовать ne_chunk_sents() вместо ne_chunk(). Я пробовал воспроизвести несколько других примеров кода, но похоже, что при использовании ne_chunk nltk я все еще получаю ту же ошибку.
Мой вопрос: что может быть причиной этого? 

Подробнее здесь: [url]https://stackoverflow.com/questions/79345691/issues-with-nltks-ne-chunk[/url]