Я пытался запустить некоторые функции nltk в наборе данных спам-сообщений UCI, но столкнулся с проблемой: word_tokenize не работает даже после загрузки зависимостей.
Я пытался запустить некоторые функции nltk в наборе данных спам-сообщений UCI, но столкнулся с проблемой: word_tokenize не работает даже после загрузки зависимостей. [code]import nltk nltk.download('punkt') from nltk.tokenize import word_tokenize
df['text'].apply(lambda x: len(nltk.word_tokenize(x))) [/code] следующая ошибка: [code]{ "name": "LookupError", "message": " ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:
>>> import nltk >>> nltk.download('punkt_tab')
For more information see: https://www.nltk.org/data.html
File ~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python312\\site-packages\\pandas\\core\\apply.py:1427, in SeriesApply.apply(self) 1424 return self.apply_compat() 1426 # self.func is Callable -> 1427 return self.apply_standard()
File ~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python312\\site-packages\\pandas\\core\\apply.py:1507, in SeriesApply.apply_standard(self) 1501 # row-wise access 1502 # apply doesn't have a `na_action` keyword and for backward compat reasons 1503 # we need to give `na_action=\"ignore\"` for categorical data. 1504 # TODO: remove the `na_action=\"ignore\"` when that default has been changed in 1505 # Categorical (GH51645). 1506 action = \"ignore\" if isinstance(obj.dtype, CategoricalDtype) else None -> 1507 mapped = obj._map_values( 1508 mapper=curried, na_action=action, convert=self.convert_dtype 1509 ) 1511 if len(mapped) and isinstance(mapped[0], ABCSeries): 1512 # GH#43986 Need to do list(mapped) in order to get treated as nested 1513 # See also GH#25959 regarding EA support 1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File lib.pyx:2972, in pandas._libs.lib.map_infer()
Cell In[1024], line 3, in (x) 1 #finding no. of words ----> 3 df['text'].apply(lambda x: len(nltk.word_tokenize(x)))
File ~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python312\\site-packages\ ltk\\tokenize\\__init__.py:129, in word_tokenize(text, language, preserve_line) 114 def word_tokenize(text, language=\"english\", preserve_line=False): 115 \"\"\" 116 Return a tokenized copy of *text*, 117 using NLTK's recommended word tokenizer (...) 127 :type preserve_line: bool 128 \"\"\" --> 129 sentences = [text] if preserve_line else sent_tokenize(text, language) 130 return [ 131 token for sent in sentences for token in _treebank_word_tokenizer.tokenize(sent) 132 ]
File ~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python312\\site-packages\ ltk\\tokenize\\__init__.py:106, in sent_tokenize(text, language) 96 def sent_tokenize(text, language=\"english\"): 97 \"\"\" 98 Return a sentence-tokenized copy of *text*, 99 using NLTK's recommended sentence tokenizer (...) 104 :param language: the model name in the Punkt corpus 105 \"\"\" --> 106 tokenizer = PunktTokenizer(language) 107 return tokenizer.tokenize(text)
LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:
>>> import nltk >>> nltk.download('punkt_tab')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt_tab/english/
Searched in: - 'C:\\\\Users\\\\user/nltk_data' - 'C:\\\\Program Files\\\\WindowsApps\\\\PythonSoftwareFoundation.Python.3.12_3.12.1520.0_x64__qbz5n2kfra8p0\\\ ltk_data' - 'C:\\\\Program Files\\\\WindowsApps\\\\PythonSoftwareFoundation.Python.3.12_3.12.1520.0_x64__qbz5n2kfra8p0\\\\share\\\ ltk_data' - 'C:\\\\Program Files\\\\WindowsApps\\\\PythonSoftwareFoundation.Python.3.12_3.12.1520.0_x64__qbz5n2kfra8p0\\\\lib\\\ ltk_data' - 'C:\\\\Users\\\\user\\\\AppData\\\\Roaming\\\ ltk_data' - 'C:\\\ ltk_data' - 'D:\\\ ltk_data' - 'E:\\\ ltk_data' ********************************************************************** " } [/code] Я попробовал переустановить nltk и загрузить несколько других файлов зависимостей, но ничего не помогло. Что я делаю не так?