Код: Выделить всё
import spacy
from spacy.attrs import ORTH
nlp = spacy.load("en_core_web_sm")
text = "according to reports the washing machine industry is declining"
special_case_1 = [{ORTH: 'according to'}]
nlp.tokenizer.add_special_case('according to', special_case_1)
special_case_2 = [{ORTH: 'washing machine'}]
nlp.tokenizer.add_special_case('washing machine', special_case_2)
doc = nlp(text)
for token in doc:
print(token, token.pos_)
Код: Выделить всё
according to PROPN
reports VERB
the DET
washing machine PROPN
industry NOUN
is AUX
declining VERB
Решение
Решение заключалось в использовании AttributeRuler в spaCy. https://spacy.io/usage/linguistic-featu ... Exceptions
Код: Выделить всё
import spacy
from spacy.attrs import ORTH
nlp = spacy.load("en_core_web_sm")
text = "according to reports the washing machine is more
popular than air conditioning because of an unknown reason"
compounds = ['according to', 'because of', 'washing machine',
'air conditioning']
for comp in compounds:
special_case = [{ORTH: comp}]
nlp.tokenizer.add_special_case(comp, special_case)
Код: Выделить всё
according to PROPN NNP
washing machine PROPN NNP
air conditioning VERB VBG
because of VERB VBZ
Код: Выделить всё
ruler = nlp.get_pipe("attribute_ruler")
prepositions = ['according to', 'because of']
nouns = ['washing machine', 'air conditioning']
for prep in prepositions:
preposition_patterns = [[{"LOWER": prep}]]
preposition_attrs = {"TAG": "IN", "POS": "ADP"}
ruler.add(patterns=preposition_patterns,
attrs=preposition_attrs)
for noun in nouns:
noun_patterns = [[{"LOWER": noun}]]
noun_attrs = {"TAG": "NN", "POS": "NOUN"}
ruler.add(patterns=noun_patterns, attrs=noun_attrs)
Код: Выделить всё
doc = nlp(text)
for token in doc:
print(token, token.tag_, token.pos_)
Код: Выделить всё
according to IN ADP
reports VBZ VERB
the DT DET
washing machine NN NOUN
is VBZ AUX
more RBR ADV
popular JJ ADJ
than IN ADP
air conditioning NN NOUN
because of IN ADP
an DT DET
unknown JJ ADJ
reason NN NOUN
Подробнее здесь: https://stackoverflow.com/questions/714 ... ecial-case