Вот пример. моей настройки:
Код: Выделить всё
from presidio_analyzer import AnalyzerEngineProvider
from presidio_anonymizer import AnonymizerEngine
FR_TEXT = """Nom complet : Jean Dupont
Préférence sexuelle : Jean s'identifie comme hétérosexuel"""
analyzer_conf_file = "path/to/all-config.yml"
provider = AnalyzerEngineProvider(analyzer_engine_conf_file=analyzer_conf_file)
analyzer = provider.create_engine()
analyzer_results = analyzer.analyze(text=FR_TEXT, language="fr")
anonymizer = AnonymizerEngine()
result = anonymizer.anonymize(text=FR_TEXT, analyzer_results=analyzer_results)
print(result.text)
Код: Выделить всё
supported_languages:
- en
- fr
- nl
default_score_threshold: 0
nlp_configuration:
nlp_engine_name: spacy
models:
-
lang_code: en
model_name: en_core_web_lg
-
lang_code: fr
model_name: fr_core_news_lg
-
lang_code: nl
model_name: nl_core_news_lg
recognizer_registry:
global_regex_flags: 26
recognizers:
- name: "SexualityFr"
supported_language: "fr"
supported_entity: "SEXUALITY"
deny_list: [hétérosexuel]
deny_list_score: 1
Подробнее здесь: https://stackoverflow.com/questions/790 ... -correct-y