Anonymous
Приложение Azure RAG: противоречивые результаты запроса при анализе массива JSON с помощью Когнитивного поиска Azure
Сообщение
Anonymous » 03 янв 2025, 14:23
Я создаю приложение Azure Retrival-Augmented Generation (RAG) с использованием Когнитивного поиска Azure для обработки данных, связанных с обучаемыми, хранящихся в файлах JSON. Вот общий рабочий процесс:
Структура данных : каждый файл JSON представляет план (например, free-trainees-project-data. json, premium-trainees-project-data.json). Ниже приведен пример структуры:
Код: Выделить всё
[
{
"overview": "The user John Doe (UUID: 00000000-0000-0000-0000-000000000001) has subscribed to the plan 'Free' starting from 2024-07-08 00:00:00.0. The user isn't associated with any tracks or projects.",
"user_name": "John Doe",
"user_uuid": "00000000-0000-0000-0000-000000000001",
"plan_start_date": "2024-07-08 00:00:00.0",
"plan_name": "Free",
"tracks": []
},
{
"overview": "The user Jane Smith (UUID: 00000000-0000-0000-0000-000000000002) has subscribed to the plan 'Free' starting from 2024-02-21 00:00:00.0. The user is associated with the following skill tracks: Track 'Quality Assurance' includes the following projects: [Project 'Sample Project A' (UUID: 00000000-0000-0000-0000-000000000003) has tags [[\"TESTNG\",\"Postman\"]], difficulty level 'intermediate', and is currently 'In Progress'. It is led by Team Lead ABC, started on 2024-06-06 08:14:06.758, and ended on Ongoing. ]",
"user_name": "Jane Smith",
"user_uuid": "00000000-0000-0000-0000-000000000002",
"plan_start_date": "2024-02-21 00:00:00.0",
"plan_name": "Free",
"tracks": [
{
"track_name": "Quality Assurance",
"projects": [
{
"project_name": "Sample Project A",
"project_uuid": "00000000-0000-0000-0000-000000000003",
"project_tags": "[\"TESTNG\",\"Postman\"]",
"team_lead": "Team Lead ABC",
"joining_date": "2024-06-06 08:14:06.758",
"project_difficulty": "intermediate",
"project_status": "In Progress",
"updated_by": "Team Lead ABC",
"exit_date": "Ongoing"
}
]
}
]
},
...
]
Рабочий процесс Azure :
Отправьте файлы JSON в контейнер хранилища Azure.
Настройте Когнитивный поиск Azure для анализа массива JSON с использованием режима анализа массива JSON.< /p>
Векторизация поле обзора с использованием модели text-embedding-ada-003.
Проблема >: Запросы возвращают ненадежные результаты. Например:
Запрос: "Список пользователей в плане "Начальный" без каких-либо треков или проектов"
Ожидается: список пользователей, соответствующих этому условию.
Реально: неправильные пользователи или неполный список.
Запрос: «Подсчитайте количество пользователей, подписавшихся на бесплатный план».
Ожидается: точный подсчет.
Факт: возвращается неверное количество.
Предпринятые шаги:
Проверена структура JSON и режим синтаксического анализа.
Использован обзор в качестве цели векторизации.
Проверено соединение между Azure Сервисы поиска и OpenAI.
Вопросы:
Я неправильно структурирую данные JSON или процесс векторизации?
Как повысить точность запросов в таких случаях использования?
Сведения о среде:
3 . Когнитивный поиск Azure с интеграцией OpenAI.
4. Модель: text-embedding-ada-003.
Мой код:
Код: Выделить всё
@Override
public Flux getOpenAIAsyncClientChatStream(
ModelConfiguration modelConfiguration, List messages) {
AIChatModel aiChatModel = modelConfiguration.getModel();
OpenAIAsyncClient openAIAsyncClient =
new OpenAIClientBuilder()
.credential(new AzureKeyCredential(aiChatModel.getApiKey()))
.endpoint(aiChatModel.getEndpoint())
.buildAsyncClient();
ChatCompletionsOptions chatCompletionsOptions = new ChatCompletionsOptions(messages);
chatCompletionsOptions.setMaxTokens(aiChatModel.getMaxTokens());
chatCompletionsOptions.setTemperature(Double.valueOf(aiChatModel.getTemperature()));
if (modelConfiguration.getDataIngestion()) {
AzureSearchChatExtensionParameters searchParameters =
getAzureSearchChatExtensionParameters(modelConfiguration);
AzureSearchChatExtensionConfiguration searchChatExtension =
new AzureSearchChatExtensionConfiguration(searchParameters);
chatCompletionsOptions.setDataSources(List.of(searchChatExtension));
}
return openAIAsyncClient.getChatCompletionsStream(
aiChatModel.getDeploymentName(), chatCompletionsOptions);
}
private AzureSearchChatExtensionParameters getAzureSearchChatExtensionParameters(
ModelConfiguration modelConfiguration) {
AzureIndex index = modelConfiguration.getIndex();
OnYourDataApiKeyAuthenticationOptions authenticationOptions =
new OnYourDataApiKeyAuthenticationOptions(searchApiKey);
AzureSearchChatExtensionParameters searchParameters =
new AzureSearchChatExtensionParameters(searchEndpoint, index.getIndexName());
searchParameters.setAuthentication(authenticationOptions);
searchParameters.setTopNDocuments(index.getTopNDocuments());
searchParameters.setSemanticConfiguration(index.getSemanticConfiguration());
searchParameters.setQueryType(AzureSearchQueryType.VECTOR_SEMANTIC_HYBRID);
searchParameters.setInScope(index.getInScope());
OnYourDataDeploymentNameVectorizationSource embeddingSource =
new OnYourDataDeploymentNameVectorizationSource(index.getTextEmbeddingModel());
searchParameters.setEmbeddingDependency(embeddingSource);
return searchParameters;
}
Мой индекс Azure в формате JSON:
Код: Выделить всё
{
"name": "all-trainees-json-index-production",
"fields": [
{
"name": "chunk_id",
"type": "Edm.String",
"key": true,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": true,
"facetable": false,
"analyzer": "keyword",
"synonymMaps": []
},
{
"name": "parent_id",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": false,
"filterable": true,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "chunk",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "title",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "text_vector",
"type": "Collection(Edm.Single)",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": [],
"dimensions": 3072,
"vectorSearchProfile": "all-trainees-json-index-production-azureOpenAi-text-profile"
},
{
"name": "user_name",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "user_uuid",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "plan_start_date",
"type": "Edm.DateTimeOffset",
"key": false,
"retrievable": true,
"stored": true,
"searchable": false,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "plan_name",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "tracks",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "track_name",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "projects",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "project_name",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "project_uuid",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "project_tags",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "team_lead",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "joining_date",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "project_difficulty",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "project_status",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "updated_by",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "exit_date",
"type": "Edm.String",
"key": false,
"retrievable": true,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
}
]
}
]
},
{
"name": "AzureSearch_DocumentKey",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"key": false,
"retrievable": false,
"stored": true,
"searchable": false,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.DateTimeOffset",
"key": false,
"retrievable": false,
"stored": true,
"searchable": false,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_content_md5",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
},
{
"name": "metadata_storage_file_extension",
"type": "Edm.String",
"key": false,
"retrievable": false,
"stored": true,
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"synonymMaps": []
}
],
"scoringProfiles": [],
"suggesters": [],
"analyzers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": [],
"normalizers": [],
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
},
"semantic": {
"defaultConfiguration": "all-trainees-json-index-production-semantic-configuration",
"configurations": [
{
"name": "all-trainees-json-index-production-semantic-configuration",
"prioritizedFields": {
"titleField": {
"fieldName": "title"
},
"prioritizedContentFields": [
{
"fieldName": "chunk"
}
],
"prioritizedKeywordsFields": []
}
}
]
},
"vectorSearch": {
"algorithms": [
{
"name": "all-trainees-json-index-production-algorithm",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
],
"profiles": [
{
"name": "all-trainees-json-index-production-azureOpenAi-text-profile",
"algorithm": "all-trainees-json-index-production-algorithm",
"vectorizer": "all-trainees-json-index-production-azureOpenAi-text-vectorizer"
}
],
"vectorizers": [
{
"name": "all-trainees-json-index-production-azureOpenAi-text-vectorizer",
"kind": "azureOpenAI",
"azureOpenAIParameters": {
"resourceUri": "https://custom-test-sample-openai.openai.azure.com",
"deploymentId": "text-embedding-3-large",
"apiKey": "",
"modelName": "text-embedding-3-large"
}
}
],
"compressions": []
},
"@odata.etag": "\"0x8DD28A373EA1D3C\""
}
Импорт и векторизация данных
Я использую режим анализа как массив JSON
Я использую столбец обзора для векторизации
Как выглядит моя индексная таблица
Любые идеи и предложения приветствуются. оценил!
Подробнее здесь:
https://stackoverflow.com/questions/793 ... h-azure-co
1735903402
Anonymous
Я создаю приложение Azure Retrival-Augmented Generation (RAG) с использованием Когнитивного поиска Azure для обработки данных, связанных с обучаемыми, хранящихся в файлах JSON. Вот общий рабочий процесс: [list] [*][b]Структура данных[/b]: каждый файл JSON представляет план (например, free-trainees-project-data. json, premium-trainees-project-data.json). Ниже приведен пример структуры: [/list] [code][ { "overview": "The user John Doe (UUID: 00000000-0000-0000-0000-000000000001) has subscribed to the plan 'Free' starting from 2024-07-08 00:00:00.0. The user isn't associated with any tracks or projects.", "user_name": "John Doe", "user_uuid": "00000000-0000-0000-0000-000000000001", "plan_start_date": "2024-07-08 00:00:00.0", "plan_name": "Free", "tracks": [] }, { "overview": "The user Jane Smith (UUID: 00000000-0000-0000-0000-000000000002) has subscribed to the plan 'Free' starting from 2024-02-21 00:00:00.0. The user is associated with the following skill tracks: Track 'Quality Assurance' includes the following projects: [Project 'Sample Project A' (UUID: 00000000-0000-0000-0000-000000000003) has tags [[\"TESTNG\",\"Postman\"]], difficulty level 'intermediate', and is currently 'In Progress'. It is led by Team Lead ABC, started on 2024-06-06 08:14:06.758, and ended on Ongoing. ]", "user_name": "Jane Smith", "user_uuid": "00000000-0000-0000-0000-000000000002", "plan_start_date": "2024-02-21 00:00:00.0", "plan_name": "Free", "tracks": [ { "track_name": "Quality Assurance", "projects": [ { "project_name": "Sample Project A", "project_uuid": "00000000-0000-0000-0000-000000000003", "project_tags": "[\"TESTNG\",\"Postman\"]", "team_lead": "Team Lead ABC", "joining_date": "2024-06-06 08:14:06.758", "project_difficulty": "intermediate", "project_status": "In Progress", "updated_by": "Team Lead ABC", "exit_date": "Ongoing" } ] } ] }, ... ] [/code] [list] [*][b]Рабочий процесс Azure[/b]: [*] Отправьте файлы JSON в контейнер хранилища Azure. [*]Настройте Когнитивный поиск Azure для анализа массива JSON с использованием режима анализа массива JSON.< /p> [*]Векторизация поле обзора с использованием модели text-embedding-ada-003. [*][b]Проблема[/b] >: Запросы возвращают ненадежные результаты. Например: [*]Запрос: "Список пользователей в плане "Начальный" без каких-либо треков или проектов" [*]Ожидается: список пользователей, соответствующих этому условию. [*] Реально: неправильные пользователи или неполный список. [*]Запрос: «Подсчитайте количество пользователей, подписавшихся на бесплатный план». [*]Ожидается: точный подсчет. [*]Факт: возвращается неверное количество. [/list][b]Предпринятые шаги:[/b] [list] [*]Проверена структура JSON и режим синтаксического анализа. [*]Использован обзор в качестве цели векторизации. [*]Проверено соединение между Azure Сервисы поиска и OpenAI. [/list] [b]Вопросы:[/b] [list] [*]Я неправильно структурирую данные JSON или процесс векторизации? [*]Как повысить точность запросов в таких случаях использования? [/list] [b]Сведения о среде:[/b] 3 . Когнитивный поиск Azure с интеграцией OpenAI. 4. Модель: text-embedding-ada-003. [b]Мой код:[/b] [code] @Override public Flux getOpenAIAsyncClientChatStream( ModelConfiguration modelConfiguration, List messages) { AIChatModel aiChatModel = modelConfiguration.getModel(); OpenAIAsyncClient openAIAsyncClient = new OpenAIClientBuilder() .credential(new AzureKeyCredential(aiChatModel.getApiKey())) .endpoint(aiChatModel.getEndpoint()) .buildAsyncClient(); ChatCompletionsOptions chatCompletionsOptions = new ChatCompletionsOptions(messages); chatCompletionsOptions.setMaxTokens(aiChatModel.getMaxTokens()); chatCompletionsOptions.setTemperature(Double.valueOf(aiChatModel.getTemperature())); if (modelConfiguration.getDataIngestion()) { AzureSearchChatExtensionParameters searchParameters = getAzureSearchChatExtensionParameters(modelConfiguration); AzureSearchChatExtensionConfiguration searchChatExtension = new AzureSearchChatExtensionConfiguration(searchParameters); chatCompletionsOptions.setDataSources(List.of(searchChatExtension)); } return openAIAsyncClient.getChatCompletionsStream( aiChatModel.getDeploymentName(), chatCompletionsOptions); } private AzureSearchChatExtensionParameters getAzureSearchChatExtensionParameters( ModelConfiguration modelConfiguration) { AzureIndex index = modelConfiguration.getIndex(); OnYourDataApiKeyAuthenticationOptions authenticationOptions = new OnYourDataApiKeyAuthenticationOptions(searchApiKey); AzureSearchChatExtensionParameters searchParameters = new AzureSearchChatExtensionParameters(searchEndpoint, index.getIndexName()); searchParameters.setAuthentication(authenticationOptions); searchParameters.setTopNDocuments(index.getTopNDocuments()); searchParameters.setSemanticConfiguration(index.getSemanticConfiguration()); searchParameters.setQueryType(AzureSearchQueryType.VECTOR_SEMANTIC_HYBRID); searchParameters.setInScope(index.getInScope()); OnYourDataDeploymentNameVectorizationSource embeddingSource = new OnYourDataDeploymentNameVectorizationSource(index.getTextEmbeddingModel()); searchParameters.setEmbeddingDependency(embeddingSource); return searchParameters; } [/code] [b]Мой индекс Azure в формате JSON:[/b] [code] { "name": "all-trainees-json-index-production", "fields": [ { "name": "chunk_id", "type": "Edm.String", "key": true, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": true, "facetable": false, "analyzer": "keyword", "synonymMaps": [] }, { "name": "parent_id", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": false, "filterable": true, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "chunk", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "title", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "text_vector", "type": "Collection(Edm.Single)", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [], "dimensions": 3072, "vectorSearchProfile": "all-trainees-json-index-production-azureOpenAi-text-profile" }, { "name": "user_name", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "user_uuid", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "plan_start_date", "type": "Edm.DateTimeOffset", "key": false, "retrievable": true, "stored": true, "searchable": false, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "plan_name", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "tracks", "type": "Collection(Edm.ComplexType)", "fields": [ { "name": "track_name", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "projects", "type": "Collection(Edm.ComplexType)", "fields": [ { "name": "project_name", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "project_uuid", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "project_tags", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "team_lead", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "joining_date", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "project_difficulty", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "project_status", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "updated_by", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "exit_date", "type": "Edm.String", "key": false, "retrievable": true, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] } ] } ] }, { "name": "AzureSearch_DocumentKey", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_content_type", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_size", "type": "Edm.Int64", "key": false, "retrievable": false, "stored": true, "searchable": false, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_last_modified", "type": "Edm.DateTimeOffset", "key": false, "retrievable": false, "stored": true, "searchable": false, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_content_md5", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_name", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_path", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] }, { "name": "metadata_storage_file_extension", "type": "Edm.String", "key": false, "retrievable": false, "stored": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "synonymMaps": [] } ], "scoringProfiles": [], "suggesters": [], "analyzers": [], "tokenizers": [], "tokenFilters": [], "charFilters": [], "normalizers": [], "similarity": { "@odata.type": "#Microsoft.Azure.Search.BM25Similarity" }, "semantic": { "defaultConfiguration": "all-trainees-json-index-production-semantic-configuration", "configurations": [ { "name": "all-trainees-json-index-production-semantic-configuration", "prioritizedFields": { "titleField": { "fieldName": "title" }, "prioritizedContentFields": [ { "fieldName": "chunk" } ], "prioritizedKeywordsFields": [] } } ] }, "vectorSearch": { "algorithms": [ { "name": "all-trainees-json-index-production-algorithm", "kind": "hnsw", "hnswParameters": { "m": 4, "efConstruction": 400, "efSearch": 500, "metric": "cosine" } } ], "profiles": [ { "name": "all-trainees-json-index-production-azureOpenAi-text-profile", "algorithm": "all-trainees-json-index-production-algorithm", "vectorizer": "all-trainees-json-index-production-azureOpenAi-text-vectorizer" } ], "vectorizers": [ { "name": "all-trainees-json-index-production-azureOpenAi-text-vectorizer", "kind": "azureOpenAI", "azureOpenAIParameters": { "resourceUri": "https://custom-test-sample-openai.openai.azure.com", "deploymentId": "text-embedding-3-large", "apiKey": "", "modelName": "text-embedding-3-large" } } ], "compressions": [] }, "@odata.etag": "\"0x8DD28A373EA1D3C\"" } [/code] [b]Импорт и векторизация данных[/b] Я использую режим анализа как массив JSON [img]https://i.sstatic.net/QUhyA8nZ.png[/img] Я использую столбец обзора для векторизации [img]https://i.sstatic.net/pzXthIfg.png[/img] Как выглядит моя индексная таблица [img]https://i.sstatic.net/MBPBDAZp.png[/img] Любые идеи и предложения приветствуются. оценил! Подробнее здесь: [url]https://stackoverflow.com/questions/79323974/azure-rag-app-inconsistent-query-results-using-json-array-parsing-with-azure-co[/url]