Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure

Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure

Цитата

Сообщение Anonymous » 25 сен 2024, 14:54

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их в Azure в реальном времени. API преобразования речи в текст для транскрипции и диаризации.
Проблема, с которой я сталкиваюсь, заключается в том, что клиентский код с ScriptProcessor работает нормально, и мы получаем транскрипцию идеально, но кажется, что ScriptProcessor действительно выполняет тяжелую работу. Процессоры машины. Итак, мы решили двигаться дальше и попробовали использовать MediaRecorder, но здесь транскрипция всегда отсутствует или транскрипция не происходит.
Я предоставил два клиентских фрагмента, а также минимальный серверный код. Чтобы воспроизвести проблему, единственное различие, которое я заметил между этими клиентскими кодами, ScriptProcessors сокращает размер в байтах, тогда как Mediarecorder сокращает время в миллисекундах.
Любая помощь будет оценена по достоинству
Работа клиентского кода с ScriptProcessor

Код: Выделить всё





Audio Streaming Client


Audio Streaming Client
Start Streaming
Stop Streaming


let audioContext;
let mediaStream;
let source;
let processor;
let socket;

const startButton = document.getElementById('startButton');
const stopButton = document.getElementById('stopButton');

startButton.addEventListener('click', async () => {
startButton.disabled =[enter image description here](https://i.sstatic.net/JvYNS2C9.png) true;
stopButton.disabled = false;

// Initialize WebSocket
socket = new WebSocket('ws://localhost:8000');

socket.onopen = async () => {
// Create an AudioContext with a specific sample rate
audioContext = new (window.AudioContext || window.webkitAudioContext)();

// Get access to the microphone
mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Create a MediaStreamSource from the microphone input
source = audioContext.createMediaStreamSource(mediaStream);

// Create a ScriptProcessorNode with a buffer size of 4096, one input and one output channel
processor = audioContext.createScriptProcessor(1024, 1, 1);

// Connect the microphone source to the processor node
source.connect(processor);

// Handle audio processing and send the data through WebSocket
processor.onaudioprocess = function (e) {
// const inputData = e.inputBuffer.getChannelData(0);
// const outputData = new Int16Array(inputData.length);

// // Convert Float32Array to Int16Array
// for (let i = 0; i < inputData.length; i++) {
//     outputData[i] = Math.min(1, Math.max(-1, inputData[i])) * 0x7FFF;
// }

if (socket.readyState === WebSocket.OPEN) {
socket.send(e.inputBuffer.getChannelData(0));
}
};

// Connect the processor node to the destination (optional, for monitoring)
processor.connect(audioContext.destination);
};

socket.onerror = function (error) {
console.error('WebSocket Error: ', error);
};
});

stopButton.addEventListener('click', () => {
stopButton.disabled = true;
startButton.disabled = false;

if (processor) {
processor.disconnect();
}
if (source) {
source.disconnect();
}
if (audioContext) {
audioContext.close();
}
if (socket) {
socket.close();
}

if (mediaStream) {
mediaStream.getTracks().forEach(track => track.stop());
}
});

Не работает клиентский код с mediarecorder

Код: Выделить всё

const connectButton = document.getElementById("connectButton");
const startButton = document.getElementById("startButton");
const stopButton = document.getElementById("stopButton");
let mediaRecorder;
let socket;

connectButton.addEventListener("click", () => {
socket = new WebSocket("ws://localhost:8000");

socket.addEventListener("open", () =>  {
console.log("Connected to server");
connectButton.disabled = true;
startButton.disabled = false;
});

socket.addEventListener("close", () => {
console.log("Disconnected from server");
connectButton.disabled = false;
startButton.disabled = true;
stopButton.disabled = true;
});
});

startButton.addEventListener("click", async () => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);

mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0 && socket && socket.readyState === WebSocket.OPEN) {
socket.send(event.data);
console.log("audio sent");
}
};

mediaRecorder.start(100); // Collect audio in chunks of 100ms

startButton.disabled = true;
stopButton.disabled = false;
});

stopButton.addEventListener("click", () =>  {
if (mediaRecorder) {
mediaRecorder.stop();
}
if (socket) {
socket.close();
}
startButton.disabled = false;
stopButton.disabled = true;
});

Простой серверный код со всеми функциями понижающей дискретизации и предварительной обработки

Код: Выделить всё

import asyncio
import websockets
import os
import datetime
import soxr
import numpy as np
from pydub import AudioSegment
from io import BytesIO
from scipy.io.wavfile import write
from scipy.signal import resample
import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv

load_dotenv()
speech_key = os.getenv("SPEECH_KEY")
speech_region = os.getenv("SPEECH_REGION")

write_stream = None
buffer = None
write_stream_sampled = None

def downsample_audio(byte_chunk, original_rate, target_rate, num_channels=1):
"""
Downsample an audio byte chunk.

Args:
byte_chunk (bytes): Audio data in bytes format.
original_rate (int): Original sample rate of the audio.
target_rate (int): Target sample rate after downsampling.
num_channels (int): Number of audio channels (1 for mono, 2 for stereo).

Returns:
bytes: Downsampled audio data in bytes.
"""
audio_data = np.frombuffer(byte_chunk, dtype=np.int16)

if num_channels == 2:
# Reshape for stereo
audio_data = audio_data.reshape(-1, 2)

# Calculate the number of samples in the downsampled audio
num_samples = int(len(audio_data) * target_rate / original_rate)

# Downsample the audio
downsampled_audio = resample(audio_data, num_samples)

# Ensure the data is in int16 format
downsampled_audio = np.round(downsampled_audio).astype(np.int16)

# Convert back to bytes
downsampled_bytes = downsampled_audio.tobytes()

return downsampled_bytes

def setup_azure_service():

speech_config = speechsdk.SpeechConfig(
subscription=speech_key,
region=speech_region,
)

# azure service logging to find cancellation issues
speech_config.set_property(
speechsdk.PropertyId.Speech_LogFilename, "azure_speech_sdk.log"
)

speech_config.enable_audio_logging()

speech_config.set_property(
property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode,
value="Continuous",
)
speech_config.set_property_by_name("maxSpeakerCount", str(8))

speech_config.request_word_level_timestamps()

auto_detect_lang_config = speechsdk.AutoDetectSourceLanguageConfig(
languages=["en-US", "es-ES"]
)

audio_stream_format = speechsdk.audio.AudioStreamFormat(
samples_per_second=16000
)
push_stream = speechsdk.audio.PushAudioInputStream(
stream_format=audio_stream_format
)

audio_config = speechsdk.audio.AudioConfig(stream=push_stream)

transcriber = speechsdk.transcription.ConversationTranscriber(
speech_config=speech_config,
audio_config=audio_config,
auto_detect_source_language_config=auto_detect_lang_config
)

def start_callback(evt):
print("Session started")

def transcribed(evt):
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
det_lang = evt.result.properties[
speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult
]
transcribed_text = evt.result.text
speaker_id = evt.result.speaker_id

print(f"Language:  {det_lang}")
print("\tText={}".format(transcribed_text))
print("\tSpeaker ID={}".format(speaker_id))

transcriber.session_started.connect(start_callback)
transcriber.transcribed.connect(transcribed)

return transcriber, push_stream

async def handle_client_connection(websocket, path):
global write_stream
global buffer
global write_stream_sampled

print("Client connected")

transcriber, push_stream = setup_azure_service()

transcriber.start_transcribing_async().get()

try:

async for message in websocket:

if buffer is None:
buffer = b""

if write_stream is None:
write_stream = open("output.webm", "ab")

if write_stream_sampled is None:
write_stream_sampled = open("output_sampled.webm", "ab")

if isinstance(message, bytes):
buffer += message

print(type(buffer))
while len(buffer) >= 4096:
audio_chunk = buffer[:4096]
buffer = buffer[4096:]

print(f"Audio chunk of size: {len(audio_chunk)} received")
push_stream.write(audio_chunk)

# print("audio received")
# if write_stream is None:
#     write_stream = open(
#         "output.webm", "ab"
#     )  # 'ab' mode to append in binary
# if isinstance(message, bytes):
#     write_stream.write(message)
# else:
#     print("Received non-binary message")
except websockets.ConnectionClosed:
print("Client disconnected")
finally:
if write_stream:
write_stream.close()
write_stream = None

transcriber.stop_transcribing_async().get()

async def start_server():
server = await websockets.serve(handle_client_connection, "127.0.0.1", 8000)
print("Server is running on port 8000")
await server.wait_closed()

if __name__ == "__main__":
print(datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S"))
asyncio.get_event_loop().run_until_complete(start_server())
asyncio.get_event_loop().run_forever()

Ожидаемый результат
введите здесь описание изображения

Подробнее здесь: https://stackoverflow.com/questions/790 ... erver-scri

1727265248

Anonymous

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их в Azure в реальном времени. API преобразования речи в текст для транскрипции и диаризации.
Проблема, с которой я сталкиваюсь, заключается в том, что клиентский код с ScriptProcessor работает нормально, и мы получаем транскрипцию идеально, но кажется, что ScriptProcessor действительно выполняет тяжелую работу. Процессоры машины. Итак, мы решили двигаться дальше и попробовали использовать MediaRecorder, но здесь транскрипция всегда отсутствует или транскрипция не происходит.
Я предоставил два клиентских фрагмента, а также минимальный серверный код.  Чтобы воспроизвести проблему, единственное различие, которое я заметил между этими клиентскими кодами, ScriptProcessors сокращает размер в байтах, тогда как Mediarecorder сокращает время в миллисекундах.
Любая помощь будет оценена по достоинству
Работа клиентского кода с ScriptProcessor
[code]




Audio Streaming Client


Audio Streaming Client
Start Streaming
Stop Streaming


let audioContext;
let mediaStream;
let source;
let processor;
let socket;

const startButton = document.getElementById('startButton');
const stopButton = document.getElementById('stopButton');

startButton.addEventListener('click', async () => {
startButton.disabled =[enter image description here](https://i.sstatic.net/JvYNS2C9.png) true;
stopButton.disabled = false;

// Initialize WebSocket
socket = new WebSocket('ws://localhost:8000');

socket.onopen = async () => {
// Create an AudioContext with a specific sample rate
audioContext = new (window.AudioContext || window.webkitAudioContext)();

// Get access to the microphone
mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Create a MediaStreamSource from the microphone input
source = audioContext.createMediaStreamSource(mediaStream);

// Create a ScriptProcessorNode with a buffer size of 4096, one input and one output channel
processor = audioContext.createScriptProcessor(1024, 1, 1);

// Connect the microphone source to the processor node
source.connect(processor);

// Handle audio processing and send the data through WebSocket
processor.onaudioprocess = function (e) {
// const inputData = e.inputBuffer.getChannelData(0);
// const outputData = new Int16Array(inputData.length);

// // Convert Float32Array to Int16Array
// for (let i = 0; i < inputData.length; i++) {
//     outputData[i] = Math.min(1, Math.max(-1, inputData[i])) * 0x7FFF;
// }

if (socket.readyState === WebSocket.OPEN) {
socket.send(e.inputBuffer.getChannelData(0));
}
};

// Connect the processor node to the destination (optional, for monitoring)
processor.connect(audioContext.destination);
};

socket.onerror = function (error) {
console.error('WebSocket Error: ', error);
};
});

stopButton.addEventListener('click', () => {
stopButton.disabled = true;
startButton.disabled = false;

if (processor) {
processor.disconnect();
}
if (source) {
source.disconnect();
}
if (audioContext) {
audioContext.close();
}
if (socket) {
socket.close();
}

if (mediaStream) {
mediaStream.getTracks().forEach(track => track.stop());
}
});



[/code]
Не работает клиентский код с mediarecorder
[code]const connectButton = document.getElementById("connectButton");
const startButton = document.getElementById("startButton");
const stopButton = document.getElementById("stopButton");
let mediaRecorder;
let socket;

connectButton.addEventListener("click", () => {
socket = new WebSocket("ws://localhost:8000");

socket.addEventListener("open", () =>  {
console.log("Connected to server");
connectButton.disabled = true;
startButton.disabled = false;
});

socket.addEventListener("close", () => {
console.log("Disconnected from server");
connectButton.disabled = false;
startButton.disabled = true;
stopButton.disabled = true;
});
});

startButton.addEventListener("click", async () => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);

mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0 && socket && socket.readyState === WebSocket.OPEN) {
socket.send(event.data);
console.log("audio sent");
}
};

mediaRecorder.start(100); // Collect audio in chunks of 100ms

startButton.disabled = true;
stopButton.disabled = false;
});

stopButton.addEventListener("click", () =>  {
if (mediaRecorder) {
mediaRecorder.stop();
}
if (socket) {
socket.close();
}
startButton.disabled = false;
stopButton.disabled = true;
});
[/code]
Простой серверный код со всеми функциями понижающей дискретизации и предварительной обработки
[code]import asyncio
import websockets
import os
import datetime
import soxr
import numpy as np
from pydub import AudioSegment
from io import BytesIO
from scipy.io.wavfile import write
from scipy.signal import resample
import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv

load_dotenv()
speech_key = os.getenv("SPEECH_KEY")
speech_region = os.getenv("SPEECH_REGION")

write_stream = None
buffer = None
write_stream_sampled = None

def downsample_audio(byte_chunk, original_rate, target_rate, num_channels=1):
"""
Downsample an audio byte chunk.

Args:
byte_chunk (bytes): Audio data in bytes format.
original_rate (int): Original sample rate of the audio.
target_rate (int): Target sample rate after downsampling.
num_channels (int): Number of audio channels (1 for mono, 2 for stereo).

Returns:
bytes: Downsampled audio data in bytes.
"""
audio_data = np.frombuffer(byte_chunk, dtype=np.int16)

if num_channels == 2:
# Reshape for stereo
audio_data = audio_data.reshape(-1, 2)

# Calculate the number of samples in the downsampled audio
num_samples = int(len(audio_data) * target_rate / original_rate)

# Downsample the audio
downsampled_audio = resample(audio_data, num_samples)

# Ensure the data is in int16 format
downsampled_audio = np.round(downsampled_audio).astype(np.int16)

# Convert back to bytes
downsampled_bytes = downsampled_audio.tobytes()

return downsampled_bytes

def setup_azure_service():

speech_config = speechsdk.SpeechConfig(
subscription=speech_key,
region=speech_region,
)

# azure service logging to find cancellation issues
speech_config.set_property(
speechsdk.PropertyId.Speech_LogFilename, "azure_speech_sdk.log"
)

speech_config.enable_audio_logging()

speech_config.set_property(
property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode,
value="Continuous",
)
speech_config.set_property_by_name("maxSpeakerCount", str(8))

speech_config.request_word_level_timestamps()

auto_detect_lang_config = speechsdk.AutoDetectSourceLanguageConfig(
languages=["en-US", "es-ES"]
)

audio_stream_format = speechsdk.audio.AudioStreamFormat(
samples_per_second=16000
)
push_stream = speechsdk.audio.PushAudioInputStream(
stream_format=audio_stream_format
)

audio_config = speechsdk.audio.AudioConfig(stream=push_stream)

transcriber = speechsdk.transcription.ConversationTranscriber(
speech_config=speech_config,
audio_config=audio_config,
auto_detect_source_language_config=auto_detect_lang_config
)

def start_callback(evt):
print("Session started")

def transcribed(evt):
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
det_lang = evt.result.properties[
speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult
]
transcribed_text = evt.result.text
speaker_id = evt.result.speaker_id

print(f"Language:  {det_lang}")
print("\tText={}".format(transcribed_text))
print("\tSpeaker ID={}".format(speaker_id))

transcriber.session_started.connect(start_callback)
transcriber.transcribed.connect(transcribed)

return transcriber, push_stream

async def handle_client_connection(websocket, path):
global write_stream
global buffer
global write_stream_sampled

print("Client connected")

transcriber, push_stream = setup_azure_service()

transcriber.start_transcribing_async().get()

try:

async for message in websocket:

if buffer is None:
buffer = b""

if write_stream is None:
write_stream = open("output.webm", "ab")

if write_stream_sampled is None:
write_stream_sampled = open("output_sampled.webm", "ab")

if isinstance(message, bytes):
buffer += message

print(type(buffer))
while len(buffer) >= 4096:
audio_chunk = buffer[:4096]
buffer = buffer[4096:]

print(f"Audio chunk of size: {len(audio_chunk)} received")
push_stream.write(audio_chunk)

# print("audio received")
# if write_stream is None:
#     write_stream = open(
#         "output.webm", "ab"
#     )  # 'ab' mode to append in binary
# if isinstance(message, bytes):
#     write_stream.write(message)
# else:
#     print("Received non-binary message")
except websockets.ConnectionClosed:
print("Client disconnected")
finally:
if write_stream:
write_stream.close()
write_stream = None

transcriber.stop_transcribing_async().get()

async def start_server():
server = await websockets.serve(handle_client_connection, "127.0.0.1", 8000)
print("Server is running on port 8000")
await server.wait_closed()

if __name__ == "__main__":
print(datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S"))
asyncio.get_event_loop().run_until_complete(start_server())
asyncio.get_event_loop().run_forever()
[/code]
Ожидаемый результат
введите здесь описание изображения 

Подробнее здесь: [url]https://stackoverflow.com/questions/79019225/handling-audio-blobs-sent-through-mediarecorder-in-js-through-python-server-scri[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure

Последнее сообщение Anonymous « 24 сен 2024, 18:01
Добавлено в форуме Python

Anonymous » 24 сен 2024, 18:01 » в форуме Python

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их...

0 Ответы

14 Просмотры

Последнее сообщение Anonymous
24 сен 2024, 18:01
Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure

Последнее сообщение Anonymous « 04 окт 2024, 12:02
Добавлено в форуме Python

Anonymous » 04 окт 2024, 12:02 » в форуме Python

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их...

0 Ответы

14 Просмотры

Последнее сообщение Anonymous
04 окт 2024, 12:02
Обработка аудио-BLOB-объектов, отправленных через MediaRecorder в JS с помощью скрипта сервера Python для службы Azure

Последнее сообщение Anonymous « 04 окт 2024, 16:12
Добавлено в форуме Python

Anonymous » 04 окт 2024, 16:12 » в форуме Python

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их...

0 Ответы

16 Просмотры

Последнее сообщение Anonymous
04 окт 2024, 16:12
Обработка аудио-BLOB-объектов, отправленных через медиарекордер в JS с помощью скрипта сервера Python для службы Azure

Последнее сообщение Anonymous « 26 сен 2024, 10:49
Добавлено в форуме Python

Anonymous » 26 сен 2024, 10:49 » в форуме Python

У меня есть два клиентских кода для выбора аудио и его потоковой передачи на сервер через веб-сокет. Один использует ScriptProcessor, а другой — с помощью функций MediaRecorder в Javascript. Задача сервера — выбрать эти аудиофрагменты и отправить их...

0 Ответы

11 Просмотры

Последнее сообщение Anonymous
26 сен 2024, 10:49
Неполное и невидимое видео Webm от MediareCorder, транслируемого через WebSocket в Fastapi и Azure Blob Blob

Последнее сообщение Anonymous « 03 июл 2025, 12:09
Добавлено в форуме Javascript

Anonymous » 03 июл 2025, 12:09 » в форуме Javascript

Я создаю крупномасштабное приложение для мониторинга видео, которое необходимо записывать веб-камеру пользователя и экран на срок до 3 часов и загружать потоки в режиме реального времени на хранилище Blob Blob. Цель состоит в том, чтобы обрабатывать...

0 Ответы

4 Просмотры

Последнее сообщение Anonymous
03 июл 2025, 12:09

Вернуться в «Python»