En-IN Потоковая транскрипция не работает в Google STT

En-IN Потоковая транскрипция не работает в Google STT ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

En-IN Потоковая транскрипция не работает в Google STT

Цитата

Сообщение Anonymous » 15 июл 2024, 08:37

Я использую облачные сервисы Google для транскрипции звука, поступающего через микрофон. Я создал распознаватель с параметрами: global=Long=en-In. Во время тестирования он даже не возвращает ни одной транскрипции или слова! Но с другим распознавательом с параметрами global=short=en-IN это работает. Может ли кто-нибудь помочь мне разобраться в этом?

Код: Выделить всё

import queue
import threading
import time

import audioop
import pyaudio

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxxxxx"

# Audio recording parameters
RATE = 8000
CHUNK = int(RATE / 25)  # 20ms

class MicrophoneStream:
"""Opens a recording stream as a generator yielding the audio chunks."""

def __init__(self: object, rate: int = RATE, chunk: int = CHUNK) -> None:
"""The audio -- and generator -- is guaranteed to be on the main thread."""
self._rate = rate
self._chunk = chunk

# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True

def __enter__(self: object) -> object:
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
# The API currently only supports 1-channel (mono) audio
google api link goes here
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)

self.closed = False

return self

def __exit__(
self: object,
type: object,
value: object,
traceback: object,
) -> None:
"""Closes the stream, regardless of whether the connection was lost or not."""
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()

def _fill_buffer(
self: object,
in_data: object,
frame_count: int,
time_info: object,
status_flags: object,
) -> object:
"""Continuously collect data from the audio stream, into the buffer.

Args:
in_data: The audio data as a bytes object
frame_count: The number of frames captured
time_info: The time information
status_flags: The status flags

Returns:
The audio data as a bytes object
"""
data = audioop.lin2ulaw(in_data, 2)
self._buff.put(data)
return None, pyaudio.paContinue

def generator(self: object) ->  object:
"""Generates audio chunks from the stream of audio data in chunks.

Args:
self: The MicrophoneStream object

Returns:
A generator that outputs audio chunks.
"""
count = 1
while not self.closed:
count += 1
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]

# if count >= 100:
#     break

# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break

yield b"".join(data)

def main_eng():
client1 = SpeechClient()

print("Speak Eng...")
while True:
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
audio_requests = (
cloud_speech_types.StreamingRecognizeRequest(audio=audio) for audio in audio_generator
)

adaptation = cloud_speech_types.SpeechAdaptation(
phrase_sets=[
cloud_speech_types.SpeechAdaptation.AdaptationPhraseSet(
phrase_set='projects/xxxxxxxx/locations/global/phraseSets/adaptations-en'
)
]
)

recognition_config = cloud_speech_types.RecognitionConfig(
explicit_decoding_config=cloud_speech_types.ExplicitDecodingConfig(encoding='LINEAR16',
sample_rate_hertz=8000,
audio_channel_count=1),
language_codes=["en-IN"],
model="long",
adaptation=adaptation
)

# print(recognition_config)

streaming_config = cloud_speech_types.StreamingRecognitionConfig(
config=recognition_config,
streaming_features=cloud_speech_types.StreamingRecognitionFeatures(interim_results=True)
)
config_request = cloud_speech_types.StreamingRecognizeRequest(
recognizer="projects/xxxxxxx/locations/global/recognizers/abc-en-long",
streaming_config=streaming_config,
)

def requests(config: cloud_speech_types.RecognitionConfig, audio: list) -> list:
yield config
yield from audio

# Transcribes the audio into text
responses_iterator = client1.streaming_recognize(
requests=requests(config_request, audio_requests)
)

responses = []
for response in responses_iterator:
responses.append(response)
for result in response.results:
# print(result)
if result.alternatives and result.is_final:
transcriptB = result.alternatives[0].transcript
print(f"Transcript: from Eng: {transcriptB}")

Я попробовал приведенный выше код с распознавателем en-IN с короткой моделью, он сработал. Для других языков это тоже сработало.

Подробнее здесь: https://stackoverflow.com/questions/787 ... google-stt

1721021875

Anonymous

Я использую облачные сервисы Google для транскрипции звука, поступающего через микрофон. Я создал распознаватель с параметрами: global=Long=en-In. Во время тестирования он даже не возвращает ни одной транскрипции или слова! Но с другим распознавательом с параметрами global=short=en-IN это работает.  Может ли кто-нибудь помочь мне разобраться в этом?
[code]
import queue
import threading
import time

import audioop
import pyaudio

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxxxxx"

# Audio recording parameters
RATE = 8000
CHUNK = int(RATE / 25)  # 20ms

class MicrophoneStream:
"""Opens a recording stream as a generator yielding the audio chunks."""

def __init__(self: object, rate: int = RATE, chunk: int = CHUNK) -> None:
"""The audio -- and generator -- is guaranteed to be on the main thread."""
self._rate = rate
self._chunk = chunk

# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True

def __enter__(self: object) -> object:
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
# The API currently only supports 1-channel (mono) audio
google api link goes here
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)

self.closed = False

return self

def __exit__(
self: object,
type: object,
value: object,
traceback: object,
) -> None:
"""Closes the stream, regardless of whether the connection was lost or not."""
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()

def _fill_buffer(
self: object,
in_data: object,
frame_count: int,
time_info: object,
status_flags: object,
) -> object:
"""Continuously collect data from the audio stream, into the buffer.

Args:
in_data: The audio data as a bytes object
frame_count: The number of frames captured
time_info: The time information
status_flags: The status flags

Returns:
The audio data as a bytes object
"""
data = audioop.lin2ulaw(in_data, 2)
self._buff.put(data)
return None, pyaudio.paContinue

def generator(self: object) ->  object:
"""Generates audio chunks from the stream of audio data in chunks.

Args:
self: The MicrophoneStream object

Returns:
A generator that outputs audio chunks.
"""
count = 1
while not self.closed:
count += 1
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]

# if count >= 100:
#     break

# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break

yield b"".join(data)

def main_eng():
client1 = SpeechClient()

print("Speak Eng...")
while True:
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
audio_requests = (
cloud_speech_types.StreamingRecognizeRequest(audio=audio) for audio in audio_generator
)

adaptation = cloud_speech_types.SpeechAdaptation(
phrase_sets=[
cloud_speech_types.SpeechAdaptation.AdaptationPhraseSet(
phrase_set='projects/xxxxxxxx/locations/global/phraseSets/adaptations-en'
)
]
)

recognition_config = cloud_speech_types.RecognitionConfig(
explicit_decoding_config=cloud_speech_types.ExplicitDecodingConfig(encoding='LINEAR16',
sample_rate_hertz=8000,
audio_channel_count=1),
language_codes=["en-IN"],
model="long",
adaptation=adaptation
)

# print(recognition_config)

streaming_config = cloud_speech_types.StreamingRecognitionConfig(
config=recognition_config,
streaming_features=cloud_speech_types.StreamingRecognitionFeatures(interim_results=True)
)
config_request = cloud_speech_types.StreamingRecognizeRequest(
recognizer="projects/xxxxxxx/locations/global/recognizers/abc-en-long",
streaming_config=streaming_config,
)

def requests(config: cloud_speech_types.RecognitionConfig, audio: list) -> list:
yield config
yield from audio

# Transcribes the audio into text
responses_iterator = client1.streaming_recognize(
requests=requests(config_request, audio_requests)
)

responses = []
for response in responses_iterator:
responses.append(response)
for result in response.results:
# print(result)
if result.alternatives and result.is_final:
transcriptB = result.alternatives[0].transcript
print(f"Transcript: from Eng: {transcriptB}")
[/code]
Я попробовал приведенный выше код с распознавателем en-IN с короткой моделью, он сработал. Для других языков это тоже сработало. 

Подробнее здесь: [url]https://stackoverflow.com/questions/78748181/en-in-streaming-transcription-doesnt-work-in-google-stt[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Печатайте точные промежуточные слова в реальном времени из Google Stt

Последнее сообщение Anonymous « 07 ноя 2023, 09:05
Добавлено в форуме C#

Anonymous » 07 ноя 2023, 09:05 » в форуме C#

Как распечатать текст произнесенных слов в реальном времени из Google Stt. Мы можем увидеть это в голосовом поиске Google, если даём команду типа «переполнение стека», он мгновенно печатает слова. У меня есть код ниже -

static async Task...

0 Ответы

39 Просмотры

Последнее сообщение Anonymous
07 ноя 2023, 09:05
Google-cloud-speech Транскрипция для кодировки AAC

Последнее сообщение Anonymous « 20 янв 2025, 02:01
Добавлено в форуме Android

Anonymous » 20 янв 2025, 02:01 » в форуме Android

Вопрос/запрос к команде Google Speech API: может ли Google Speech обеспечить транскрипцию для кодировки AAC?

Справочная информация --- Мы работаем над мобильным приложением, которое будет работать на платформах iOS и Android. AAC — единственная...

0 Ответы

10 Просмотры

Последнее сообщение Anonymous
20 янв 2025, 02:01
Google-cloud-speech Транскрипция для кодировки AAC

Последнее сообщение Anonymous « 20 янв 2025, 02:18
Добавлено в форуме IOS

Anonymous » 20 янв 2025, 02:18 » в форуме IOS

Вопрос/запрос к команде Google Speech API: может ли Google Speech обеспечить транскрипцию для кодировки AAC?

Справочная информация --- Мы работаем над мобильным приложением, которое будет работать на платформах iOS и Android. AAC — единственная...

0 Ответы

5 Просмотры

Последнее сообщение Anonymous
20 янв 2025, 02:18
Google-cloud-speech Транскрипция для кодировки AAC

Последнее сообщение Anonymous « 20 янв 2025, 02:18
Добавлено в форуме Android

Anonymous » 20 янв 2025, 02:18 » в форуме Android

Вопрос/запрос к команде Google Speech API: может ли Google Speech обеспечить транскрипцию для кодировки AAC?

Справочная информация --- Мы работаем над мобильным приложением, которое будет работать на платформах iOS и Android. AAC — единственная...

0 Ответы

7 Просмотры

Последнее сообщение Anonymous
20 янв 2025, 02:18
Я столкнулся с трудностями при разработке приложения, которое объединяет видеоконференции и STT (преобразование речи в т

Последнее сообщение Anonymous « 24 май 2024, 05:19
Добавлено в форуме Android

Anonymous » 24 май 2024, 05:19 » в форуме Android

В настоящее время я разрабатываю приложение, которое интегрирует видеоконференции (с использованием jitsiMeetSDK) и STT (с использованием @react-native-voice/voice) с React Native.
Я столкнулся с проблемой на Android, когда при подключен к...

0 Ответы

51 Просмотры

Последнее сообщение Anonymous
24 май 2024, 05:19

Вернуться в «Python»