может ли кто-нибудь помочь?
я нашел несколько моделей с обнимающимся лицом, как
TheUpperCaseGuy/Guy-Urdu-TTS
pocketmonkey/speecht5_tts_urdu
Talha185/speecht5_finetuned_urdu_TTS
но я не могу создать или сгенерировать речь хорошего качества из текста
может ли кто-нибудь помочь ????
Код: Выделить всё
import torch
from transformers import SpeechT5ForTextToSpeech, SpeechT5HifiGan, AutoTokenizer
import soundfile as sf
from datasets import load_dataset
# Load the model and tokenizer
model_name = "pocketmonkey/speecht5_tts_urdu"
model = SpeechT5ForTextToSpeech.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# Load speaker embeddings
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
# Prepare the text input
urdu_text = ",HELLO HOW ARE YOU,AUR BATAO KESE HO AAJ KAL?آپ کیسے ہیں؟" # "How are you?" in Urdu
inputs = tokenizer(text=urdu_text, return_tensors="pt")
# Generate speech
speech = model.generate_speech(inputs["input_ids"], vocoder=vocoder, speaker_embeddings=speaker_embeddings)
# Save the audio file
sf.write("output.wav", speech.numpy(), samplerate=16000)
print("Audio saved as 'output.wav'")`
Подробнее здесь: https://stackoverflow.com/questions/788 ... ith-python