Speech Recognition Module in Python
Speech Module in Python: Converting text to speech, known as Speech Synthesis, this process is the computer-generated recreation of human speech. This module converts the human language text into human-like speech audio.
In this article, we will discuss how to convert text to speech in Python language. We will not be developing any neutral networks nor training the model to achieve any results. Instead, we will use some APIs and engines that offer the facility to convert text into speech in Python. There are many APIs that have this quality, and among them, one of the most used services is Google text to Speech, an online library. In contrast, another library we will discuss is pyttsx3, which is an offline library of Python.
To get started with Python online library, that is gTTS (Google Text To Speech) library, by installing it using PIP:
pip3 install gTTS # Google text to speech library pip3 install playsound
Online Text to Speech Module
The gTTS is the python library used for interfacing with Google translate’s text to speech API. This library only works with an internet connection, and this is very easy and simple to use.
Open the new Python file and Import:
import gtts # Google text to speech library from playsound import playsound
We need to pass the text to the gTTS (Google text to speech library) module to use this library. What is the interface to Google translate's text to speech API:
For Example:
# First, we have made the request to google to get synthesis textts = gtts.gTTS ( " Hello Programmers " )
Till now, we have sent the text and recovered the original speech from the API (Application Programming Interface), Now we will save this audio to the file:
# we are saving the audio into a file textts.save( " JTP.mp3 " )
Now, we can notice that a new file has visible in the current directory. We can play it by using the playsound module, which we have installed earlier.
# we will play the audio file playsound ( " JTP.mp3 " )
And now, we can hear a robot speaking what we just asked it to say. We can use it for other languages also, by passing the lang parameter:
# for example, in spanish textts = gtts.gTTS ( " Hola Española " , lang = " es " ) textts.save ( " spanish.mp3 " ) playsound ( " spanish.mp3 " )
If the user does not want to save the audio in the file and wants to play it directly, they can use textts.write_to_fp(), which will accept io.BytesIO() object to write into it.
User can see the available languages by using this:
# to see all available languages along with their IETF tag Print ( gtts.lang.tts_langs ( ) )
Here are the supported languages:
Output:
{ ' af ' :' Afrikaans ' ,
' sq ' :
' Albanian ' ,
' ar ' :
' Arabic ' ,
' hy ' :
' Armenian ' ,
' bn ' :
' Bengali ' ,
' bs ' :
' Bosnian ' ,
' ca ' :
' Catalan ' ,
' hr ' :
' Croatian ' ,
' cs ' :
' Czech ' ,
' da ' :
' Danish ' ,
' nl ' :
' Dutch ' ,
' en ' :
' English ' ,
' eo ' :
' Esperanto ' ,
' et ' :
' Estonian ' ,
' tl ' :
' Filipino ' ,
' fi ' :
' Finnish ' ,
' fr ' :
' French ' ,
' de ' :
' German ' ,
' el ' :
' Greek ' ,
' gu ' :
' Gujarati ' ,
' hi ' :
' Hindi ' ,
' hu ' :
' Hungarian ' ,
' is ' :
' Icelandic ' , ' id ' :
' Indonesian ' ,
' it ' :
' Italian ' ,
' ja ' :
' Japanese ' ,
' jw ' :
' Javanese ' ,
' kn ' :
' Kannada ' ,
' km ' :
' Khmer ' ,
' ko ' :
' Korean ' ,
' la ' :
' Latin ' ,
' lv ' :
' Latvian ' ,
' mk ' :
' Macedonian ' ,
' ml ' :
' Malayalam ' ,
' mr ' :
' Marathi ' ,
' my ' :
' Myanmar ( Burmese ) ' ,
' ne ' :
' Nepali ' ,
' no ' :
' Norwegian ' ,
' pl ' :
' Polish ' ,
' pt ' :
' Portuguese ' ,
' ro ' :
' Romanian ' ,
' ru ' :
' Russian ' , ' sr ' :
' Serbian ' ,
' si ' :
' Sinhala ' ,
' sk ' :
' Slovak ' ,
' es ' :
' Spanish ' ,
' su ' :
' Sundanese ' ,
' sw ' :
' Swahili ' ,
' sv ' :
' Swedish ' ,
' ta ' :
' Tamil ' ,
' te ' :
' Telugu ' ,
' th ' :
' Thai ' ,
' tr ' :
' Turkish ',
' uk ' :
' Ukrainian ' ,
' ur ' :
' Urdu ' ,
' vi ' :
' Vietnamese ' ,
' cy ' :
' Welsh ' ,
' zh-cn ' :
' Chinese ( Mandarin / China ) ' ,
' zh -tw ' :
' Chinese ( Mandarin / Taiwan ) ' ,
' en – us ' :
' English ( US ) ' ,
' en - ca ' :
' English ( Canada ) ' ,
' en -uk ' :
' English ( UK ) ' ,
' en -gb ' :
' English ( UK ) ' ,
' en -au ' :
' English ( Australia ) ' ,
' en -gh ' :
' English ( Ghana ) ',
' en -in ' :
' English ( India ) ' ,
' en -ie ' :
' English ( Ireland ) ' ,
' en -nz ' :
' English ( New Zealand ) ' ,
' en -ng ' :
' English ( Nigeria ) ' ,
' en -ph ' :
' English ( Philippines ) ' ,
' en- za ' :
' English ( South Africa ) ' ,
' en -tz ' :
' English ( Tanzania ) ' ,
' fr -ca ' :
' French ( Canada ) ' ,
' fr -fr ' :
' French ( France ) ' ,
' pt -br ' :
' Portuguese ( Brazil ) ' ,
' pt -pt ' :
' Portuguese ( Portugal ) ' ,
' es -es ' :
' Spanish ( Spain ) ' ,
' es -us ' :
' Spanish ( United States ) ' }
Offline Text to Speech
We know how to use Google Text To Speech API, but what if we want to convert the text to speech without an internet connection. Well, the pyttsx3 library is used for that purpose. This is a library of python which is used for converting text to speech. This library looks for a TTS engine, which is pre-install in the out platform and uses them for conversion.
The following are the text-to-speech synthesizers that the pyttsx3 library uses:
- SAPI5 on Windows XP, Windows Vista, 8, 8.1 and 10
- espeak on Ubuntu Desktop Edition 8.10, 9.04 and 9.10
- NSSpeechSynthesizer on Mac OS X 10.5 and 10.6
The main features of the pyttsx3 library are:
- This library works totally offline.
- User can choose between various voices that are installed on their system.
- This library can control the speed of speech
- It can tweak the volume
- It can save the speech audio into the file
If the user is using this library on a Linux operating system and their voice output is not working with the pyttsx3 library, then they have to install espeak, ffmpeg, and libespeak1.
$ sudo apt update && sudo apt install espeak install ffmpeg install libespeak1
To start using this library, we have to open the new python file and import the library in it:
# importing the text to speech library of python import pyttsx3
Now, we have to initialize the text-to-speech engine of the system:
# we are initializing the Text-to-speech engine of the system engine_system = pyttsx3.init ()
For converting the text, we have to use say() and runAndWait() methods:
# for converting the following text to speech text_speech = " Python is a simple and most popular programming language " engine_system.say ( text_speech ) # to play the speech engine_system.runAndWait ()
The say() function adds the sound to the speak at the event queue, whereas, runAndWait() function runs the real event loop while waiting for all the commands to queue up.
So, we can call the say() function numerous times and then run the runAndWait() function in a single command in the end in order to hear the synthesis.
This library has some properties that the user can tweak depending on their requirements.
For example:
Let’s see the details of the speaking rate:
# let’s see the details of the speaking rate rate = engine_system.getProperty ( "rate" ) print ( rate )
Output:
200
Now, let’s change the speaking rate to 300, which will make the rate much faster.
# to set the new voice rate to make it faster engine_system.setProperty ( "rate" , 300 ) engine_system.say ( text ) engine_system.runAndWait ()
we can also set it to 100, which will make it slower:
# to set the speaking rate to make it slower engine_system.setProperty ( "rate" , 100 ) engine_system.say ( text_speech ) engine_system.runAndWait ( )
Another useful functionality of this library is voices, by which the user can see the details about all the voices available on their system.
# to see the details of all voices available on the system voices = engine_system.getProperty ( "voices" ) print ( voices )
Output:
[ < pyttsx3.voice.Voice object at 0x000001994D817A20 > , < pyttsx3.voice.Voice object at 0x000001994D817F898 > , < pyttsx3.voice.Voice object at 0x000001994D6182D30 > , < pyttsx3.voice.Voice object at 0x000001994E799C10 > , < pyttsx3.voice.Voice object at 0x000001994D48CD90 > ]
As we can see here, that my system has five voice sounds, lets use the fifth one.
For example:
# to set the voice another voice engine_system.setProperty ( "voice" , voices [ 5 ].id ) engine_system.say ( text_speech ) engine_system.runAndWait ( )
We can save the audio as the file by using the save_to_file() function if we don’t want to play the audio by using the say() function.
For example:
# for saving the speech audio into the file engine_system.save_to_file ( text , "text_to_Speech.mp3" ) engine_system.runAndWait ( )
Example for listening the event:
import pyttsx3 def onStart ( ): print ( ' starting ' ) def onWord ( name , location , length ) : print ( ' word ' , name , location , length ) def onEnd ( name , completed ) : print ( ' finishing ' , name , completed ) engine = pyttsx3.init ( ) engine.connect ( ' started-utterance ' , onStart ) engine.connect ( ' started-word ' , onWord ) engine.connect ( ' finished-utterance ' , onEnd ) sen = ' One day the people that don’t even believe in you will tell everyone how they met you ' engine.say ( sen ) engine.runAndWait ( )
Output:
word None 1 559936 word None 1 559936 word None 1 559936 finishing None True finishing None True finishing None True
Conclusion:
In this article, we have discussed two types of python libraries, gTTS and pyttsx3, used for converting text into speech, one for Online conversion and another one is for offline conversion. We have also discussed their various properties of how a user can change the speaking rate of the speech and how they can change the voice into different voices available on their system.