What are speech synthesizers? The best speech synthesizers. Speech synthesizers with Russian voices. How to use a speech synthesizer? Speech synthesizer captain

eSpeak

Type	speech synthesizer
Author	Jonathan Duddington
Written on	C++
operating system	Linux and other UNIX-like, Windows
First edition	2006
Latest version	1.48.04 (April 6)
State	inactive
License	GNU GPL
Website

OS

Versions of eSpeak exist for operating systems such as Microsoft Windows, Mac OS X, Linux, RISC OS, and its source code in C++ is also available. In addition, the official documentation of the synthesizer provides instructions on how to compile it for Windows Mobile. The program has one significant limitation - voice generation is only possible in a WAV file.

In addition, eSpeak is used in the mobile operating systems Android, starting with version 1.6, and Maemo, but these projects are not personally supervised by the developer, and there are no corresponding packages on the official eSpeak website, and the Android version has a number of significant errors when working in some languages , in particular Russian.

Supported languages

eSpeak supports about five dozen different languages. During installation, the user is required to indicate which dialects he is interested in supporting.

Below is a list of languages supported by the eSpeak synthesizer and their symbols that are used in its settings.

Albanian - sq
English (American) - en-us
English (British with a northern accent) - en-n
English (British with West Midlands accent) - en-wm
English (classical British) - en
English (generally spoken) - en-rp
English (Scottish) - en-sc
Armenian (Western) - hy-west
Armenian (classical) - hy
Afrikaans - af
Bosnian - bs
Welsh - cy
Hungarian - hu
Vietnamese - vi
Dutch - nl
Voices of MBROLA (voice xxx) - mb-xxx
Greek - el
Ancient Greek - grc
Indonesian - id
Icelandic - is
Spanish (classical) - es
Spanish (Latin American) - es-la
Italian - it
Catalan - ca
Chinese (Cantonese) - zh-yue
Chinese (Mandarin) - zh
Kurdish - ku
Latin - la
Latvian - lv
Lojban - jbo
Macedonian - mk
German - de
Norwegian - no
Polish - pl
Portuguese (Brazilian) - pt
Portuguese (European) - pt-pt
Romanian - ro
Russian - ru
Serbian - sr
Slovak - sk
Slovenian - sw
Tamil - ta
Turkish - tr
Finnish - fi
French - fr
Hindi - hi
Croatian - hr
Czech (spoken) - cs
Swedish - sv
Esperanto - eo

The list of supported languages can also be expanded using MBROLA voice libraries, which can be connected to eSpeak.

eSpeak and MBROLA

MBROLA is a special diphonic speech synthesis algorithm, on the basis of which many different software products have been created with the inclusion of text-to-speech (TTS) technology. This project holds the record among other speech synthesis technologies for the number of different languages for which it has been used. Although MBROLA voices have not yet been created for some common languages, including Russian.

eSpeak can work in conjunction with MBROLA, which makes it possible to use the voice libraries of this project as an integral part of eSpeak itself. This allows you to further expand the list of supported languages for synthesizing speech from text.

You can use the eSpeak and MBROLA combination on operating systems such as Windows, Linux and Mac OS X.

However, not all MBROLA voice libraries support integration with eSpeak.

Implementation principles

The words of the input text for synthesis undergo two stages of processing:

a word in letter representation is converted into a sequence of phonemes;
a sound signal is generated based on the received sequence.

The rules for obtaining a sequence of phonemes are stored in the form "A, B, C = D". Where B is the letter in question, A and C are the surrounding context of that letter in the word, and D is the phoneme into which that letter can be converted. The environmental context can be specified either by specific letters or by special characters denoting groups of letters. The synthesizer rules allow for ambiguous definition of such chains. To resolve this ambiguity, the synthesizer assigns a priority to each rule, which is calculated based on the number of letters involved in the rule and the degree of specificity in defining the environmental context. The rules can also specify differences in translation depending on stress.

In eSpeak, vowel sounds are always synthesized, voiced consonants are created by mixing synthesized sounds with pre-recorded voice noises, and all other sounds are simply recorded, for example, [sh].

Each sound, except voiceless consonants, is represented by a sequence of formants. In addition to information about formants, each phoneme has information about its amplitude, sound duration and delay before the next phoneme. Based on these parameters, the sound of a vowel is synthesized using algorithms implemented in the synthesizer. Information about phonemes and formants is stored in separate files, which are also subsequently compiled into a binary format.

The eSpeak Edit utility is supplied with the synthesizer. This is a GUI application written using the WXLib library. It allows you to visually edit ready-made phonemes. The phoneme is represented as a curve graph, where formants can be selected sequentially and their values, such as frequency, pitch and width, can be changed. Thanks to these capabilities, based on ready-made phonemes, you can obtain new, more accurate sounds for a certain language. At the same time, some of the phonemes cannot be obtained by modifying existing ones. For example, when developing the Russian-language part of eSpeak, the sound [р] was specially recorded, since there was no worthy analogue for it in other languages.

Projects using eSpeak

Third Party Addons

Some languages do not have simple and universal rules for constructing literate speech, and eSpeak requires additional components to produce high-quality synthesis in these languages. To avoid increasing the size of the main eSpeak package, these components are distributed separately. In particular, in the Russian language there are no general rules establishing the stressed syllable in words. In these cases, eSpeak tries to determine the stress of the word, but the given pronunciation is often not the correct one. To solve this problem, there is a special extended pronunciation dictionary, which must be installed separately from the main eSpeak package.

In addition to Russian, third-party eSpeak speech correction components are also available for Chinese (Putonghua and Cantonese).

You can download these dictionaries from the official website of the project.

VoiceFabric is an Internet service that allows you to voice any text information in a synthesized voice.
A demo synthesis with various voices is available on the website voicefabric.ru, with its help you can evaluate the quality of the synthesis. Today there are 8 voices available (female and male), which can speak 3 languages (Russian, English, Kazakh).

Speech synthesis from MDG is useful when it is necessary to personalize outgoing voice messages. The solution allows you to completely abandon the services of a speaker and pre-recording of audio clips in the IVR menu, that is, the written text is instantly converted into a sound recording and voiced to the client over the telephone line.

Also, using synthesized voices, you can voice books, videos, and record voice cards without losing “natural” intonation. Voicefabric guarantees correct placement of accents, correct reading of abbreviations, numbers, abbreviations.

You can register on the website voicefabric.ru and get free synthesis seconds, as well as access to the service’s software interface.

Attention! This site uses JavaScript technology, which is disabled in your browser. To fully work with the resource, it is recommended to enable this function in the settings of your Internet browser. However, all of the site’s content is also available in JavaScript disabled mode, so activating this setting is not necessary, although it is recommended.

Speech synthesizer Captain

"Speech synthesizer Captain" is a MS SAPI4-compatible synthesizer. It supports speech synthesis in Russian and Ukrainian. Support for the Ukrainian language is based on the replacement of Ukrainian phonemes with similar-sounding phonemes of the Russian language.

The synthesizer was created on the allophone basis of one of the ten voices of the TTS software complex "Speaking Mouse Home", which was developed by the Voice Technologies Club at the Moscow State University Science Park in 1995. By inheritance, this synthesizer received the name of its prototype from “Speaking Mouse Home” - Captain.

In addition to the Russian and Ukrainian languages, the Captain is able to work with English, German and French, as well as separately read multilingual texts: each language with a corresponding synthesizer voice. Separate reading of multilingual documents and support for English, German and French languages is carried out through the use of an additional module based on the eSpeak synthesizer, however, the use of this module is not necessary for the operation of the Captain. Support for Russian and Ukrainian languages can be provided without installing it. This module for expanding the capabilities of the Captain is already included in the archive, which you can download from our website, so its separate download, as described in the accompanying documentation of the synthesizer itself, is not required.

The synthesizer has a male voice with good speech intelligibility, but low natural sound. The advantages include high operating speed, small size and low system capacity utilization. However, "Captain" also has a number of disadvantages:

The synthesizer practically does not support the function of adjusting the pitch of speech.
During operation, sometimes sound “swallowing” phenomena may occur.
From time to time, the synthesizer may become silent during operation or begin to read only the first part of the lines of text. In this case, to return to normal operation, it is recommended to reboot the synthesizer.

This synthesizer may require a system component to operate.

We have learned to recognize speech, although, for now, only English, we will not stand still, we will go further, or rather in the opposite direction - we will convert the text into voice speech.

The most popular and free speech synthesizers that I know of and with which I have had practical experience: Festival, eSpeak, OpenMary.
Let's look at all 3 in order.

Festival

OS: Linux
Russian language: yes (male voice)
Website: http://www.cstr.ed.ac.uk/projects/festival/

A fairly advanced synthesizer, it comes with the following language packs: English (in English and American pronunciation), Welsh and Spanish. There are also other packages such as Castilian Spanish, Czech, Finnish, Hindi, Italian, Marathi, Polish, Russian and Telugu. Festival is included with several Linux distributions. The synthesizer understands the Russian language quite well; if you play around with the spelling of words and punctuation marks, you can achieve quite sane speech.

Installation

There are already a lot of materials on the network about installing the festival and adding the Russian language, so I won’t go into too much detail, I’ll just say that you will need to install the festival itself (installed from the repository - festival) and the Russian language pack, which is also installed from the repository (festvox-ru), then you should tweak the config a little. There is a good article on this matter.

Usage

In the console:

Echo "Hello" | festival --tts --language russian

eSpeak

OS: Linux, Windows, Mac OS X, RISC OS
Russian language: yes (male voice)
Website:

A simple, compact, software speech synthesizer. According to the information, eSpeak is used in Android mobile operating systems, starting with version 1.6, and Maemo. Versions for Windows and Linux are regularly updated along with the source code, with other platforms things are worse. Many languages are supported, including Russian, English, French, Spanish, etc. If we compare Russian speech, eSpeak is inferior to Festival, English, French, and German are on par.

Installation

In some distributions, Linux eSpeak is already installed, if not, it can also be installed simply from the console (espeak), or we download the desired version from, for example, in the case of Windows.

Usage

In the console:

Espeak "Hello world!" -vru -s 100

Where:
-v language (ru, en, de)
-s speed (80-450)

OpenMary

OS: Linux, Windows, Mac OS X
Russian language: yes (male voice)
Website: http://mary.opendfki.de

A young and, as yet, not very popular synthesizer, but at the same time very functional and advanced. Written in Java, which allows it to be platform independent. It works on the client-server principle. It has advanced speech add-ons, with which you can specify intonation, timbre, and speed for each word. Comes with a client written in the same Java, with a graphical interface.

Installation

Installation is quite easy and does not require any effort, download the Java installer (do not forget that Java is required) . Launch the installer and follow the instructions. At a certain stage of installation, you will be asked to select the language packs that we will use. , For myself, I chose Russian and English.

Usage

Go to the installation directory, go to the folder bin and start the server - maryserver. As soon as the server is started, launch the client - maryclient.

If everything was done correctly, after pressing the “Play” button you should be able to hear the pronunciation of the written text. In the example folder, the directory above, there are examples of client implementations for different languages.

Conclusion

Festival: for average home and corporate projects (informing about the weather, reading letters, “language” for bots, etc.).
eSpeak: for implementing small tasks (notifying the current time, informing about traffic jams, the number of letters in the mailbox, free space on the hard drive, etc.).
OpenMary: for large projects (smart home systems, voice-over of programs, reading large texts).

Speech synthesizers installed on computers or mobile devices no longer seem to be such unusual programs as before. Thanks to modern technology, a regular desktop PC can reproduce the human voice.

How do speech synthesizers work? Where are they used? What is the best speech synthesizer? The answers to these and other questions are presented in this article.

General concept

Speech synthesizers are special programs consisting of a number of modules that provide the ability to translate typed texts into sentences spoken by a human voice. You should not think that the entire database of words and phrases was recorded by real people in professional studios. It is physically impossible to complete such a task. A library with such a large number of phrases cannot be installed on any modern computer, let alone mobile phones. For this purpose, the developers created Text-to-Speech technology.

Scope of application

Speech synthesizers are used in learning foreign languages, listening to texts on the pages of books, creating vocal parts, issuing search queries in the form of spoken phrases, etc.

What types of programs are there? Depending on the scope of application, utilities can be divided into 2 types: regular ones that convert typed text into speech, and special vocal modules used in music applications.

Advantages and disadvantages

At the moment, the computer synthesizes human speech only approximately. In the simplest programs, you can observe problems with sound and the correct placement of stress in various words. Speech synthesizers installed on mobile devices consume a lot of energy. It is often possible to note unauthorized downloading of additional modules.

The advantages include ease of perception. Many users find it much easier to assimilate audio information than any other kind.

The best speech synthesizers with Russian voices

The RHVoice program was created by Olga Yakovleva. The standard version of the application includes 3 voices. The settings are very simple. The program can be used both as a stand-alone application, compatible with SAPI5, and as an additional screen module.

The Acapela speech synthesizer differs from its analogues in its ideal text pronunciation. The application supports more than 30 languages of the world. In the free version, only 1 female voice is available.

Vocalizer is often used in call centers. The user can adjust the emphasis, volume and reading speed. Additional dictionaries are loaded if necessary. There is 1 female voice in the application. The speech engine is automatically integrated into programs for reading books in electronic format.

The eSpeak utility supports over 50 languages. The disadvantage of the program is that it saves sound files only in WAV format, which requires a lot of space on your hard drive.

The Festival application is a powerful speech synthesis utility that even supports Finnish and Hindi.

Program installation

How to use this type of application? First you need to install the program. Computer operating systems use a standard installer, in which the user only has to select the language module supported by the utility. The installer for mobile devices can be downloaded from the official website, Google Play, and the App Store. Installation of the application occurs automatically.

First launch of the program

At this stage, the user just needs to set the default language. Sometimes you need to note the sound quality. The standard version implies a sampling frequency of 4410 Hz, a depth of 16 bits and a bit rate of 128 kbps. In mobile OS, the figures may be lower. A specific voice is used as a basis.

Filters and equalizers help you achieve the desired sound. The user has three options for text translation. He can type sentences on the keyboard, turn on the audio of an existing file, or install a browser extension that converts content on web pages into speech. It is enough to note the required course of action, the timbre of the voice and the language in which the text will be spoken. To start the playback process, click on the “Start” button.

Working with complex programs

In music applications, settings are much more complex. In the speech module of the FL Studio program, the user can select several types of voices, as well as specify the tone and playback speed. Stresses are placed before syllables using the “_” symbol. With the help of such a speech synthesizer, you can only create a robotic voice.

Vocaloid is a professional type application. In addition to the usual parameters, the user can select articulation and glissando. The utility has a database with professional vocals. If desired, you can adjust entire sentences to fit the notes. The library with vocals alone takes up more than 4 GB in compressed form.

"Google Speech Synthesizer": what is this program?

In May 2014, the company provided users with the opportunity to try out a new free product. What is Google Speech Synthesizer on Android? This is a program that reads text on the screen of a mobile device or tablet. Now there is no need to install third-party utilities that require a license. "Google Speech Synthesizer" is used when reading e-books, listening to the correct pronunciation of words, and launching the TalkBack application.

The new version of the Google Speech Synthesizer 3.1 program now supports English, Italian, Spanish, Korean, German, Dutch, Polish, Portuguese, Russian and French. Where can I find voice packs? They are downloaded from the application itself.

Advantages and disadvantages of the product from Google

The peculiarities of the Russian-speaking female voice are its clear, loud sound and smooth intonation. Playback speed can be adjusted in the program settings. Users using TalkBack and the Russian language localization of the Android OS should exercise caution when switching to the speech synthesizer if the application was previously set to a different voice by default. You may have trouble maintaining auditory control of your mobile device. Almost all voices, except Russian, are unable to process sentences in Cyrillic.

Among the disadvantages, one can note a delayed reaction to reading texts consisting of phrases in different languages. The Russian voice is distinguished by metallic notes of timbre. You may hear a rattling sound at low frequencies. The advantages include the stability of the application and acceptable quality of reading English words.

"Google Speech Synthesizer": how to use the program

In order for the utility to work as it should, you need to update it to the latest version. To activate the process of speaking text, you need to open the settings. In the “language and input” section, you need to check the “speech synthesis” box. The line “default system” should also be noted. Don't forget that the voice packages in the program itself also need to be updated.

Problems when working with the utility

If necessary, the user can disable the application. In the simplest utilities, the stop button is located in the program itself. Deactivating an extension installed in the browser is done by disabling the add-on or completely removing the plugin. Problems may also arise when using the program on a mobile phone. The fact is that the speech synthesizer automatically starts loading language modules that the user does not need.

This process takes a lot of time and significantly consumes traffic. How can I disable Google Speech Synthesizer on my mobile device and get rid of this problem? First you need to open the application settings. Then you need to select the “language and voice input” section. Next you need to mark the last line.

Having selected voice search, you should click on the cross next to the “offline speech recognition” item. Then it is recommended to delete the application cache. Next you need to restart your mobile phone. To completely disable the utility, you need to open the “applications” section in the settings, select a speech synthesizer from the list and click on the “stop” button.

Uninstalling a program

It happens that the user does not use Google Speech Synthesizer at all. Is it possible to remove the utility from a mobile device? To do this you need to open Google Play. Then you should select the speech synthesizer from the list of installed programs and click on the “delete” button.

Results

Applications with a simple interface are suitable for ordinary users and people with disabilities. This can be either RHVoice or Google Speech Synthesizer. A Russian voice will read the text displayed on the screen. The average user does not need more.

Musicians are recommended to give preference to the professional Vocaloid program. The application has additional voice libraries and many different options. The program will allow you to get a natural sounding voice. After all, it is so important for musicians that computer synthesis is not perceptible to the ear.