Hosting a text-to-speech service on Heroku
Using marytts-http, you can easily host a multilingual open source text-to-speech service on Heroku and restrict requests with HMAC-SHA256.
In college, I made a vocab memorizer and used MaryTTS for speech synthesis to learn the correct German pronunciations.
MaryTTS is an open-source multilingual text-to-speech library in Java, but it doesn't expose a convenient and easy to deploy web service. So I recently built marytts-http, to wrap it as a Java servlet using Jetty.
You can deploy it to Heroku like this (assuming you have their toolbelt and logged in):
git clone https://github.com/draffensperger/marytts-http
cd marytts-http
heroku create
git push heroku master
Heroku will perform a Maven build for MaryTTS with English and German voices. You can then try the service by specifying text
and locale
and visiting e.g. [your-marytts-app].herokuapp.com/?text=Hallo&locale=de
Securing requests
You can secure your service by setting the HMAC_SECRET
(Base64 encoded) environment variable. To generate a random 32-byte key and set it on Heroku run:
heroku config:set HMAC_SECRET=`cat /dev/urandom | head -c 32 | base64`
You will need to sign the requests using HMAC-SHA256, and you can also specify the expires
parameter to make the signed request only last a specified period of time. The marytts-http readme has the details and see below for a Rails code example.
Using it in a Rails app
One way to use this in a Rails app would be via the <audio>
tag. Assuming you have a view helper, marytts_url
, this snippet would caused your page to say "Hallo" in German using your text-to-speech service:
<%= audio_tag(marytts_url('Hallo', locale: :de), autoplay: true) %>
The view helper to construct and sign the URL with for the service could look like this:
module MaryttsHelper
@@marytts_key = Base64.decode64(Rails.application.secrets.marytts_key)
@@marytts_host = Rails.application.secrets.marytts_host
def marytts_url(text, opts={})
# Expiry time is represented as a unix timestamp
opts[:expires] = opts[:expires].to_i if opts[:expires].present?
params_to_sign_in_order = [:text, :locale, :gender, :voice, :style,
:effects, :expires]
params = opts.merge(text: text).slice(*params_to_sign_in_order)
sign_url(@@marytts_key, @@marytts_host, params)
end
private
def sign_url(key, base_url, params)
param_values = params.values.map(&:to_s).reduce('', :+)
signature = hmac_sha256(param_values, key)
URI.join(base_url, '/?' + params.merge(signature: signature).to_query).to_s
end
def hmac_sha256(data, key)
digest = OpenSSL::HMAC.digest(OpenSSL::Digest.new('sha256'), key, data)
Base64.encode64(digest).strip
end
end
You would set marytts_host
and marytts_key
in secrets.yml
to be [your-marytts-app].herokuapp.com
and your HMAC_KEY
above respectively.
Here's a PHP code example of how to embed a marytts-http link as well.
What about French, Italian, Swedish, Russian, Turkish, and Telugu?
MaryTTS supports those languages too! By modifying the marytts-http Maven build script, we could add language and voice packages for them. A list of the voices is in the (non-user-friendly) MaryTTS components.xml.
How does it actually sound?
The German "bits3-hsmm" and English "cmu-slt-hsmm" voices included in marytts-http are space efficient (under 3MB total) but sound a bit tinny:
"Welcome to the world of speech synthesis!"
"Willkommen in der Welt der Sprachsynthese!"
If we used more space for the German "dfki-pavoque-neutral" (425MB) and the British English "dfki-spike" (129MB) voices, the quality would be better:
"This voice is higher quality."
"Diese Stimme ist qualitätvoller."
But given the Heroku slug size limit of 300MB, and the increased RAM needed for those voices, deploying them may take more work and larger dyno sizes.
To try out the various voices and languages, you can download MaryTTS, run their component installer to get the voices and run a local MaryTTS server which provides a web interface for you to interact with the system.
Other text-to-speech options
There are several hosted text-to-speech web services by iSpeech, AT&T Speech API, Ivona (Amazon), or IBM, and depending on your app needs, a managed API service may be best for you.