Today’s blog is primarily about the latest addition to book readings generated using Amazon’s Polly text-to-speech software, but before getting to that it’s worth saying goodbye to the Cassini space probe. This was launched nearly twenty years ago, has been orbiting Saturn and its moons since 2004, and is now almost out of fuel. By the end of the week, following a deliberate course change to avoid polluting any of the moons, Cassini will impact Saturn and break up in the atmosphere there.
So, Half Sick of Shadows and Polly. Readers of this blog, or the Before the Second Sleep blog (first post and second post) will know that I have been using Amazon’s Polly technology to generate book readings. The previous set were for the science fiction book Timing, Far from the Spaceports 2. Today it is the turn of Half Sick of Shadows.
Without further ado, and before getting to some technical stuff, here is the result. It’s a short extract from late on in the book, and I selected it specifically because there are several speakers.
OK. Polly is a variation of the text-to-speech capability seen in Amazon Alexa, but with a couple of differences. First, it is geared purely to voice output, rather than the mix of input and output needed for Alexa to work.
Secondly, Polly allows a range of gender, voice and language, not just the fixed voice of Alexa. The original intention was to provide multi-language support in various computer or mobile apps, but it suits me very well for representing narrative and dialogue. For this particular reading I have used four different voices.
If you want to set up your own experiment, you can go to this link and start to play. You’ll need to set up some login credentials to get there, but you can extend your regular Amazon ones to do this. This demo page allows you to select which voice you want and enter any desired text. You can even download the result if you want.
But the real magic starts when you select the SSML tab, and enter more complex examples. SSML is an industry standard way of describing speech, and covers a whole wealth of variations. You can add what are effectively stage directions with it – pauses of different lengths, directions about parts of speech, emphasis, and (if necessary) a phonetic letter by letter description. You can speed up or slow down the reading, and raise or lower the pitch. Finally, and even more usefully for my purposes, you can select the spoken language as well as the language of the speaker. So you can have an Italian speaker pronouncing an English sentence, or vice versa. Since all my books are written in English, that means I can considerably extend the range of speakers. Some combinations don’t work very well, so you have to test what you have specified, but that’s fair enough.
If you’re comfortable with the coding effort required, you can call the Polly libraries with all the necessary settings and generate a whole lot of text all at once, rather than piecemeal. Back when I put together the Timing extracts, I wrote a program which was configurable enough that now I just have to specify the text concerned, plus the selection of voices and other sundry details. It still takes a little while to select the right passage and get everything organised, but it’s a lot easier than starting from scratch every time. Before too much longer, there’ll be dialogue extracts from Far from the Spaceports as well!