{"id":1555,"date":"2017-09-12T18:50:52","date_gmt":"2017-09-12T18:50:52","guid":{"rendered":"http:\/\/richardabbott.datascenesdev.com\/blog\/?p=1555"},"modified":"2017-09-12T18:50:52","modified_gmt":"2017-09-12T18:50:52","slug":"polly-and-half-sick-of-shadows","status":"publish","type":"post","link":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/2017\/09\/12\/polly-and-half-sick-of-shadows\/","title":{"rendered":"Polly and Half Sick of Shadows"},"content":{"rendered":"<figure style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium\" src=\"https:\/\/www.nasa.gov\/sites\/default\/files\/styles\/image_card_4x3_ratio\/public\/thumbnails\/image\/pia21345-full.jpg\" alt=\"Saturn, from Cassini (NASA)\" width=\"300\" height=\"225\" \/><figcaption class=\"wp-caption-text\">Saturn, from Cassini (NASA)<\/figcaption><\/figure>\n<p>Today&#8217;s blog is primarily about the latest addition to book readings generated using Amazon&#8217;s Polly text-to-speech software, but before getting to that it&#8217;s worth saying goodbye to the Cassini space probe. This was launched nearly twenty years ago, has been orbiting Saturn and its moons since 2004, and is now almost out of fuel. By the end of the week, following a deliberate course change to avoid polluting any of the moons, Cassini will impact Saturn and break up in the atmosphere there.<\/p>\n<p>So, <em>Half Sick of Shadows<\/em> and Polly. Readers of this blog, or the Before the Second Sleep blog (<a href=\"https:\/\/beforethesecondsleep.wordpress.com\/2017\/07\/29\/audio-book-excerpt-timing-extract-a-richard-abbott\/\" target=\"_blank\" rel=\"noopener\">first post<\/a> and <a href=\"https:\/\/beforethesecondsleep.wordpress.com\/2017\/09\/07\/audio-book-excerpt-timing-extracts-b-c-richard-abbott\/\" target=\"_blank\" rel=\"noopener\">second post<\/a>) will know that I have been using Amazon&#8217;s Polly technology to generate book readings. The previous set were for the science fiction book <em>Timing, Far from the Spaceports 2<\/em>. Today it is the turn of <em>Half Sick of Shadows<\/em>.<\/p>\n<p>Without further ado, and before getting to some technical stuff, here is the result. It&#8217;s a short extract from late on in the book, and I selected it specifically because there are several speakers.<\/p>\n<!--[if lt IE 9]><script>document.createElement('audio');<\/script><![endif]-->\n<audio class=\"wp-audio-shortcode\" id=\"audio-1555-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"http:\/\/datascenesdev.com\/Alexa\/voicefiles\/Shadows_Extract_1.mp3?_=1\" \/><a href=\"http:\/\/datascenesdev.com\/Alexa\/voicefiles\/Shadows_Extract_1.mp3\">http:\/\/datascenesdev.com\/Alexa\/voicefiles\/Shadows_Extract_1.mp3<\/a><\/audio>\n<p>OK. Polly is a variation of the text-to-speech capability seen in Amazon Alexa, but with a couple of differences. First, it is geared purely to voice output, rather than the mix of input and output needed for Alexa to work.<\/p>\n<figure id=\"attachment_1365\" aria-describedby=\"caption-attachment-1365\" style=\"width: 225px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1365\" src=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/03\/Kindle-Cover-225x300.jpg\" alt=\"Kindle Cover - Half Sick of Shadows\" width=\"225\" height=\"300\" srcset=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/03\/Kindle-Cover-225x300.jpg 225w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/03\/Kindle-Cover-768x1024.jpg 768w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/03\/Kindle-Cover.jpg 1800w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><figcaption id=\"caption-attachment-1365\" class=\"wp-caption-text\">Kindle Cover &#8211; Half Sick of Shadows<\/figcaption><\/figure>\n<p>Secondly, Polly allows a range of gender, voice and language, not just the fixed voice of Alexa. The original intention was to provide multi-language support in various computer or mobile apps, but it suits me very well for representing narrative and dialogue. For this particular reading I have used four different voices.<\/p>\n<p>If you want to set up your own experiment, you can go to <a href=\"https:\/\/eu-west-1.console.aws.amazon.com\/polly\/home\/SynthesizeSpeech\" target=\"_blank\" rel=\"noopener\">this link<\/a> and start to play. You&#8217;ll need to set up some login credentials to get there, but you can extend your regular Amazon ones to do this. This demo page allows you to select which voice you want and enter any desired text. You can even download the result if you want.<\/p>\n<figure id=\"attachment_1558\" aria-describedby=\"caption-attachment-1558\" style=\"width: 300px\" class=\"wp-caption alignright\"><a href=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1558 size-medium\" src=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12-300x132.png\" alt=\"Amazon Polly test console\" width=\"300\" height=\"132\" srcset=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12-300x132.png 300w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12-768x338.png 768w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12-1024x451.png 1024w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2017\/09\/2017-09-12.png 1975w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-1558\" class=\"wp-caption-text\">Amazon Polly test console<\/figcaption><\/figure>\n<p>But the real magic starts when you select the SSML tab, and enter more complex examples. SSML is an industry standard way of describing speech, and covers a whole wealth of variations. You can add what are effectively stage directions with it &#8211; pauses of different lengths, directions about parts of speech, emphasis, and (if necessary) a phonetic letter by letter description. You can speed up or slow down the reading, and raise or lower the pitch. Finally, and even more usefully for my purposes, you can select the spoken language as well as the language of the speaker. So you can have an Italian speaker pronouncing an English sentence, or vice versa. Since all my books are written in English, that means I can considerably extend the range of speakers. Some combinations don&#8217;t work very well, so you have to test what you have specified, but that&#8217;s fair enough.<\/p>\n<p>If you&#8217;re comfortable with the coding effort required, you can call the Polly libraries with all the necessary settings and generate a whole lot of text all at once, rather than piecemeal. Back when I put together the <em>Timing<\/em> extracts, I wrote a program which was configurable enough that now I just have to specify the text concerned, plus the selection of voices and other sundry details. It still takes a little while to select the right passage and get everything organised, but it&#8217;s a lot easier than starting from scratch every time. Before too much longer, there&#8217;ll be dialogue extracts from <em>Far from the Spaceports<\/em> as well!<\/p>\n<figure id=\"attachment_694\" aria-describedby=\"caption-attachment-694\" style=\"width: 225px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-694\" src=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2015\/11\/Kindle-Cover-450x600-225x300.png\" alt=\"Far from the Spaceports cover\" width=\"225\" height=\"300\" srcset=\"http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2015\/11\/Kindle-Cover-450x600-225x300.png 225w, http:\/\/richardabbott.datascenesdev.com\/blog\/wp-content\/uploads\/2015\/11\/Kindle-Cover-450x600.png 450w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><figcaption id=\"caption-attachment-694\" class=\"wp-caption-text\">Far from the Spaceports cover<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today&#8217;s blog is primarily about the latest addition to book readings generated using Amazon&#8217;s Polly text-to-speech software, but before getting to that it&#8217;s worth saying goodbye to the Cassini space probe. This was launched nearly twenty years ago, has been orbiting Saturn and its moons since 2004, and is now almost out of fuel. By &hellip; <a href=\"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/2017\/09\/12\/polly-and-half-sick-of-shadows\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Polly and Half Sick of Shadows<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49,7,47,41],"tags":[],"class_list":["post-1555","post","type-post","status-publish","format-standard","hentry","category-alexa","category-extract","category-half-sick-of-shadows","category-science"],"_links":{"self":[{"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1555","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=1555"}],"version-history":[{"count":5,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1555\/revisions"}],"predecessor-version":[{"id":1561,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1555\/revisions\/1561"}],"wp:attachment":[{"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=1555"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=1555"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/richardabbott.datascenesdev.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=1555"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}