Category Archives: Alexa

Left behind by events, part 3

This is the third and final part of Left Behind by Events, in which I take a look at my own futuristic writing and try to guess which bits I will have got utterly wrong when somebody looks back at it from a future perspective! But it’s also the first of a few blogs in which I will talk a bit about some of the impressions I got of technical near-future as seen at the annual Microsoft Future Decoded conference that I went to the other day.

Amazon Dot - Active
Amazon Dot – Active

So I am tolerably confident about the development of AI. We don’t yet have what I call “personas” with autonomy, emotion, and gender. I’m not counting the pseudo-gender produced by selecting a male or female voice, though actually even that simple choice persuades many people – how many people are pedantic enough to call Alexa “it” rather than “she”? But at the rate of advance of the relevant technologies, I’m confident that we will get there.

I’m equally confident, being an optimistic guy, that we’ll develop better, faster space travel, and have settlements of various sizes on asteroids and moons. The ion drive I posit is one definite possibility: the Dawn asteroid probe already uses this system, though at a hugely smaller rate of acceleration than what I’m looking for. The Hermes, which features in both the book and film The Martian, also employs this drive type. If some other technology becomes available, the stories would be unchanged – the crucial point is that intra-solar-system travel takes weeks rather than months.

The Sting (PInterest)
The Sting (PInterest)

I am totally convinced that financial crime will take place! One of the ways we try to tackle it on Earth is to share information faster, so that criminals cannot take advantage of lags in the system to insert falsehoods. But out in the solar system, there’s nothing we can do about time lags. Mars is between 4 and 24 minutes from Earth in terms of a radio or light signal, and there’s nothing we can do about that unless somebody invents a faster-than-light signal. And that’s not in range of my future vision. So the possibility of “information friction” will increase as we spread our occupancy wider. Anywhere that there are delays in the system, there is the possibility of fraud… as used to great effect in The Sting.

Something I have not factored in at all is biological advance. I don’t have cyborgs, or genetically enhanced people, or such things. But I suspect that the likelihood is that such developments will occur well within the time horizon of Far from the Spaceports. Biology isn’t my strong suit, so I haven’t written about this. There’s a background assumption that illness isn’t a serious problem in this future world, but I haven’t explored how that might happen, or what other kinds of medical change might go hand-in-hand with it. So this is almost certainly going to be a miss on my part.

Moving on to points of contact with the conference, there is the question of my personas’ autonomy. Right now, all of our current generation of intelligent assistants – Alexa, Siri, Cortana, Google Home and so on – rely utterly on a reliable internet connection and a whole raft of cloud-based software to function. No internet or no cloud connection = no Alexa.

This is clearly inadequate for a persona like Slate heading out to the asteroid belt! Mitnash is obviously not going to wait patiently for half an hour or so between utterances in a conversation. For this to work, the software infrastructure that imparts intelligence to a persona has to travel along with it. Now this need is already emerging – and being addressed – right now. I guess most of us are familiar with the idea of the Cloud. Your Gmail account, your Dropbox files, your iCloud pictures all exists somewhere out there… but you neither know nor care where exactly they live. All you care is that you can get to them when you want.

A male snow leopard (Wikipedia)
A male snow leopard (Wikipedia)

But with the emerging “internet of things” that is having to change. Let’s say that a wildlife programme puts a trail camera up in the mountains somewhere in order to get pictures of a snow leopard. They want to leave it there for maybe four months and then collect it again. It’s well out of wifi range. In those four months it will capture say 10,000 short videos, almost all of which will not be of snow leopards. There will be mountain goats, foxes, mice, leaves, moving splashes of sunshine, flurries of rain or snow… maybe the odd yeti. But the memory stick will only hold say 500 video clips. So what do you do? Throw away everything that arrives after it gets full? Overwrite the oldest clips when you need to make space? Arrange for a dangerous and disruptive resupply trip by your mountaineer crew?

Or… and this is the choice being pursued at the moment… put some intelligence in your camera to try to weed out non-snow-leopard pictures. Your camera is no longer a dumb picture-taking device, but has some intelligence. It also makes your life easier when you have recovered the camera and are trying to scan through the contents. Even going through my Grasmere badger-cam vids every couple of weeks involves a lot of deleting scenes of waving leaves!

So this idea is now being called the Cloud Edge. You put some processing power and cleverness out in your peripheral devices, and only move what you really need into the Cloud itself. Some of the time, your little remote widgets can make up their own minds what to do. You can, so I am told, buy a USB stick with trainable neural network on it for sifting images (or other similar tasks) for well under £100. Now, this is a far cry from an independently autonomous persona able to zip off to the asteroid belt, but it shows that the necessary technologies are already being tackled.

Artist's Impression of Dawn in orbit (NASA/JPL)
Artist’s Impression of Dawn in orbit (NASA/JPL)

I’ve been deliberately vague about how far into the future Far from the Spaceports, Timing, and the sequels in preparation are set. If I had to pick a time I’d say somewhere around the one or two century mark. Although science fact notoriously catches up with science fiction faster than authors imagine, I don’t expect to see much of this happening in my lifetime (which is a pity, really, as I’d love to converse with a real Slate). I’d like to think that humanity from one part of the globe or another would have settled bases on other planets, moons, or asteroids while I’m still here to see them, and as regular readers will know, I am very excited about where AI is going. But a century to reach the level of maturity of off-Earth habitats that I propose seems, if anything, over-optimistic.

That’s it for today – over the next few weeks I’ll be talking about other fun things I learned…

Polly and Half Sick of Shadows

Saturn, from Cassini (NASA)
Saturn, from Cassini (NASA)

Today’s blog is primarily about the latest addition to book readings generated using Amazon’s Polly text-to-speech software, but before getting to that it’s worth saying goodbye to the Cassini space probe. This was launched nearly twenty years ago, has been orbiting Saturn and its moons since 2004, and is now almost out of fuel. By the end of the week, following a deliberate course change to avoid polluting any of the moons, Cassini will impact Saturn and break up in the atmosphere there.

So, Half Sick of Shadows and Polly. Readers of this blog, or the Before the Second Sleep blog (first post and second post) will know that I have been using Amazon’s Polly technology to generate book readings. The previous set were for the science fiction book Timing, Far from the Spaceports 2. Today it is the turn of Half Sick of Shadows.

Without further ado, and before getting to some technical stuff, here is the result. It’s a short extract from late on in the book, and I selected it specifically because there are several speakers.

OK. Polly is a variation of the text-to-speech capability seen in Amazon Alexa, but with a couple of differences. First, it is geared purely to voice output, rather than the mix of input and output needed for Alexa to work.

Kindle Cover - Half Sick of Shadows
Kindle Cover – Half Sick of Shadows

Secondly, Polly allows a range of gender, voice and language, not just the fixed voice of Alexa. The original intention was to provide multi-language support in various computer or mobile apps, but it suits me very well for representing narrative and dialogue. For this particular reading I have used four different voices.

If you want to set up your own experiment, you can go to this link and start to play. You’ll need to set up some login credentials to get there, but you can extend your regular Amazon ones to do this. This demo page allows you to select which voice you want and enter any desired text. You can even download the result if you want.

Amazon Polly test console
Amazon Polly test console

But the real magic starts when you select the SSML tab, and enter more complex examples. SSML is an industry standard way of describing speech, and covers a whole wealth of variations. You can add what are effectively stage directions with it – pauses of different lengths, directions about parts of speech, emphasis, and (if necessary) a phonetic letter by letter description. You can speed up or slow down the reading, and raise or lower the pitch. Finally, and even more usefully for my purposes, you can select the spoken language as well as the language of the speaker. So you can have an Italian speaker pronouncing an English sentence, or vice versa. Since all my books are written in English, that means I can considerably extend the range of speakers. Some combinations don’t work very well, so you have to test what you have specified, but that’s fair enough.

If you’re comfortable with the coding effort required, you can call the Polly libraries with all the necessary settings and generate a whole lot of text all at once, rather than piecemeal. Back when I put together the Timing extracts, I wrote a program which was configurable enough that now I just have to specify the text concerned, plus the selection of voices and other sundry details. It still takes a little while to select the right passage and get everything organised, but it’s a lot easier than starting from scratch every time. Before too much longer, there’ll be dialogue extracts from Far from the Spaceports as well!

Far from the Spaceports cover
Far from the Spaceports cover

 

Mostly about YouTube

Just a short post today to highlight a YouTube video based around one of the Polly conversations from Timing that I have been talking about recently. This one is of Mitnash, Slate, Parvati and Chandrika talking on board Parvati’s spaceship, The Parakeet, en route to Phobos. The subject of conversation is the recent wreck of Selif’s ship on Tean, one of the smaller asteroids in the Scilly isles group…

The link is: https://youtu.be/Uv5L0yMKaT0

While we’re in YouTube, here is the link to the conversation with Alexa about Timing… https://youtu.be/zLHZSOF_9xo

It’s slow work, but gradually all these various conversations and readings will get added to YouTube and other video sharing sites.

More about AI and voice technology

A couple of weeks ago I went to a day event put on by Amazon showcasing their web technologies. My own main interests were – naturally – in the areas of AI and voice, but there was plenty there if instead you were into security, or databases, or the so-called “internet of things”.

Amazon Dot - Active
Amazon Dot – Active

Readers of this blog will know of my enthusiasm for Alexa, and perhaps will also know about the range of Alexa skills I have been developing (if you’re interested, go to the UK or the US sites). So I thought I’d go a little bit more into both Alexa and the two building blocks which support Alexa – Lex for language comprehension, and Polly for text-to-speech generation.

Alexa does not in any substantial sense live inside your Amazon Echo or Dot – that simply provides the equivalent of your ears and mouth. Insofar as the phrase is appropriate, Alexa lives in the cloud, interacting with you by means of specific convenient devices. Indeed, Amazon are already moving the focus away from particular pieces of hardware, towards being able to access the technology from a very wide range of devices including web pages, phones, cars, your Kindle, and so on. When you interact with Alexa, the flow of information looks a bit like this (ignoring extra bits and pieces to do with security and such like).

Alexa information flows (simplified)
Alexa information flows (simplified)

And if you tease that apart a little bit then this is roughly how Lex and Polly fit in.

Lex and Polly information flows (simplified)
Lex and Polly information flows (simplified)

 

So for today I want to look a bit more at the two “gateway” parts of the jigsaw – Lex and Polly. Lex is there to sort out what it is you want to happen – your intent – given what it is you said. Of course, given the newness of the system, every so often Lex gets it wrong. What entertains me is not so much those occasions when you get misunderstood, but the extremity of some people’s reaction to this. Human listeners make mistakes just like software ones do, but in some circles each and every failure case of Lex is paraded as showing that the technology is inherently flawed. In reality, it is simply under development. It will improve, but I don’t expect that it will ever get to 100% perfection, any more than people will.

Anyway, let’s suppose that Lex has correctly interpreted your intent. Then all kinds of things may happen behind the scenes, from simple list lookups through to complex analysis and decision-making. The details of that are up to the particular skill, and I’m not going to talk about that.

Instead, let’s see what happens on the way back to the user. The skill as a whole has decided on some spoken response. At the current state of the art, that response is almost certainly defined by the coder as a block of text, though one can imagine that in the future, a more intelligent and autonomous Alexa might decide for herself how to frame a reply. But however generated, that body of text has to be transformed into a stream of spoken words – and that is Polly’s job.

A standard Echo or Dot is set up to produce just one voice. There is a certain amount of configurability – pitch can be raised or lowered, the speed of speech altered, or the pronunciation of unusual words defined. But basically Alexa has a single voice when you use one of the dedicated gadgets to access her. But Polly has a lot more – currently 48 voices (18 male and 30 female), in 23 languages. Moreover, you can require that the speaker language and the written language differ, and so mimic a French person speaking English. Which is great if what you want to do is read out a section of a book, using different voices for the dialogue.

Timing Kindle cover
Timing Kindle cover

That’s just what I have been doing over the last couple of days, using Timing (Far from the Spaceports Book 2) as a test-bed. The results aren’t quite ready for this week, but hopefully by next week you can enjoy some snippets. Of course, I rapidly found that even 48 voices are not enough to do what you want. There is a shortage of some languages – in particular Middle Eastern and Asian voices are largely absent – but more will be added in time. One of the great things about Polly (speaking as a coder) is that switching between different voices is very easy, and adding in customised pronunciation is a breeze using a phonetic alphabet. Which is just as well. Polly does pretty well on “normal” words, but celestial bodies such as Phobos and Ceres are not, it seems, considered part of a normal vocabulary! Even the name Mitnash needed some coaxing to get it sounding how I wanted.

The world of Far from the Spaceports and Timing (and the in preparation Authentication Key) is one where the production of high quality and emotionally sensitive speech by artificial intelligences (personas in the books) taken for granted. At present we are a very long way from that – Alexa is a very remote ancestor of Slate, if you like – but it’s nice to see the start of something emerging around us.

Friday June 30th was International Asteroid Day!

Artist's impression of asteroid (NASA/JPL)
Artist’s impression of asteroid (NASA/JPL)

And no, I hadn’t realised this myself until a couple of days before… but NASA and others around the world had a day’s focus on asteroids. Now, to be sure most of that focus was looking at the thorny question of Near Earth Objects, both asteroids and comets, and what we might be able to do if one was on a collision course.

Far from the Spaceports cover
Far from the Spaceports cover

But it seemed to me that this was as good a time as any to celebrate my fictional Scilly Isle asteroids, as described in Far from the Spaceports and Timing (and the work in progress provisionally called The Authentication Key). In those stories, human colonies have been established on some of the asteroids, and indeed on sundry planets and moons. These settlements have gone a little beyond mining stations and are now places that people call home. A scenario well worth remembering on International Asteroid Day!

Kindle Cover - Half Sick of Shadows
Kindle Cover – Half Sick of Shadows

While on the subject of books, some lovely reviews for Half Sick of Shadows have been coming in.

Hoover Reviews said:
“The inner turmoil of The Lady, as she struggles with the Mirror to gain access to the people she comes in contact with, drives the tale as the Mirror cautions her time and again about the dangers involved.  The conclusion of the tale, though a heart rending scene, is also one of hope as The Lady finally finds out who she is.”

The Review said:
“Half Sick of Shadows is in a genre all its own, a historical fantasy with some science fiction elements and healthy dose of mystery, it is absolutely unique and a literary sensation. Beautifully written, with an interesting storyline and wonderful imagery, it is in a realm of its own – just like the Lady of Shalott… It truly is mesmerising.”

Find out for yourself at Amazon.co.uk or Amazon.com.

Half Sick of Shadows Alexa skill icon
Half Sick of Shadows Alexa skill icon

Or chat about the book with Alexa by enabling the skill at the UK or US stores.

Language and pronunciation

Half Sick of Shadows Alexa skill icon
Half Sick of Shadows Alexa skill icon

I’ve been thinking these last few days, once again, about language and pronunciation. This was triggered by working on some more Alexa skills to do with my books. For those who don’t know, I have such things already in place for Half Sick of Shadows, Far from the Spaceports, and Timing. That leaves the Bronze Age series set in Kephrath, in the hill country of Canaan. And here I ran into a problem. Alexa does pretty well with contemporary names – I did have a bit of difficulty with getting her to pronounce “Mitnash” correctly, but solved that simply by changing the spelling of the text I supplied. If instead of Mitnash I wrote Mitt-nash, the text-to-speech engine had enough clues to work out what I meant.

So far so good, but you can only go part of the way down that road. You can’t keep fiddling around with weird spellings just to trick the code into doing what you want. Equally, it’s hardly reasonable to suppose that the Alexa coding team would have considered how to pronounce ancient Canaanite or Egyptian names. Sure enough the difficulties multiplied with the older books. Even “Kephrath” came out rather mangled, and things went downhill from there.
Amazon Dot - Inactive
Amazon Dot – Inactive

So I took a step back, did some investigation, and found that you can define the pronunciation of unusual words by using symbols from the phonetic alphabet. Instead of trying to guess how Alexa might pronounce Giybon, or Makty-Rasut, or Ikaret, I can simply work out what symbols I need for the consonants and vowels, and provide these details in a specific format. Instead of Mitnash, I write mɪt.næʃ. Ikaret becomes ˈIk.æ.ˌɹɛt.

So that solved the immediate problem, and over the next few days my Alexa skills for In a Milk and Honeyed Land, Scenes from a Life, and The Flame Before Us will be going live. Being slightly greedy about such things, of course I now want more! Ideally I want the ability to set up a pronunciation dictionary, so that I can just set up a list of standard pronunciations that Alexa can tap into at need – rather like having a custom list of words for a spelling checker. Basically, I want to be able to teach Alexa how to pronounce new words that aren’t in the out-of-the-box setup. I suspect that such a thing is not too far away, since I can hardly be the only person to come across this. In just about every specialised area of interest there are words which aren’t part of everyday speech.

Amazon Dot - Active
Amazon Dot – Active

But also, this brought me into contact with the perennial issue of UK and US pronunciation. Sure, a particular phonetic symbol means whatever it means, but the examples of typical words vary considerably. As a Brit, I just don’t pronounce some words the same as my American friends, so there has to be a bit of educated guesswork going into deciding what sound I’m hoping for. Of course it’s considerably more complicated than just two nations – within those two there are also large numbers of regional and cultural shifts. And of course there are plenty of countries which use English but sound quite different to either “standard British” or “standard American”.

That’s for some future, yet to be invented, dialect-aware Alexa! Right now it’s enough to code for two variations, and rely on the fact that the standard forms are recognisable enough to get by. But wouldn’t it be cool to be able to insert some extra tags into dialogue in order to get one character’s speech as – say – Cumbrian, and another as from Somerset.