I thought that this week I would have a quick break from the Inklings, King Arthur, and such like, and report some space news which I came across a few days ago.
But first, an update on my latest Alexa skill – Polly Reads. This showcases the ability of Alexa’s “big sister”, Polly, to read text in multiple voices and accents. So this skill is a bit like a podcast, letting you step through a series of readings from my novels. Half Sick of Shadows is there, of course, plus some readings from Far from the Spaceports and Timing. So far the skill is available only on the UK Alexa Skills site, but it’s currently going through the approval process for other sites world-wide. **update on Wednesday morning – I just heard that it has gone live world-wide now! ** Here is the Amazon US link **
Now the space news, and specifically about the asteroid Ceres (or dwarf planet if you prefer). Quite apart from their general interest, this news affects how we write about the outer solar system, so is particularly relevant to my near future series.
Many readers will know that the NASA Dawn spacecraft has been orbiting Ceres for some time now – nearly three years. This has provided us with some fascinating insights into the asteroid, especially the mountains on its surface, and the bright salt deposits found here and there. But the sheer length of time accumulated to date – something like 1500 orbits, at different elevations – means that we can now follow changes as they happen on the surface.
Now the very fact of change is something of a surprise. Not all that long ago, it was assumed that such small objects, made of rock and ice, had long since ceased to evolve. Any internal energy would have leaked away millennia ago, and the only reason for anything to happen would be if there was a collision with some other external object like a meteorite. We knew that the gas giant planets were active, with turbulent storms and hugely powerful prevailing winds, but the swarms of small rocky moons, asteroids, and dwarf planets were considered static.
But what Dawn has shown us is that this is wrong. Repeated views of the same parts of the surface show how areas of exposed ice are constantly growing and shrinking, even over just a few months. This could be because new water vapour is oozing out of surface cracks and then freezing, or alternatively because some layer of dust is slowly settling, and so exposing ice which was previously hidden. At this stage, we can’t tell for sure which of those (or some third explanation) is true.
The evidence now suggests that Ceres once had a liquid water ocean – most of this has frozen into a thick crust of ice, with visible mineral deposits scattered here and there.
Certainly Ceres – and presumably many other asteroids – is more active than we had presumed. Such members of our solar system remain chemically and geologically active, rather than being just inert lumps drifting passively around our sun. As and when we get out there to take a look, we’re going to find a great many more surprises. Meanwhile, we can always read about them…
A couple of days ago, a friend sent me an article talking about the present state of the art of chatbots – artificially intelligent assistants, if you like. The article focused on those few bots which are particularly convincing in terms of relationship.
Now, as regular readers will know, I quite often talk about the Alexa skills I develop. In fact I have also experimented with chatbots, using both Microsoft’s and Amazon’s frameworks. Both the coding style, and the flow of information and logic, are very similar between these two types of coding, so there’s a natural crossover. Alexa, of course, is predominantly a voice platform, whereas chatbots are more diverse. You can speak to, and listen to, bots, but they are more often encountered as part of a web page or mobile app.
Now, beyond the day job and my coding hobby, I also write fiction about artificially intelligent entities – the personas of Far from the Spaceports and related stories (Timing and the in-progress The Liminal Zone). Although I present these as occurring in the “near-future”, by which I mean vaguely some time in the next century or two, they are substantially more capable than what we have now. There’s a lot of marketing hype about AI, but also a lot of genuine excitement and undoubted advancement.
So, what are the main areas where tomorrow’s personas vastly exceed today’s chatbots?
First and foremost, a wide-ranging awareness of the context of a conversation and a relationship. Alexa skills and chatbots retain a modest amount of information during use, called session attributes, or context, depending on the platform you are using. So if the skill or bot doesn’t track through a series of questions, and remember your previous answers, that’s disappointing. The developer’s decision is not whether it is possible to remember, but rather how much to remember, and how to make appropriate use of it later on.
Equally, some things can be remembered from one session to the next. Previous interactions and choices can be carried over into the next time. Again, the questions are not how, but what should be preserved like this.
But… the volume of data you can carry over is limited – it’s fine for everyday purposes, but not when you get to wanting an intelligent and sympathetic individual to converse with. If this other entity is going to persuade, it needs to retain knowledge of a lot more than just some past decisions.
Secondly, a real conversational partner does other things with their time outside of the chat specifically between the two of you. They might tell you about places, people, or things they had seen, or ideas that had occurred to them in the meantime. But currently, almost all skills and chatbots stay entirely dormant until you invoke them. In between times they do essentially nothing. I’m not counting cases where the same skill is activated by different people – “your” instance, meaning the one that holds any record of your personal interactions, simply waits for you to get involved again. The lack of any sense of independent life is a real drawback. Sure, Alexa can give you a “fact of the day” when you say hello, but we all know that this is just fished out of an internet list somewhere, and does not represent actual independent existence and experience.
Finally (for today – there are lots of other things that might be said) today’s skills and bots have a narrow focus. They can typically assist with just one task, or a cluster of closely related tasks. Indeed, at the current state of the art this is almost essential. The algorithms that seek to understand speech can only cope with a limited and quite structured set of options. If you write some code that tries to offer too wide a spectrum of choice, the chances are that the number of misunderstandings gets unacceptably high. To give the impression of talking with a real individual, the success rate needs to be pretty high, and the entity needs to have some way of clarifying and homing in on what it was that you really wanted.
Now, I’m quite optimistic about all this. The capabilities of AI systems have grown dramatically over the last few years, especially in the areas of voice comprehension and production. My own feeling is that some of the above problems are simply software ones, which will get solved with a bit more experience and effort. But others will probably need a creative rethink. I don’t imagine that I will be talking to a persona at Slate’s level in my lifetime, but I do think that I will be having much more interesting conversations with one before too long!
A follow-up to my earlier post this week, catching up on some more news. But first, here is a couple of snaps (one enlarged and annotated) I took earlier today in the early morning as I walked to East Finchley tube station.
The Moon, Jupiter and Mars, annotated
The Moon, Jupiter and Mars
All very evocative, and leads nicely into my next link, which is a guest post I wrote for Lisl’s Before the Second Sleep blog, on the subject of title. Naturally enough, it’s a topic that really interests me – how will human settlements across the solar system adapt to and reflect the physical nature of the world they are set on?
In particular I look at Mars’ moon Phobos, both in the post and in Timing. So far as we can tell, Phobos is extremely fragile. Several factors cause this, including its original component parts, the closeness of its orbit to Mars, and the impact of whatever piece of space debris caused the giant crater Stickney. But whatever the cause… how might human society adapt to living on a moon where you can’t trust the ground below your feet? For the rest of the post, follow this link.
And also here’s a reminder of the Kindle Countdown offer on most of my books, and the Goodreads giveaway on Half Sick of Shadows. Here are the links…
Half Sick of Shadows is on Goodreads giveaway, with three copies to be won by the end of this coming weekend.
All the other books are on Kindle countdown deal at £0.99 or $0.99 if you are in the UK or US respectively – but once again only until the end of the weekend. Links for these are:
It’s been an exceptionally busy time at work recently, so I haven’t had time to write much. But happily, lots of other things are happening, so here’s a compendium of them.
First, Half Sick of Shadows was reviewed on Sruti’s Bookblog, with a follow-up interview. The links are: the review itself, plus the first and second half of the interview. “She wishes for people to value her but they seem to be changing and missing… She can see the world, but she always seemed curbed and away from everything.”
Secondly, right now there’s a whole lot of deals available on my novels, from oldest to newest. Half Sick of Shadows is on Goodreads giveaway, with three copies to be won by the end of next weekend.
All the other books are on Kindle countdown deal at £0.99 or $0.99 if you are in the UK or US respectively. Links for these are:
The second part of this quick review of the Future Decoded conference looks at things a little further ahead. This was also going to be the final part, but as there’s a lot of cool stuff to chat about, I’ve decided to add part 3…
So here’s a problem that is a minor one at the moment, but with the potential to grow into a major one. In short, the world has a memory shortage! Already we are generating more bits and bytes that we would like to store, than we have capacity for. Right now it’s an inconvenience rather than a crisis, but year by year the gap between wish and actuality is growing. If growth in both these areas continues as at present, within a decade we will only be able to store about a third of what we want. A decade or so later that will drop to under one percent.
Think about it on the individual level. You take a short video clip while on holiday. It goes onto your phone. At some stage you back it up in Dropbox, or iCloud, or whatever your favourite provider is. Maybe you keep another copy on your local hard drive. Then you post it to Facebook and Google+. You send it to two different WhatsApp groups and email it to a friend. Maybe you’re really pleased with it and make a YouTube version. You now have ten copies of your 50Mb video… not to mention all the thumbnail images, cached and backup copies saved along the way by these various providers, which you’re almost certainly not aware of and have little control over. Your ten seconds of holiday fun has easily used 1Gb of the world’s supply of memory! For comparison, the entire Bible would fit in about 3 Mb in plain uncompressed text, and taking a wild guess, you would use well under that 1 Gb value to store every last word of the world’s sacred literature. And a lot of us are generating holiday videos these days! Then lots of cyclists wear helmet cameras these days, cars have dash cams… and so on. We are generating prodigious amounts of imagery.
So one solution is that collectively we get more fussy about cleaning things up. You find yourself deleting the phone version when you’ve transferred it to Dropbox. You decide that a lower resolution copy will do for WhatsApp. Your email provider tells you that attachments will be archived or disposed of according to some schedule. Your blog allows you to reference a YouTube video in a link, rather than uploading yet another copy. Some clever people somewhere work out a better compression algorithm. But… even all these workarounds together will still not be enough to make up for the shortfall, if the projections are right.
Holiday snaps aside, a great deal of this vast growth in memory usage is because of emerging trends in computing. Face and voice recognition, image analysis, and other AI techniques which are now becoming mainstream use a great deal of stored information to train the models ready for use. Regular blog readers will know that I am particularly keen on voice assistants like Alexa. My own Alexa programming doesn’t use much memory, as the skills are quite modest and tolerably well written. But each and every time I make an Alexa request, that call goes off somewhere into the cloud, to convert what I said (the “utterance”) into what I meant (the “intent”). Alexa is pretty good at getting it right, which means that there is a huge amount of voice training data sitting out there being used to build the interpretive models. Exactly the same is true for Siri, Cortana, Google Home, and anyone else’s equivalent. Microsoft call this training area a “data lake”. What’s more, there’s not just one of them, but several, at different global locations to reduce signal lag.
Hopefully that’s given some idea of the problem. Before looking at the idea for a solution that was presented the other day, let’s think what that means for fiction writing. My AI persona Slate happily flits off to the asteroid belt with her human investigative partner Mitnash in Far from the Spaceports. In Timing, they drop back to Mars, and in the forthcoming Authentication Key they will get out to Saturn, but for now let’s stick to the asteroids. That means they’re anywhere from 15 to 30 minutes away from Earth by signal. Now, Slate does from time to time request specific information from the main hub Khufu in Earth, but necessarily this can only be for some detail not locally available. Slate can’t send a request down to London every time Mit says something, just so she can understand it. Trying to chat with up to an hour lag between statements would be seriously frustrating. So she has to carry with her all of the necessary data and software models that she needs for voice comprehension, speech, and defence against hacking, not to mention analysis, reasoning, and the capacity to feel emotion. Presupposing she has the equivalent of a data lake, she has to carry it with her. And that is simply not feasible with today’s technology.
So the research described the other day is exploring the idea of using DNA as the storage medium, rather than a piece of specially constructed silicon. DNA is very efficient at encoding data – after all, a sperm and egg together have all the necessary information to build a person. The problems are how to translate your original data source into the various chemical building blocks along a DNA helix, and conversely how to read it out again at some future time. There’s a publicly available technical paper describing all this. We were shown a short video which had been encoded, stored, and decoded using just this method. But it is fearfully expensive right now, so don’t expect to see a DNA external drive on your computer anytime soon!
The benefits purely in terms of physical space are colossal. The largest British data centre covers the equivalent of about eight soccer grounds (or four cricket pitches), using today’s technology. The largest global one is getting on for ten times that size. With DNA encoding, that all shrinks down to about a matchbox. For storytelling purposes that’s fantastic – Slate really is off to the asteroids and beyond, along with her data lake in plenty of local storage, which now takes up less room and weight than a spare set of underwear for Mit. Current data centres also use about the same amount of power as a small town, (though because of judicious choice of technology they are much more ecologically efficient) but we’ll cross the power bridge another time.
However, I suspect that many of us might see ethical issues here. The presenter took great care to tell us that the DNA used was not from anything living, but had been manufactured from scratch for the purpose. No creatures had been harmed in the making of this video. But inevitably you wonder if all researchers would take this stance. Might a future scenario play out that some people are forced to sell – or perhaps donate – their bodies for storage? Putting what might seem a more positive spin on things, wouldn’t it seem convenient to have all your personal data stored, quite literally, on your person, and never entrusted to an external device at all? Right now we are a very long way from either of these possibilities, but it might be good to think about the moral dimensions ahead of time.
Either way, the starting problem – shortage of memory – is a real one, and collectively we need to find some kind of solution…
And for the curious, this is the video which was stored on and retrieved from DNA – regardless of storage method, it’s a fun and clever piece of filming (https://youtu.be/qybUFnY7Y8w)…
This is the third and final part of Left Behind by Events, in which I take a look at my own futuristic writing and try to guess which bits I will have got utterly wrong when somebody looks back at it from a future perspective! But it’s also the first of a few blogs in which I will talk a bit about some of the impressions I got of technical near-future as seen at the annual Microsoft Future Decoded conference that I went to the other day.
So I am tolerably confident about the development of AI. We don’t yet have what I call “personas” with autonomy, emotion, and gender. I’m not counting the pseudo-gender produced by selecting a male or female voice, though actually even that simple choice persuades many people – how many people are pedantic enough to call Alexa “it” rather than “she”? But at the rate of advance of the relevant technologies, I’m confident that we will get there.
I’m equally confident, being an optimistic guy, that we’ll develop better, faster space travel, and have settlements of various sizes on asteroids and moons. The ion drive I posit is one definite possibility: the Dawn asteroid probe already uses this system, though at a hugely smaller rate of acceleration than what I’m looking for. The Hermes, which features in both the book and film The Martian, also employs this drive type. If some other technology becomes available, the stories would be unchanged – the crucial point is that intra-solar-system travel takes weeks rather than months.
I am totally convinced that financial crime will take place! One of the ways we try to tackle it on Earth is to share information faster, so that criminals cannot take advantage of lags in the system to insert falsehoods. But out in the solar system, there’s nothing we can do about time lags. Mars is between 4 and 24 minutes from Earth in terms of a radio or light signal, and there’s nothing we can do about that unless somebody invents a faster-than-light signal. And that’s not in range of my future vision. So the possibility of “information friction” will increase as we spread our occupancy wider. Anywhere that there are delays in the system, there is the possibility of fraud… as used to great effect in The Sting.
Something I have not factored in at all is biological advance. I don’t have cyborgs, or genetically enhanced people, or such things. But I suspect that the likelihood is that such developments will occur well within the time horizon of Far from the Spaceports. Biology isn’t my strong suit, so I haven’t written about this. There’s a background assumption that illness isn’t a serious problem in this future world, but I haven’t explored how that might happen, or what other kinds of medical change might go hand-in-hand with it. So this is almost certainly going to be a miss on my part.
Moving on to points of contact with the conference, there is the question of my personas’ autonomy. Right now, all of our current generation of intelligent assistants – Alexa, Siri, Cortana, Google Home and so on – rely utterly on a reliable internet connection and a whole raft of cloud-based software to function. No internet or no cloud connection = no Alexa.
This is clearly inadequate for a persona like Slate heading out to the asteroid belt! Mitnash is obviously not going to wait patiently for half an hour or so between utterances in a conversation. For this to work, the software infrastructure that imparts intelligence to a persona has to travel along with it. Now this need is already emerging – and being addressed – right now. I guess most of us are familiar with the idea of the Cloud. Your Gmail account, your Dropbox files, your iCloud pictures all exists somewhere out there… but you neither know nor care where exactly they live. All you care is that you can get to them when you want.
But with the emerging “internet of things” that is having to change. Let’s say that a wildlife programme puts a trail camera up in the mountains somewhere in order to get pictures of a snow leopard. They want to leave it there for maybe four months and then collect it again. It’s well out of wifi range. In those four months it will capture say 10,000 short videos, almost all of which will not be of snow leopards. There will be mountain goats, foxes, mice, leaves, moving splashes of sunshine, flurries of rain or snow… maybe the odd yeti. But the memory stick will only hold say 500 video clips. So what do you do? Throw away everything that arrives after it gets full? Overwrite the oldest clips when you need to make space? Arrange for a dangerous and disruptive resupply trip by your mountaineer crew?
Or… and this is the choice being pursued at the moment… put some intelligence in your camera to try to weed out non-snow-leopard pictures. Your camera is no longer a dumb picture-taking device, but has some intelligence. It also makes your life easier when you have recovered the camera and are trying to scan through the contents. Even going through my Grasmere badger-cam vids every couple of weeks involves a lot of deleting scenes of waving leaves!
So this idea is now being called the Cloud Edge. You put some processing power and cleverness out in your peripheral devices, and only move what you really need into the Cloud itself. Some of the time, your little remote widgets can make up their own minds what to do. You can, so I am told, buy a USB stick with trainable neural network on it for sifting images (or other similar tasks) for well under £100. Now, this is a far cry from an independently autonomous persona able to zip off to the asteroid belt, but it shows that the necessary technologies are already being tackled.
I’ve been deliberately vague about how far into the future Far from the Spaceports, Timing, and the sequels in preparation are set. If I had to pick a time I’d say somewhere around the one or two century mark. Although science fact notoriously catches up with science fiction faster than authors imagine, I don’t expect to see much of this happening in my lifetime (which is a pity, really, as I’d love to converse with a real Slate). I’d like to think that humanity from one part of the globe or another would have settled bases on other planets, moons, or asteroids while I’m still here to see them, and as regular readers will know, I am very excited about where AI is going. But a century to reach the level of maturity of off-Earth habitats that I propose seems, if anything, over-optimistic.
That’s it for today – over the next few weeks I’ll be talking about other fun things I learned…
This is the first part of two, in which I look at the ways in which books show their age.
I read a lot of science fiction, and I watch a fair number of science fiction films and TV series. The latest addition is Star Trek Discovery, the latest offering in that very-long-running universe. For those who don’t know, it’s set in a time frame a few years before the original series (the one with Captain Kirk), and well after the series just called Enterprise.
Inevitably the new series has had a mixed reception, but I have enjoyed the first couple of episodes. But the thing I wanted to write about today was not the storyline, or the characters, but the presentation of technology. The bridge of the starship Shenzhou looked just like you’d imagine – lots of touch screen consoles, big displays showing not just some sensor data but also some interpretive stuff so you could make sense of it. And so on. It looked great – recognisable to us 21st century folk used to our own touch screen phones and the like, but futuristic enough that you knew you couldn’t just buy it all from Maplin.
But herein lies the problem. Look back at an old episode of the original series, and the Enterprise bridge looks really naff! I dare say that back in the 1960s it also gave the impression of “this is cool future stuff”, but it certainly doesn’t look as though it’s another decade or so on from the technological world of Discovery.
Basically, our ability to build cool gadgets has vastly outstripped the imagination of authors and film makers. Just about any old science fiction book suffers from this. You find computers on board spaceships which can think, carry out prodigiously complex calculations, and so on, but output their results on reams of printed paper. Once you start looking, you can find all manner of things like this.
Now, on one level this doesn’t matter at all. The story is the main thing, and most of us can put up with little failures of imagination about just how quickly actual invention and design would displace what seemed to be far-fetched ideas. On the whole we can forgive individual stories for their foibles. If it’s a good story, we don’t mind the punched-card inputs, paper-tape outputs, and so on. We accept that in the spirit that the author intended. Also, many authors are not so very interested in the mechanics of the story, or how feasible the science is, but in different dimensions. How might people react in particular circumstances? What are the moral dimensions involved? What aspects of the story resonate most strongly with present-day issues?
The particular problem that Discovery has is simply that it is part of a wider set of series, and we already thought we knew what the future looked like! A particular peril for any of us writing a series of books.
Now it’s not just science fiction that can be left behind by the march of events. Our view of history can, and has, changed as new evidence comes to light. Casual assumptions that one generation makes about past societies, interactions, and chronology may be turned over a few years down the line. Sometimes we look at the ways in which older authors presented things and cringe. Historical fiction books might easily be overtaken by research and deeper understanding, just as much as science fiction. It’s a risk we all face.
Next time – some thoughts about my own science fiction series, Far from the Spaceports, and the particular things in that story that might get left behind. And also, the particular problems of writing about the near-future.
Just a short post today to highlight a YouTube video based around one of the Polly conversations from Timing that I have been talking about recently. This one is of Mitnash, Slate, Parvati and Chandrika talking on board Parvati’s spaceship, The Parakeet, en route to Phobos. The subject of conversation is the recent wreck of Selif’s ship on Tean, one of the smaller asteroids in the Scilly isles group…
The link is: https://youtu.be/Uv5L0yMKaT0
While we’re in YouTube, here is the link to the conversation with Alexa about Timing… https://youtu.be/zLHZSOF_9xo
It’s slow work, but gradually all these various conversations and readings will get added to YouTube and other video sharing sites.
Sometime in the next couple of weeks they’ll be uploaded to YouTube, but for now they are just audio links included below and on the appropriate blog page. You’ll find more about this below. In passing, there’s a small prize available for the first person who correctly spots what’s wrong with the voice selection for Chandrika! Also, and unrelated to that, you’ll hear that not all of the voices are equally successful. I shall continue to tweak them, so hopefully the quality will steadily improve.
But before that, NASA just released two YouTube videos to celebrate the two year anniversary of when the New Horizons probe was at nearest approach to Pluto and Charon. They have turned the collection of images and other telemetry into flyby simulations of the dwarf planet and its moon, as though you were manoeuvring over them. Both the colours and the vertical heights of surface features have been exaggerated so you can get a better sense of what you are seeing, but that aside, it’s as close as most of us will get to personally experiencing these places.
OK, back to Polly. As well as specifying which of several different voices you want, you can give Polly some metadata about the sentence to help generate correct pronunciation. Last week I talked about getting proper nouns correct, like Mitnash. But in English you also get lots of words which are spelled the same but pronounced differently – homonyms. The one which I ran into was “minute”, which can either be a unit of time (min-nit) or something very small (my-newt). Another problem case I found was “produce” – was I expecting the noun form (prod-yuce) or the verb (pro-deuce)?
In all such cases, Polly tries to guess from context which you mean, but sometimes guesses wrong. Happily you can simply add some metadata to say which you want. Sometimes this is simply a matter of adding in a tag saying “I want the noun”. Other times you can say which of several alternate senses of the word you want, and simply check the underlying list until you find the right one. And if all else fails, there’s always the option of spelling it out phonetically…
A couple of weeks ago I went to a day event put on by Amazon showcasing their web technologies. My own main interests were – naturally – in the areas of AI and voice, but there was plenty there if instead you were into security, or databases, or the so-called “internet of things”.
Readers of this blog will know of my enthusiasm for Alexa, and perhaps will also know about the range of Alexa skills I have been developing (if you’re interested, go to the UK or the US sites). So I thought I’d go a little bit more into both Alexa and the two building blocks which support Alexa – Lex for language comprehension, and Polly for text-to-speech generation.
Alexa does not in any substantial sense live inside your Amazon Echo or Dot – that simply provides the equivalent of your ears and mouth. Insofar as the phrase is appropriate, Alexa lives in the cloud, interacting with you by means of specific convenient devices. Indeed, Amazon are already moving the focus away from particular pieces of hardware, towards being able to access the technology from a very wide range of devices including web pages, phones, cars, your Kindle, and so on. When you interact with Alexa, the flow of information looks a bit like this (ignoring extra bits and pieces to do with security and such like).
And if you tease that apart a little bit then this is roughly how Lex and Polly fit in.
So for today I want to look a bit more at the two “gateway” parts of the jigsaw – Lex and Polly. Lex is there to sort out what it is you want to happen – your intent – given what it is you said. Of course, given the newness of the system, every so often Lex gets it wrong. What entertains me is not so much those occasions when you get misunderstood, but the extremity of some people’s reaction to this. Human listeners make mistakes just like software ones do, but in some circles each and every failure case of Lex is paraded as showing that the technology is inherently flawed. In reality, it is simply under development. It will improve, but I don’t expect that it will ever get to 100% perfection, any more than people will.
Anyway, let’s suppose that Lex has correctly interpreted your intent. Then all kinds of things may happen behind the scenes, from simple list lookups through to complex analysis and decision-making. The details of that are up to the particular skill, and I’m not going to talk about that.
Instead, let’s see what happens on the way back to the user. The skill as a whole has decided on some spoken response. At the current state of the art, that response is almost certainly defined by the coder as a block of text, though one can imagine that in the future, a more intelligent and autonomous Alexa might decide for herself how to frame a reply. But however generated, that body of text has to be transformed into a stream of spoken words – and that is Polly’s job.
A standard Echo or Dot is set up to produce just one voice. There is a certain amount of configurability – pitch can be raised or lowered, the speed of speech altered, or the pronunciation of unusual words defined. But basically Alexa has a single voice when you use one of the dedicated gadgets to access her. But Polly has a lot more – currently 48 voices (18 male and 30 female), in 23 languages. Moreover, you can require that the speaker language and the written language differ, and so mimic a French person speaking English. Which is great if what you want to do is read out a section of a book, using different voices for the dialogue.
That’s just what I have been doing over the last couple of days, using Timing (Far from the Spaceports Book 2) as a test-bed. The results aren’t quite ready for this week, but hopefully by next week you can enjoy some snippets. Of course, I rapidly found that even 48 voices are not enough to do what you want. There is a shortage of some languages – in particular Middle Eastern and Asian voices are largely absent – but more will be added in time. One of the great things about Polly (speaking as a coder) is that switching between different voices is very easy, and adding in customised pronunciation is a breeze using a phonetic alphabet. Which is just as well. Polly does pretty well on “normal” words, but celestial bodies such as Phobos and Ceres are not, it seems, considered part of a normal vocabulary! Even the name Mitnash needed some coaxing to get it sounding how I wanted.
The world of Far from the Spaceports and Timing (and the in preparation Authentication Key) is one where the production of high quality and emotionally sensitive speech by artificial intelligences (personas in the books) taken for granted. At present we are a very long way from that – Alexa is a very remote ancestor of Slate, if you like – but it’s nice to see the start of something emerging around us.