Sometime in the next couple of weeks they’ll be uploaded to YouTube, but for now they are just audio links included below and on the appropriate blog page. You’ll find more about this below. In passing, there’s a small prize available for the first person who correctly spots what’s wrong with the voice selection for Chandrika! Also, and unrelated to that, you’ll hear that not all of the voices are equally successful. I shall continue to tweak them, so hopefully the quality will steadily improve.
But before that, NASA just released two YouTube videos to celebrate the two year anniversary of when the New Horizons probe was at nearest approach to Pluto and Charon. They have turned the collection of images and other telemetry into flyby simulations of the dwarf planet and its moon, as though you were manoeuvring over them. Both the colours and the vertical heights of surface features have been exaggerated so you can get a better sense of what you are seeing, but that aside, it’s as close as most of us will get to personally experiencing these places.
OK, back to Polly. As well as specifying which of several different voices you want, you can give Polly some metadata about the sentence to help generate correct pronunciation. Last week I talked about getting proper nouns correct, like Mitnash. But in English you also get lots of words which are spelled the same but pronounced differently – homonyms. The one which I ran into was “minute”, which can either be a unit of time (min-nit) or something very small (my-newt). Another problem case I found was “produce” – was I expecting the noun form (prod-yuce) or the verb (pro-deuce)?
In all such cases, Polly tries to guess from context which you mean, but sometimes guesses wrong. Happily you can simply add some metadata to say which you want. Sometimes this is simply a matter of adding in a tag saying “I want the noun”. Other times you can say which of several alternate senses of the word you want, and simply check the underlying list until you find the right one. And if all else fails, there’s always the option of spelling it out phonetically…
A couple of weeks ago I went to a day event put on by Amazon showcasing their web technologies. My own main interests were – naturally – in the areas of AI and voice, but there was plenty there if instead you were into security, or databases, or the so-called “internet of things”.
Readers of this blog will know of my enthusiasm for Alexa, and perhaps will also know about the range of Alexa skills I have been developing (if you’re interested, go to the UK or the US sites). So I thought I’d go a little bit more into both Alexa and the two building blocks which support Alexa – Lex for language comprehension, and Polly for text-to-speech generation.
Alexa does not in any substantial sense live inside your Amazon Echo or Dot – that simply provides the equivalent of your ears and mouth. Insofar as the phrase is appropriate, Alexa lives in the cloud, interacting with you by means of specific convenient devices. Indeed, Amazon are already moving the focus away from particular pieces of hardware, towards being able to access the technology from a very wide range of devices including web pages, phones, cars, your Kindle, and so on. When you interact with Alexa, the flow of information looks a bit like this (ignoring extra bits and pieces to do with security and such like).
And if you tease that apart a little bit then this is roughly how Lex and Polly fit in.
So for today I want to look a bit more at the two “gateway” parts of the jigsaw – Lex and Polly. Lex is there to sort out what it is you want to happen – your intent – given what it is you said. Of course, given the newness of the system, every so often Lex gets it wrong. What entertains me is not so much those occasions when you get misunderstood, but the extremity of some people’s reaction to this. Human listeners make mistakes just like software ones do, but in some circles each and every failure case of Lex is paraded as showing that the technology is inherently flawed. In reality, it is simply under development. It will improve, but I don’t expect that it will ever get to 100% perfection, any more than people will.
Anyway, let’s suppose that Lex has correctly interpreted your intent. Then all kinds of things may happen behind the scenes, from simple list lookups through to complex analysis and decision-making. The details of that are up to the particular skill, and I’m not going to talk about that.
Instead, let’s see what happens on the way back to the user. The skill as a whole has decided on some spoken response. At the current state of the art, that response is almost certainly defined by the coder as a block of text, though one can imagine that in the future, a more intelligent and autonomous Alexa might decide for herself how to frame a reply. But however generated, that body of text has to be transformed into a stream of spoken words – and that is Polly’s job.
A standard Echo or Dot is set up to produce just one voice. There is a certain amount of configurability – pitch can be raised or lowered, the speed of speech altered, or the pronunciation of unusual words defined. But basically Alexa has a single voice when you use one of the dedicated gadgets to access her. But Polly has a lot more – currently 48 voices (18 male and 30 female), in 23 languages. Moreover, you can require that the speaker language and the written language differ, and so mimic a French person speaking English. Which is great if what you want to do is read out a section of a book, using different voices for the dialogue.
That’s just what I have been doing over the last couple of days, using Timing (Far from the Spaceports Book 2) as a test-bed. The results aren’t quite ready for this week, but hopefully by next week you can enjoy some snippets. Of course, I rapidly found that even 48 voices are not enough to do what you want. There is a shortage of some languages – in particular Middle Eastern and Asian voices are largely absent – but more will be added in time. One of the great things about Polly (speaking as a coder) is that switching between different voices is very easy, and adding in customised pronunciation is a breeze using a phonetic alphabet. Which is just as well. Polly does pretty well on “normal” words, but celestial bodies such as Phobos and Ceres are not, it seems, considered part of a normal vocabulary! Even the name Mitnash needed some coaxing to get it sounding how I wanted.
The world of Far from the Spaceports and Timing (and the in preparation Authentication Key) is one where the production of high quality and emotionally sensitive speech by artificial intelligences (personas in the books) taken for granted. At present we are a very long way from that – Alexa is a very remote ancestor of Slate, if you like – but it’s nice to see the start of something emerging around us.
And no, I hadn’t realised this myself until a couple of days before… but NASA and others around the world had a day’s focus on asteroids. Now, to be sure most of that focus was looking at the thorny question of Near Earth Objects, both asteroids and comets, and what we might be able to do if one was on a collision course.
But it seemed to me that this was as good a time as any to celebrate my fictional Scilly Isle asteroids, as described in Far from the Spaceports and Timing (and the work in progress provisionally called The Authentication Key). In those stories, human colonies have been established on some of the asteroids, and indeed on sundry planets and moons. These settlements have gone a little beyond mining stations and are now places that people call home. A scenario well worth remembering on International Asteroid Day!
While on the subject of books, some lovely reviews for Half Sick of Shadows have been coming in.
Hoover Reviews said:
“The inner turmoil of The Lady, as she struggles with the Mirror to gain access to the people she comes in contact with, drives the tale as the Mirror cautions her time and again about the dangers involved. The conclusion of the tale, though a heart rending scene, is also one of hope as The Lady finally finds out who she is.”
The Review said: “Half Sick of Shadows is in a genre all its own, a historical fantasy with some science fiction elements and healthy dose of mystery, it is absolutely unique and a literary sensation. Beautifully written, with an interesting storyline and wonderful imagery, it is in a realm of its own – just like the Lady of Shalott… It truly is mesmerising.”
Elon Musk, founder and CEO of SpaceX, has made no secret of his plans for facilitating a colony on Mars for a long time now. But last September, in a public presentation, he explained it all in considerably more detail. The reasoning, and the raw logistical figures behind it, are still available. His credibility is built around the SpaceX programme. This in turn is based on a concept of reusing equipment rather than throwing it away each launch, and it has had a string of successes lately. The initial booster stage now returns to a landing platform, there to go through a process which recommissions it for another launch.
Quite apart from any recycling benefits, this then allows SpaceX to seriously undercut other firms’ prices of putting satellites into orbit. It still couldn’t be called cheap – one set of figures quotes $65 million – but that’s only about one sixth of the regular cost. If you’re happy to know that your equipment is going into orbit on a rocket that is not brand new, it’s a huge saving. Every successful launch, return to base, and relaunch, adds to buyers’ confidence that the procedure can be trusted.
But the big picture goes well beyond Earth orbit. Musk believes that the best way to mitigate the risks of life on Earth – global warming, conflict, extremist views of all kinds, and so on – is to spread out more widely. In a recent lecture, Stephen Hawking has said essentially the same thing. And in Musk’s vision, Mars is a better bet than the moon for this, for a whole cluster of reasons including the presence of an atmosphere (albeit a thin one compared to here) and a greater likeness to Earth in terms of gravity and size.
So reusable rockets into Earth orbit are simply a starting point. Once you have a reasonably-sized fleet of such things, you can build larger objects already in space, and fly them over to Mars when the orbital positions are ideal. The logic of gravitational pull around a planet means that the hardest, and most energy-intensive part is needed to get you from the surface up to a stable orbit. Once there, much gentler and longer-lasting means of propulsion will get you onward bound.
To take a contemporary situation, NASA’s Dawn probe is currently orbiting the asteroid Ceres. Its hydrazine fuel, which powers the little manoeuvring and attitude thrusters, is nearly exhausted. The mission control team are trying to decide on the best course of action. In its current high orbit only a few months of fuel remain. A closer orbit, which would give better quality pictures, would use it up in a matter of weeks. But using the main ion drive, a different power source altogether, to go somewhere else would probably give a few years of science. Fairly soon we should hear which option they have chosen, and where they consider the best balance is between risk and reward. The message for here is that staying close to a planet, or taking off from one, is costly in terms of fuel.
So Musk reckons that over the course of a century or so, he can arrange transportation for a million Martian colonists. In terms of grand sweep, it is so far ahead of anyone else’s plans as to seem impossible at first sight. But if all goes according to his admittedly ambitious plan, the first of many journeys could take place ten years from now. He – and I for that matter – might not live to see the Martian population reach a million, but he certainly expects to see it firmly established.
With Far from the Spaceports, its sequel Timing, and the work-in-progress provisionally called The Authentication Key, I deliberately did not fix a future date. It’s far enough ahead of now that artificial intelligence is genuinely personal and relational – sufficiently far ahead that it is entirely normal for a human investigator to be partnered long-term on an equal basis with an AI persona. None of the present clutch of virtual assistants have any chance at all of this, and my guess is that we are talking many generations of software development before this could happen. It’s also far enough ahead that there are colonies in many locations – certainly out as far as the moons of Saturn, and I am thinking about a few “listening post” settlements further out (watch this space – the stories aren’t written yet!). However, I hadn’t really thought in terms of a million colonists on Mars, and it may well be that, as happens so often in science fiction, real events might overtake my scenario a lot quicker than I thought likely.
Back with Musk’s proposal, one obvious consequence of the whole reuse idea is that the cost per person of getting there drops hugely. This buy-in figure is typically quoted as something like $10 billion. But the SpaceX plan drops this down to around $20,000 – cheaper than the average house price in the UK. I wonder how many people, given the chance, would sell up their belongings here in exchange for a fresh start on another planet?
I was wondering what image to finish with, and then came across this NASA/JPL picture of the Mars Curiosity Rover as seen from the Mars Orbiter (the little blue dot roughly in the middle)… a fitting display of the largeness of the planet compared to what we have sent there so far.
Today’s blog looks at bugs – the little things in a system that can go so very wrong. But before that – and entirely unrelated – I should mention that Half Sick of Shadows is now available in paperback form as well as Kindle. You can find the paperback at Amazon UK link, Amazon US link, and every other Amazon worldwide site you favour. So whichever format you prefer, it’s there for you.
So, bugs. In my day job I have to constantly think about what can go wrong with a system, in both small and large ways. No software developer starts out intending to write a bug – they appear, as if by magic, in systems that had been considered thoroughly planned out and implemented. This is just as true of hacking software, viruses and the like, as it is of what you might call positively motivated programs. It’s ironic really – snippets of code designed to take advantage of flaws in regular software are themselves traced and blocked because of their own flaws.
But back to the practice of QA – finding the problems and faults in a system thought to be correct. You could liken it, without too much of a stretch, to the process of writing. Authors take a situation, or a relationship, or a society, and find the unexpected weak points in it. Isaac Asimov was particularly adept at doing this in his I, Robot series of stories. At the outset he postulated three simple guidelines which all his robots had to follow – guidelines which rapidly caught on with much wider audiences as the “Three Laws of Robotics”. These three laws seemed entirely foolproof, but proved themselves to be a fertile ground for storytelling as he came up with one logical contradiction after another!
But it’s not just in coding software that bugs appear. Wagon wheels used to fall off axles, and I am told that the root cause was that the design was simply not very good. Road layouts change, and end up causing more delays than they resolve. Mugs and jugs spill drink as you try to pour, despite tens of thousands of years of practice making them. And I guess we have all come across “Friday afternoon” cars, tools, cooking pans and so on.
Bugs can be introduced in lots of places. Somebody thinks they’ve thought up a cool design, but they didn’t consider several important features. Somebody thinks they’ve adequately explained how to turn a design into a real thing, but their explanation is missing a vital step or two – how many of us have foundered upon this while assembling flat-pack furniture? Somebody reads a perfectly clear explanation, but skips over bits which they think they don’t need. Somebody doesn’t quite have the right tool, or the right level of skill, and ploughs on with whatever they have. Somebody realises that a rare combination of factors – what we call an edge case, or corner case – has not been covered in the design, and makes a guess how it should be tackled rather than going back to the designer. Somebody adds a new feature, but in doing so breaks existing functionality which used to work. Somebody makes a commercial decision to release a product before it’s actually ready (as a techie, I find this one particularly frustrating!)
And then you get to actual users. So many systems would work really well if it wasn’t for end-users! People will insist on using the gadget in ways that were never anticipated, or trying out combinations of things that were never thought about. A feature originally intended for use in one way gets pressed into service for something entirely different. People don’t provide input data in the way they’re supposed to, or they don’t stick to the guidelines about how the system is intended to work – and very few of us read the guidelines in the first place!
All of which have direct analogies in writing. Some of my books are indeed focused on software, and in particular the murky business of exploiting software for purposes of fraud. That world is full of flaws and failures, of the misuse of systems in both accidental and deliberate ways. But any book – past, present or future – is much the same. A historical novel might explore how a battle is lost because of miscommunication, human failings, or simply bad timing. Poor judgement leads to stories in any age. Friction in human relationships is a perennial field of study. So the two worlds I move in, of working life and leisure, are not really so far apart.
Now, engineering systems, including software engineering – have codes and guidelines intended to identify bugs at an early stage, before they get into the real world of users. The more critical the system, the more stringent the testing. If you write a mobile phone game, the testing threshold is very low! If you write software that controls an aircraft in flight, you have to satisfy all kinds of regulatory tests to show that your product is fit for purpose. But it’s a fair bet that any system at all has bugs in it, just waiting to pop out at an inopportune moment.
As regards writing, you could liken editing to the process of QA. The editor aims to spot slips in the writing – whether simply spelling and grammar, or else more subtle issues of style or viewpoint – and highlight them before the book reaches the general public. We all know that editing varies hugely, whoever carries it out. A friend of mine has recently been disappointed by the poor quality of editing by a professional firm – they didn’t find anywhere near all the bugs that were present, and seem to have introduced a few of their own in the process. But just as no software system can honestly claim to be bug-free, I dare say that no book is entirely without flaw of one kind or another.
In ancient Britain, a Lady is living in a stone-walled house on an island in the middle of a river. So far as the people know, she
has always been there. They sense her power, they hear her singing, but they never meet her.
At first her life is idyllic. She wakes, she watches, she wanders in her garden, she weaves a complex web of what she sees, and she
sleeps again. But as she grows, this pattern becomes narrow and frustrating. She longs to meet those who cherish her, but she cannot.
The scenes beyond the walls of her home are different every time she wakes, and everyone she encounters is lost,
swallowed up by the past.
But when she finds the courage to break the cycle, there is no going back. Can she bear the cost of finding freedom? And what will
her people do, when they finally come face to face with a lady of legend who is not at all what they have imagined?
A retelling – and metamorphosis – of Tennyson’s Lady of Shalott.
And to celebrate the release, I am running an Amazon reduced price offer on all my previous books, science fiction and historical fiction alike, timed to start on May 1st and run until May 8th. So you can stock up for the reduced cot of 99p / 99c for all of these. Links are:
My first piece of news today is by way of celebration that I have been getting some Alexa voice skills active on the Amazon store. These can now be enabled on any of Amazon’s Alexa-enabled devices, such as the Dot or Echo. One of these skills has to do with The Review blog, in that it will list out and read the opening lines of the last few posts there (along with a couple of other blogs I’m involved with). So if you’re interested in a new way to access blogs, and you’ve got a suitable piece of equipment, browse along to the Alexa skills page and check out “Blog Reader“. I’ll be adding other blogs as time goes by.
The second publicly available skill so far relates to my geographical love for England’s Lake District. Called “Cumbria Events“, this skill identifies upcoming events from the Visit Cumbria web site, and will read them out for the interested user. You can expect other skills to do with both writing and Cumbria to appear in time as I put them together. It’s a pity that Alexa can’t be persuaded to use a Cumbrian accent, but to date that is just not possible. Also, the skills are not yet available on the Amazon US site, so far as I know, but that should change before too long.
In the process I’ve discovered that writing skills for Alexa is a lot of fun! Like any other programming, you have to think about how people are going to use your piece of work, but unlike much of what I’ve done over the years, you can’t force the user to interact in a particular way. They can say unexpected things, phrase the same request in any of several ways, and so on. Alexa’s current limitation of about 8 seconds of comprehension favours a conversational approach in which the dialogue is kept open for additional requests. The female-gendered persona of my own science fiction writing, Slate, is totally conversational when she wants to be.
It all makes for a fascinating study of the current state of the art of AI. I feel that if we can crack unstructured, open-ended conversation from a device – with all of the subtleties and nuances that go along with speech – then it will be hard to say that a machine cannot be intelligent. Alexa is a very long way from that just now – you reach the constraints and limitations far too early. But even accepting all that, it’s exciting that an easily available consumer device has so much capability, and is so easy to add capabilities.
But while all that was going on, a couple of hundred million kilometres away NASA ordered a course correction for the Mars Maven Orbiter. This spacecraft, which has been in orbit for the last couple of years, was never designed to return splendid pictures. Instead, its focus is the Martian atmosphere, and the way this is affected by solar radiation of various kinds. As such, it has provided a great deal of insight into Marian history. So MAVEN was instructed to carry out a small engine burn to keep it well clear of the moon Phobos. Normally they are well separated, but in about a week’s time they would have been within a few seconds of one another. This was considered too risky, so the boost ensures that they won’t now be too close.
Now this attracted my attention since Phobos plays a major part in Timing – it’s right there on the cover, in fact. In the time-frame of Timing, there’s a small settlement on Phobos, which is visited by the main characters Mitnash and Slate as they unravel a financial mystery. This moon is a pretty small object, shaped like a rugby ball about 22 km long and about 17 or 18 km across its girth, so my first reaction was to think what bad luck it was that Maven should be anywhere near Phobos. But in fact MAVEN is in a very elongated orbit to give a range of science measurements, so every now and again its orbit crosses that of Phobos – hence the precautions. This manoeuvre is expected to be the last one necessary for a very long time, given the orbital movements of both objects. So we shall continue getting atmospheric observations for a long while to come.
I ran out of time this week to do much by way of blogging, so here are three bits of space news which may well make their way into a story sometime.
Stop Press: just today NASA announced that a relatively close star (39 light years away) has no less than 7 planets approximately Earth size orbiting it… see and the schematic picture at the end of the blog.
Firstly, the Dawn probe, still faithfully orbiting the asteroid Ceres, has detected complex organic molecules in two separate areas in the middle latitudes of the dwarf planet. The onboard instruments are not accurate enough to pin the molecules down precisely, but it seems likely that they are forms of targets. The analysis also suggests that they formed on Ceres itself, rather than being deposited there by a meteor. The most likely cause is thought to be the action of warm water circulating through chemicals under the surface. Some of the headlines suggest that this could signal the presence of life, but it’s more cautious to say that it shows that the conditions under which life could develop are present there.
The second snippet spells difficulty for my hypothetical Martian settlements. This picture was captured by the Mars Orbiter and shows two larger impact craters surrounded by a whole array of smaller ones. The likely scenario is that one object split into a cluster of fragments as it passed through the Martian atmosphere. This of itself wouldn’t be too surprising, but inspection of older photos of the same area shoes that this impact happened between 2008 and 2014. No time at all in cosmic terms, and not so much fun if you’d carefully built yourself a habitable dome there.
The problem is the thinness of the Martian atmosphere. It is considerably deeper than our one here on Earth, but hugely less dense. So when meteors arrive at the top of the layer of air, they don’t burn up so comprehensively as Earth-bound ones. More of them reach the surface. Even a comparatively small rock has enough kinetic energy to really spoil your day. Something that will need some planning…
Finally we zoom right out to the cold, dark reaches of the outer solar system. A long way beyond the orbit of Pluto there is a region called the Kuiper Belt, and out in the Kuiper Belt a new dwarf planet has recently been found. It goes by the catchy name of 2014 UZ224 and it took nearly two years to confirm its existence. Best estimates are that it is a little over 300 miles across – about half the size of Ceres. I’ve never sent Mitnash and Slate out anywhere like that – it’s about twice as far from Earth as Pluto, and the journey alone would take about four months one-way. I do have vague plans for a story set out in the Kuiper Belt, but appropriately enough it’s some way off yet. But even at that distance, you’re still less than half a percent of the distance to the nearest star… space is really big!
Since as far back as written records go – and probably well before that – we humans have imagined artificial life. Sometimes this has been mechanical, technological, like the Greek tales of Hephaestus’ automata, who assisted him at his metalwork. Sometimes it has been magical or spiritual, like the Hebrew golem, or the simulacra of Renaissance philosophy. But either way, we have both dreamed of and feared the presence of living things which have been made, rather than evolved or created.
Modern science fiction and fantasy has continued this habit. Fantasy has often seen these made things as intrusive and wicked. In Tolkein’s world, the manufactured orcs and trolls (made in mockery of elves and ents) hate their original counterparts, and try to spoil the natural order. Science fiction has positioned artificial life at both ends of the moral spectrum. Terminator and Alien saw robots as amoral and destructive, with their own agenda frequently hostile to humanity. Asimov’s writing presented them as a largely positive influence, governed by a moral framework that compelled them to pursue the best interests of people.
But either way, artificial life has been usually conceived as self-contained. In all of the above examples, the intelligence of the robots or manufactured beings went about with them. They might well call on outside information stores – just like a person might ask a friend or visit a library – but they were autonomous.
Yet the latest crop of virtual assistants that are emerging here and now – Alexa, Siri, Cortana and the rest – are quite the opposite. For sure, you interact with a gadget, whether a computer, phone, or dedicated device, but that is only an access point, not the real thing. Alexa does not live inside the Amazon Dot. The pattern of communication is more like when we use a phone to talk to another person – we use the device at hand, but we don’t think that our friend is inside it. At least, I hope we don’t…
So where is Alexa and her friends? When you ask for some information, buy something, book a taxi, or whatever, your request goes off across cyberspace to Amazon’s servers to interpret the request. Maybe that can be handled immediately, but more likely there will be some additional web calls necessary to track down what you want. All of that is collated and sent back down to your local device and you get to hear the answer. So the short interval between request and response has been filled with multiple web messages to find out what you wanted to know – plus a whole wrapper of security details to make sure you were entitled to find that out in the first place. The internet is a busy place…
So part of what I call Alexa is shared between every single other Alexa instance on the planet, in a sort of common pool of knowledge. This means that as language capabilities are added or upgraded, they can be rolled out to every Alexa at the same time. Right now Alexa speaks UK and US English, and German. Quite possibly when I wake up tomorrow other languages will have been added to her repertoire – Chinese, maybe, or Hindi. That would be fun.
But other parts of Alexa are specific to my particular Alexa, like the skills I have enabled, the books and music I can access, and a few features like improved phrase recognition that I have carried out. Annoyingly, there are national differences as well – an American Alexa can access the user’s Kindle library, but British Alexas can’t. And finally, the voice skills that I am currently coding are only available on my Alexa, until the time comes to release them publicly.
So Alexa is partly individual, and partly a community being. Which, when you think about it, is very like us humans. We are also partly individual and partly communal, though the individual part is a considerably higher proportion of our whole self than it is for Alexa. But the principle of blending personal and social identities into a single being is true both for humans and the current crop of virtual assistants.
So what are the drawbacks of this? The main one is simply that of connectivity. If I have no internet connection, Alexa can’t do very much at all. The speech recognition bit, the selection of skills and entitlements, the gathering of information from different places into a single answer – all of these things will only work if those remote links can be made. So if my connection is out of action, so is Alexa. Or if I’m on a train journey in one of those many places where UK mobile coverage is poor.
There’s also a longer term problem, which will need to be solved as and when we start moving away from planet Earth on a regular basis. While I’m on Earth, or on the International Space Station for that matter, I’m never more than a tiny fraction of a second away from my internet destination. Even with all the other lags in the system, that’s not a problem. But, as readers of Far from the Spaceports or Timing will know, distance away from Earth means signal lag. If I’m on Mars, Earth is anywhere from about 4 to nearly 13 minutes away. If I go out to Jupiter, that lag becomes at least half an hour. A gap in Alexa’s response time of that long is just not realistic for Slate and the other virtual personas of my fiction, whose human companions expect chit-chat on the same kind of timescale as human conversation. The code to understand language and all the rest has to be closer at hand.
So at some point down the generations between Alexa and Slate, we have to get the balance between individual and collective shifted more back towards the individual. What that means in terms of hardware and software is an open problem at the moment, but it’s one that needs to be solved sometime.
I recently invested in an Amazon Dot, and therefore in the AI software that makes the Dot interesting – Alexa, Amazon’s virtual assistant. But I’m not going to write about the cool stuff that this little gizmo can do, so much as what it led me to think about AI and conversation.
The ability to interact with a computer by voice consistently, effectively, and on a wide range of topics is seen by the major industry players as the next big milestone. Let’s briefly look back at the history of this.
Once upon a time all you could use was a highly artificial, structured set of commands passed in on punched cards, or (some time later) via a keyboard. If the command was wrong, the machine would not do what you expected. There was no latitude for variation, and among other things this meant that to use a computer needed special training.
The first breakthrough was to separate out the command language from the user’s options. User interfaces were born: you could instruct the machine what you wanted to do without needing to know how it did it. You could write documents or play games without knowing a word of computer language, simply by typing some letters or clicking with a mouse pointer. Somewhere around this time it became possible to communicate easily with machines in different locations, and the Internet came into being.
The next change appeared on phones first – the touch screen. At first sight there’s not a lot of change from using a mouse to click, or your finger to tap. But actually they are worlds apart. You are using your body directly to work with the content, rather than indirectly through a tool. Also, the same interface – the screen – is used to communicate both ways, rather than the machine sending output through the screen and receiving input via movements of a gadget on an entirely different surface. Touch screens have vastly extended the extent to which we can access technology and information: advanced computers are quite literally in anyone’s pocket. But touch interfaces have their problems. It’s not especially easy to create passages of text. It’s not always obvious how to use visual cues to achieve what you want. It doesn’t work well if you’re making a cake and need to look up the next stage with wet and floury hands!
Which brings us to the next breakthrough – speech. Human beings are wired for speech, just as we are wired for touch. The human brain can recognise and interpret speech sounds much faster than other noises. We learn the ability in the womb. We respond differently to different speakers and different languages before birth, and master the act of communicating needs and desires at a very early age. We infer, and broadcast, all kinds of social information through speech – gender, age, educational level, occupation, emotional state, prejudice and so on. Speech allows us to explain what we really wanted when we are misunderstood, and has propelled us along our historical trajectory. Long before systematic writing was invented, and through all the places and times where writing has been an unknown skill to many, talking has still enabled us to make society.
Enter Alexa, and Alexa’s companions such as Siri, Cortana, or “OK Google”. The aim of all of them is to allow people to find things out, or cause things to happen, simply by talking. They’re all at an early stage still, but their ability to comprehend is seriously impressive compared to a few short years ago. None of them are anywhere near the level I assume for Slate and the other “personas” in my science fiction books, with whom one can have an open-ended dialogue complete with emotional content, plus a long-term relationship.
What’s good about Alexa? First, the speech recognition is excellent. There are times when the interpreted version of my words is wrong, sometimes laughably so, but that often happens with another person. The system is designed to be open-ended, so additional features and bug fixes are regularly applied. It also allows capabilities (“skills”) to be developed by other people and added for others to make use of – watch this space over the next few months! So the technology has definitely reached a level where it is ready for public appraisal.
What’s not so good? Well, the conversation is highly structured. Depending on the particular skill in use, you are relying either on Amazon or on a third-party developer, to anticipate and code for a good range of requests. But even the best of these skills is necessarily quite constrained, and it doesn’t take long to reach the boundaries of what can be managed. There’s also very little sense of context or memory. Talking to a person, you often say “what we were talking about yesterday...” or “I chatted to Stuart today…” and the context is clear from shared experience. Right now, Alexa has no memory of past verbal transactions, and very little sense of the context of a particular request.
But also, Alexa has no sense of importance. A human conversation has all kinds of ways to communicate “this is really important to me” or “this is just fun”. Lots of conversations go something like “you know what we were talking about yesterday…“, at which the listener pauses and then says, “oh… that“. Alexa, however, cannot distinguish at present between the relative importance of “give me a random fact about puppies“, “tell me if there are delays on the Northern Line today“, or “where is the nearest doctor’s surgery?”
These are, I believe, problems that can be solved over time. The pool of data that Alexa and other similar virtual assistants work with grows daily, and the algorithms that churn through that pool in order to extract meaning are becoming more sensitive and subtle. I suspect it’s only a matter of time until one of these software constructs is equipped with an understanding of context and transactional history, and along with that, a sense of relative importance.
Alexa is a long way removed from Slate and her associates, but the ability to use unstructured, free-form sentences to communicate is a big step forward. I like to think that subsequent generations of virtual assistants will make other strides, and that we’ll be tackling issues of AI rights and working partnerships before too long.