Category Archives: Software

The Liminal Zone

The Liminal Zone cover

Well, it’s almost time for The Liminal Zone to see the light of day. The publication date of the Kindle version is this Sunday, May 17th, and it can already be preordered on the Amazon site at https://www.amazon.co.uk/dp/B087JP2GJP. The paperback version will not be too far behind it, depending on the final stages of proofing and such like. I am, naturally, very pleased and excited about this, as it is quite a while since I first planned out the beginnings of the characters, setting and plot. Since that beginning, some parts of my original ideas have changed, but the core has remained pretty much true to that original conception all the way through.

But I thought for today I’d talk a little bit about my particular spin on the future development of the solar system. My time-horizon at the moment is around 50-100 years ahead, not the larger spans which many authors are happy to explore. So readers can expect to recognise the broad outlines of society and technology – it will not have changed so far away from our own as to be incomprehensible. I tend towards the optimistic side of future-looking – I read dystopian novels, but have never yet been tempted to write one myself. I also tend to focus on an individual perspective, rather than dealing with political or large-scale social issues. The future is seen through the lenses of a number of individuals – they usually have interesting or important jobs, but they are never leaders of worlds or armies. They are, typically, experts in their chosen field, and as such encounter all kinds of interesting and unusual situations that warlords and archons might never encounter. The main character of Far from the Spaceports and Timing (and a final novel to come in that trilogy) is Mitnash Thakur, who with his AI partner Slate tackles financial crime. In The Liminal Zone, the central character is Nina Buraca, who works for an organisation broadly like present-day SETI, and so investigates possible signs of extrasolar life.

Amazon Dot - Active
Amazon Dot – Active

Far from the Spaceports, and the subsequent novels in the series, are built around a couple of assumptions. One is that artificial intelligence will have advanced to the point where thinking machines – my name for them is personas – can be credible partners and friends to people. They understand and display meaningful and real emotions as well as being able to solve problems. Now, I have worked with AI as a coder in one capacity or another for the last twenty-five years or so, and am very aware that right now we are nowhere near that position. The present-day household systems – Alexa, Siri, Cortana, Bixby, Google Home and so on – are very powerful in their own way, and great fun to work with as a coder… but by no stretch of the imagination are they anything like friends or coworkers. But in fifty, sixty, seventy years? I reckon that’s where we’ll be.

Xenon ion discharge from the NSTAR ion thruster of Deep Space 1 (NASA)

The second major pillar concerns solar system exploration. Within that same timespan, I suggest that there will be habitable outposts scattered widely throughout the system. I tend to call these domes, or habitats, with a great lack of originality. Some are on planets – in particular Mars – while others are on convenient moons or asteroids. Many started as mining enterprises, but have since diversified into more general places to live. For travel between these places to be feasible, I assume that today’s ion drive, used so far in a handful of spacecraft, will become the standard means of propulsion. As NASA says in a rather dry report, “Ion propulsion is even considered to be mission enabling for some cases where sufficient chemical propellant cannot be carried on the spacecraft to accomplish the desired mission.” Indeed. A fairly readable introduction to ion propulsion can be found at this NASA link.

I am sure that well before that century or so look-ahead time, there will have been all kinds of other advances – in medical or biological sciences, for example – but the above two are the cornerstones of my science fiction books to date.

That’s it for today, so I can get back to sorting out the paperback version of The Liminal Zone. To repeat, publication date is Sunday May 17th for the Kindle version, and preorders can be made at https://www.amazon.co.uk/dp/B087JP2GJP. As a kind of fun bonus, I am putting all my other science fiction and historical fiction books on offer at £0.99 / $0.99 for a week starting on 17th.

The Liminal Zone cover
The Liminal Zone cover

Common Sense and AI

Cover – The Liminal Zone

Before starting this blog post properly, I should mention that my latest novel in the Far from the Spaceports series – called The Liminal Zone – is now on pre-prder at Amazon in kindle format. The link is https://www.amazon.co.uk/gp/product/B087JP2GJP. Release date is May 17th. For those who prefer paperback, that version is in the later stages of preparation and will be ready shortly. For those who haven’t been following my occasional posts, it’s set about twenty or so years on from the original book, out on Pluto’s moon Charon, and has a lot more to do with first extraterrestrial contact than financial crime!

Amazon Dot - Active
Amazon Dot – Active

Back to this week’s post, and as a break from the potential for life on exoplanets, I thought I’d write about AI and its (current) lack of common sense. AI individuals – called personas – play a big role in my science fiction novels, and I have worked on and off with software AI for quite a few years now. So I am well aware that the kind of awareness and sensitivity which my fictional personas display, is vastly different from current capabilities. But then, I am writing about events set somewhere in the next 50-100 years, and I am confident that by that time, AI will have advanced to the point that personas are credible. I am not nearly so sure that within the next century we’ll have habitable bases in the asteroid belt, let alone on Charon, but that’s another story.

What are some of the limitations we face today? Well, all of the best-known AI devices, for all that they are streets ahead of what had a decade ago, are extremely limited in their capacity to have a real conversation. Some of this is context, and some is common sense (and some other factors that I’m not going to talk about today).

Context is the ability that a human conversation partner has to fill in gaps in what you are saying. For example, if I say “When did England last win the Ashes?“, you may or may not know the answer, but you’d probably realise that I was talking about a cricket match, and (maybe with some help from a well-known search engine) be able to tell me. If I then say “And where were they playing?“, you have no difficulty in realising that “they” still means England, and the whole question relates to that Ashes match. You are holding that context in your mind, even if we’ve chatted about other stuff in the meantime, like “what sort of tea would you like?” or “will it rain tomorrow?“. I could go on to other things, like “Who scored most runs?” or “Was anybody run out?” and you’d still follow what I was talking about.

I just tried this out with Alexa. “When did England last win the Ashes?” does get an answer, but not to the right question – instead I learned when the next Ashes was to be played. A bit of probing got me the answer to who won the last such match (in fact a draw, which was correctly explained)… but only if I asked the question in fairly quick succession after the first one. If I let some time go by before asking “Where were they playing?“, what I get is “Hmmm, I don’t know that one“. Alexa loses the context very quickly. Now, as an Alexa developer I know exactly why this is – the first question opens up the start of a session, during which some context is carefully preserved by the development team deciding what information is going to be repeatedly passed to and fro as Alexa and I exchange comments. During that session, further questions within the defined context can be handled. Once the session closes, the contextual information is discarded. (If I was a privacy campaigner, I’d be very pleased that it was discarded, but as a keen AI enthusiast I’m rather disappointed). With the Alexa skills that I have written (and you can find them on the Alexa store on Amazon by searching for DataScenes Development), I try to keep the fiction of conversation going by retaining a decent amount of context, but it is all very focused on one thing. If you’re using my Martian Weather skill and then assume you can start asking about Cumbrian Weather, on the basis that they are both about weather, then Alexa won’t give you a sensible answer. It doesn’t take long at all to get Alexa in a spin – for some humour about this, check out this YouTube link – https://www.youtube.com/watch?v=JepKVUym9Fg

So context is one thing, but common sense is another. Common sense is the ability to tap into a broad understanding of how things work, in order to fill in what would otherwise be gaps. It allows you to make reasonable decisions in the face of uncertainty or ambiguity. For example, if I say “a man went into a bar. He ordered fish and chips. When he left, he gave the staff a large tip“, and then say “what did he eat?“, common sense will tell you that he most likely ate fish and chips. Strictly speaking, you don’t know that – he might have ordered it for someone else. It might have arrived at his table on the outdoor terrace but was stolen by a passing jackdaw. In the most strict logical sense, I haven’t given you enough information to say for sure, and you can concoct all kinds of scenarios where weird things happened and he did not, in fact, eat fish and chips… but the simplest guess, and the most likely one that you’d guess, is that is what he did.

In passing, Robert Heinlein, in his very long novel Stranger in a Strange Land, assumed the existence of people whose memory, and whose capacity for not making assumptions, meant that they could serve in courts of law as “fair witnesses”, describing only and exactly what they had seen. So if asked what colour a house was, they would answer something like “the house was white on the side facing me” – with no assumption about the other sides. All very well for legal matters, but I suspect the conversation would get boring quite quickly if they carried that over into personal life. They would run out of friends before long…

Now, what is an AI system to do? How do we code common sense into artificial intelligence, which by definition has not had any kind of birth and maturation process parallel to a human one (there probably has been a period of training in a specific subject). By and large, we learn common sense (or in some people’s case, don’t learn it) by watching how those around us do things – family, friends, school, peers, pop stars or sports people. And so on. We pick up, without ever really trying to, what kinds of things are most likely to have happened, and how people are likely to have reacted, But a formalised way of imparting common sense has eluded AI researchers for over fifty years now. There have been attempts to reduce common sense to a long catalogue of “if this then that” statements, but there are so many special cases and contradictions that these attempts have got bogged down. There have been attempts to assign probabilities of particular individual outcomes, so that a machine system trying to find its way through a complex decision, would try to identify what was the most likely thing to do in some kind of combination problem. To date, none have really worked, and encoding common sense into AI remains a challenging problem. We have AI software which can win Go and other games, but cannot then go on to hold an interesting conversation about other topics.

All of which is of great interest to me as author – if I am going to make AI personas appear capable of operating as working partners and as friends to people, they have to be a lot more convincing than Alexa or any of her present-day cousins. Awareness of context and common sense goes a long way towards achieving this, and hopefully, the personas of Far from the Spaceports, and the following novels through to The Liminal Zone, are convincing in this way.

Software generations and obsolescence

Alexa Far from the SpaceportsWebIcon
Alexa Far from the SpaceportsWebIcon

This post came about for a number of reasons, arising both from the real and fictional worlds. Fictionally speaking, my current work-in-progress deals with several software generations of personas (the AI equivalent of people). Readers of Far from the Spaceports and Timing will no doubt remember Slate, the main persona who featured there. Slate was – or is, or maybe even will be – a Stele-class persona, which in my future universe is the first software generation of personas. Before the first Stele, there were pre-persona software installations, which were not reckoned to have reached the level of personhood.

The Liminal Zone (temporary cover)
The Liminal Zone (temporary cover)

There’s a third book in that series about Mitnash and Slate, tentatively called The Authentication Key, which introduces the second generation of personas – the Sapling class. But that is in very fragmentary stage just now, so I’ll skip over that. By the time of The Liminal Zone, which is well under way, the third generation – the Scribe class – is just starting to appear. And as you will discover in a few months, there is considerable friction between the three classes – for example, Scribes tend to consider the earlier versions as inferior. They also have different characteristics – Saplings are reckoned to be more emotional and flighty, in contrast with serious Scribes and systematic Steles. How much of this is just sibling rivalry, and how much reflects genuine differences between them is for you to decide.

So what made me decide to write this complicated structure into my novels? Well, in today’s software world, this is a familiar scenario. Whether you’re a person who absolutely loves Windows 10, macOS Catalina, or Android Pie, or on the other hand you long for the good old days of Vista, Snow Leopard or Kitkat, there is no doubt that new versions split public opinion. And how many times have you gone through a rather painful upgrade of some software you use every day, only to howl in frustration afterwards, “but why did they get rid of xyz feature? It used to just work…” So I’m quite convinced that software development will keep doing the same thing – a new version will come along, and the community of users will be divided in their response.

Artist’s impression, Europa Clipper at work (from space.com)

But as well as those things, I came across an interesting news article the other day, all about the software being developed to go on the forthcoming space mission to Jupiter’s moon Europa. That promises to be a fascinating mission in all kinds of ways, not least because Europa is considered a very promising location to look for life elsewhere in our solar system. But the section that caught my eye was when one of the JPL computer scientists casually mentioned that the computer system intended to go was roughly equivalent to an early 1990s desktop. By the time the probe sets out, in the mid 2020s, the system will be over 30 years out of date. Of course, it will still do its job extremely well – writing software for those systems is a highly specialised job, in order to make the best use of the hardware attached, and to survive the rigours of the journey to Jupiter and the extended period of research there.

But nevertheless, the system is old and very constrained by modern standards – pretty much all of the AI systems you might want to send on that mission in order to analyse what is being seen simply won’t run in the available memory and processing power. The computing job described in that article considers the challenge of writing some AI image analysis software, intended to help the craft focus in on interesting features – can it be done in such a way as to match the hardware capabilities, and still deliver some useful insights?

As well as scientific research, you could consider banking systems – the traditional banks are built around mainframe computers and associated data stores which were first written years ago and which are extremely costly. Whatever new interfaces they offer to customers – like a new mobile app – still has to talk to the legacy systems. Hence a new generation of challenger banks has arisen, leapfrogging all the old bricks-and-mortar and mainframe legacy systems and focusing on a lean experience for mobile and web users. It’s too early to predict the outcome, and the trad banks are using their huge resources to play catch-up as quickly as they can.

Often, science fiction assumes that future individuals will, naturally, have access to the very latest iteration of software. But there are all kinds of reasons why this might not happen. In my view, legacy and contemporary systems can, and almost certainly will, continue to live side by side for a very long time!

Lego ideas (from ideas.lego.com)

When software goes wrong…

Let’s be clear right at the start – this is not a blame-the-computer post so much as a blame-the-programmer one! It is all too easy, these days, to blame the device for one’s ills, when in actual fact most of the time the problem should be directed towards those who coded the system. One day – maybe one day quite soon – it might be reasonable to blame the computer, but we’re not nearly at that stage yet.

Related image

So this post began life with frustration caused by one of the several apps we use at work. The organisation in question, which shall remain nameless, recently updated their app, no doubt for reasons which seemed good to them. The net result is that the app is now much slower and more clunky than it was. A simple query, such as you need to do when a guest arrives, is now a ponderous and unreliable operation, often needing to be repeated a couple of times before it works properly.

Now, having not so long ago been professionally involved with software testing, this started me thinking. What had gone wrong? How could a bunch of (most likely) very capable programmers have produced an app which – from a user’s perspective – was so obviously a step backwards?

Of course I don’t know the real answer to that, but my guess is that the guys and girls working on this upgrade never once did what I have to do most days – stand in front of someone who has just arrived, after (possibly) a long and difficult journey, using a mobile network connection which is slow or lacking in strength. In those circumstances, you really want the software to just work, straight away. I suspect the team just ran a bunch of tests inside their superfast corporate network, ticked a bunch of boxes, and shipped the result.

Image result for free image self driving car
Self-driving car (Roblox)

Now, that’s just one example of this problem. We all rely very heavily on software these days – in computers, phones, cars, or wherever – and we’ve become very sophisticated in what we want and don’t want. Speed is important to us – I read recently that every additional second that a web page takes to load loses a considerable fraction of the potential audience. Allegedly, 40% of people give up on a page if it takes longer than 3 seconds to load, and Amazon reckon that slow down in page loading of just one second costs the sales equivalent of $1.6 billion per year. Sainsbury’s ought to have read that article… their shopping web app is lamentably slow. But as well as speed, we want the functionality to just work. We get frustrated if the app we’re using freezes, crashes, loses changes we’ve made, and so on.

What has this to do with writing? Well, my science fiction is set in the near future, and it’s a fair bet that many of the problems that afflict software today will still afflict it in a few decades. And the situation is blurred by my assumption that AI systems wil have advanced to the point where genuinely intelligent individuals (“personas”) exist and interact with humans. In this case, “blame-the-computer” might come back into fashion. Right now, with the imminent advent of self-driving cars on our roads, we have a whole raft of social, ethical, and legal problems emerging about responsibility for problems caused. The software used is intelligent in the limited sense of doing lots of pattern recognition, and combining multiple different sources of data to arrive at a decision, but is not in any sense self-aware. The coding team is responsible, and can in principle unravel any decision taken, and trace it back to triggers based on inputs into their code.

Far from the Spaceports cover
Far from the Spaceports cover

As and when personas come along, things will change. Whoever writes the template code for a persona will provide simply a starting point, and just as humans vary according to both nature and nurture, so will personas. As my various stories unfold, I introduce several “generations” of personas – major upgrades of the platform with distinctive traits and characteristics. But within each generation, individual personas can differ pretty much in the same way that individual people do. What will this mean for our present ability to blame the computer? I suppose it becomes pretty much the same as what happens with other people – when someone does something wrong, we try to disentangle nature from nurture, and decide where responsibility really lies.

Meanwhile, for a bit of fun, here’s a YouTube speculation, “If HAL-9000 was Alexa”…

About a podcast

Absolute Business MIndset podcast logo

A short blog today as I get back into blog writing after a very busy Easter. And it’s something a little bit different for me – a friend and former work colleague interviewed me for his podcast series over the weekend, and it has now gone live.

Now, I’ve never really got into podcasts, and Marks’ normal focus for his series is to do with business (as you can tell from his series title, Absolute Business Mindset), but we both managed to make something of the interaction.

Different people use different podcast software, but this site
https://gopod.me/1340548096 gives you a list of different options through which you can access the interview. Alternatively, search for Mark’s series by its title, Absolute Business Mindset.

In it, you can hear me talking with Mark about all kinds of stuff, largely focused around maths, artificial intelligence, Alexa and so on, ultimately touching on science fiction. The whole thing takes about an hour, and Alexa takes more of a central role in the second half. Enjoy!

Artificial Intelligence – Thoughts and News

My science fiction books – Far from the Spaceports and Timing, plus two more titles in preparation – are heavily built around exploring relationships between people and artificial intelligences, which I call personas. So as well as a bit of news about one of our present-day AIs – Alexa – I thought I’d talk today about how I see the trajectory leading from where we are today, to personas such as Slate.

Martian Weather Alexa skill web icon
Martian Weather Alexa skill web icon

Before that, though, some news about a couple of new Alexa skills I have published recently. The first is Martian Weather, providing a summary of recent weather from Elysium Planitia, Mars, courtesy of a public NASA data feed from the Mars Insight Lander. So you can listen to reports of about a week of temperature, wind, and air pressure reports. At the moment the temperature varies through a Martian day between about -95 and -15° Celsius, so it’s not very hospitable. Martian Weather is free to enable on your Alexa device from numerous Alexa skills stores, including UK, US, CA, AU, and IN. The second is Peak District Weather, a companion to my earlier Cumbria Weather skill but – rather obviously – focusing on mountain weather conditions in England’s Peak District rather than Lake District. Find out about weather conditions that matter to walkers, climbers and cyclists. This one is (so far) only available on the UK store, but other international markets will be added in a few days.

Who remembers Clippy?

Current AI research tends to go in one of several directions. We have single-purpose devices which aim to do one thing really well, but have no pretensions outside that. They are basically algorithms rather than intelligences per se – they might be good or bad at their allotted task, but they aren’t going to do well at anything else. We have loads of these around these days – predictive text and autocorrect plugins, autopilots, weather forecasts, and so on. From a coding point of view, it is now comparatively easy to include some intelligence in your application, using modular components, and all you have to do is select some suitable training data to set the system up (actually, that little phrase “suitable training data” conceals a multitude of difficulties, but let’s not go into that today).

Boston Dynamics ‘Atlas’ (Boston Dynamics web site)

Then you get a whole bunch of robots intended to master particular physical tasks, such as car assembly or investigation of burning buildings. Some of these are pretty cute looking, some are seriously impressive in their capabilities, and some have been fashioned to look reasonably humanoid. These – especially the latter group – probably best fit people’s idea of what advanced AI ought to look like. They are also the ones closest to mankind’s long historical enthusiasm for mechanical assistants, dating back at least to Hephaestus, who had a number of automata helping him in his workshop. A contemporary equivalent is Boston Dynamics (originally a spin-off from MIT, later taken over by Google) which has designed and built a number of very impressive robots in this category, and has attracted interest from the US military, while also pursing civilian programmes.

Amazon Dot - Active
Amazon Dot – Active

Then there’s another area entirely, which aims to provide two things: a generalised intelligence rather than one targeted on a specific task, and one which does not come attached to any particular physical trappings. This is the arena of the current crop of digital assistants such as Alexa, Siri, Cortana and so on. It’s also the area that I am both interested in and involved in coding for, and provides a direct ancestry for my fictional personas. Slate and the others are, basically, the offspring – several generations removed – of these digital assistants, but with far more autonomy and general cleverness. Right now, digital assistants are tied to cloud-based sources of information to carry out speech recognition. They give the semblance of being self-contained, but actually are not. So as things stand you couldn’t take an Alexa device out to the asteroid belt and hope to have a decent conversation – there would be a minimum of about half an hour between each line of chat, while communication signals made their way back to Earth, were processed, and then returned to Ceres. So quite apart from things like Alexa needing a much better understanding of human emotions and the subtleties of language, we need a whole lot of technical innovations to do with memory and processing.

As ever, though, I am optimistic about these things. I’ve assumed that we will have personas or their equivalent within about 70 or 80 years from now – far enough away that I probably won’t get to chat with them, but my children might, and my grandchildren will. I don’t subscribe to the theory that says that advanced AIs will be inimical to humankind (in the way popularised by Skynet in the Terminator films, and picked up much more recently in the current Star Trek Discovery series). But that’s a whole big subject, and one to be tackled another day.

Meanwhile, you can enjoy my latest couple of Alexa skills and find out about the weather on Mars or England’s Peak District, while I finish some more skills that are in progress, and also continue to write about their future.

Mars Insight Lander, Artist’s impression (NASA/JPL)

“The eye prefers repetition, the ear prefers variety”

I was at the annual Amazon technical summit here in London last week, and today’s blog post is based on something I heard one of the presenters say. On the whole it was a day of consolidating things already developed, rather than a day of grand new breakthroughs, and I enjoyed myself hearing about enhancements to voice and natural language services, together with an offbeat session on building virtual 3d worlds.

Grid design based on thirds (Interaction Design Foundation)
Grid design based on thirds (Interaction Design Foundation)

But I want to focus on one specific idea, contrasting how we build human-computer interfaces quite differently for the eye and the ear. In short, “the eye prefers repetition, the ear prefers variety“. Look at the appearance of your typical app on computer or phone. We have largely standardised where the key elements go – menu, options, title and so on. They are so standardised that we can tell at a glance if something is “in the wrong place“. The text stays the same every time you open it. The icons stay the same, unless they have a little overlay telling you to do something with them. And so on.

Now in the middle of a technical session I just let that statement drift by, but it stuck with me afterwards, and I kept turning it over. Hence this post. At face value it seemed a bit odd – our eyes are constantly bombarded with hugely diverse information from the world around us. But then I started thinking some more. It’s not just to do with the light falling into our eyes, or the biology of how our visual receptors handle that – our image of the world is the end result of a very complex series of processing steps inside our nervous system.

House By Beach - quick sketch
House By Beach – quick sketch

A child’s picture of a face, or a person, is instantly recognisable as such, even though reduced to a few schematic shapes. A sketch artist will make a few straight lines and a curve, and we know we are looking at a house beside a beach, even though there are no colours or textures to help us. The animal kingdom shows us the same thing. Show a toad a horizontal line moving sideways, and it reacts as though it was a worm. Turn the line vertical and move it in the same way, and the toad ignores it (see this Wikipedia article or this video for details). Arrange a dark circle over a mouse and increase its size, and it reacts with fear and aggression, as though something was looming over it (see this article, in the section headed Visual threat cues).

https://78.media.tumblr.com/d9d3e010a958fbd345007c823d3d6580/tumblr_oojho5wyH31udk21ko1_500.jpg
Toad: Mystery Science Theatre 3000

It’s not difficult to see why – if you think you might be somebody’s prey, you react to the first sign of the predator. If you’re wrong, all you’ve lost is some time and adrenalin. If you ignore the first signs and you’re wrong, it’s game over!

So it makes sense that our visual sense, including nervous system as well as eyes, reduces the world to a few key features. We skim over fine detail at first glance, and only really notice it when we need to – when we deliberately turn our attention to it.

Also,there’s something to be learned from how light and sound work differently for us. At a very fundamental level, light adds up to give a single composite result. We mix red and yellow paint to give orange, or red and green light on a computer screen to give yellow. The colour tints, or the light waves, add up to make a single average colour. Not so with sound. Play the note middle C on a keyboard, then start playing the G above it. You end up with a chordyou don’t end up with a single note which is a blend of the two. So adding visual signals, and adding audible ones, give completely different effects.

Finally, the range of what we can perceive is entirely different. The most extreme violet light that we can see has about twice the frequency of the most extreme red. Doubling frequency gives us an octave change, so that means we can see one octave of visible light out of the entire spectrum. But a keen listener under ideal circumstances can hear a range of seven or eight octaves of sound, from about 12 Hz to nearly 30kHz. Some creatures do a bit better than us in both light and sound detection, but the basic message is the same – we hear a much more varied spectrum than we see.

Amazon Dot - Active
Amazon Dot – Active

Now, the technical message behind that speaker’s statement related to Alexa skills. To retain a user’s interest, the skill has to not sound the same every time. The eye prefers repetition, so our phone apps look the same each time we start them. But the ear prefers variety, so our voice skills have to mirror that, and say something a little bit different each time.

I wonder how that applies to writing?

How close are personable AI assistants?

A couple of days ago, a friend sent me an article talking about the present state of the art of chatbots – artificially intelligent assistants, if you like. The article focused on those few bots which are particularly convincing in terms of relationship.

Amazon Dot - Active
Amazon Dot – Active

Now, as regular readers will know, I quite often talk about the Alexa skills I develop. In fact I have also experimented with chatbots, using both Microsoft’s and Amazon’s frameworks. Both the coding style, and the flow of information and logic, are very similar between these two types of coding, so there’s a natural crossover. Alexa, of course, is predominantly a voice platform, whereas chatbots are more diverse. You can speak to, and listen to, bots, but they are more often encountered as part of a web page or mobile app.

Now, beyond the day job and my coding hobby, I also write fiction about artificially intelligent entities – the personas of Far from the Spaceports and related stories (Timing and the in-progress The Liminal Zone). Although I present these as occurring in the “near-future”, by which I mean vaguely some time in the next century or two, they are substantially more capable than what we have now. There’s a lot of marketing hype about AI, but also a lot of genuine excitement and undoubted advancement.

Far from the Spaceports cover
Far from the Spaceports cover

So, what are the main areas where tomorrow’s personas vastly exceed today’s chatbots?

First and foremost, a wide-ranging awareness of the context of a conversation and a relationship. Alexa skills and chatbots retain a modest amount of information during use, called session attributes, or context, depending on the platform you are using. So if the skill or bot doesn’t track through a series of questions, and remember your previous answers, that’s disappointing. The developer’s decision is not whether it is possible to remember, but rather how much to remember, and how to make appropriate use of it later on.

Equally, some things can be remembered from one session to the next. Previous interactions and choices can be carried over into the next time. Again, the questions are not how, but what should be preserved like this.

But… the volume of data you can carry over is limited – it’s fine for everyday purposes, but not when you get to wanting an intelligent and sympathetic individual to converse with. If this other entity is going to persuade, it needs to retain knowledge of a lot more than just some past decisions.

A suitable cartoon (from xkcd.com)
A suitable cartoon (from xkcd.com)

Secondly, a real conversational partner does other things with their time outside of the chat specifically between the two of you. They might tell you about places, people, or things they had seen, or ideas that had occurred to them in the meantime. But currently, almost all skills and chatbots stay entirely dormant until you invoke them. In between times they do essentially nothing. I’m not counting cases where the same skill is activated by different people – “your” instance, meaning the one that holds any record of your personal interactions, simply waits for you to get involved again. The lack of any sense of independent life is a real drawback. Sure, Alexa can give you a “fact of the day” when you say hello, but we all know that this is just fished out of an internet list somewhere, and does not represent actual independent existence and experience.

Finally (for today – there are lots of other things that might be said) today’s skills and bots have a narrow focus. They can typically assist with just one task, or a cluster of closely related tasks. Indeed, at the current state of the art this is almost essential. The algorithms that seek to understand speech can only cope with a limited and quite structured set of options. If you write some code that tries to offer too wide a spectrum of choice, the chances are that the number of misunderstandings gets unacceptably high. To give the impression of talking with a real individual, the success rate needs to be pretty high, and the entity needs to have some way of clarifying and homing in on what it was that you really wanted.

Now, I’m quite optimistic about all this. The capabilities of AI systems have grown dramatically over the last few years, especially in the areas of voice comprehension and production. My own feeling is that some of the above problems are simply software ones, which will get solved with a bit more experience and effort. But others will probably need a creative rethink. I don’t imagine that I will be talking to a persona at Slate’s level in my lifetime, but I do think that I will be having much more interesting conversations with one before too long!

Alexa and William Wordsworth

Amazon Dot - Active
Amazon Dot – Active

Well, a couple of weeks have passed and it’s time to get back to blogging. And for this week, here is the Alexa post that I mentioned a little while ago, back in December last year.

First, to anticipate a later part of this post, is the extract of Alexa reciting the first few lines of Wordsworth’s Daffodils…

It has been a busy time for Alexa generally – Amazon have extended sales of various of the hardware gizmos to many other countries. That’s well and good for everyone: the bonus for us developers is that they have also extended the range of countries into which custom skills can be deployed. Sometimes with these expansions Amazon helpfully does a direct port to the new locale, and other times it’s up to the developer to do this by hand. So when skills appeared in India, everything I had done to that date was copied across automatically, without me having to do my own duplication of code. From Monday Jan 8th the process of generating default versions for Australia and New Zealand will begin. And Canada is also now in view. Of course, that still leaves plenty of future catch-up work, firstly making sure that their transfer process worked OK, and secondly filling in the gaps for combinations of locale and skill which didn’t get done. The full list of languages and countries to which skills can be deployed is now

  • English (UK)
  • English (US)
  • English (Canada)
  • English (Australia / New Zealand)
  • English (India)
  • German
  • Japanese

The world, Robinson projection (Wiki)
The world, Robinson projection (Wiki)

Based on progress so far, Amazon will simply continue extending this to other combinations over time. I suspect that French Canadian will be quite high on their list, and probably other European languages – for example Spanish would give a very good international reach into Latin America. Hindi would be a good choice, and Chinese too, presupposing that Amazon start to market Alexa devices there. Currently an existing Echo or Dot will work in China if hooked up to a network, but so far as I know the gadgets are not on sale there – instead several Chinese firms have begun producing their own equivalents. Of course, there’s nothing to stop someone in another country accessing the skill in one or other of the above languages – for example a Dutch person might consider using either the English (UK) or German option.

To date I have not attempted porting any skills in German or Japanese, essentially through lack of necessary language skills. But all of the various English variants are comparatively easy to adapt to, with an interesting twist that I’ll get to later.

Wordsworth Facts Web Icon
Wordsworth Facts Web Icon

So my latest skill out of the stable, so to speak, is Wordsworth Facts. It has two parts – a small list of facts about the life of William Wordsworth, his family, and some of his colleagues, and also some narrated portions from his poems. Both sections will increase over time as I add to them. It was interesting, and a measure of how text-to-speech technology is improving all the time, to see how few tweaks were necessary to get Alexa to read these extract tolerably well. Reading poetry is harder than reading prose, and I was expecting difficulties. The choice of Wordsworth helped here, as his poetry is very like prose (indeed, he was criticised for this at the time). As things turned out, in this case some additional punctuation was needed to get these sounding reasonably good, but that was all. Unlike some of the previous reading portions I have done, there was no need to tinker with phonetic alphabets to get words sounding right. It certainly helps not to have ancient Egyptian, Canaanite, or futuristic names in the mix!

And this brings me to one of the twists in the internationalisation of skills. The same letter can sound rather different in different versions of English when used in a word – you say tomehto and I say tomarto, and all that. And I necessarily have to dive into custom pronunciations of proper names of characters and such like – Damariel gets a bit messed up, and even Mitnash, which I had assumed would be easily interpreted, gets mangled. So part of the checking process will be to make sure that where I have used a custom phonetic version of someone’s name, it comes out right.

Wordsworth Facts is live across all of the English variants listed above – just search in your local Amazon store in the Alexa Skills section by name (or to see all my skills to date, search for “DataScenes Development“, which is the identity I use for coding purposes. If you’re looking at the UK Alexa Skills store, this is the link.

The next skill I am planning to go live with, probably in the next couple of weeks, is Polly Reads. Those who read this blog regularly – or indeed the Before The Second Sleep blog (see this link, or this, or this) – may well think of Polly as Alexa’s big sister. Polly can use multiple different voices and languages rather than a fixed one, though Polly is focused on generating spoken speech rather than interpreting what a user might be saying (the module in Amazon’s suite that does the comprehension bit is called Lex). So Polly Reads is a compendium of all the various book readings I have set up using Polly, onto which I’ll add a few of my own author readings where I haven’t yet set Polly up with the necessary text and voice combinations. The skill is kind of like a playlist, or maybe a podcast, and naturally my plan is to extend the set of readings over time. More news of that will be posted before the end of the month, all being well.

Kayak logo (from https://www.kayakonline.info/)
Kayak logo (from https://www.kayakonline.info/)

The process exposed a couple of areas where I would really like Amazon to enhance the audio capabilities of Alexa. The first was when using the built-in ability to access music (ie not my own custom skill). Compared to a lot of Alexa interaction, this feels very clunky – there is no easy way to narrow in on a particular band, for example – “The band is Dutch and they play prog rock but I can’t remember the name” could credibly come up with Kayak, but doesn’t. There’s no search facility built in to the music service. And you have to get the track name pretty much dead on – “Alexa, Play The Last Farewell by Billy Boyd” gets you nowhere except for a “I can’t find that” message, since it is called “The Last Goodbye“. A bit more contextual searching would be good. Basically, this boils down to a shortfall in what technically we call context, and what in a person would be short-term memory – the coder of a skill has to decide exactly what snippets of information to remember from the interaction so far – anything which is not explicitly remembered, will be discarded.

That was a user-moan. The second is more of a developer-moan. Playing audio tracks of more than a few seconds – like a book extract, or a decent length piece of music – involves transferring control from your own skill to Alexa, who then manages the sequencing of tracks and all that. That’s all very well, and I understand the purpose behind it, but it also means that you have lost some control over the presentation of the skill as the various tracks play. For example, on the new Echo Show (the one with the screen) you cannot interleave the tracks with relevant pictures – like a book cover, for example. Basically the two bits of capability don’t work very well together. Of course all these things are very new, but it would be great to see some better integration between the different pieces of the jigsaw. Hopefully this will be improved with time…

That’s it for now – back to reading and writing..

Future Possibilities 3

Today is the third and last post based loosely on upcoming techie stuff I learned about at the recent Microsoft Future Decoded conference here in London. It’s another speculative one this time, focusing on quantum computing, which according to estimates by speakers might be about five years away. But a lot has to happen if that five year figure is at all accurate.

Quantum device - schematic (Microsoft.com)
Quantum device – schematic (Microsoft.com)

It’s a very technical area, both as regards the underlying maths and the physical implementation, and I don’t intend going far into that. Many groups around the world, both in industry and academia, are actively working on this, hoping to crack both theory and practice. So what’s the deal? Why all the effort?

Conventional computers, of the kind we are familiar with, operate essentially in a linear sequential way. Now, there are ways to fudge this and give a semblance of parallel working. Even on a domestic machine you can run lots of programs at the same time, but at the level of a single computing core you are still performing one thing at a time, and some clever scheduling shares resources between several in-progress tasks. A bigger computer will wire up multiple processors and have vastly more elaborate scheduling, to make the most efficient use of what it’s got. But at the end of the day, present-day logic circuits do one thing at a time.

This puts some tasks out of reach. For example, the security layer that protects your online banking transactions (and such like) relies on a complex mathematical problem, which takes an extremely long time to solve. In theory it could be done, but in practice it is impenetrable. Perhaps more interestingly, there are problems in all the sciences which are intractable not only with present-day systems, but also including any credible speed advances using present-day architecture. It actually doesn’t take much complexity to render the task impossible.

Probability models for a water molecule with different energy levels - the atoms are not at fixed places but smeared out over a wider volume (Stoneybrook University)
Probability models for a water molecule with different energy levels – the atoms are not at fixed places but smeared out over a wider volume (Stoneybrook University)

Quantum computing offers a way to actually achieve parallel processing on a massive scale. It relies not on binary true/false logic, but on the probability models which are the foundation of the quantum world. It is as though many different variations of a problem all run simultaneously, each (as it were) in their own little world. It’s a perfect solution for all kinds of problems where you would like to find an optimal solution to a complex situation. So to break our online security systems, a quantum computer would simultaneously pursue many different cracking routes to break in. By doing that, the task becomes solvable. And yes, that is going to need a rethink of how we do internet security. But for today let’s look at a couple of more interesting problems.

Root nodules on a broad bean (Wikipedia)
Root nodules on a broad bean (Wikipedia)

First, there’s one from farming, or biochemistry if you prefer. To feed the world, we need lots of nitrogen to make fertiliser. The chemical process to do this commercially is energy-intensive, and nearly 2% of the world’s power goes on this one thing. But… there is a family of plants, the leguminosae, which fix nitrogen from the air into the soil using nothing more than sunlight and the organic molecules in their roots. They are very varied, from peas and beans down to fodder crops like clover, and up to quite sizeable trees. We don’t yet know exactly how this nitrogen fixing works. We think we know the key biochemical involved, but it’s complicated… too complicated for our best supercomputers to analyse. A quantum computer might solve the problem in short order.

Climate science is another case. There are several computer programs which aim to model what is going on globally. They are fearfully complicated, aiming to include as wide a range as possible of contributing factors, together with their mutual interaction. Once again, the problem is too complicated to solve in a realistic time. So, naturally, each group working on this makes what they regard as appropriate simplifications and approximations. A quantum computer would certainly allow for more factors to be integrated, and would also allow more exploration of the consequences of one action rather than another. We could experiment with what-if models, and find effective ways to deploy limited resources.

Bonding measurement wires to a quantum device (Microsoft.com)
Bonding measurement wires to a quantum device (Microsoft.com)

So that’s a little of what might be achieved with a quantum computer. To finish this blog post off, what impact might one have in science fiction, and my own writing in particular. Well, unlike the previous two weeks, my answer here would be “not very much, I think“. Most writers, including myself, simply assume that future computers will be more powerful, more capable, than those of today. The exact technical architecture is of less literary importance! Right now it looks as if a quantum computer will only work at extremely low temperatures, not far above absolute zero. So you are talking about sizeable, static installations. If we manage to find or make the necessary materials that they could run at room temperature, that could change, but that’s way more than five years away.

Far from the Spaceports cover
Far from the Spaceports cover

So in my stories, Slate would not be a quantum computer, just a regular one running some very sophisticated software. Now, the main information hub down in London, Khufu, could possibly be such a thing – certainly he’s a better candidate, sitting statically in one place, processing and analysing vast quantities of data, making connections between facts that aren’t at all obvious on the surface. But as regards the story, it hardly matters whether he is one or the other.

So, interested as I am in the development of a quantum computer, I don’t think it will feature in an important way in the world of Far from the Spaceports!

That’s it for today, and indeed for this little series… until next year.