IBM Sapphire – Dialog System Combining DL, DRL, NLP, Signal Processing and More

I was the lead developer of IBM Sapphire to which I was invited by IBM’s Speech-CTO and Fellow David Nahamoo. Here is a screenshot of the main system: We built a dialog system for advising and the underlying data collection pipeline. Since this was done on actual advising session great care had to be taken to encrypt everything and allow for privacy-preserving data annotation etc. The concrete screenshot shows an example that used our neural natural language interface to our database (NLIDB) to translate a natural language query into SQL which can then answer which courses provide three credits in this case. We both render out the results table, but also show the generated SQL for verification during development. However, there were many more components as I will show shortly.

Here is a screenshot of the kiosk application we installed in the advising office, because students had to be able to log into the system and give their consent to be recorded. It is a boring application, but does more than is apparent – for instance, it schedules the recording appointments, sends out reminders and logs what exactly the student consented to against cloud spreadsheets which it authenticates against via OAuth 2.0.

Here is a screenshot of the recording application which doesn’t look like much, but it had to check the consent status, perform a quick sound check, record the conversation via omnidirectional microphones (and allow the advisor to pause recording in case the student wanted to discuss something too sensitive for recording) to then encrypt it and upload it to our servers where we postprocessed it (e.g. voice activity detection to verify the recording worked, removal of personally identifiable information (PII) and afterwards annotation with our in-house developed annotation pipelines. There are interesting edge cases involved as well – for instance, conversations are too valuable to be lost and thus if the upload should not be possible, we have to encrypt the conversation in-place to be picked up. I forgot the exact implementation, but I think we generate a random symmetric key for that (for speed) and then encrypt that key asymmetrically s.t. the actual encryption key does not reside on the same machine.

But enough GUIs. Here is a rough sketch of the system architecture. Essentially, we have at least one GUI that talks to the dialog manager which via POMDPs or end-to-end training decides what to say next. For that, it leverages NLP services such as semantic parsing (CAMR), dialog act classification and sentiment analysis etc. Another core component is that the dialog manager can forward natural language queries to an ensemble of two natural language interfaces to databases which can turn the query into a database query. In turn, they might use helper services like word embeddings or an HTML generator. We also have an API gateway that connects our newer system to services we had built before like summarization, course selection and also natural language generation.

Besides this dialog core there is this bigger ecosystem that also involves various data annotation web applications, the GUIs, scheduling, cloud services etc. There are more components than depicted – for instance, we also had a named entity recognition ensemble.

I also worked with various researchers to build mini demos like this one which takes a sentence, executes semantic parsing and visualizes the result as a graph. This was useful to debug individual components and demo small things.

Here is a rough entity relationship diagram (ERD) to get an idea of what kind of data we are talking about. When I get around to writing something about IBM Verdi (which I already have gotten permission for), I will reiterate how helpful a proper business object model can be for AI components.

Finally, I conducted many fun side experiments like building a knowledge graph of courses – this particular one was GRAKN-based, but Janusgraph or Blazegraph would have worked just as well. [Or probably ArangoDB which I urgently want to take a closer look at.]

Ultimately, IBM Sapphire was a fascinating project, since it meant collaboration with six chairs – deep learning, reinforcement learning, natural language processing, systems, data annotation and signal processing. The project yielded over 20 papers at top conferences with over 700 citations which I am very proud of. It also directly led to me getting hired into IBM Research AI Cambridge.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s