I Bouvet er vi flere hundre teknologer som brenner for å programmere og utforme gode, digitale løsninger. I denne bloggen utforsker vi teknologien og deler det vi finner med dere.

Voice Controlling a Robot using Arduino, node.js, MQTT, WebSockets, Johnny-Five and HTML5 Speech Recognition!

Read on to find out how I created an Internet of Things prototype!


I’ve added a Raspberry PI to my Robot!  Read this blog for more information: Unleash the Potential of your Internet of Things Project by Combining Arduino and Raspberry PI!


If you’ve read my recent blogs, you’ll know that I’ve been playing around both with speech recognition and using JavaScript to control Arduino Robots using the Johnny-Five library.  In this blog I will show you how I’ve also added MQTT and WebSockets to the mix in order to voice control an Arduino based Robot.

Before I delve into the technical details, lets have a look at the Robot in action!

This prototype was hacked together over the course of a few evenings, despite me having no previous experience with any of the technologies involved.  It’s very simple, with less than 200 lines of code (excluding comments).  So how does it work?

Solution Overview

This High Level overview shows how the basic components interact.  Click the picture to view the full size version.

High Level Component Overview

The code for this solution is available on GitHub, under a Creative Commons Attribution 4.0 International License.

Lets take a closer look at each of those components.


This is a HTML / JavaScript web page that converts speech to text using HTML5 Speech Recognition, which I’ve previously blogged about.  One can configure the HTML5 Speech Recognition to behave in different ways.  After some experimentation I found that the following combination yielded the quickest results:

A side effect of enabling interim results was that I needed to ensure that duplicate results were ignored.  See the following screen shot of the web page for an example of duplicate results:

Interim results are in green text, final results in white.  We only need to send "go" once.

Latest result at top, oldest at bottom. Interim results in green text, final results in white.

In the above screen shot you can see that a single spoken command «go» resulted in 3 results being returned from the HTML5 Speech Recognition.  In this case we only needed to send the «go» command to the Robot once, hence the need for ignoring duplicate results.

When controlling a moving object with voice commands it is also important that commands are processed and sent further to the Robot in real time.  To facilitate this I used WebSockets to return the parsed text commands to my WebServer.  This can be achieved only a few lines of code using socket.io:


This JavaScript file represents a node.js process that performs three tasks:

1. It runs a simple WebServer (using the node.js express.js web application framework) that serves up the arduino_speech.html webpage over the HTTPS Protocol.  The main reason for using HTTPS is to avoid constant « Wants to use your microphone» messages, which are an annoyance more than anything else.

2. It listens for WebSocket events emitted by the arduino_speech.html webpage, using the node.js socket.io library.  See below for how this is done in the code.

3. It converts the incoming WebSocket events to messages and publishes them to the MQTT Broker using the node.js mqtt.js library.

The MQTT Broker

The MQTT Broker is the special sauce that connects the speech recognition part of my solution to the Robot!

MQTT (Message Queue Telemetry Transport) is an extremely lightweight pub / sub messaging transport which is often used in the «Machine-to-machine» / «Internet of Things» space.  My colleague Simen Sommerfeldt has written a very informative article about MQTT here.

For my project I decided to try a public MQTT Broker rather than downloading, installing and configuring one locally.  A quick google search revealed that this publicly available broker was both free and didn’t require any kind of signup.  A quick test showed that this broker worked well enough to support my prototype.

As mentioned earlier in this blog I used the node.js mqtt.js library, which makes it extremely easy to implement JavaScript based MQTT clients in just a few lines of code.


This JavaScript file represents a node.js process that does the following:

1. It subscribes to the MQTT Broker (via the node.js mqtt.js library) for the text commands originally generated by the arduino_speech.html web page.

2. It forwards these commands onto the Zumo Robot (via the Johnny-Five node.js library).  Johnny-Five controls the Robot via the Firmata Protocol, over a Bluetooth connection.  Read my previous blog about Johnny-Five to find out more about how this works.

Pololu Zumo Robot for Arduino 

The Pololu Zumo is is a pre-assembled Robot, with an Arduino Uno as it’s brain.  I choose a prebuilt Robot as I wanted to focus upon the coding side of things, rather than building my own Robot from scratch.   I had a Zumo lying around the office (a leftover from a hackathon that I had previously organised) so the choice was made.

For this project I’ve installed the Firmata Protocol on the Arduino Uno and added a JY-MCU Bluetooth Module to facilitate wireless communication with the zumo_controller.js component.  Note that this project should work with any Arduino based Robot, although you may have to tweak the zumo_controller.js file a little, based on your Robots configuration.

Future Plans

The code works, but it could probably be better structured.  I’d love to have a code review with an experienced noder or JavaScript guru.  Contact me if you are interested in helping me out with this!

I’m also going to add some more sensors to the robot, most likely an infrared sensor for obstacle detection in addition to some kind of sensor (i.e. temperature) that sends readings back to the web page.

Other than that, I think it’s time to build my own Robot from the bottom up 🙂

If you have any other ideas for improvements to this project, let me know!


I started this project with no experience of the various technologies and components.

The decision to use Johnny-Five was driven by both a desire to learn JavaScript, and my choice of hardware.  Johnny-Five led me to node.js, which has formed the backbone of my solution.  Node.js is great for prototyping – there are so many libraries available that you can pretty much do anything you want!  In fact using node.js has in turn made it easy for me to get started with both MQTT and WebSockets.  And of course in the process I have also learnt a good deal about both JavaScript and the npm tool!

One potential drawback of node.js is the sheer size of it’s ecosystem.  I would be careful of using some of the more exotic libraries in a production system as there is no way of guaranteeing that these libraries will continue to be actively maintained.

The Arduino ecosystem provides a cheap and easy way to get started with Robots and Internet of Things solutions.  Not only are there a wide range of micro controllers, but there are also literally thousands of sensors and actuators to choose from.  There is also a mature and well established community to ask questions if you need to.

The HTML Speech Recognition part of the solution is a lot of fun to play with and easy to set up, but at the time of writing it is only supported by recent releases of Google Chrome.

To summarise, I would argue that this type of hobby project is well worth your time.  Not only is it a lot of fun, but you can pick up tools and knowledge for your next work project!

I hope that this blog has been interesting – feel free to post comments and feedback below!  It’s always nice to know that someone is actually reading this stuff!

Thanks for reading!

10 kommentarer om “Voice Controlling a Robot using Arduino, node.js, MQTT, WebSockets, Johnny-Five and HTML5 Speech Recognition!

  1. I like your post, reading the todo list on github made me giggle –

    8.»Pass the Turing Test.»
    9. «Open the Pod Bay doors Hall»



  2. I was looking through your GitHub page for this project. It seems to be missing the socket.io code. Do you either have the full code for me or a tutorial or example you can point me into?


      1. How do I include/install websockets for mqtt.js? The above code does not work and I know it has to do with websockets not being used.

        1. I’m not really sure what it is you are trying to do. Do you want to run my code, or are you looking for a solution to another problem?

          Version 1 of the Robot (see the link in my previous comment) used *socket.io* for communication between the Web Browser (arduino_speech.html) and the Web Server (i.e. web_server_HTTP.js). The Web Server then forwarded commands to a MQTT instance for further processing.

          Version 2 of the Robot (https://github.com/markwest1972/voice-controlled-zumo) has dropped the socket.io library. Instead commands to MQTT are sent directly from the browser. This is achieved by using a browserifed version of mqtt.js (see https://www.npmjs.com/package/mqtt#browserify for more information about this).

          Note that this specific blog post refers to Version 1, not Version 2.

          1. Thanks Mark. I am trying to get your code running on my pi to understand how it works more, so then I can work on a home automation solution for myself.

            You have made it more clear on both of your versions for your code. I can now work on it and make it work on my pi.

  3. This is awesome! Voice control for everything 🙂
    If you’re into using voice control with Node you might want to check out my speech recognition library. It’s got offline hotword detection and streaming recognition via Google Cloud Speech (similar to whats used for webkitSpeechRecognition).


Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

Kotlin – an Introduction

At Google I/O 2017, Google announced that Kotlin would become a first-class language in Android. This means that Kotlin will be supported..

Eress Forum 2017

Eress and Erex Eress is an organisation created to provide a simple, efficient, reliable, accurate and flexible standard energy settlement..

Devoxx UK 2017

Introduksjon Devoxx UK er en todagers konferanse som holdes i London. Det er en mellomstor konferanse med over 1200 deltakere..

Bouvet Battle Royale – Robot Wars

Introduction Robot Wars, a robot-sumo competition, was held at Bouvet early April for students attending technology courses at the University in..

Magic Mirror – version 1

Introduction A while back I discovered the exciting world of “magic mirrors”. I don’t remember how or where it caught my attention, but..

DevOpsDays Oslo 2016

5.-6. september hadde eg gleden av å delta på den første norske DevOpsDays i Oslo. Her er en oppsummering av høydepunktene..

Bouvet at JavaZone 2016

This year JavaZone celebrated it’s 15th year with with 3000 attendees and over 170 sessions. As one of Norway’s premier Java..

IT years are like dogs years

One of the characteristics of the IT industry is that time works differently for us. This is challenging and fun,..

The Future of SharePoint

Den 4. mai holdt Microsoft en virtuell event om fremtiden til SharePoint, jeg fikk heldigvis anledning til å delta de..

SharePoint 2016 er på vei!

Som lovet var Microsoft ferdig med utviklingen av SharePoint Server 2016 (RTM – release to manufacturing) rett før påske, og..


I Bouvet er vi flere hundre teknologer som brenner for å programmere og utforme gode, digitale løsninger. I denne bloggen utforsker vi teknologien og deler det vi finner med dere.