DIY voice assistant part 4: Internet radio and other (bad) jokes

In the last part of this series, we enabled the voice assistant to read temperature sensors and checking the weather over the internet. In this article, we will ensure entainment by implementing a DIY internet radio and an API for bad jokes.

Radio streams under Linux

A commonly used function of commercial voice assistants is playing music and internet radio - of course this feature is a must for the DIY variant (at least to increase the WAF). Rhasspy does not offer a feature like this, but that's not a problem - as we're using Node-RED we have a lot of expansion possibilities. For example, we can control additional containers in a comfortable manner.

For consuming internet radio under Linux, there are plenty of tools - one of these tools is the versatile player mplayer. When passing a URL, the tool usually immediately plays the appropriate station:

1$ mplayer http://rbb-fritz-live.cast.addradio.de/rbb/fritz/live/mp3/128/stream.mp3

Some stations hide the actual MP3/AAC URL in a playlist (*m3u, *.pls) - sometimes mplayer is not able to handle this and aborts. In this case curl might help to find out the URL:

1$ curl http://streams.ffh.de/radioffh/mp3/hqlivestream.m3u
2http://mp3.ffh.de/radioffh/hqlivestream.mp3

Simple container

Well, okay - now we could create nodes within Node-RED to run commands on the appropriate host. But this would require creating an user and enabling login via SSH (or another protocol). It would also be necessary to create a command per radio station. There must be a nicer way to accomplish this...

I was thinking about an API that can be accessed easily by Node-RED to run simple commands (start radio, switch station, etc.) Because I'm programming in Python more often I've choosen Flask - a lightweight but feature-rich framework for web services and APIs. I planned the following features:

  • Storing radio stations in a SQLite database
  • REST API and basic UI for managing and controlling stations
  • Controling volume (via amixer)

Recently, when I was bored I started work. The result was a very basic app that can be found on GitHub. Pre-built container images based on Ubuntu for x86_64 and ARM are available on Docker Hub.

This application offers several API calls that can be used to control the functionality - e.g.:

Call Method Function
/api/stations POST Save new station
/api/stations/<id>/<name>/play GET Play station
/api/next GET Next station
/api/previous GET Previous station
/api/stop GET Stop radio
/api/volume POST Change volume

There is also a very basic web interface in case you don't want to use the API:

Basic web interface for managing and controlling radio stations

In the GitHub repository you will also find a configuration file for docker-compose:

 1version: "3"
 2
 3services:
 4  radio:
 5    container_name: radio_api
 6    image: stdevel/radio_api:latest
 7    ports:
 8      - "5000:5000"
 9    devices:
10      - "/dev/snd:/dev/snd"
11    volumes:
12      - data:/opt/radio_api/instance
13    restart: unless-stopped
14
15volumes:
16  data:

The application listens on TCP port 5000, for the radio station database a volume is created. The device file /dev/snd is forwarded to the container.

The container can be created and started easily:

1$ docker-compose up -d

Chicken-and-egg problem

The big problem is that Rhasspy and radio_api are sharing the soudn card. To be more precise: voice commands will not work reliable while the radio is active. The hot word is triggered misleadingly from time to time. Also, commands are often not recognized due to the background music. A workaround would be disabling the radio using the web interface before the next command is spoken.

I found no elegant solution for this problem so far - so if you have any ideas I'd love to hear your thoughts. For me, a functional workaround is combing MQTT with a smartphone app with IoT OnOff (iOS) or MQTT Dash (Android). Pressing a button on the smartphone is easier than accessing a website that is not optimized for smartphones. In Mosquitto I extended the pre-existing MQTT user operator in the ACL configuration with the following topics:

1# operator
2user operator
3topic read #
4topic read $SYS/#
5topic readwrite radio/status
6topic readwrite radio/station
7topic readwrite radio/volume

In a dedicated flow, Node-RED listens on these topics and controls the radio via the API once commands are received:

Payload Topic Description
stop radio/status Stopps radio
prev Previous station
next Next station
radio/volume Change volume (0 - 100%)

An appropriate dashboard on the smartphone controls volume and the radio station:

MQTT dashboard for Radio API

The appropriate flow is available on GitHub.

Bad jokes as a service

I'm a big fan of bad jokes and wordplays - so it seemed reasonable to create another API for this and link it with the voice assistant. At parties this either generates laughter or embarassed faces. 🙂

For this, I re-used the Radio API - the following requirements were important to me:

  • Serving multiple categories (normal jokes, bad jokes, movie quotes,...)
  • Storing jokes in categories in a SQLite database
  • Random mode
  • REST calls and basic UI for managing categories and jokes

During a long weekend I created a first app, that can be found on GitHub. I also prepared container images for this application - this time based on Alpine Linux for x86_64 and ARM on Docker Hub.

The appropriate API calls can be found in the documentation or a Postman collection - an extract:

Call Method Function
/api/categories POST Create new category
/api/categories GET Get category information
/api/jokes POST Add new joke
/api/jokes/random/ GET Random joke
`/api/jokes/random/<id,name> GET Random joke of a particular category`
/api/jokes/random/<id,name>/<rank> POST Random joke of a particular category with minimum ranking

For management, there is also a very basic web interface:

Creating a joke

In the GitHub repository, you will also find a configuration file for docker-compose:

 1version: "3"
 2
 3services:
 4  joke_api:
 5    container_name: joke_api
 6    image: stdevel/joke-api:latest
 7    ports:
 8      - "5001:5000/tcp"
 9    volumes:
10       - data:/opt/joke_api/instance
11    restart: unless-stopped
12
13volumes:
14  data:

The application can be accessed via TCP port 5001, for the database a dedicated volume is created. This volume can be backed up and restored during updates (so that nobody needs to renounce bad jokes).

The container is created and started like this:

1$ docker-compose up -d

Sentences

To enable Rhasspy running the new commands, we will need to define two sentences:

1[TellJoke]
2tell a joke
3
4[PlayRadio]
5radio on
6play radio
7turn on the radio

TellJoke retrieves a joke and reads it loud while PlayRadio turns on the radio. Appropriate commands for switching radio stations or stopping the radio did not work reliably for me. As mentioned before, I have a workaround for this.

To store changes, click Train.

Linking with Node-RED

The first step is opening the Rhasspy handler in the Node-RED interface and expanding the commands switch by two cases: TellJoke and PlayRadio.

Afterwards, a http request node is added to the flow and linked with the TellJoke case. By double-clicking the following settings are applied:

  • Method: GET
  • URL: http://localhost:5001/api/jokes/random
  • Return: a parsed JSON object
  • Name: Random joke
Note

The URL retrieves a random joke from a random category. If you want to select a joke from a specific category, simply add the category name - e.g. /generic

For Return, ensure that a parsed JSON object is returned. In the next step this object will be processed. Add a Template node and link its input with the http request node output. Apply the following settings:

  • Name: Intent response
  • Format: Mustache template
  • Output as: Parsed JSON

Keep an eye on the template format and the returned JSON object - the actual template looks like this:

 1{
 2  "intent": {
 3    "name": "TellJoke",
 4    "confidence": 0
 5  },
 6  "speech": {
 7  "text": "{{ payload.results.0.joke_text }}"
 8  },
 9  "wakeId": "",
10  "siteId": "default",
11  "time_sec": 0.010800838470458984
12}

The speech.text value contains the randomly chosen joke's text.

Afterwards, the template object output is assigned to the http reponse object.

For the second case another http request node is created and linked with the remaining case. Apply the following settings:

  • Method: GET
  • URL: http://localhost:5000/api/stations/1/play
  • Return: a parsed JSON object
  • Name: Play radio

The URL points to the radio API and starts the first stored radio station. Stations can be switched with the already mentioned workaround.

The node's output is connected to another template node; set the following values:

  • Name: Intent response
  • Format: Mustache template
  • Output as: Parsed JSON

This time, the template won't contain a text to speech:

1{
2  "intent": {
3    "name": "PlayRadio",
4    "confidence": 0
5  },
6  "wakeId": "",
7  "siteId": "default",
8  "time_sec": 0.010800838470458984
9}

Finally, the template output is connected to the http response node; changes are stored by clicking Deploy.

Expanded Rhasspy handler flow

The features are now available - time for a test; e.g. using curl or utilizing the voice command.

1$ curl -H "Content-Type: application/json" -X POST -d '{"intent": {"name": "TellJoke"}}' http://<node-red-ip-adresse>:1880/intent
2{"intent":{"name":"TellJoke","confidence":0},"speech":{"text":"Wie heißt einen Spanier ohne Auto? Carlos."},"wakeId":"","siteId":"default","time_sec":0.010800838470458984}

Conclusion

In this article, we enhanced entertainment by adding an online radio and an API for bad jokes. But of course there is still potential for optimisation.

For some people it might by ugly having four ports opened for applications. A better solution would be having a reverse proxy forwarding requests to the appropriate applications by using URL forwardings - e.g.:

URL Port
/node-red localhost:1880
/radio localhost:5000
/jokes localhost:5001
/rhasspy localhost:12101

For this, using a software such as NGINX or Traefik would be an option.

Posts in this series

Translations: