DIY voice assistant part 4: Internet radio and other (bad) jokes
In the last part of this series, we enabled the voice assistant to read temperature sensors and checking the weather over the internet. In this article, we will ensure entainment by implementing a DIY internet radio and an API for bad jokes.
Radio streams under Linux
A commonly used function of commercial voice assistants is playing music and internet radio - of course this feature is a must for the DIY variant (at least to increase the WAF). Rhasspy does not offer a feature like this, but that's not a problem - as we're using Node-RED we have a lot of expansion possibilities. For example, we can control additional containers in a comfortable manner.
For consuming internet radio under Linux, there are plenty of tools - one of these tools is the versatile player mplayer. When passing a URL, the tool usually immediately plays the appropriate station:
1$ mplayer http://rbb-fritz-live.cast.addradio.de/rbb/fritz/live/mp3/128/stream.mp3
Some stations hide the actual MP3/AAC URL in a playlist (*m3u, *.pls) - sometimes mplayer is not able to handle this and aborts. In this case curl
might help to find out the URL:
1$ curl http://streams.ffh.de/radioffh/mp3/hqlivestream.m3u
2http://mp3.ffh.de/radioffh/hqlivestream.mp3
Simple container
Well, okay - now we could create nodes within Node-RED to run commands on the appropriate host. But this would require creating an user and enabling login via SSH (or another protocol). It would also be necessary to create a command per radio station. There must be a nicer way to accomplish this...
I was thinking about an API that can be accessed easily by Node-RED to run simple commands (start radio, switch station, etc.) Because I'm programming in Python more often I've choosen Flask - a lightweight but feature-rich framework for web services and APIs. I planned the following features:
- Storing radio stations in a SQLite database
- REST API and basic UI for managing and controlling stations
- Controling volume (via
amixer
)
Recently, when I was bored I started work. The result was a very basic app that can be found on GitHub. Pre-built container images based on Ubuntu for x86_64 and ARM are available on Docker Hub.
This application offers several API calls that can be used to control the functionality - e.g.:
Call | Method | Function |
---|---|---|
/api/stations |
POST | Save new station |
/api/stations/<id>/<name>/play |
GET | Play station |
/api/next |
GET | Next station |
/api/previous |
GET | Previous station |
/api/stop |
GET | Stop radio |
/api/volume |
POST | Change volume |
There is also a very basic web interface in case you don't want to use the API:
In the GitHub repository you will also find a configuration file for docker-compose:
1version: "3"
2
3services:
4 radio:
5 container_name: radio_api
6 image: stdevel/radio_api:latest
7 ports:
8 - "5000:5000"
9 devices:
10 - "/dev/snd:/dev/snd"
11 volumes:
12 - data:/opt/radio_api/instance
13 restart: unless-stopped
14
15volumes:
16 data:
The application listens on TCP port 5000, for the radio station database a volume is created. The device file /dev/snd
is forwarded to the container.
The container can be created and started easily:
1$ docker-compose up -d
Chicken-and-egg problem
The big problem is that Rhasspy and radio_api are sharing the soudn card. To be more precise: voice commands will not work reliable while the radio is active. The hot word is triggered misleadingly from time to time. Also, commands are often not recognized due to the background music. A workaround would be disabling the radio using the web interface before the next command is spoken.
I found no elegant solution for this problem so far - so if you have any ideas I'd love to hear your thoughts. For me, a functional workaround is combing MQTT with a smartphone app with IoT OnOff (iOS) or MQTT Dash (Android). Pressing a button on the smartphone is easier than accessing a website that is not optimized for smartphones. In Mosquitto I extended the pre-existing MQTT user operator in the ACL configuration with the following topics:
1# operator
2user operator
3topic read #
4topic read $SYS/#
5topic readwrite radio/status
6topic readwrite radio/station
7topic readwrite radio/volume
In a dedicated flow, Node-RED listens on these topics and controls the radio via the API once commands are received:
Payload | Topic | Description |
---|---|---|
stop | radio/status | Stopps radio |
prev | Previous station | |
next | Next station | |
radio/volume | Change volume (0 - 100%) |
An appropriate dashboard on the smartphone controls volume and the radio station:
The appropriate flow is available on GitHub.
Bad jokes as a service
I'm a big fan of bad jokes and wordplays - so it seemed reasonable to create another API for this and link it with the voice assistant. At parties this either generates laughter or embarassed faces. 🙂
For this, I re-used the Radio API - the following requirements were important to me:
- Serving multiple categories (normal jokes, bad jokes, movie quotes,...)
- Storing jokes in categories in a SQLite database
- Random mode
- REST calls and basic UI for managing categories and jokes
During a long weekend I created a first app, that can be found on GitHub. I also prepared container images for this application - this time based on Alpine Linux for x86_64 and ARM on Docker Hub.
The appropriate API calls can be found in the documentation or a Postman collection - an extract:
Call | Method | Function |
---|---|---|
/api/categories |
POST | Create new category |
/api/categories |
GET | Get category information |
/api/jokes |
POST | Add new joke |
/api/jokes/random/ |
GET | Random joke |
`/api/jokes/random/<id,name> | GET | Random joke of a particular category` |
/api/jokes/random/<id,name>/<rank> |
POST | Random joke of a particular category with minimum ranking |
For management, there is also a very basic web interface:
In the GitHub repository, you will also find a configuration file for docker-compose:
1version: "3"
2
3services:
4 joke_api:
5 container_name: joke_api
6 image: stdevel/joke-api:latest
7 ports:
8 - "5001:5000/tcp"
9 volumes:
10 - data:/opt/joke_api/instance
11 restart: unless-stopped
12
13volumes:
14 data:
The application can be accessed via TCP port 5001, for the database a dedicated volume is created. This volume can be backed up and restored during updates (so that nobody needs to renounce bad jokes).
The container is created and started like this:
1$ docker-compose up -d
Sentences
To enable Rhasspy running the new commands, we will need to define two sentences:
1[TellJoke]
2tell a joke
3
4[PlayRadio]
5radio on
6play radio
7turn on the radio
TellJoke
retrieves a joke and reads it loud while PlayRadio
turns on the radio. Appropriate commands for switching radio stations or stopping the radio did not work reliably for me. As mentioned before, I have a workaround for this.
To store changes, click Train.
Linking with Node-RED
The first step is opening the Rhasspy handler in the Node-RED interface and expanding the commands switch by two cases: TellJoke
and PlayRadio
.
Afterwards, a http request node is added to the flow and linked with the TellJoke case. By double-clicking the following settings are applied:
- Method: GET
- URL: http://localhost:5001/api/jokes/random
- Return: a parsed JSON object
- Name: Random joke
The URL retrieves a random joke from a random category. If you want to select a joke from a specific category, simply add the category name - e.g. /generic
For Return, ensure that a parsed JSON object is returned. In the next step this object will be processed. Add a Template node and link its input with the http request node output. Apply the following settings:
- Name: Intent response
- Format: Mustache template
- Output as: Parsed JSON
Keep an eye on the template format and the returned JSON object - the actual template looks like this:
1{
2 "intent": {
3 "name": "TellJoke",
4 "confidence": 0
5 },
6 "speech": {
7 "text": "{{ payload.results.0.joke_text }}"
8 },
9 "wakeId": "",
10 "siteId": "default",
11 "time_sec": 0.010800838470458984
12}
The speech.text
value contains the randomly chosen joke's text.
Afterwards, the template object output is assigned to the http reponse object.
For the second case another http request node is created and linked with the remaining case. Apply the following settings:
- Method: GET
- URL: http://localhost:5000/api/stations/1/play
- Return: a parsed JSON object
- Name: Play radio
The URL points to the radio API and starts the first stored radio station. Stations can be switched with the already mentioned workaround.
The node's output is connected to another template node; set the following values:
- Name: Intent response
- Format: Mustache template
- Output as: Parsed JSON
This time, the template won't contain a text to speech:
1{
2 "intent": {
3 "name": "PlayRadio",
4 "confidence": 0
5 },
6 "wakeId": "",
7 "siteId": "default",
8 "time_sec": 0.010800838470458984
9}
Finally, the template output is connected to the http response node; changes are stored by clicking Deploy.
The features are now available - time for a test; e.g. using curl
or utilizing the voice command.
1$ curl -H "Content-Type: application/json" -X POST -d '{"intent": {"name": "TellJoke"}}' http://<node-red-ip-adresse>:1880/intent
2{"intent":{"name":"TellJoke","confidence":0},"speech":{"text":"Wie heißt einen Spanier ohne Auto? Carlos."},"wakeId":"","siteId":"default","time_sec":0.010800838470458984}
Conclusion
In this article, we enhanced entertainment by adding an online radio and an API for bad jokes. But of course there is still potential for optimisation.
For some people it might by ugly having four ports opened for applications. A better solution would be having a reverse proxy forwarding requests to the appropriate applications by using URL forwardings - e.g.:
URL | Port |
---|---|
/node-red |
localhost:1880 |
/radio |
localhost:5000 |
/jokes |
localhost:5001 |
/rhasspy |
localhost:12101 |
For this, using a software such as NGINX or Traefik would be an option.