Home Voice Controlled Swarms
Post
Cancel

Voice Controlled Swarms

Inspired by Command School from Enders Game, let’s make swarms that we can control with our voice! We will do this by having a voice-to-text program feed our voice commands into an LLM, and then that LLM can run a bunch of commands to control the swarm.

Below is the final result.

And we will make this in two parts:

  1. First we will create a general voice-controller that can plug into any application that supports the Model Context Protocol (MCP). As an example, we will make a canvas application where the LLM can draw shapes on a canvas and move the shapes around, allowing us to “draw” with our voice.
  2. Then we will make a swarm simulation with a large set of commands which allow the LLM to direct the swarms according to our commands.

Voice-Controller

The overall architecture for how users interact with the simulation will be as follows: the player holds down a button to speak a command, which gets transcribed using a voice-to-text model. The transcription is sent to an LLM agent connected to an MCP server. That server controls the simulation, meaning the agent can directly interact with the simulation using any available tools implemented on the server.

interaction architecture

One neat aspect of this architecture is that the voice-controlled agent is independent of the application running on the MCP server. This means we can reuse the voice-controlled agent for any MCP-compatible application. In fact, we’ll develop the voice-controlled agent only once and then use it to interact with multiple MCP servers.

As an example, we create a fairly simple MCP server below.

Canvas MCP Server

We create a canvas that can show squares and circles. We create a few simple functions on the canvas that the LLM can call, then the LLM will be able to turn the voice commands we say into actual function calls on the canvas.

Here are the commands we provide, which in the language of LLM agents are called the ‘tools’.

  • create_circle
  • create_square
  • move_shapes
  • remove_shapes
  • get_canvas

It’s essential to write good documentation for each tool, as this is what the LLM will read when choosing what actions to take. I found that including example inputs, expected ranges, and typical values can help to improve its performance. All the code for this project can be found on GitHub.

And once we have the server running and a voice-controller connected to it, we get the below.

The agent can make dumb mistakes, but it can also sometimes be surprisingly clever. One funny interaction I had: Everything seemed to be working correctly, but then I noticed I had forgotten to implement the remove_shapes function. This was weird since the agent seemed to be removing shapes when I asked. It turns out that when the agent couldn’t find a tool to remove the shapes, it would decide to move the shapes off the side of the screen so I couldn’t see them anymore, which made it look like the non-existent remove_shapes function was working just fine.

And now that we can see how we can control a canvas with our voice, we can do the same with a swarm simulation. But that requires writing a sufficiently capable swarm simulation, as well as the tools to allow the LLM to convert our commands into actual actions.

Swarm Simulation

The individual agents of a swarm (or fleet, we will use the terms interchangeably) will be controlled using a modified version of the boids algorithm along with a seeking rule that points each boid toward its target. We only use the separation rule and alignment rule from the boids algorithm, but modified as below.

  • Separation: The separation strength between boids of the same swarm will be stronger then between boids of different swarms. This lets different swarms easily pass through each other.
  • Alignment: the boids will try to align in the opposite direction of their neighbors, making our swarms are more chaotic. This might help boids more easily move through each other, but mostly I just think this looks cooler.

A single swarm moving to a point

We see each boid as a triangle, the center of mass of all the boids as a circle, and the target position of the swarm as an X.

One of the first tools we will implement is the assign_swarm_to_position function, which lets the agent set the desired position of the swarm. Some variants we can add are assign_swarm_to_follow and assign_swarm_to_waypoints, where assign_swarm_to_follow attaches the swarms target position to some other object that may be moving, and assign_swarm_to_waypoints has the swarm cycle (or just iterate once) through a list of positions.

Below we see the left swarm following a car that is traveling in a circle, and on the right we see a swarm cycling through two waypoints.

A swarm following a car and a swarm following waypoints

We also want the swarm to be capable of splitting into multiple swarms that can pursue different tasks. We will have the primitives fork_swarm_to_position, fork_swarm_to_follow, and fork_swarm_to_waypoints, all similar to the assignment primitives, but forking off of a pre-existing swarm by taking some of its boids.

We will also let the agent move drones between swarms with reassign_drones, and merge a swarm into another with merge_swarm (but to be honest, merge_swarm may not be necessary since it is just a specific instance of reassign_drones where it re-assigns all the drones). Below we see a swarm get forked to the left and then right, and then re-assign some boids from the left swarm to the right.

A swarm being forked and then re-assigned

Let’s take a moment to improve the assignment algorithm, replacing the current random assignment with a position-aware assignment.

The Assignment Problem

I want the drones that get selected for a re-assignment (or fork) to be selected based on their position relative to the target location and current location. For example, if I wanted a swarm to split in half and move to the left and right, we want the drones on the left of the swarm to be the drones assigned to move to the left, and the drones on the right to be the ones assigned to move right.

This type of problem is well-known and called the Assignment Problem, and in the more general case, the Minimum-cost Flow Problem. As an example problem, let’s say we want to assign 2 drones to location A and 1 drone to location B. Framed as the Assignment Problem: We create 3 “tasks”, 2 for location A and 1 for location B, and now we have the Assignment Problem of matching 1-to-1 the drones to tasks, while minimizing the sum total cost (distance from each drone to its assigned location). Framed as the Minimum-cost Flow Problem: We look at each drone is a source of 1 flow, location A as a sink of 2 flow, location B as a sink of 1 flow, and edges connecting the sources and sinks with weights equal to the cost between each source and sink, then we have the Minimum-cost Flow Problem of assigning flows from sources to sinks while minimizing cost.

Typically you would use the distance between a drone and target as the cost, but I decided to use the square of the distance since I think the assignments are more visually appealing (maybe there’s some rigorous reason out there that justifies my aesthetic intuition, but for now it’s just for visual appeal). Below we see how 1000 points can be assigned equally among 5 targets, comparing the assignments produced from the normal distance cost to the squared distance cost.

Min-cost flow assignment

And since our API only lets a swarm split into two swarms, our assignment problem is much simpler then the general problem and can be easily implemented. We can calculate a drones “re-assignment cost” of being re-assigned from location A to location B by taking the difference of costs to each location. Then we can pick all the drones with the lowest re-assignment cost as the drones to be re-assign.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function reassign_drones(source_swarm, target_swarm, num_to_move):
    # 1. collect all drones currently in the source swarm
    candidates = source_swarm.drones

    # 2. for each candidate, compute its “re-assignment cost”:
    #    how much cheaper (or more expensive) it is to go to target vs. stay
    for each drone in candidates:
        cost_to_target = squared_distance(drone.position, target_swarm.position)
        cost_to_source = squared_distance(drone.position, source_swarm.position)
        delta[d] = cost_to_target − cost_to_source

    # 3. sort the candidates by that delta (lowest first)
    sort candidates by delta ascending

    # 4. pick the first num_to_move drones
    to_reassign = first num_to_move elements of candidates

    # 5. re-assign each one
    for each drone in to_reassign:
        drone.swarm = target_swarm

And here we see the same forking and re-assigning from before, but now with the position-aware assignment.

A swarm being forked and then re-assigned using a sorted assignment

Additional Features

We’ll add in a few more things to make this prototype more interesting.

  • No-fly zones: Rectangles that boids get repulsed by when they get too close. Obviously this could be more sophisticated (more general shapes, using path-finding instead of barrier functions), but this is the easiest first version.
  • Circling: We allow the swarms to encircle a target instead of just hovering on top of it.
  • Landmarks: Adds more objects in the scene that the user can work with.
  • Phonetic IDs: Using the NATO phonetic alphabet to ID objects in the scene will make it easier to verbally issue commands.
  • Coordinate grid: Overlaying a grid with coordinates on top of the screen lets the user more easily reason about coordinates.

Here we see all the features, with one swarm traveling through waypoints, another encircling a position, and another following a car around a no-fly zone.

All the features together

All together

So the tools that the agent will have access to are:

Function NameArgumentsReturns
get_environment All objects in the scene
assign_swarm_to_positionswarm_id, x, yIf the swarm was assigned
assign_swarm_to_waypointsswarm_id, waypoints, cycleIf the swarm was assigned
assign_swarm_to_followswarm_id, target_idIf the swarm was assigned
fork_swarm_to_positionsource_swarm_id, num_drones, x, yThe new swarms ID
fork_swarm_to_waypointssource_swarm_id, num_drones, waypoints, cycleThe new swarms ID
fork_swarm_to_followsource_swarm_id, num_drones, target_idThe new swarms ID
reassign_dronessource_swarm_id, target_swarm_id, num_dronesThe number of drones actually re-assigned
merge_swarmsource_swarm_id, target_swarm_idThe target swarm ID
set_swarm_encircleswarm_id, is_encircling, radiusThe state of the swarms is_encircling

And now we can try to control the swarms with our voice.

And the code for everything here can be found at my GitHub.

This post is licensed under CC BY 4.0 by the author.