Why I Don’t Teach ROS to Robotics Students
I am often asked how I go about teaching ROS to robotics students, and I simply reply “I don’t”.
Although ROS is an important skill to learn at some point during the career of a robotics student, in the first semester or two of introducing robotics, an instructor must balance the relatively steep learning curve of ROS against theory, algorithms. In the case of mathematically- or mechanically-minded students, software engineering itself can be a foreign concept.
ROS is a strong tool for connecting components together, but the robotics problem is not simply about connecting software packages. In my view, the job of an educator is to achieve the thinnest of possible barriers between the concept and the assignment. Moreover, because software and hardware compatibility changes so rapidly, a novice can be easily trapped in “version hell” where course materials and tutorials become out of date as versions of ROS change (Kinetic, Melodic, ROS2? Python 2 or 3? What about building on my Mac?) and their lives are consumed with compilation issues. But besides this, the bending-over-backwards of distributed systems programming imposes a certain “oddness” that takes engineering students, hobbyists, and even lower-division undergraduate students in computing, too long to grasp for my taste.
Let me start by presenting an example of something very simple that a novice might want to do, which could take 10 seconds to implement in a normal programming language, but might take a student days to implement in ROS.
Example: Simple things can be hard in ROS
Let’s suppose that node A has a feature F, but it shouldn’t be turned on unless another node requests it. This is a very common use case in camera stream configurations, UI options, motion planners with configurable constraints, and configurable logging. Suppose that we’ve built nodes B and C that may want to use F. Let’s suppose B wants F to be on for some amount of time, and C doesn’t care.
The logic should state that F should be on if at least one other node asks it to be on, and off otherwise. In other words,
F_enabled = B_F_enabled or C_F_enabled.
We shall see that this one-liner in any normal programming language leaves an abundance of options (and hence, an abundance of confusion) to the ROS implementer.
ROS Implementation 1
This might be the first thing that a student tries:
- A reads
std_msgs/Bool
from/F_enabled
- B publishes value True to
/F_enabled
when it wants F to be on, and False otherwise. - C publishes value False to
/F_enabled
This doesn’t work because if C’s message is sent after B’s, it will erase the setting done by B. Let’s try again:
ROS Implementation 2
- A reads
std_msgs/Bool
from/F_enabled
. If True is read, then F turns on for some amount of time. After a timeout, F turns off. - B repeatedly publishes value True to
/F_enabled
when it wants F on, and False otherwise. - C doesn’t do anything.
This works, as long as A implements the timeout correctly, and B knows the timeout length from A so its rate can be set to at least this value. But observe that this makes the behavior of A history- and timing- dependent. Moreover, what if it’s not a good idea to keep F on, and instead I’d like F to turn off promptly at a specified time? We’ve broken the very clear logical definition of F into a bit of a mess.
ROS Implementation 3
- A reads
std_msgs/Bool
from/B_F_enabled
and/C_F_enabled
. It then implements the logic directly every time it receives a message from either channel. This means creating two callbacks like this:
B_F_enabled = False
C_F_enabled = Falsedef recv_B_F_enabled(msg):
global B_F_enabled,C_F_enabled
B_F_enabled = msg.data
if B_F_enabled or C_F_enabled:
enable_F()
else:
disable_F()def recv_C_F_enabled(msg):
global B_F_enabled,C_F_enabled
C_F_enabled = msg.data
if B_F_enabled or C_F_enabled:
enable_F()
else:
disable_F()
- B publishes value True to
/B_F_enabled
if it wants it on, and False otherwise. - C can either publish False to
/C_F_enabled
, or do nothing.
This is the first implementation that actually does what we wanted! The problem is, however, that it doesn’t scale to more nodes. The complexity of adding a third node D requires modifying A to subscribe to a new topic with a third callback function, and modifying all of the logic above.
ROS Implementation 4
A more sophisticated implementation that accepts variable nodes could reuse a function that refers to the caller like the following:
X_F_enabled = dict()def recv_X_F_enabled(msg,caller):
global X_F_enabled
X_F_enabled[caller] = msg.data
if any(X_F_enabled.values())
enable_F()
else:
disable_F()rospy.Subscriber('/B_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'B'))
rospy.Subscriber('/C_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'C'))
rospy.Subscriber('/D_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'D'))
But this doesn’t scale to variable numbers of writers. To do so, we’d could set up the list of writers in a rosparam parameter. Let’s say we set the param F_writers = '["B","C","D"]'
, then the subscribing call can be replaced as follows:
possible_F_writers = rospy.get_param("/F_writers")
for x in possible_F_writers:
rospy.Subscriber('/{}_F_enabled'.format(x),Bool,lambda msg: recv_X_F_enabled(msg,x))
Now, each of the writers needs to know which topic they should publish to. Or, a topic mapping can be set up to map their output from “F_enabled” to “[X]_F_enabled”. In any case, yikes! This is still a bit of a pain.
ROS Implementation 5
- A implements a ROS Service with a callback with signature
enable_F(value,caller)
. The service can store a dictionary like above, and implement theany(X_F_enabled.values())
logic like above. - B calls the service with
enable_F(True,'B')
- C can either call the service or not.
OK, now we’re getting to something much more elegant. However, we now have to write the .srv
IDL for the service, write the build for A so that it gets installed, then build B and C such that they can access the service… Kind of a pain, isn’t it?
A Simpler Alternative
Let’s see how simple this can be with an alternative distributed systems model. Suppose we have a persistent, centralized JSON store like Redis.JSON. Then, we can implement something like the following:
- A sets an empty dictionary to some path:
redis.jsonset("F_enabled",".",{})
. It then periodically performs ajsonget
on the dictionary, and implements the logic. - B writes True or False to a subkey of this path:
redis.jsonset("F_enabled",".B",True)
- C can write False via
redis.jsonset("F_enabled",".C",True)
or just do nothing at all. - Other writers can do similar things, just making sure they write to a unique subkey.
That’s it! No fiddling with multiple callback functions, proxy topics, service IDLs, etc. The implementation is almost as close as you can be to the logic as possible. More importantly, this implementation directly follows from elementary programming concepts: F_enabled
is viewed as a stateful variable, writers write to it, and A reads from it.
Although the API is a bit clunky, we can write wrappers to improve readability and productivity.
Commentary and observations
Although this was a pretty simple example, there are many other cases accomplishing an obvious task in ROS requires a non-obvious implementation. While publish-subscribe is an excellent paradigm for streaming sensor data, it’s just not the cleanest paradigm for many other tasks, such as launching temporary worker processes for parallel programming or invoking long-running services like training machine learning models, or invoking motion planners. Until a novice is proficient at distributed systems programming, their clumsiness at the “ROS dance” will lead to messy code, brittle hacking, and frustration.
Here are more issues that have influenced my view:
The Learning Curve is Steep
- Complex file structures. In most build sytems, the file structure isn’t important and can always be reconfigured, but in ROS, understanding how folder names map to package functionality is crucial to get any understanding.
- Paradigms are unusual. Build systems and IDLs aren’t taught in lower level CS classes, and many engineering students want to build robots but have been trained on Matlab.
- Building and installing before testing is just… weird. This flies against the face of most software engineering philosophies. Shouldn’t an installed piece of code be already thoroughly tested?
- Defining new messages requires learning a new language. Knowing the IDL is also critical to understand the function of an existing message. This is easy for seasoned programmers, but not so straightforward for the novice.
- Not curated. There are way too many packages available, of varying quality. Documentation is often spotty, and OS / language / ROS version incompatibility is a major problem. After days of struggle, installing a package successfully seems like a major achievement — but what an intellectual dud! The problem will only get worse with ROS2.
- Here’s a very common example. Do you want to communicate with an Arduino programmatically? Yes! Let’s do it the ROS way with the
rosserial_arduino
package!. Oh wait, this is a serial communication and serialization wrapper, and I need to code your own Arduino script to link the messages with the Arduino's I/O. On the other hand, the Arduino Python Command API lets you read and write to Arduino I/O directly from Python code. That was easy, why didn't I do that first?
Facilitates Discovery… Somewhat
There are a lot of positive aspects of ROS’s discovery tools, such rostopic
, rqt_graph
, and rviz
. They do a great job telling you what is available to interface with in a system, but I would argue that the question of "what data is out there?" is less important than "how should I interpret it?"
- Can find a system’s topics but not how to interpret them, e.g., conventions, units, program logic.
- Topic encapsulation isn’t enforced and relies on convention and launch file tweaking. Topic naming clashes are frequent, since the community lacks conventions and best practices.
- It’s hard to write good documentation even in the best of times; ROS makes it harder because nodes don’t self-document their major functions (topics, services) in their main source files. Even with a well-commented piece of code, this becomes a game of hunting.
What’s the system state?
In ROS there is no accessible notion of “system state”, i.e., a snapshot of the current workings of the system. Instead, we have logs, which provide a sequence of deltas. Although this is great to review what happened during a run, it is suboptimal for debugging.
- No save / restore / rewind. With ROS, to find out what went wrong in a system, we must replay logs from the system start time time, and for rare bugs on long runs this is exceedingly inconvenient.
- Hard / tedious to implement state synchronization. Sometimes, components just need to know the state of other components, especially in UIs.
- Less amenable to machine learning and planning. System ID, reinforcement learning, and motion planning rely fundamentally on being able to capture state.
The Future is Hazy
Willow Garage was a huge player in promoting robotics and ROS in the early 2010’s, but its sudden demise left a void that has proven hard to fill. So while ROS is still the dominant middleware system in robotics, its future as the industry leader is uncertain.
Although ROS was a breath of fresh air in the bad old days of CORBA, over the last 10 years serialization technologies like JSON, Google Protocol Buffers, Apache Thrift (used by Facebook), and MIT’s LCM have become abundant. Other networking technologies like Redis, Websockets, and ZeroMQ are making basic communication relatively straightforward. Furthermore, ROS wasn’t built with the web in mind, with inter-domain messaging and security a secondary concern.
ROS2 is an attempt to modernize ROS for the 2020’s, but convincing the established ROS community to upgrade will prove an uphill climb. It does have the support of Amazon and iRobot on the ROS2 Technical Steering Committee, but heavy hitters like NVidia, Facebook, Intel, and Google are notable omissions — in fact, NVidia and Facebook have released packages that directly compete with ROS/ROS2. We also don’t see successful robotics companies like Intuitive Surgical, Boston Dynamics, autonomous driving companies on this list. If ROS1 is still deeply embedded in labs and education, and industry isn’t embracing the upgrade, then why switch?
Recommendations
Overall, I make the following recommendations for instructors:
- Although programming is essential, teaching robotics with a heavy emphasis on software engineering is a mistake; it tends to exclude potentially talented students with mechanical engineering, electrical engineering, and mathematics backgrounds, while also giving the impression that robotics is a chore.
- An instructor to an introductory robotics class should prepare straightforward interfaces to a physical robot’s functionality. The API should be in a single language, well-documented, and spread across as few top-level interface files as possible.
- The physical robot API should also share common data representations with the tools presented for modeling, collision detection, inverse kinematics, motion planning, etc.
- ROS should be taught as a programming technology, like a C++ programming language course or game programming with the Unity engine, rather than a foundational topic. Proficiency in ROS is a skill that enhances a student’s preparation for the workforce, but ROS itself does not deepen understanding of robotics concepts that stand the test of time.
- Migrating from educational code to “real robot coding” takes as much time understanding the quirks of the robot as it does to learn its software infrastructure. These problems do not disappear whether the API uses ROS, WebSockets, C, Python, or a CAN protocol, and a student with a broad base of knowledge will be able to adapt quickly to the API. So don’t sweat it!