Why I Don’t Teach ROS to Robotics Students

I am often asked how I go about teaching ROS to robotics students, and I simply reply “I don’t”.

Although ROS is an important skill to learn at some point during the career of a robotics student, in the first semester or two of introducing robotics, an instructor must balance the relatively steep learning curve of ROS against theory, algorithms. In the case of mathematically- or mechanically-minded students, software engineering itself can be a foreign concept.

Students programming a robot during the Amazon Picking Challenge, 2016

ROS is a strong tool for connecting components together, but the robotics problem is not simply about connecting software packages. In my view, the job of an educator is to achieve the thinnest of possible barriers between the concept and the assignment. Moreover, because software and hardware compatibility changes so rapidly, a novice can be easily trapped in “version hell” where course materials and tutorials become out of date as versions of ROS change (Kinetic, Melodic, ROS2? Python 2 or 3? What about building on my Mac?) and their lives are consumed with compilation issues. But besides this, the bending-over-backwards of distributed systems programming imposes a certain “oddness” that takes engineering students, hobbyists, and even lower-division undergraduate students in computing, too long to grasp for my taste.

Let me start by presenting an example of something very simple that a novice might want to do, which could take 10 seconds to implement in a normal programming language, but might take a student days to implement in ROS.

Example: Simple things can be hard in ROS

The logic should state that F should be on if at least one other node asks it to be on, and off otherwise. In other words,

F_enabled = B_F_enabled or C_F_enabled.

We shall see that this one-liner in any normal programming language leaves an abundance of options (and hence, an abundance of confusion) to the ROS implementer.

ROS Implementation 1

  • A reads std_msgs/Bool from /F_enabled
  • B publishes value True to /F_enabled when it wants F to be on, and False otherwise.
  • C publishes value False to /F_enabled

This doesn’t work because if C’s message is sent after B’s, it will erase the setting done by B. Let’s try again:

ROS Implementation 2

  • B repeatedly publishes value True to /F_enabled when it wants F on, and False otherwise.
  • C doesn’t do anything.

This works, as long as A implements the timeout correctly, and B knows the timeout length from A so its rate can be set to at least this value. But observe that this makes the behavior of A history- and timing- dependent. Moreover, what if it’s not a good idea to keep F on, and instead I’d like F to turn off promptly at a specified time? We’ve broken the very clear logical definition of F into a bit of a mess.

ROS Implementation 3

B_F_enabled = False
C_F_enabled = False
def recv_B_F_enabled(msg):
global B_F_enabled,C_F_enabled
B_F_enabled = msg.data
if B_F_enabled or C_F_enabled:
enable_F()
else:
disable_F()
def recv_C_F_enabled(msg):
global B_F_enabled,C_F_enabled
C_F_enabled = msg.data
if B_F_enabled or C_F_enabled:
enable_F()
else:
disable_F()
  • B publishes value True to /B_F_enabled if it wants it on, and False otherwise.
  • C can either publish False to /C_F_enabled, or do nothing.

This is the first implementation that actually does what we wanted! The problem is, however, that it doesn’t scale to more nodes. The complexity of adding a third node D requires modifying A to subscribe to a new topic with a third callback function, and modifying all of the logic above.

ROS Implementation 4

X_F_enabled = dict()def recv_X_F_enabled(msg,caller):
global X_F_enabled
X_F_enabled[caller] = msg.data
if any(X_F_enabled.values())
enable_F()
else:
disable_F()
rospy.Subscriber('/B_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'B'))
rospy.Subscriber('/C_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'C'))
rospy.Subscriber('/D_F_enabled',Bool,lambda msg: recv_X_F_enabled(msg,'D'))

But this doesn’t scale to variable numbers of writers. To do so, we’d could set up the list of writers in a rosparam parameter. Let’s say we set the param F_writers = '["B","C","D"]', then the subscribing call can be replaced as follows:

possible_F_writers = rospy.get_param("/F_writers")
for x in possible_F_writers:
rospy.Subscriber('/{}_F_enabled'.format(x),Bool,lambda msg: recv_X_F_enabled(msg,x))

Now, each of the writers needs to know which topic they should publish to. Or, a topic mapping can be set up to map their output from “F_enabled” to “[X]_F_enabled”. In any case, yikes! This is still a bit of a pain.

ROS Implementation 5

  • B calls the service with enable_F(True,'B')
  • C can either call the service or not.

OK, now we’re getting to something much more elegant. However, we now have to write the .srv IDL for the service, write the build for A so that it gets installed, then build B and C such that they can access the service… Kind of a pain, isn’t it?

A Simpler Alternative

  • A sets an empty dictionary to some path: redis.jsonset("F_enabled",".",{}). It then periodically performs a jsonget on the dictionary, and implements the logic.
  • B writes True or False to a subkey of this path: redis.jsonset("F_enabled",".B",True)
  • C can write False via redis.jsonset("F_enabled",".C",True) or just do nothing at all.
  • Other writers can do similar things, just making sure they write to a unique subkey.

That’s it! No fiddling with multiple callback functions, proxy topics, service IDLs, etc. The implementation is almost as close as you can be to the logic as possible. More importantly, this implementation directly follows from elementary programming concepts: F_enabled is viewed as a stateful variable, writers write to it, and A reads from it.

Although the API is a bit clunky, we can write wrappers to improve readability and productivity.

Commentary and observations

Here are more issues that have influenced my view:

The Learning Curve is Steep

  • Paradigms are unusual. Build systems and IDLs aren’t taught in lower level CS classes, and many engineering students want to build robots but have been trained on Matlab.
  • Building and installing before testing is just… weird. This flies against the face of most software engineering philosophies. Shouldn’t an installed piece of code be already thoroughly tested?
  • Defining new messages requires learning a new language. Knowing the IDL is also critical to understand the function of an existing message. This is easy for seasoned programmers, but not so straightforward for the novice.
  • Not curated. There are way too many packages available, of varying quality. Documentation is often spotty, and OS / language / ROS version incompatibility is a major problem. After days of struggle, installing a package successfully seems like a major achievement — but what an intellectual dud! The problem will only get worse with ROS2.
  • Here’s a very common example. Do you want to communicate with an Arduino programmatically? Yes! Let’s do it the ROS way with the rosserial_arduino package!. Oh wait, this is a serial communication and serialization wrapper, and I need to code your own Arduino script to link the messages with the Arduino's I/O. On the other hand, the Arduino Python Command API lets you read and write to Arduino I/O directly from Python code. That was easy, why didn't I do that first?

Facilitates Discovery… Somewhat

  • Can find a system’s topics but not how to interpret them, e.g., conventions, units, program logic.
  • Topic encapsulation isn’t enforced and relies on convention and launch file tweaking. Topic naming clashes are frequent, since the community lacks conventions and best practices.
  • It’s hard to write good documentation even in the best of times; ROS makes it harder because nodes don’t self-document their major functions (topics, services) in their main source files. Even with a well-commented piece of code, this becomes a game of hunting.

What’s the system state?

  • No save / restore / rewind. With ROS, to find out what went wrong in a system, we must replay logs from the system start time time, and for rare bugs on long runs this is exceedingly inconvenient.
  • Hard / tedious to implement state synchronization. Sometimes, components just need to know the state of other components, especially in UIs.
  • Less amenable to machine learning and planning. System ID, reinforcement learning, and motion planning rely fundamentally on being able to capture state.

The Future is Hazy

Although ROS was a breath of fresh air in the bad old days of CORBA, over the last 10 years serialization technologies like JSON, Google Protocol Buffers, Apache Thrift (used by Facebook), and MIT’s LCM have become abundant. Other networking technologies like Redis, Websockets, and ZeroMQ are making basic communication relatively straightforward. Furthermore, ROS wasn’t built with the web in mind, with inter-domain messaging and security a secondary concern.

ROS2 is an attempt to modernize ROS for the 2020’s, but convincing the established ROS community to upgrade will prove an uphill climb. It does have the support of Amazon and iRobot on the ROS2 Technical Steering Committee, but heavy hitters like NVidia, Facebook, Intel, and Google are notable omissions — in fact, NVidia and Facebook have released packages that directly compete with ROS/ROS2. We also don’t see successful robotics companies like Intuitive Surgical, Boston Dynamics, autonomous driving companies on this list. If ROS1 is still deeply embedded in labs and education, and industry isn’t embracing the upgrade, then why switch?

Recommendations

  • Although programming is essential, teaching robotics with a heavy emphasis on software engineering is a mistake; it tends to exclude potentially talented students with mechanical engineering, electrical engineering, and mathematics backgrounds, while also giving the impression that robotics is a chore.
  • An instructor to an introductory robotics class should prepare straightforward interfaces to a physical robot’s functionality. The API should be in a single language, well-documented, and spread across as few top-level interface files as possible.
  • The physical robot API should also share common data representations with the tools presented for modeling, collision detection, inverse kinematics, motion planning, etc.
  • ROS should be taught as a programming technology, like a C++ programming language course or game programming with the Unity engine, rather than a foundational topic. Proficiency in ROS is a skill that enhances a student’s preparation for the workforce, but ROS itself does not deepen understanding of robotics concepts that stand the test of time.
  • Migrating from educational code to “real robot coding” takes as much time understanding the quirks of the robot as it does to learn its software infrastructure. These problems do not disappear whether the API uses ROS, WebSockets, C, Python, or a CAN protocol, and a student with a broad base of knowledge will be able to adapt quickly to the API. So don’t sweat it!

Associate Professor of Electrical and Computer Engineering at Duke University

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store