Putting the Systems Back into RSS: Recommendations for Reviewers and Authors
Since the beginning of the RSS conference there has been a debate about how we can support the important role of systems research in robotics, and I’ve weighed in on this debate in the past. Systems research is important and hard work, but it is widely known that the review process is unfairly critical of this type of research. We at the RSS Foundation have put together this document to help educate our reviewer population about these issues to help give systems papers a fairer shake in the review process, and to give systems researchers guidelines for maximizing the value of their work for the robotics community.
Let’s start with a (TL;DR) set of guidelines for reviewers.
Recommendations for reviewers of systems papers
Systems research may not invent a novel algorithm or compare against baselines on benchmarks. In most cases, it is not even possible to compare against other systems. These papers are not simple narratives of “Our method A outperforms method B on benchmark C”, so a reviewer cannot fairly assess the value of a systems paper by performing “lazy” reviewing strategies which have become so common in other fields (e.g., looking at a table of experimental results and seeing if the “our method” row has lots of bold numbers).
Systems papers dive into all of the messiness of the real world; accordingly, as a reviewer you should weigh the work’s pros and cons amongst multiple dimensions. This demands a different critical eye than “science” papers. To write a good review, you should exercise shrewd judgement, demonstrate awareness of the state of the field, and make a close reading of the paper to estimate how useful the work will be for others in the field.
When you receive a paper, first determine whether the paper is primarily a systems contribution. Is the unit of analysis a single component or the integration of multiple components? Are the authors genuinely concerned with their robot’s performance when deployed in some real-world setting? Systems papers may also contain contributions to methods and some methods papers may contain systems, so it may be best to consider “systems-ness” as a spectrum rather than a rigid classification. (For more guidelines, read on to the section “What defines systems research?”)
Now that you have a systems paper on your hands, prepare your review according to these key guidelines:
- Orient your review toward the question of “how useful is this work for researchers and/or practitioners?” Think carefully about the work as a whole and exercise good judgement about whether the work contributes value to the community, carefully weighing its strengths and weaknesses.
- Resist the temptation of the lazy reviewer’s crutches while forming your opinion of the paper, e.g., “needs more experiments to be convincing”, “authors should compare against the state-of-the-art”, “the authors have put together a nice system but it’s just engineering.”
- Systems research is hard; don’t expect perfection. Systems research is multifaceted and complex, the field of technology evolves quickly, and systems work takes time. It is not fair to expect the latest technique published in an ArXiv paper last month to be integrated and tested!
- Novelty of a technical component is not a prerequisite for publication. “The authors just combined known methods” is not a significant criticism! Were they the first to combine the methods in that way? Did they achieve a novel system capability? Did they learn something intriguing or insightful from the combination? Did they extrapolate from current trends to better understand issues that would arise in the future? Focus on criticizing these aspects of the work if you aren’t convinced.
- Benchmarking or comparing against baselines is not a prerequisite for publication. If a benchmark doesn’t exist and other systems are not available for testing, this should not be asked of authors. Consider whether the work is useful for the community as a feasibility study.
- Experiments are expensive and time consuming in contrast to simulation or lab testing, so it is often unfair to ask the author to do significantly more testing. (In some of my past research, our experiments were limited by the number of organ donors who died with intact corneas that were donated to a local eye bank for medical research — about 2–3 per week!) It is fair to say something like “the potential impact of the work is weakened by the limited number of experiments” but you should review the paper as-is.
Let’s now dive deeper to define and explain the purpose of systems research, and explain what makes systems research good or bad. We also provide recommendations to authors to structure their research and papers in a way to better highlight their contributions during the review process.
What defines systems research?
A key factor that distinguishes systems research from “science” research is that it contributes to knowledge about how systems of many components — hardware, software, algorithms, models, developers, and human stakeholders— interact. This understanding can lead to important insights that help our field develop better systems in the future.
Systems research almost certainly involves multiple interacting technological components. It may also include human factors and socioeconomic studies to understand the use and viability of robot systems for certain applications. There are several purposes of systems research in robotics, including but not limited to:
- Understand how robots do/will behave during real-world deployment.
- Explore how factors of a system implementation affect overall system performance.
- Provide tools to the community to facilitate the development of deployed robots.
- Understand and improve the engineering and organizational processes of developing and deploying robots.
- Predict future robot development challenges by examining performance trends in existing systems.
Note that not all research is hypothesis driven; research can be purely descriptive and still quite valuable! Consider a paper that implements a system to achieve a task. If this authors describe how their implementation was able to achieve the task whereas others were not, this could be an example of descriptive research — a feasibility study — that could mark a useful datapoint for the community.
Why do we need systems research? It’s to keep a field honest by testing robots under representative or even adversarial conditions. Within purely “science” research in robotics there are pervasive incentives to drift from reality —choosing objectives according to how well a method of choice works, or testing under favorable and unrealistic lab conditions — and these incentives are often self-reinforcing and can poison a field for years or decades.
How should we assess systems research?
We note that undue criticism of systems research is not a phenomenon limited to the robotics field, and this document has been inspired by efforts in other fields, such as this post from the program chairs of the UIST 2021 conference. Adapting a framing presented by the UIST 2021 program chairs and integrated into their review process, there are four quadrants of systems research:
- Novelty in components and novelty in integration (rare)
- Novel components and known integration
- Known components and novelty in integration
- Known components and known integration.
In this fast-moving era that emphasizes novelty, novelty, novelty, you might ask “how can the last category actually be good research?” Although the system can be completely known, the authors can analyze it in a novel way, e.g., conducting more extensive experiments, considering different metrics, or providing novel insights into how the system performs. Or, an author may analyze historical systems and predict how the field should evolve to address emerging challenges. By doing so, systems research can draw long-range conclusions that influence the field for years.
Good systems papers will contain some (but probably not all) of the following characteristics:
- Demonstrates a novel functionality through the integration of technological components.
- Insights into limitations of component technologies in deployed systems.
- Insights into engineering problems likely to be faced when integrating systems.
- Clearly defines test procedures and metrics and executes test procedures with scientific rigor.
- Results and recommendations are didactic; that is, conclusions educate the reader with lessons that extend beyond the author’s system.
- Relevance to real applications with possible socioeconomic impact.
In contrast, bad systems papers may have the following characteristics:
- Describes systems work but does not perform research, i.e., does not attempt to answer a generalizable question. Systems research needs to be structured as actual research.
- Does not seriously consider deployed conditions, e.g., overly simplified lab environments, overly instrumented environments like motion capture. Note: some simplification / instrumentation is usually needed to make progress, so the reviewer needs to be familiar with the state of research to see whether the authors are making steps toward reality “in good faith” vs “cheating”.
- Exhibits methodological weakness, e.g. modifying the system during testing, repeating tests until positive results are obtained, experimenters helping the system during testing, having the authors stand in for end users, or not accounting for training effects.
- Lessons learned are not generalizable, e.g. a single case study is performed, implementation mistakes were made, or test environments are not representative. Building a system and showing it works once is poor systems research!
- Exhibits low impact on the field, e.g., the task is not relevant to current problems, or performance is significantly surpassed by existing deployed systems. For feasibility studies, it is easy for an author to be the first to achieve a task by defining a new or modified task. Without judicious reviewing we’ll end up publishing papers that “plant a flag” on tasks that drift from reality and ultimately become downright silly.
Note on competition papers: the impact criterion we identify above does not imply that a team that fails to win a competition cannot write a good systems paper! There are often multiple dimensions of performance, such as cost, ease of integration, usability, or robustness, in which a system can perform well or lessons can be learned, and a non-winning team should highlight those dimensions. Negative results are often insightful and can help others avoid the same mistakes.
Recommendations for authors of systems papers
Besides being hard to review, systems research is hard to do right. It’s expensive, frustrating, and time-consuming, and yet we want this work to be rewarded with high-impact papers. If you are a systems researcher, we recommend aligning your engineering process to the publishing pipeline to achieve better outcomes from the review process.
- Systems work alone is not systems research. To count as research, your work must attempt to answer a generalizable question regarding robot systems. It is your job as an author to articulate and motivate the question. Examples of research approaches include: comparing system behavior under different implementations / degradations of a component, demonstrating a first achievement of some capability and describing how it was achieved, analyzing trends when scaling to larger systems, more complex environments, or emerging challenges.
- Incorporate research questions at early stages in your development plan. Review relevant literature and identify open questions. Conduct experiments with scientific rigor, even when the baseline is known to be inferior. Be rigorous about freezing the version of your system tested; don’t redo tests to fix your system on the fly. Don’t “cheat” by changing the evaluation to make your work look better.
- Justify system design decisions as you go. If you can’t clearly explain in black and white the choices you made while designing and developing your system, reviewers will criticize it. Note that your system need not be “perfect”; effort, cost, and availability of components can make reasonable justifications.
- Choose applications with real-world relevance. You don’t have to have a product at the ready, but if an application is not yet viable, examine the literature to identify the gaps. Think carefully about the steps that would need to be taken toward viability, target your research toward those steps, and write your papers to justify how your research makes progress.
- Articulate any lessons learned that have general value to the community. Consider writing a whole section in which you enumerate these lessons rather than weaving them into your results or conclusions.
- To extract the most value from your investments, evolve your system gradually and use it as a platform for multiple papers. These papers may answer different research questions or show a sequence of upgrades to system capabilities. However, excessive “salami slicing” will lead to weak papers, so strive for a healthy balance.
Conclusion and feedback
We at the RSS Foundation feel it is important to promote and encourage strong systems research to ensure the long-term health of the robotics research field. Doing so is a two-way street that requires better alignment between systems paper reviewers and authors, and we hope that our recommendations can help achieve this goal.
In the interest of openness and transparency, we welcome public commentary on this article or private feedback through email. We look forward to hearing further from the community.