First, we are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create robots.txt parsers that are following the protocol.
Java robots.txt parser and matcher
Second, we are releasing an official Java port of the C++ robots.txt parser, created by Ian. Java is the 3rd most popular programming language on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and we hope that you'll find it useful, too.