simplejson security audit complete!
I’m pleased to announce that OSTIF recently sponsored a security audit of simplejson, performed by X41 D-Sec. The release of simplejson v3.9.1 includes fixes for the potential security issues discovered in this audit, as well as a number of security hardening measures it recommended.
Many open source projects such as simplejson are developed by volunteers without funding, sponsorship, or dedicated security resources, yet they have become embedded in infrastructure where security is critical. The OSTIF funded security audit of simplejson performed by X41 D-Sec was thorough; it identified potential security issues (minor, in this case) and included several specific recommendations for security hardening that will help the project remain safe to use. Even as a solo maintainer, it was a breeze to work with OSTIF and X41 D-Sec. The entire engagement from their initial email to the release of simplejson v3.9.1 took less than a month, just a few emails and one meeting on my end before receiving the initial report. I’d highly recommend working with OSTIF to other open source maintainers, and if I was still a decision maker at a corporation I’d devote resources to sponsoring this important work!
You can read more about this audit on the X41 D-Sec Blog, OSTIF Blog, and the full audit report.
The PR that resolved the identified issues is here: simplejson#313. For more context, see the full audit report.
In summary, the source code audit was done on simplejson v3.18.4. The three notable issues that were identified were as follows with some of my commentary on the potential impact:
SJ-PT-23-03 (medium) Backport the integer string length limitation from Python 3.11 to limit quadratic number parsing. (See also CVE-2020-10735)
For users of version of Python older than v3.11, this could have been used as a denial of service attack on services that parse large chunks of JSON.
SJ-PT-23-02 (low) Fix missing reference count decrease if PyOS_string_to_double raises an exception in Python 2.x.
I believe that this code was unreachable in all versions of Python. The string is checked in advance, and parsing a double shouldn’t need to allocate memory, thus it should never raise an exception.
SJ-PT-23-01 (low) Fix invalid handling of unicode escape sequences in the pure Python implementation of the decoder.
This would impact the presumably rare usage of the pure Python decoder and allow it to parse a subset of invalid unicode escape sequences. I think this would be difficult to exploit maliciously, but if this implementation was used as a validator it would certainly let incorrect inputs through. Regardless, any undocumented deviation from the JSON spec is considered a bug and will be addressed.
A handful of other improvements were made as a result of this audit, and several more are being considered for the future.
Several things came up in the audit that I’m strongly considering for the future:
Drop Python 2 support
This would certainly make maintenance easier, and improve the security. I’m not sure what a good timeline would be to sunset this support? Would it make more sense to sunset the library entirely, or make it a compatibilty shim over a more modern one? I’m not very connected to the userbase, libraries this old pretty much just work so long as they don’t change very much.
Implement Type Hints
For as long as Python 2 is supported, this is a bit tricky. However, I think the best path forward would be to adopt the simplejson typeshed stubs and maintain it in-tree. The hardest part would be to integrate type checking with the CI. If you’ve seen an example of this for another project that still supports ancient versions of Python, I’d be interested to see how it went!
Raise exception on duplicate keys
For compatibility reasons, the JSON specification says very little about
what implementations must do when duplicate keys are encountered. The current
implementation is consistent with many other implementations such that the
last name/value pair is reported. However, since some implementations differ,
it would be useful in a security or validation context to be able to reject
duplicates. I consider this very low priority as I don’t think it could be
the default behavior, and the additional code to support it efficiently would
have a performance and maintenance cost. I think at least documenting how the
object_pairs_hook argument could be used to implement such a check would
be a win.