March 12, 2008

Format Wars: XML v. JSON

They're called wars because afterwards, everyone agrees that they were senseless wastes of resources, and faithfully promises never to let it happen again. In this case (at Mozo), the absolutely-everything format of XML (that stuff where everything is surrounded by <angle> bracketed words </angle>) is up against a new comer called JSON.

Because we are software engineers in the financial cryptography world I prefer the haptic approach to decision making. That is, we have to understand at least enough in order to build it. Touch this:

Here’s an example data structure, of the kind you might want to transmit from one place to another (represented as a Python dictionary; mentally replace with the syntax from your programming language of choice).
person = {
  "name": "Simon Willison",
  "age": 25,
  "height": 1.68,
  "urls": [
    "http://simonwillison.net/",
    "http://www.flickr.com/photos/simon/",
    "http://simon.incutio.com/"
  ]
}

Speaking strictly from the point of view of security: the goals are to have all your own code, and to be simple. Insecurity lurks in complexity, and other people's code represents uncontrollable complexity. Not because the authors are evil but because their objectives differ from yours in ways that you cannot see and cannot control.

Generally, then, in financial cryptography you should use your own format. Because that ensures that it is your own code doing the reading, and especially, that you have the skills and assets to maintain that code, and fix it.

To get to the point, I think this rules out XML. If one were considering security as the only goal, then it's out: XML is far too complex, it drags in all sorts of unknown stuff which the average developer cannot control, and you are highly dependent on the other people's code sets. I've worked on a few large projects with XML now, and this is the ever-present situation: out of control.

What then about JSON? I'm not familiar with it, but a little googling and I found the page above that describes it ... in a page. From a strictly security pov, that gives it a hands-down win.

I already understand what JSON is about, so I can secure it. I can't and never will be able to say I can secure XML.

Posted by iang at March 12, 2008 08:31 AM | TrackBack
Comments

> Generally, then, in financial cryptography you should use your own
> format. Because that ensures that it is your own code doing the
> reading, and especially, that you have the skills and assets to
> maintain that code, and fix it.

That is an absolutely fascinating point. YOU ROCK !

> To get to the point, I think this rules out XML. If one were
> considering security as the only goal, then it's out: XML is far too
> complex, it drags in all sorts of unknown stuff which the average
> developer cannot control, and you are highly dependent on the other
> people's code sets.

SO TRUE !

> I've worked on a few large projects with XML now,
> and this is the ever-present situation: out of control.
>
> What then about JSON? I'm not familiar with it, but a little googling
> and I found the page above that describes it ... in a page. From a
> strictly security pov, that gives it a hands-down win.
>
> I already understand what JSON is about, so I can secure it. I can't
> and never will be able to say I can secure XML.


I feel even JSON is too complex. I'm sticking with the IGRule.

"Use your OWN format, as it forces you to use your own [hence secure] code."

Posted by: JPM at March 12, 2008 10:21 AM

thats nothing ... should have seen the format wars during the 90s in the financial standards bodies with *ML vis-a-vis ASN.1 .... especially the x.509 contingent ... except *MLs were being treated as the new-comer

disclaimer, *MLs were invented by G, M, & L in 1969 at the science center
http://www.garlic.com/~lynn/subtopic.html#545tech
... a decade later standardized as SGML
http://www.garlic.com/~lynn/subtopic.html#sgml

and then begat HTML, XML and a whole slew of other *MLs.
http://infomesh.net/html/history/early

Posted by: Lynn Wheeler at March 12, 2008 11:17 AM

A slashdot comment is what I remember about this war :

XML is xtensible markup language. But it is still markup. So documents (e.g HTML , Word Documents ) is the place you would use XML.

JSON is for object notation.

the right tool for the right job. Think about what would HTML-JSON would like!

Posted by: duryodhan at March 12, 2008 01:57 PM

Hi Ian,

1. The XML standard defines a canonical deterministic representation, one can test XML formatted data for this canonicalness. The existence of such a canonical format implies that you can compare XML data in a meaningfull sense.
2. The validity of the format of data represented in XML can be verified unambiguously using schema's. The complexitiy of the format these schema's induce on the XML representation data space lies in the hands of the prudent and skillful designer (sorry). In dealing with digital signatures this is very important!
3. JSON doesn't have such a canonical deterministic representation nor does it provide a standard for validation without adding a lot of complexity (I' ve written some kind of JSON schema validator in javascript, it is more code than I'ld care fore) . The code snippet below suggest why having no canonical deterministric representation posses crypto problems:

var person1 ={ "name":"Alice", "key": "123" },
person2 ={ "key":"1234", "name":"Alice" };

alert( (person1 == person2) && (SHA1(person1) != SHA1(person2)) )

4. Which doesn't mean that JSON is not a very suitable tool for javascript programmers, which most likely have reinvented JSON-like solutions over and over.
5. Because you can store a lot in JSON it will spiral out of control as well.

As alway KISS applied to XML helps in adding inertia to prevent it from spinning out of control (Well KISS /is/ a good design constraint)

In short, stick with ASN.1 and DER. ASN.1 is as expressive as XML and already has a good binary representation. I like the TLV structure of ASN.1/DER in particular.
JSON and XML both are TV and the L is nice extra information for a crypto plumber. Take away: A TLV parser is less likely to suffer from security problems caused by buffer over runs ;-)

Posted by: kr, Twan at March 13, 2008 08:04 AM

Hey Twan,

I agree that neither XML nor JSON will be useful for hard-core financial cryptographic protocols. I was more thinking that we do need some sort of readable configuration format for simple data storage. And for applications that don't do high-end security implied by FC, it is still important to talk about basic, easy formats.

But, you are right. If security is really needed we need to recall that the application is the final authority on whether the data is good. Then, XML's canonical abilities are so much verbage to drown in. If we have digitally signed the data, this security result dominates and obviates the need for all of XML's checkability and expansibility and so forth. And JSON fails on other points.

TLV {tag,length,value} in binary: I mostly agree, but with simplification: Drop ASN.1 and do your own format. I have, and it is surprisingly little work, it is only a week or two's worth of coding to come up with the 6 or so things you need in own binary format.

Again, the trick is to know what to include and what to drop. For example, almost everything can be done with two things: number and array-of-bytes. I have discovered over time (around 3-4 iterations now) that I have preferred to replace all number formats with one single expanding natural number format, rather than have things like 1 byte, 2 byte, 4 byte, and 8 byte numbers, in positive and negative values, and decimals. We don't need negative values in FC, nor fractions nor decimals, so that is a large group of problems dropped (or more properly deferred across to application space). (Note this means also replacing the L in T-L-V, another nice simplification :)

Posted by: Iang at March 13, 2008 08:27 AM
Post a comment









Remember personal info?






Hit preview to see your comment as it would be displayed.