right, and it should not be left to the serialization layer for that.

jiggawatts · on Aug 20, 2023

Security is a concern for every layer. It's not magic pixie dust that' can be sprinkled on top of software to renders it secure!

A while ago I read a great article about how the Adobe PDF serialization format is nearly impossible to secure because it allows inherently unsafe constructs.

For example, it allows cross-references that are basically just arbitrary unaligned pointers. It uses many different alignment and padding algorithms. It has length-prefixed and not-length prefixed sections. Etc, etc...

Apparently it was a serious research exercise to make a safe PDF parser, and they only covered a fraction of the full spec!

To put things in perspective: Originally, PDF allowed arbitrary code execution as a core feature, allowing the output of shell commands to be used as document content.

Most people like the Chromium and Firefox teams have just given up and now parse PDF using a sandboxed JavaScript VM because it's too hard to do it safely with C++. They parse HTML and JavaScript with C++, but not PDF. Think about that.

A similar issue caused Log4j, where a "format string parser" contained a vulnerability because it was too flexible and allowed network requests to be triggered by user-controlled data.

Even trivial, "surely it must be safe" formats like XML and JSON are riddled with security issues, such as different layers in a microservice architecture having different handling semantics for duplicate keys, null values, etc... This can result in exploits such as authentication and authorization tokens being interpreted by a system one way, but a different way by a different system. For real-world attacks along these lines, search for "request smuggling".

Serialization and parsing are security minefields and it is dangerously naive to just hand-wave that away.

See: https://seriot.ch/projects/parsing_json.html

signa11 · on Aug 20, 2023

> Serialization and parsing are security minefields and it is dangerously naive to just hand-wave that away. well, i am not hand-waving them away, i am not sure what can the serialization framework possibly _do_ to make things secure during the serialization ?

when execution of user-supplied code is allowed (in the examples that you have outlined above), surely, the layer _executing_ the code cannot really do anything about it ! perhaps you actually did intend to `rm -rf /` ?

policy checking, enforcement etc. has to happen at a higher / different layer. i am not sure why mechanism and policy are being conflated here.

in the same way, you gave the serialization layer a 10mb or whatever sized input to serialize, sure...you get an valid serialized output etc. maybe there is a genuine usecase for that in some context or another f.e. when serializing say image files, or something else etc. etc.

[edit] : minor comment.

jiggawatts · on Aug 20, 2023

> I am not sure what can the serialization framework possibly _do_ to make things secure during the serialization

Loads of things!

A strict specification that can only be interpreted one way goes very far. E.g.: a machine-readable BNF grammar file or something similar with no ambiguities.

A conformance test suite covering corner-cases is surprisingly effective, even with a supposedly perfect spec.

"Be strict with what you generate and lax with what you accept" has been demonstrated over and over again to be a disaster over the long-term in an ecosystem of many groups. Be strict always with what is accepted, not just generated!

Speaking of being strict: schema validation is essential. Strong typing for scalars helps a lot.

The actual implementations of the spec can obviously have a wide range of security features. Never allowing arbitrary type instantiation is critical, yet is a mistake that keeps reoccurring much like SQL injection.

Etc, etc...

signa11 · on Aug 22, 2023

> I am not sure what can the serialization framework possibly _do_ to make things secure during the serialization

>> Loads of things!

>> A strict specification that can only be interpreted one way goes very far. E.g.: a machine-readable BNF grammar file or something similar with no ambiguities.

once again, that is not the domain of the serialization framework ! it is a policy which needs to be established and enforced at input / output layer by the entity which implements it.

a serialization framework should just serialize and deserialize objects to / from an i/o 'channel' f.e. file, network, etc. shackling it with specification / enforcement of security etc. policies seems conflating one concern with another.

eviks · on Aug 20, 2023

what's the best modern alternative that is designed in this way?

jiggawatts · on Aug 20, 2023

gRPC ticks most of the checkboxes.

Unfortunately these are lessons that have to be learned over and over. Anything based on JSON is generally suspect. If you see the terms "quick" or "simple" in some marketing splash-page, assume the author has not thought about the hard problems like security and long-term interoperability.

Similarly, if you find yourself hand-rolling RPC client code and calling methods on something like "HttpClient" manually, you've done it wrong. That code should have been spat out by a code-generator from a schema.

signa11 · on Aug 22, 2023

> gRPC ticks most of the checkboxes.

huh :) ! gRPC is a 'r-p-c' framework, and uses protobuf for serialization. you should be comparing protobuf to cap'nproto.

afiori · on Aug 20, 2023

it depends on what type of safety.

The schema language might for example allow you to specify that an input string/blob should be smaller than 10MB and refuse to deserialize it if it is longer, same for array/list/vector length.

seangrogg · on Aug 20, 2023

It feels like a check against an input size of 10MB is something you would do well before deserialization, no?

tom_ · on Aug 20, 2023

The limit might apply to some specific part of the message, rather than the whole. You can't check this without actually deserialising, or at least doing most of the same work.

afiori · on Aug 20, 2023

not if it is a message you receive from a third party.

A concrete example might be a batching third party client: the app sends N messages in a single batch and each message has its own size limit.

baq · on Aug 20, 2023

You would, but others might not. Defense in depth.

signa11 · on Aug 20, 2023

> ... allow you to specify that an input string/blob should be smaller than 10MB and refuse to deserialize it if it is longer ...

why ? are there no cases where serializing even larger file is valid ?

afiori · on Aug 20, 2023

sure, a lot of cases, I suspect that S3 upload limits are different from imgur.

signa11 · on Aug 20, 2023

and feel free to do that in _your_ application. don’t shackle others with the limitations of your domain.

mechanism vs policy and all that.

afiori · on Aug 20, 2023

I believe I have already justified why it might be useful at the protocol/schema level in ways that cannot be replicated at the application level: to eagerly fail on expensive (eg memory) deserialization.

formerly_proven · on Aug 20, 2023

Disregard for safety and security in serialization is one of the most common, if not the most common, cause for security vulnerabilities.