I get the impression that this fact is fundamentally lost on a lot of the people who want a "compatible" IPv6. Like, their mental model does not distinguish between how we as humans write down an IPv4 address in text and how that address is represented in the packet.
So they think "let's just add a couple more dots and numerals and keep everything else the same"
I think you’re right. Honestly, my impression is that a lot of people imagine it like a string field, and others more like a rich text field, analogous to “can’t we just use a smaller font?”
> The first thing in the IP header is the version number.
So you just change the version number… like was done with IPv6?
How would this be any different: all hosts, firewalls, routers, etc, would have to be updated… like with IPv6. So would all application code to handle (e.g.) connection logging… like with IPv6.
I was addressing the narrow claim that you cannot distinguish ASCII from UTF-7. You can distinguish IPv4 from IPv6 by looking at the version field (and I forgot to mention the L2 protocol field is out of band from IP's perspective). Obviously if the receiver doesn't support UTF-7 or IPv6 then it won't be understood. Forward compatibility isn't possible in this case.
Weirdly, the version field is actually irrelevant. You can't determine the type of a packet by looking at its first byte; you must look at the EtherType header in the Ethernet frame, or whatever equivalent your L2 protocol uses. It's redundant, possibly even to the point of being a mistake.
I mean, yes, in practice you can peek at the first byte if you know you're looking at an IP packet, but down that route lies expensive datacenter switches that can't switch packets sent to a destination MAC that starts with a 04 or 06 (looking at you, Cisco and Brocade: https://seclists.org/nanog/2016/Dec/29).
Right, the variable-length thing was my point. That's fine when you're dealing with byte slices that you scan through incrementally. But it's not fine for packets and OS data structures that had their lengths fixed at 32 bits.
UTF-8 is convenient because ASCII has a spare bit, but UTF-8 is fundamentally possible because ASCII is variable-length. IPv4 is not variable-length.