> Different human languages don't figure into this at all and are completely irrelevant.
Back to basics: A Turing machine is specified by a set of symbols it can read/write to a tape, and a state machine matching current state and read symbol to next state and actions.
If that set of symbols is just {1,0}, then it absolutely, positively, cannot print out the string "ABC".
> the best compression methods out all have a variable length (measured in bits) alphabet.
This is a category error....if the compression algorithm reads and writes binary data, its alphabet is "zero" and "one." The symbols read and written by a Turing machine are atomic--they are not composed of any simpler parts.
Sure, external to the turning machine, you can adopt some conventions which map bit patterns to letters in a different alphabet. But a binary Turing machine cannot print out those letters--it can only print out a string of 1's and 0's.
The mapping from strings in the binary language to letters in the target language cannot participate in the calculations for the complexity of a string in another language. Because if it did, again, you could make the Kolmogorov complexity of any arbitrary string S you choose to be 0, because you could just say the null output maps to S.
This is a subtle problem, often glossed over or just missed entirely. We are so used to encoding things in binary that it might not occur to us unless we think about it deeply.
Just because a turing machine prints out 0 and 1 at each step doesn't mean the sequences to factor into the calculation of what to print out next can't be longer binary sequences.
Pretty much all the best compression methods are language agnostic and work on bit wise sequences. They also pretty much all predict the next bit and feed that into an alogithmic encoder.
Eg. look up dynamic markov coding which is commonly used by Hutter prize winners. The paper is short and readable. They dynamically create a binary tree and binary sequences are seen so if the pattern '01101000 01100101' comes in it walks down the binary tree. It'll probably predict the next bit as '0' as '0110100001100101' just so happened to be a common sequence in English that will likely have a next bit of '0' but the Dynamic Markov coding model has no idea of that. It just has binary sequences of bits and a prediction of the next bit given that.
Likewise it can continue reading the bits of the file in and walking down its binary tree where history of the next bit are stored in every node and it see's '111001001011110110100000'. It makes the prediction of the next bit as a likely '1' that it feeds into an arithmetic coder that takes predictions and forms an optimally minimal bitsequence from that. That second binary sequence forms part of 你好.
In both cases the turing machine doesn't care about that. It's also just writing 1's and 0's as per a turing machine. Eventually those 1's and 0's form sequences that happen to map to characters in various languages but it doesn't care about that.
>The mapping from strings in the binary language to letters in the target language cannot participate in the calculations for the complexity of a string in another language. Because if it did, again, you could make the Kolmogorov complexity of any arbitrary string S you choose to be 0, because you could just say the null output maps to S.
One other thing to address here is that Kolmogorov complexity explicitly includes any dictionary you use in it's calculation. A dictionary of the file you wish to compress would just blow out your Kolmogorov complexity to that size exactly. That's why Kolmogorov complexity is an excellent tool. You explicitly cannot cheat in this way.
> the turing machine doesn't care about that. It's also just writing 1's and 0's as per a turing machine.
But a Turing machine does not have to be restricted to just printing out zeros and ones. It can be any finite set of symbols. For example, he Soviets built a computer which used base 3--its symbol set was {-1, 0, 1}. It didn't have bits which could just store "1" or "0", it had trits which could store "-1", "0", or "1".
And why the soviets built such a computer is germane to Kolmogorov complexity just because you can make shorter strings in base 3 than you can in base 2. The choice of symbol set absolutely impacts the length of strings and programs, and therefore impacts the Kolmogorov complexity of strings relative to the computer.
With this in mind, please consider three Turing Machines: A, B, and C
The symbols A prints out on its tape are {"A", "B", "C"}
The symbols B prints out on its tape are {"0", "1"}
The symbols C prints out on its tape are {"0", "1", "A", "B", "C"}
Now consider two strings: "1000001" and "A". Turing machine B could print out "1000001". Turing machine A could print out "A".
You might be tempted to equate "1000001" and "A". But consider the same two strings printed out by machine C:
"1000001" and "A"
Clearly, these are different strings. If C printed out "A", it did not print out "1000001", and vice versa.
> Kolmogorov complexity explicitly includes any dictionary you use in it's calculation.
Sure, but on a binary turing machine, like Machine B above, that dictionary is not going to be matching between binary strings and, say Roman letters. Its going to be mapping from one binary string to another binary string.
Using Machine C, you certainly could write a program which inputed "1000001" and output "A". But you absolutely, positively, cannot write such a program in either machines A or B. Machine A cannot print the string "1000001". And B cannot print the string "A".
All of those examples are exactly equivalent and convertable but where you are confusing yourself is that you're thinking of the differing states of a given n-ary system as having explicit symbols. It's best to think of it as simply numeric. Binary 010 is 2 if i were to write in decimal. Ternary 020 is the number 6 if i were to write it in decimal. Etc.
Any actual symbol mapping to the numbers from an n-ary system to actual symbols like you are showing here is actually arbitrary and part of the calculation of space when measuring an algorithms Kolmogov complexity. The program size in kolmogorov complexity includes the dictionary of numerical to symbolic lookup which is what you are getting at here.
> where you are confusing yourself is that you're thinking of the differing states of a given n-ary system as having explicit symbols. Its best to think of it as simply numeric.
But what is "simply numeric"? How do we explain numeric things, and how to compute with them? Uh, we use Turing machines.
We don't explain Turing machines using numeric concepts, we explain numeric concepts using Turing machines.
We explain what computation is, by describing how a Turing machine can manipulate symbols on its tape. But that's where the analysis stops. The symbols themselves are the simplest entities, there's nothing simpler than they are to explain them with.
Nobody who is objecting to me here has ever noticed that the symbols "0" and "1" are atomic, and not further explained. Because that's where they bottom out--explanations have to stop somewhere.
But you don't have to bottom out at {0,1}. Trinary computers use a different symbol set, {-1, 0 1}. Saying that "A" can be equated with "1000001" is as silly as saying we could store any of the three values -1, 0 or 1 in a single bit of a binary computer.
There just simply is no symbol set for any machine. That's literally an arbitrary mapping. You can literally call the two states whatever you want. You can call one state the contents of file A and another state the contents of file B if you want. But that mapping has a size it takes and kolmogorov complexity very directly says you need to include the size of that mapping. End of story.
The reason to think numerically is that there are n^y states for an n-ary machine. You can count them. And you can map each on of those to any other n-ary system.
Note that Turing in his paper talked about states. "simply stated that
the machine had certain m-configurations (enumerated) and all m-configurations obtainable". There's no binary machine that outputs 0 or 1 directly except by a mapping of their two states to symbols. Kolmogorov knew this and explicitly stated that any dictionary be included as part of the size measurement.
Let’s do a concrete example. B is a Turing machine which has 2 symbols, and uses base 2 to encode numbers. N is a Turing machine with n symbols and uses base n to encode numbers.
Suppose that machine N can print out string s with a program k symbols long.
Machine B cannot literally print out s, in general, because s can contain symbols which B can’t print.
If I understand you correctly, you are saying “so what?” We can just encode the n symbols of N in the obvious way, using only the two symbols of B, and then we just program B to emulate N. Right? Do I get your point?
If so, well how much longer is a program for the emulated N than the program encoded in N’s native language? I calculate it would be O(log(n)/log(2)).
So, as n grows, the length of the translated program also grows. We can make the complexity measured by B to be as much larger than the complexity measured by N as we wish, just by increasing n, the number of symbols used by machine N.
Now, if this were just a constant factor—if the discrepancy was just O(1) more, I would agree with you. Yeah, sure we need a map, but that’s just a constant bit more of information…oh wait, the size of the map is going to grow with at O(log(n)) as well…
O(1) we can sweep under the rug. O(log(n)) we can’t, because this means there is a trade-off between the complexity of the Turing machine and the complexity it assigns to strings.
A more complex Turing machine will systematically assign less complexity than a simple one does.
I mean in retrospect, this is obvious: I can reduce the complexity of any number to 1, just by using that number as one of the symbols in my machine.
Pi has infinite digits and can’t be printed out? No problem, just include “pi” as one of you symbols and you can print it in in one character.
This is easy to miss, because very quickly most presentations of Kolmogorov complexity settle in on a binary language. Then they prove, say, an invariance theorem, saying, e.g., that any two Turing machines will assign the same complexity up to O(1)…
But that is if you hold n constant. If you can vary n, then a tacit presupposition of those theorems is violated. Those theorems no longer hold.
If you have a black and white monitor, you can’t display a green pixel. Sure, you can use dithering—assign different bit patterns to each color. But then you need a bigger monitor, because the dithered pixel patterns are more than 1 bit long.
Similarly, a binary machine can only print two symbols. It can’t print 3 symbols, any more than a black and white pixel can display a green pixel.
And even if we relax the requirement, and say it doesn’t have to be literally a green pixel, or literally a third symbol (“literally” is a most appropriate word here) we find we need more pixels, or longer programs. And the more colors we want to dither, the more pixels we need.
I hope this clarifies; I’d be happy to answer any questions.
You can trivially convert any such Turing machine to another simply by adding n new states to it, that map a letter of the previous alphabet to the new’s.
With all due respect, you are misunderstanding something/bring something up with no relevance to complexity.
chuckle Ok, give me the benefit of the doubt for 10 more minutes, and follow my argument like a leopard stalking its prey...
Sure, you could use a binary turing machine, and have it simulate, say, a ternary or a decimal turing machine. You could do this just by mapping bit patters to the symbols the other turing machines have. E.g. "3" could be mapped to "00000011".
But here's one thing nobody who has objected to me so far has noticed: why don't you feel it's necessary to explain how the symbols "0" and "1" on a binary machine work---say, by using another map to another Turing machine? Or by any other means? Why do people just stop there?
Take the simplest possible question we could ask about "0" and "1": what explains why the symbol "0" is different from the symbol "1"?
Cf. with the question why is "3" different from "4"? I mean, we could explain why "3" isn't the same symbol as "4", because a binary turing machine would map "3" and "4" to the binary strings "011" and "100" and these are different binary strings, because "0" is different from "1".
But what explains why "0" is different from "1"? Absolutely nothing explains it!!
On pain of infinite regress, explanations have to stop somewhere, and for Turing machines, this is where it stops. A set of symbols is specified, but those symbols are atomic, and not decomposable into any simper or more basic symbols, and are not further explainable.
Everything else is explained in terms of these symbols and how they are manipulated by the Turing machine. There is nothing more simple, or more basic, than these symbols to explain these symbols.
Try thinking about it going the other way: Suppose you were trying to explain how the symbols "0" and "1" behaved on a binary turing machine---by emulating it with a decimal turing machine!! You certainly could say something like "Well, the binary turing machine knows that "0" is different from "1", because we have this dictionary which maps "0" and "1" on the decimal turing machine to "0" and "1" on the binary turing machine. And since the decimal turing machine knows that "0" is different from "1", well, that's how the binary turing machine knows it....
See what I'm getting at? Sooner or later you get to the axioms. The unexplained explainers. The atoms out of which everything else is constructed. I know you are wishing I'd get to the point, but please stick with me :-)
If a turing machine does not have a symbol in its symbol set, it cannot print that symbol out on the tape, any more than a binary computer can store "-1" in a single bit. Just like bits and trits are the smallest, most elementary storage units for binary and ternary computers, the cell on the turing machine tape is the simplest, most elementary storage unit, and the only things it can store are what are specified in the symbol set.
Now just like numbers written in decimal are systematically smaller than numbers written in binary, turing machines which use 10 symbols have systematically smaller programs than turing machines which use 2 symbols. This has obvious ramifications for Kolmogorov complexity.
When applying Kolmogorov complexity, we typically do limit ourselves to an alphabet of {0,1}. But it is important to realize that this is a tacit assumption in most of the theorems. For example, you can prove that for any two turing machines, U and V, KU(s) <= KV(s) + O(1).....but this theorem relies on the fact that if U and V are universal turing machines, there is an O(1)-length program for V which emulates U, and vice versa. **but this is only true if U and V share the same symbol set**. KU(s) and KV(s) can be made arbitrarily different from each other if they don't have the same symbol set.
And its not even just a theoretical, esoteric point: Knuth has argued for ternary computers, just because they can systematically express things using shorter strings of trits than a binary computer can with strings of bits.
There is more than one "knob we can turn" when it comes to Turing machines: one "knob" is which program we feed it, another knob is the number of symbols it uses. By making one bigger, we can make the other one smaller. Program size isn't the only thing that matters (as my wife likes to reassure me...)
When you write a program in any modern programming language to print "ABC",
that program merely outputs the 3 bytes 01000001 01000010 01000011,
ASCII codes which your console window (not the program you wrote) decides to display graphically as those characters.
So any machine outputting bits can be said to print out "ABC" just as well.
Furthermore, your comment above was transmitted by your browser to Hacker News not as English characters but as those very bits, and everyone reading your comment was receiving those bits. The way that the browser decides to display those bits does not change their character.
> When you write a program in any modern programming language
Think about it this way. Say you have 1 bit of memory. Can you store any of the three numbers -1, 0, 1 with that a bit?? No. The only thing a bit can store is 0 or 1.
There have been trinary computers built. They don't use bits, they use trits. A Trit can store -1,0, or 1.
A bit cannot. Saying that a turing machine, whose symbol set is {0,1} can print "A" on its tape, is like saying you can store "-1" >>>in a single bit<<<< on a binary computer.
Back to basics: A Turing machine is specified by a set of symbols it can read/write to a tape, and a state machine matching current state and read symbol to next state and actions.
If that set of symbols is just {1,0}, then it absolutely, positively, cannot print out the string "ABC".
> the best compression methods out all have a variable length (measured in bits) alphabet.
This is a category error....if the compression algorithm reads and writes binary data, its alphabet is "zero" and "one." The symbols read and written by a Turing machine are atomic--they are not composed of any simpler parts.
Sure, external to the turning machine, you can adopt some conventions which map bit patterns to letters in a different alphabet. But a binary Turing machine cannot print out those letters--it can only print out a string of 1's and 0's.
The mapping from strings in the binary language to letters in the target language cannot participate in the calculations for the complexity of a string in another language. Because if it did, again, you could make the Kolmogorov complexity of any arbitrary string S you choose to be 0, because you could just say the null output maps to S.
This is a subtle problem, often glossed over or just missed entirely. We are so used to encoding things in binary that it might not occur to us unless we think about it deeply.
Nevertheless, it is a real, genuine problem.