It works the same way:
softmax is essentially just applying the normalization to the vector exp(x).
From an "engineering" POV this effectively ensures that the vector you normalize has strictly positive entries, so the result ends up being a proper distribution.
From a theory POV you get softmax like distributions (Gibbs distributions) by trying to balance following some energy E(x) and the entropy of the distribution.
In essence the softmax is the answer to "I try to follow the maximum of a function E(x) but I need to maintain some level of uncertainy".
The balancing coefficient between entropy and picking the maximum of the function is called "temperature" (following the behavior of particles in a physical system: The colder the system, the lower the chance of having particles randomly walk away from the minimal energy state).
specifically, the temperature is
softmax(x/temp)
if you draw temp->0, your softmax slowly becomes an argmax (with temp=0 being a literal argmax). If you increase the temperature, you are closer to the "random fluctuations" leaving more room for sampling x values that are not the maximum of x. (this is why e.g. LLMs become deterministic as you decrease temp->0)
Using a different base other than e implicitly changes the temperature:
N^x = exp(ln(N) x)
The normalization works the same since you are still dividing a positive value N^x by the sum of all alternatives sum(N^x_i), which is a normalization by design
Of course you can: you just have to define it in your type.
The output set becomes a union type of the normal output and whatever you want as an exception.
If you write this as a monad, your get very similar syntax to procedural code.
An exception is different to an Either result type. Exceptions short circuit execution and walk up the call tree to the nearest handler. They also have very different optimization in practice (eg in C++)
I'm what way is that different?
You return early and the call Cascades up the call chain until you handle it (otherwise it's always an "either" results)
In practice you use something like an exception monad, which makes this a lot more ergonomic since you don't need to carry a case distinction around for every unwrap: an exception monad essentially has an implicit passthrough that says "if it's a value, apply the function, if it's an exception just keep that".
You only need to "catch" the exception if you actually need the value.
I'm this case the exception monad is not that different from annotating a function with "throws": your calling function either needs it's own throws (=error monad wrapper) in which case exceptions just roll through, or you remove the throws, but now need to handle the exception explicitly (=unwrap the monad).
Germany already repatriated about half of its gold reserves between 2013 and 2017 from paris and new york to frankfurt.
There has been a recent (as in "18th of march" recent) petition to the Bundestag to repatriate the gold.
The reason not to repatriate the remaining gold back then is because Germany has substantial trade with the US, which is why Germany held gold in new york to begin with: It's the easiest way to resolve USD-Euro currency exchange at the central bank level (this is also why germany got rid of the paris gold reserves: with the euro you don't need currency exchange anymore).
Also, as you mentioned, the idea of "officially" repatriating gold with the current administration is quite dicey. It is very possible that the correct way of resolving this is to just stop buying gold in new york and let the currency exchange flux deal with the slow unwinding of the reserves without explicit repatriation.
Effectively none. The US has a huge trade deficit with Germany/Europe so there is practically never a case where the US receives gold from Germany: It's always more then offset by the deficit.
The equivalent for the US would be the consumption goods that are already flowing into the US. I.e. US gets goods but doesn't sell enough to Germany, so the difference to maintain the total exchange rate is the Gold.
That's also why it was trivial for france to repatriate its gold compared to germany: Germany holds about 10x the amount of gold in the US compared to France (France was ~120 tons, Germany is roughly 1200 tons: France earned its gold through different trade).
That's also why it is such a complex thing to repatriate German reserves: France took almost 1 year to repatriate its gold. For Germany, the efforts would be decade spanning (though maybe with recent changes there is a little more urgency).
Whether the US is capable of hiding their maleficence or not should not be an indicator of whether it is safe to deal with them. If your indicator for the US being a good partner in _anything_ is that "well we did corrupt things in the past, but people didn't use to care about it", then the US is still not a good partner.
It's not like the US has never e.g. openly threatened NATO allies with war: There is quite literally a standing law that allows the US president to invade the netherlands if any US military personnel is ever detained by the International Criminal Court.
This law has been on the books for over 20 years and has the publically announced intention to prevent the US from being prosecuted for all the other atrocities committed in e.g. Iraq. This bill was supported by both democrats and republicans.
The reality is that the US' stance towards the rest of the world has not changed with the recent administrations (nor would I expect it to: Trump does not happen in a vacuum). What did change was willingness of the rest of the world to act on the US' actions.
There were plenty of models the size of gpt3 in industry.
The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.
Assuming this is real: Why do you think anthropic was put on what is essentially an "enemy of the state" list and openai didn't?
The two things anthropic refused to do is mass surveillance and autonomous weapons, so why do _you_ think openai refused and still did not get placed on the exact same list.
It's fine to say "I'm not going to resign. I didn't even sign that letter", but thinking that openai can get away with not developing autonomous weapons or mass surveillance is naive at the very best.
It might be that they pay less for anthropic depending how many tokens are generated by each model: total cost is token cost times number of tokens. I haven't checked gpt5, but it is not impossible that price wise they might be very comparable if you account for reasoning tokens used.
This is essentially the principle behind algebraic effects (which, in practice, do get implemented as delimited continuations):
When you have an impure effect (e.g. check a database, generate a random number, write to a file, nondeterministic choices,...), instead of directly implementing the impure action, you instead have a symbol e.g "read", "generate number", ...
When executing the function, you also provide a context of "interpreters" that map the symbol to whatever action you want.
This is very useful, since the actual business logic can be analyzed in an isolated way.
For instance, if you want to test your application you can use a dummy interpreter for "check database" that returns whatever values you need for testing, but without needing to go to an actual SQL database.
It also allows you to switch backends rather easily: If your database uses the symbols "read", "write", "delete" then you just need to implement those calls in your backend. If you want to formally prove properties of your code, you can also do that by noting the properties of your symbols, e.g. `∀ key. read (delete key) = None`.
Since you always capture the symbol using an interpreter, you can also do fancy things like dynamically overriding the interpreter:
To implement a seeded random number generator, you can have an interpreter that always overrides itself using the new seed. The interpreter would look something like this
rnd, new_seed <- generate_pseudorandom(seed, argument)
with Pseudorandom_interpreter(new_seed):
continuation(rnd)
```
You can clearly see the continuation passing style and the power of self-overriding your own interpreter.
In fact, this is a nice way of handeling state in a pure way: Just put something other than new_seed into the new interpreter.
If you want to debug a state machine, you can use an interpreter like this
with replace_state_interpreter(new_state ++ state):
continuation(head state)
```
To trace the state.
This way the "state" always holds the entire history of state changes, which can be very nice for debugging.
During deployment, you can then replace use a different interpreter
That's really interesting. This seems like it would be a really good approach to combine something like an otherwise pure finite state machine, but with state transitions that rely on communicating with external systems.
Normally I emit tokens to a stack which are consumed by an interpreter but then it's a bit awkward to feed the results back into the FSM, it feels like decoupling just for the sake of decoupling even though the systems need to be maintained in parallel.
Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.
You still have no scalping, but you recover the ability to back out due to unforeseen events
reply