More

rassibassi · on July 13, 2024

CFG (context free grammar), which is explained in the book, is used here together with LLMs: https://tree-diffusion.github.io

rassibassi · on June 15, 2024

In this context, RAG isn't what's being discussed. Instead, the reference is to a process similar to monte carlo tree search, such as that used in the AlphaGo algorithm.

Presently, a large language model (LLM) uses the same amount of computing resources for both simple and complex problems, which is seen as a drawback. Imagine if an LLM could adjust its computational effort based on the complexity of the task. During inference, it might then perform a sort of search across the solution space. The "search" mentioned in the article means just that, a method of dynamically managing computational resources at the time of testing, allowing for exploration of the solution space before beginning to "predict the next token."

At OpenAI Noam Brown is working on this, giving AI the ability to "ponder" (or "search"), see his twitter post: https://x.com/polynoamial/status/1676971503261454340

rassibassi · on March 19, 2024

What's the difference to using joblibs Memory class similar to this implementation:

https://github.com/stanfordnlp/dspy/blob/main/dsp/modules/ca...

khaledh · on March 19, 2024

I was going to mention this as well. It's fairly similar:

  memory = joblib.memory.Memory(...)
  
  @memory.cache
  def slow_func(...):
      ...

rassibassi · on March 19, 2024

The diskcache docs state:

""" Caching Libraries

    joblib.Memory provides caching functions and works by explicitly saving the inputs and outputs to files. It is designed to work with non-hashable and potentially large input and output data types such as numpy arrays.

""" From https://pypi.org/project/diskcache/

williamzeng0 · on March 19, 2024

This is great! I see it also supports an 'ignore' parameter.

rassibassi · on Aug 16, 2023

If I understand you correctly, you basically mean some interpolation of knowledge within Z, that was not known to humans before, but can be synthesized by recombination of information within Z.

Yet, that still means, knowledge of O cannot be synthesized.

rassibassi · on Oct 24, 2022

The upper bound is still the Shannon limit. The experiment does a lot of multiplexing: spatial multi-core fiber, spectral multi channel multiplexing across wavelength, dual polarization.

Each of the multiplexed channels are individually limited by the Shannon limit, and with higher power the fiber's Kerr effect creates interference which creates a sweet spot for the optimal optical launch power.

the novelty here is that the spectral channels are all generated from a single laser source rather than a laser per channel

igravious · on Oct 24, 2022

   ^ Superb answer ^
   |               |

Shannon Limit in Information Theory

[1] https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

[2] https://news.mit.edu/2010/explained-shannon-0115

rassibassi · on Oct 24, 2022

Not practical yet, the novelty is the frequency comb which allows +200 channels across wavelength with only a single laser, where before one required 200 lasers.

In an experiment like this, only the initial light source is modulated and therefore all channels carry the same data. The equipment for the transmitter and receiver chain is so expensive that university labs can barely afford one of each.

cycomanic · on Oct 24, 2022

Almost correct. You typically need 2-4 transmitters to emulate the system. So you modulate one or two channels under test and modulate the rest of the band with a single modulator and use some decorrelation tricks to be realistic. Then you scan your channels under test through the whole band. This is in typically a lower bound of performance, i.e. a real system would likely perform better. As you said, using individual transmitters is economically unfeasible even for the best equipped industry labs.

Dylan16807 · on Oct 24, 2022

Does that mean "We experimentally demonstrate transmission of 1.84 Pbit s–1" in the paper abstract is a lie?

henrikeh · on Oct 24, 2022

I worked on this project and cycomanic summarizes the practice well. I’ve written more on it here: https://news.ycombinator.com/item?id=33321506

Dylan16807 · on Oct 24, 2022

Well, the technology is just as impressive either way, but I think "we experimentally demonstrate transmission of 1.84 Pbit s–1" is misleading. The capacity was demonstrated piecewise but that data rate was not demonstrated.

rassibassi · on Oct 24, 2022

They also multiplex across +200 channels across wavelength (wavelength division multiplexing).

Not sure what the baud rate of a single channel was in their experiment but probably between 32-80Gb which is common for the lab equipment at Universities. The industry is knocking on 100-400Gb where for the actual decoding and signal processing there is massive parallelism applied to reduce the rate even more

rassibassi · on Sept 10, 2022

Yes, and another reason for the small model size and the novelty of the underlying paper [1], is that the diffusion model is not acting on the pixel space but rather on a latent space. This means that this 'latent diffusion model' does not only learn the task at hand (image synthesis) but in parallel also a powerful lossy compression model via an outer auto encoder structure. Now, the number of weights (model size) can be reduced drastically as the inner neural network layers act on a lower dimensional latent space rather than a high dimensional pixel space. It's fascinating because it shows that deep learning at its core comes down to compression/decompression (encoding/decoding), with close relation to Shannon's Information Theory (e.g. source coding/channel coding/data processing inequality).

[1] https://arxiv.org/abs/2112.10752

jrm4 · on Sept 11, 2022

Oh, wow. Now that you mention how it's similar to lossy (if not the same as) compression it all makes a LOT of sense. This is great. I teach IT and I already do a bit on how lossy compression works, (e.g. hey, if you see a blue pixel and then another slightly darker one next to it, what's the NEXT likely to be?) and this is something of an extension of that.

rassibassi · on Sept 10, 2022

Correction: the auto encoder is pre-trained :)

rassibassi · on Aug 28, 2022

Don't be so hard on yourself, and don't compare yourself to a computer science prodigy. You can teach yourself to become a decent web developer in 1 year and take it from there, your life is far from being wasted in any sense.

rassibassi · on Aug 11, 2022

Nice, saw you also have a section on fonts! Think the following could make you smile, too. Checkout this font that changes with the users facial expression (shameless plug):

Description: https://danishdesignaward.com/en/arkiver/nominee/adam-lenzin...

Demo: https://facetype3000.herokuapp.com

swyx · on Aug 11, 2022

haha nice demo! needs a smoothing function, wobbles too much. also i use an external cam, would be nice to pick cam source