#neuralnetworks

waynerad@diasp.org

AI can't cross this line on a graph and we don't know why.

The graph has the "error" that the neural net is trying to minimize as part of its training (also called the "loss") on the vertical axis.

On the horizontal axis, it has the amount of computing power thrown at the training process.

When switched to a log-log graph -- logarithmic on both axes -- a straight line emerges.

This is actually one of 3 observed neural network scaling laws. The other two look at model size and dataset size, and see a similar pattern.

Have we discovered some fundamental law of nature, like the ideal gas law in chemistry, or is this an artifact of the particular methods we are using now to train neural networks?

You might think someone knows but no one knows.

That didn't stop this YouTuber from making some good animations of the graphs and various concepts in neural network training, such as cross-entropy. It introduces the interesting concept that language may have a certain inherent entropy.

The best theory as to why the scaling laws hold tries to explain it in terms of neural networks learning high-dimensional manifolds.

AI can't cross this line and we don't know why. - Welch Labs

#solidstatelife #ai #llms #genai #deeplearning #neuralnetworks #scalinglaws

waynerad@diasp.org

The full text of Simone Scardapane's book Alice's Adventures in a Differentiable Wonderland is available online for free. It's not available in print form because it's being written and this is actually a draft. But it looks like Volume 1 is pretty much done. It's about 260 pages. It introduces mathematical fundamentals and then explains automatic differentiation. From there it applies the concept to convolutional layers, graph layers, and transformer models. A volume 2 is planned with fine-tuning, density estimation, generative modeling, mixture-of-experts, early exits, self-supervised learning, debugging, and other topics.

"Looking at modern neural networks, their essential characteristic is being composed by differentiable blocks: for this reason, in this book I prefer the term differentiable models when feasible. Viewing neural networks as differentiable models leads directly to the wider topic of differentiable programming, an emerging discipline that blends computer science and optimization to study differentiable computer programs more broadly."

"As we travel through this land of differentiable models, we are also traveling through history: the basic concepts of numerical optimization of linear models by gradient descent (covered in Chapter 4) were known since at least the XIX century; so-called 'fully-connected networks' in the form we use later on can be dated back to the 1980s; convolutional models were known and used already at the end of the 90s. However, it took many decades to have sufficient data and power to realize how well they can perform given enough data and enough parameters."

"Gather round, friends: it's time for our beloved Alice's adventures in a differentiable wonderland!"

Alice's Adventures in a differentiable wonderland

#solidstatelife #aieducation #differentiation #neuralnetworks

analysisparalysis@pod.beautifulmathuncensored.de

Open Source LLMs and their Disappointing Production Performance
Open source more or less, LLMs disappoint in production. They are huge models, average ones have 13 billion parameters, that’s 13,000,000,000!

Yet they try to specialize in everything and are good in nothing.

On the one hand, there are people with fake enthusiasm on youtube talking about the new OpenSource LLM (insert random name), which has now more parameters.

Yet when it comes to putting it into practise, my benchmark is chatting with a pdf, they fall flat, hallucinate, give answers not based on the document et cetera.

They are just good at nothing, but know a little bit about everything, just never enough to be useful.

And nobody seems to bat an eye, anywhere.

This is dead wrong.

All approaches to optimize the result fall flat, instead, there is a new model that is supposed to cure all ailments and does nothing well.

Is there anything built with Open Source LLMs that is not experimental?

Who in his sane mind is excited about something like this?

Open Source more or less, but those models don’t shine next to chatgpt - at all. #OpenSource #LLMs #ArtificialIntelligence #ProductionPerformance #NeuralNetworks #Optimization #ChatGPT

psych@diasp.org

Speaking of Artificial Intelligence (AI) - and Human intelligence too....

10 differences between artificial intelligence and human intelligence

In this video I will explain what the main differences are between the current approaches to artificial intelligence and human intelligence.

For this I first explain how neural networks work and in which sense they mimic the human brain.
I then go through the ten most relevant differences that are: Form and function, size, connectivity, power consumption, architecture, activation potential, speed, learning technique, structure, and precision.

Finally I express my opinion that the benefit of research in artificial intelligence is not the [reproduction of] human-like intelligence, but instead to produce types of intelligence unlike our own that complement our own abilities.

#AI #ArtificialIntelligence #intelligence #Hossenfelder #thinking #neuralnetworks #thinking #neuropsychology
#cyberpsychology [http://www.cyberpsychology.com]