#rl

claralistensprechen3rd@friendica.myportal.social

The answer is radio. Radio knows no borders.
#RFE #RL #Shiokaze #RadioFarda #ClandestineRadio #RadioSohl etc


Alexander Karn - 2024-11-26 18:05:32 GMT

“We are living in a world in which there is a war on truth. And what I think is coming is a war on journalists. I want to talk about America. Because what happens in America is going to affect us all.”And teachers.
Scientists.
Librarians.
Activists.
Lawyers.

There are dark clouds gathering.

As I’ve said before, fascism is incremental. Like filling a vessel, or a rain swollen river that rises and overspills its banks.

timesofmalta.com/article/there…

waynerad@diasp.org

AlphaProof is a new reinforcement-learning based system for formal math reasoning from DeepMind. AlphaProof + AlphaGeometry 2, an improved version of DeepMind's geometry system, solved 4 out of 6 problems from this year's International Mathematical Olympiad (IMO), achieving the same level as a silver medalist.

"AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct. This included the hardest problem in the competition, solved by only five contestants at this year's IMO. AlphaGeometry 2 proved the geometry problem, while the two combinatorics problems remained unsolved."

"AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go."

"Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified for correctness."

"When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean. Each proof that was found and verified is used to reinforce AlphaProof's language model, enhancing its ability to solve subsequent, more challenging problems."

"We trained AlphaProof for the IMO by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition. The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found."

The blog post seems to have revealed few details of how AlphaProof works. But it sounds like we're about to enter a new era of math proofs, where all kinds of theorems will be discovered and proved.

AI achieves silver-medal standard solving International Mathematical Olympiad problems

#solidstatelife #ai #genai #llms #reinforcementlearning #rl #mathematics #proofs

waynerad@diasp.org

Richard Sutton interviewed by Edan Meyer. Rich Sutton literally half-wrote the book on reinforcement learning -- my textbook on reinforcement learning, Reinforcement Learning: An Introduction, was written by him and Andrew Barto. I've never seen him (or Andrew Barto) on video before so this was interesting to see. (Full disclosure, I only read about half of the book, and I 'cheated' and didn't do all the exercises.)

The thing that I thought was most interesting was his disagreement with the self-supervised learning approach. For those of you not up on the terminology, "self-supervised" is a term that means you take any data, and you mask out some piece of it, and try to train your neural network to "predict" the part that's masked out from the part that isn't masked. The easiest way to do this is to just unmask all the "past" data and mask all the "future" data and as the neural network to predict the "next word" or "next video frame" or "next" whatever. It's called "self-supervised" because neural network training started with paired inputs and outputs where the "outputs" that the neural network was to learn were written by humans, and this came to be called "supervised" learning. "Unsupervised" learning came to refer to throwing mountains of data at an algorithm and asking it to find whatever patterns are in there. So to describe this alternate mode where it's like "supervised" learning but the "correct answers" are created just by masking out input data, the term "self-supervised" was coined.

I thought "self-supervised" learning was a very important breakthrough. It's what led directly to ChatGPT and all the other chatbots we know and love (we do love them right?). But Rich Sutton is kind of a downer when it comes to self-suprevised learning.

"Outside of reinforcement learning is lots of guys trying to predict the next observation, or the next video frame. Their fixation on that problem is what I mean by they've done very little, because the thing you want to predict about the world is not the next frame. You want to predict consequential things. Things that matter. Things that you can influence. And things that are happening multiple steps in the future."

"The problem is that you have to interact the world. You have to predict and control it, and you have large sensory sensory motor vectors, then the question is what is my background? Well, if I'm a supervised learning guy, I say, maybe I can apply my supervised learning tools to them. They all want to have labels, and so the labels I have is the very next data point. So I should predict that that next data point. This is is a way of thinking perfectly consistent with their background, but if you're coming from the point of reinforcement learning you think about predicting multiple steps in the future. Just as you predict value functions, predict reward, you should also predict the other events -- these things will be causal. I want to predict, what will happen if I if I drop this? Will it spill? will there be water all over? what might it feel on me? Those are not single step predictions. They involve whole sequences of actions picking things up and then spilling them and then letting them play out. There are consequences, and so to make a model of the world it's not going to be like a video frame. It's not going to be like playing out the video. You model the world at a higher level."

I talked with Rich Sutton - Edan Meyer

#solidstatelife #ai #reinforcementlearning #rl

waynerad@diasp.org

The full text of Dimitri P. Bertsekas's book A Course in Reinforcement Learning is available online for free. It's also available for purchase in print form. About 450 pages. It's the textbook for his course at Arizona State University "Reinforcement Learning and Optimal Control".

I've gone through more than half of Richard Sutton and Andrew Barto's book Reinforcement Learning: An Introduction (though I confess to have 'cheated' and not done all the exercises). It might be worth reading this book, too, to see the same material from an alternate point of view.

"Reinforcement learning can be viewed as the art and science of sequential decision making for large and difficult problems, often in the presence of imprecisely known and changing environment conditions. Dynamic programming is a broad and well-established algorithmic methodology for making optimal sequential decisions, and is the theoretical foundation upon which reinforcement learning rests. This is unlikely to change in the future, despite the rapid pace of technological innovation. In fact, there are strong connections between sequential decision making and the new wave of technological change, generative technology, transformers, GPT applications, and natural language processing ideas, as we will aim to show in this book."

"In dynamic programming there are two principal objects to compute: the optimal value function that provides the optimal cost that can be attained starting from any given initial state, and the optimal policy that provides the optimal decision to apply at any given state and time. Unfortunately, the exact application of dynamic programming runs into formidable computational difficulties, commonly referred to as the curse of dimensionality. To address these, reinforcement learning aims to approximate the optimal value function and policy, by using manageable off-line and/or on-line computation, which often involves neural networks (hence the alternative name Neuro-Dynamic Programming)."

"Thus there are two major methodological approaches in reinforcement learning: approximation in value space, where we approximate in some way the optimal value function, and approximation in policy space, whereby we construct a suboptimal policy by using some form of optimization over a suitably restricted class of policies."

"The book focuses primarily on approximation in value space, with limited coverage of approximation in policy space. However, it is structured so that it can be easily supplemented by an instructor who wishes to go into approximation in policy space in greater detail, using any of a number of available sources."

"An important part of our line of development is a new conceptual framework, which aims to bridge the gaps between the artificial intelligence, control theory, and operations research views of our subject. This framework, the focus of the author's recent monograph 'Lessons from AlphaZero ...',, centers on approximate forms of dynamic programming that are inspired by some of the major successes of reinforcement learning involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon)."

A Course in Reinforcement Learning

#solidstatelife #ai #aieducation #reinforcementlearning #rl

waynerad@diasp.org

Agent Hospital is a simulacrum of hospital with evolvable medical agents alrighty then. And an excuse to use the word "simulacrum".

"Once arrived the Agent Hospital, the patient's journey begins at the triage station. Patients arrive and describe their symptoms to the nursing agents. The instructions guide the nursing staff in their decision-making, enabling them to direct patients to the appropriate specialist departments where medical professional agents are available to conduct further diagnostics."

"After the initial assessment, patients follow the advice from the triage station and proceed to register at the registration counter. They then wait in the designated waiting area for their consultation turn with the specialists from the respective departments."

"When it is their turn for consultation, patients engage in a preliminary dialogue with the physician agents to describe their symptoms and the duration since onset. The physician then determines which medical examination is needed to investigate the cause and assist with diagnosis and treatment. In the current version, only one type of medical examination will be conducted for each patient based on the decisions made by doctor agents."

"After receiving the prescribed list of medical examinations, patients proceed to the relevant department to undergo the tests. The resulting medical data which are pre-generated by LLM are subsequently presented to the patient and the doctor. This process designed to mimic real-time diagnostic feedback, aligns with the presentation of symptoms."

"Subsequent to the medical examination, patients are guided to the respective department where physician agents undertake the diagnostic process. Patients disclose their symptoms and share the results of the medical examination with the physician agents, who then undergo diagnostic processes based on a predefined disease set. The diagnostic result is promptly communicated back to the patient, showcasing the model's capacity to integrate complex medical data and its advanced diagnostic ability."

"The medical agent is presented with the patient's symptoms, results from medical examinations and the diagnosis of the disease they made. In addition, three distinct treatment plans tailored to mild, moderate, and severe conditions are also provided. The doctor is then tasked with selecting the appropriate plan from the mild, moderate, or severe options, according to the patient's specific needs. If any medicine is prescribed, patients proceed to the dispensary to collect it."

"At the end of the diagnostic and treatment process, the patient provides feedback or updates on their health condition for follow-up actions. To mimic the dynamic progression of diseases accurately, the LLM-enhanced simulation involves a few key steps: doctors devise treatment plans based on the patient's detailed health information and test results, and then these details -- specifically the patient's symptoms, the prescribed treatment plan, and the diagnosed disease are incorporated into a template for simulation."

Ok, as you can see, quite an elaborate simulation. But how do the medical agents actually learn? The whole point of doing all this is to get medical agents that actually learn. Here's what they say (big chunk of quotes to follow):

"Doctor agents continuously learn and accumulate experience during the treatment process in Agent Hospital, thereby enhancing their medical capabilities similar to human doctors. We assume that doctor agents are constantly repeating this process during all working hours."

"Apart from improving their skills through clinical practice, doctor agents also proactively accumulate knowledge by reading medical documents outside of work hours. This process primarily involves strategies to avoid parametric knowledge learning for agents."

"To facilitate the evolution of LLM-powered medical agents, we propose MedAgent-Zero strategy MedAgent-Zero is a parameter-free strategy, and no manually labeled data is applied as AlphaGo-Zero."

"There are two important modules in this strategy, namely the Medical Record Library and the Experience Base. Successful cases, which are to be used as references for future medical interventions, are compiled and stored in the medical record library. For cases where treatment fails, doctors are tasked to reflect and analyze the reasons for diagnostic inaccuracies and distill a guiding principle to be used as a cautionary reminder for subsequent treatment processes."

"In the process of administering treatment, it is highly beneficial for doctors to consult and reference previously validated medical records. These medical records contain abundant knowledge and demonstrate the rationale behind accurate and adequate responses to diverse medical conditions. Therefore, we propose to build a medical record library for doctor agents to sharpen their medical abilities, including historical medical records from hospital practices and exemplar cases from medical documents."

"Learning from diagnostic errors is also crucial for the growth of doctors. We believe that LLM-powered medical professional agents can engage in self-reflection from these errors, distilling relevant principles (experience) to ensure correct diagnoses when encountering similar issues in future cases."

"If the answer is wrong, the agent will reflect the initial problem, generated answer, and golden answer to summarize reusable principles. All principles generated are subject to a validation process. Upon generation, the principle is integrated into the original question which was initially answered incorrectly, allowing medical professional agents to re-diagnose. Only if the diagnosis is correct will the principle be added to the experience base."

"To eliminate the influence of noise and maximize the utilization of the experience base, we incorporate additional judgment when utilizing experience. This judgment involves evaluating whether the top-K experience retrieved based on semantic similarity are helpful for the treating process. Helpful experience will be incorporated into the prompt, while unhelpful experience will be excluded."

Ok, so, kind of analogous to how our chatbots are originally pretrained (by self-supervised training) transformers that get further training from a reinforcement learning system called RLHF (reinforcement learning through human feedback), here we also have a LLM-based system where reinforcement learning is employed (albeit in a different way) to further train the LLMs.

I have mixed feelings about this. There's part of me that says this is a silly exercise, unlikely to produce anything reliable enough to be useful, and another part of me that says, yeah, but this could be the beginning of how all hospitals are run 20 or 30 years in the future.

Agent Hospital: A simulacrum of hospital with evolvable medical agents

#solidstatelife #ai #genai #llms #medicalai #reinforcementlearning #rl