#scientificmethod

waynerad@diasp.org

"Today, we're excited to introduce The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently."

"We" meaning Sakana AI.

"The AI Scientist automates the entire research lifecycle, from generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing them, and presenting its findings in a full scientific manuscript."

"We also introduce an automated peer review process to evaluate generated papers, write feedback, and further improve results. It is capable of evaluating generated papers with near-human accuracy."

"The automated scientific discovery process is repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, thus imitating the human scientific community."

"In this first demonstration, The AI Scientist conducts research in diverse subfields within machine learning research, discovering novel contributions in popular areas, such as diffusion models, transformers, and grokking."

"The AI Scientist is designed to be compute efficient. Each idea is implemented and developed into a full paper at a cost of approximately $15 per paper. While there are still occasional flaws in the papers produced by this first version (discussed below and in the report), this cost and the promise the system shows so far illustrate the potential of The AI Scientist to democratize research and significantly accelerate scientific progress."

The obvious step that's missing to me is replication. But let's continue.

The 4-step process to paper (it looks like I'm quoting a lot but I tried to chop this down):

"1. Idea Generation: Given a starting template, The AI Scientist first 'brainstorms' a diverse set of novel research directions. We take inspiration from evolutionary computation and open-endedness research and iteratively grow an archive of ideas using LLMs as the mutation operator. Each idea comprises a description, experiment execution plan, and (self-assessed) numerical scores of interestingness, novelty, and feasibility. At each iteration, we prompt the language model to generate an interesting new research direction conditional on the existing archive, which can include the numerical review scores from completed previous ideas. We use multiple rounds of chain-of-thought and self-reflection to refine and develop each idea. After idea generation, we filter ideas by connecting the language model with the Semantic Scholar API and web access as a tool. This allows The AI Scientist to discard any idea that is too similar to existing literature."

"2. Experiment Iteration: Given an idea and a template, the second phase of The AI Scientist first executes the proposed experiments and then visualizes its results for the downstream write-up. The AI Scientist uses Aider to first plan a list of experiments to run and then executes them in order. We make this process more robust by returning any errors upon a failure or time-out (e.g. experiments taking too long to run) to Aider to fix the code and re-attempt up to four times."

"After the completion of each experiment, Aider is then given the results and told to take notes in the style of an experimental journal. Currently, it only conditions on text but in future versions, this could include data visualizations or any modality. Conditional on the results, it then re-plans and implements the next experiment. This process is repeated up to five times. Upon completion of experiments, Aider is prompted to edit a plotting script to create figures for the paper using Python. The AI Scientist makes a note describing what each plot contains, enabling the saved figures and experimental notes to provide all the information required to write up the paper. At all steps, Aider sees its history of execution."

"3. Paper Write-up: The third phase of The AI Scientist produces a concise and informative write-up of its progress in the style of a standard machine learning conference proceeding in LaTeX. We note that writing good LaTeX can even take competent human researchers some time, so we take several steps to robustify the process. This consists of the following:"

"(a) Per-Section Text Generation: The recorded notes and plots are passed to Aider, which is prompted to fill in a blank conference template section by section. This goes in order of introduction, background, methods, experimental setup, results, and then the conclusion. All previous sections of the paper it has already written are in the context of the language model."

"(b) Web Search for References: In a similar vein to idea generation, The AI Scientist is allowed 20 rounds to poll the Semantic Scholar API looking for the most relevant sources to compare and contrast the near-completed paper against for the related work section."

"(c) Refinement: After the previous two stages, The AI Scientist has a completed first draft, but can often be overly verbose and repetitive. To resolve this, we perform one final round of self-reflection section-by-section."

"(d) Compilation: Once the LaTeX template has been filled in with all the appropriate results, this is fed into a LaTeX compiler. We use a LaTeX linter and pipe compilation errors back into Aider so that it can automatically correct any issues."

After the paper is produced, we're not done.

"Automated paper reviewing: A key component of an effective scientific community is its reviewing system, which evaluates and improves the quality of scientific papers. To mimic such a process using large language models, we design a GPT-4o-based agent to conduct paper reviews based on the Neural Information Processing Systems (NeurIPS) conference review guidelines."

"To evaluate the LLM-based reviewer's performance, we compared the artificially generated decisions with ground truth data for 500 ICLR 2022 papers extracted from the publicly available OpenReview dataset."

They provide an example paper, "Dualscale Diffusion: Adaptive feature balancing for low-dimensional generative models", so you can evaluate how well you think the system works.

The AI Scientist: Towards fully automated open-ended scientific discovery

#solidstatelife #ai #genai #llms #scientificmethod

wist@diasp.org

A quotation from Russell, Bertrand

If you have a good scientific imagination, you can think of all sorts of things that might be true, and that’s the essence of science. You first think of something that might be true — then you look to see if it is, and generally it isn’t.

Bertrand Russell (1872-1970) English mathematician and philosopher
Interview by Woodrow Wyatt, BBC TV (1959)

#quote #quotes #quotation #imagination #science #scientificmethod #test #truth
Sourcing / notes: https://wist.info/russell-bertrand/23515/

wist@diasp.org

A quotation from Dyson, Freeman

Progress in science is often built on wrong theories that are later corrected. It is better to be wrong than to be vague.

Freeman Dyson (1923-2020) English-American theoretical physicist, mathematician, futurist
The Scientist as Rebel, Part 3, ch. 19 “The World on a String” (2006)

#quote #quotes #quotation #correction #error #science #scientificmethod
Sourcing / notes: https://wist.info/dyson-freeman/60115/