#solidstatelife

waynerad@diasp.org

"Opinion: It's time for the Biden Campaign to embrace AI"

"By Kaivan Shroff, Guest Writer"

"The stakes of the 2024 presidential election cannot be overstated. With Donald Trump promising to act as a dictator 'on day one,' it is not hyperbolic to say the future of American democracy hangs in the balance. Against this backdrop, the Biden campaign faces a critical challenge: conveying a strong and effective image of President Joe Biden to a population and media ecosystem increasingly focused on optics over substance. Given the president's concerning performance last week, it's time for the Biden campaign to consider leveraging artificial intelligence (AI) to effectively reach the voting public."

"Reasonably, some may challenge the use of AI as dishonest and deceptive, but the current information ecosystem is arguably no better." "We must ask the question, are augmented AI videos that present Biden in his best form -- while sharing honest and accurate information -- really more socially damaging than our information ecosystem's current realities?"

"AI-generated content can be tailored to highlight President Biden's accomplishments, clearly articulate his policies, and present a consistent, compelling message. In an era where visual mediums and quick, digestible content dominate public perceptions, AI offers an opportunity for more effective communication. These AI-enhanced videos could ensure that the public does not make decisions about the future of our democracy based on an inconveniently timed cough, stray stutter, or healthy but hobbled walk (Biden suffers from a 'stiff gait')."

"The use of AI renderings in political campaigns is becoming increasingly common, and the Republican Party has already embraced this technology and is using AI in their attack ads against the president. Instead of a race to the bottom, the Biden campaign could consider an ethical way to deploy the same tools."

Opinion: It's time for the Biden Campaign to embrace AI | HuffPost Opinion

#solidstatelife #ai #genai #llms #computervision #deepfakes #domesticpolitics

waynerad@diasp.org

"AI scaling myths."

"Scaling will run out. The question is when."

"So far, bigger and bigger language models have proven more and more capable. But does the past predict the future?"

"One popular view is that we should expect the trends that have held so far to continue for many more orders of magnitude, and that it will potentially get us to artificial general intelligence, or AGI."

"What exactly is a 'better' model? Scaling laws only quantify the decrease in perplexity, that is, improvement in how well models can predict the next word in a sequence. Of course, perplexity is more or less irrelevant to end users -- what matters is 'emergent abilities', that is, models' tendency to acquire new capabilities as size increases."

"Emergence is not governed by any law-like behavior. It is true that so far, increases in scale have brought new capabilities. But there is no empirical regularity that gives us confidence that this will continue indefinitely."

They show a graph of airspeed of airplanes and CPU clock speed and show they look like trends you could extrapolate out until they suddenly stop.

The authors break this down into 2 sub-questions: Can more training data be acquired? Can synthetic data solve the problem?

For training data, large language models (LLMs) are already being trained on essentially all of the web, and a huge pile of additional copyrighted material that they might have to stop training on "now that copyright holders have wised up and want to be compensated." They don't mention textbooks but it seems to me like LLMs have been trained on a lot of textbooks.

You might think there's vast amounts of additional data that can be used in the form of YouTube transcripts, but the authors argue most YouTube transcripts are of low quality, and YouTube transcripts are mostly useful for teaching LLMs what spoken conversations look like. (Are they not useful for putting the audio and video together and learning the words for things in the videos?)

They argue synthetic data is sometimes useful for fixing specific gaps in training data and making domain-specific improvements in specific domains like math, code, or low-resource languages. "Self-play", like AlphaZero used to learn to play the Chinese game of Go, is analogous to creating synthetic data, and analogous to "distillation", where a slow and expensive model generates training data for a smaller, more cheaply trained model. Regardless, synthetic data is not a panacea that will lead to AGI.

I find these thoughts on scaling interesting because to me it seems like LLMs are still improving, but I wonder if the improvement is as rapid as before? Perhaps rather than exponential improvement, we're now asymptoting, and will continue to do so until the next algorithmic breakthrough, which might not happen for years, or might happen next week.

AI scaling myths by Arvind Narayanan and Sayash Kapoor

#solidstatelife #ai #genai #llms #aiscaling

waynerad@diasp.org

"Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests. In their blog post, Arditi et al. have shown that this refusal behavior is mediated by a specific direction in the model's residual stream. If we prevent the model from representing this direction, it loses its ability to refuse requests. Conversely, adding this direction artificially can cause the model to refuse even harmless requests."

"To uncensor an LLM, we first need to identify the 'refusal direction' within the model. This process involves a few technical steps:"

"Data Collection: Run the model on a set of harmful instructions and a set of harmless instructions, recording the residual stream activations at the last token position for each."

"Mean difference: Calculate the mean difference between the activations of harmful and harmless instructions. This gives us a vector representing the 'refusal direction' for each layer of the model."

"Selection: Normalize these vectors and evaluate them to select the single best 'refusal direction.'"

"Once we have identified the refusal direction, we can 'ablate' it, effectively removing the model's ability to represent this feature. This can be done through an inference-time intervention or permanently with weight orthogonalization."

"Let's talk about inference-time intervention first..."

Uncensor any LLM with abliteration

#solidstatelife #ai #genai #llms #adversarialexamples

waynerad@diasp.org

"A short-haul aircraft in the United Kingdom recently became the first airborne platform to test delicate quantum technologies that could usher in a post-GPS world--in which satellite-based navigation (be it GPS, BeiDou, Galileo, or others) cedes its singular place as a trusted navigational tool."

"At the core of Infleqtion's technology is a state of matter called a Bose-Einstein condensate (BEC), which can be made to be extremely sensitive to acceleration."

That's... crazy.

"The best inertial systems in the world, based on ring laser gyroscopes, or fiber-optic gyroscopes, can...maintain a nautical mile of precision over about two weeks of mission."

"Max Perez, vice president for strategic initiatives at the Boulder, CO-based company Infleqtion, expects Infleqtion to be able to either maintain the same nautical-mile precision over a month or more mission time -- or, conversely, increase the sensitivity over a week's mission to something like one-tenth of a nautical mile."

That's nothing compared with GPS. And making Bose-Einstein condensate (BEC) is super hard. So, I think this is not the future. But it's an amazing idea. Maybe it can be used in the future for spacecraft? Anything traveling beyond Earth's orbit won't be able to use GPS anyway.

Quantum navigational tech takes flight in new trial

#solidstatelife #quantumphysics #boseeinsteincondensate #bec #navigation #accelerometer

waynerad@diasp.org

"Kosmos-2.5 is a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format."

"This unified multimodal literate capability is achieved through a shared decoder-only auto-regressive Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images."

Kosmos-2.5: A multimodal literate model

#solidstatelife #ai #genai #llms #computervision #multimodal

waynerad@diasp.org

At Google, the fraction of code created with AI assistance via code completion, defined as the number of accepted characters from AI-based suggestions divided by the sum of manually typed characters and accepted characters from AI-based suggestions, now exceeds 50%.

"We achieved the highest impact with UX that naturally blends into users' workflows."

"We observe that with AI-based suggestions, the code author increasingly becomes a reviewer, and it is important to find a balance between the cost of review and added value."

"Quick iterations with online A/B experiments are key, as offline metrics are often only rough proxies of user value. By surfacing our AI-based features on internal tooling, we benefit greatly from being able to easily launch and iterate, measure usage data, and ask users directly about their experience through UX research."

"High quality data from activities of Google engineers across software tools, including interactions with our features, is essential for our model quality."

"Human-computer interaction has moved towards natural language as a common modality, and we are seeing a shift towards using language as the interface to software engineering tasks as well as the gateway to informational needs for software developers, all integrated in IDEs."

"ML-based automation of larger-scale tasks -- from diagnosis of an issue to landing a fix -- has begun to show initial evidence of feasibility. These possibilities are driven by innovations in agents and tool use, which permit the building of systems that use one or more LLMs as a component to accomplish a larger task."

50% still seems like a lot. I wonder how much of that 50% has "code churn" -- has to be corrected again, even after being checked in? Maybe a lot of that 50% is actually correction code on previous LLM-generated code, lol.

Also, you would think if Google engineers are now writing code 2x as fast, we ought to be seeing rapid innovation in Google products. I'm not holding my breath. To be fair, Google is trying to innovate with Gemini, "AI summaries", and various other AI products. But, Google Search seems like it's been getting slowly worse for a long time (although I still use it and it's ok for most searches), and Google has a history of canceling a lot of products. I feed oddly doubtful this 2x productivity boost will make any visible difference to us users.

AI in software engineering at Google: Progress and the path ahead

#solidstatelife #ai #genai #llms #codingai

waynerad@diasp.org

"There is a common misunderstanding, even among practitioners, that low-latency trading is a waste of human talent and resources that could instead go to advancing physics or curing cancer. It's been attacked by books like Flash Boys, governments trying to pass transaction taxes, and exchanges bending to pressure by implementing speed bumps or periodic batch auctions. This essay argues the positive case for HFT and latency competition based on four main reasons: (1) Low latency trading lowers spreads, (2) Economically significant things do happen on sub-millisecond time scales, (3) HFT is the optimization layer for capitalism, and (4) Markets are not a zero-sum game."

Contrarian thinking for today.

"Latency is equivalent to time to expiry for a market maker. The longer it takes them to hedge an executed quote, the more risk they are exposed to, and the wider the spreads they will need to charge."

"HFT breaks feedback loops in global supply and demand negotiation by propagating information more quickly and using short term predictive signals to untangle causality."

"In the context of capitalism, HFT serves as a low-level optimization layer in the global price discovery process."

"Almost all competition looks like it's winner takes all when you define the space of competition in a narrow enough region, but if you look big picture it's never that simple. HFTs are diverse. A one nanosecond speedup in one component doesn't allow anyone to take the whole market."

Opinion: Rationalizing latency competition in high-frequency trading

#solidstatelife #stockmarket #highfrequencytrading #hft #

waynerad@diasp.org

Drone-on-drone dogfights are a thing now in Ukraine. Well, these videos look less like dogfights and more like a small FPV drone sneaking up on a large, sophisticated reconnaissance drone.

"With Russian reconnaissance drones enabling devastating missile strikes deep in Ukrainian territory, Ukraine's military is turning to a novel solution: deploying agile, low-cost 'kamikaze' drones to take out their high-priced Russian counterparts in midair dogfights."

Drone dogfights: Ukraine’s novel strategy to counter Russian reconnaissance UAVs

#solidstatelife #ai #robotics #uavs #ukraineconflict

waynerad@diasp.org

Cirq is a Python software library for making quantum circuits, and then running them either on real quantum computers or on quantum simulators. If you're interested in learning quantum computing, you might give it a shot. The website won't teach you quantum computing, so they recommend a book, "Quantum Computation and Quantum Information" by Michael Nielsen and Isaac Chuang (link below -- I haven't read it -- will add it to my list of books to read "when I have time").

"Cirq provides useful abstractions for dealing with today's noisy intermediate-scale quantum computers, where details of the hardware are vital to achieving state-of-the-art results."

"The first part of creating a quantum circuit is to define a set of qubits (also known as a quantum register) to act on."

"Cirq has three main ways of defining qubits:"

"cirq.NamedQubit: used to label qubits by an abstract name."

"cirq.LineQubit: qubits labelled by number in a linear array."

"cirq.GridQubit: qubits labelled by two numbers in a rectangular lattice."

"There are also pre-packaged sets of qubits called Devices. These are qubits along with a set of rules for how they can be used. A cirq.Device can be used to ensure that two-qubit gates are only applied to qubits that are adjacent in the hardware, and other constraints."

Cirq | Google Quantum AI

#solidstatelife #ai #quantumcomputing

waynerad@diasp.org

Giskard is a tool I just learned exists for "automatic vulnerability detection for LLMs."

"With Giskard, data scientists can scan their model (tabular, NLP and LLMs) to find dozens of hidden vulnerabilities, instantaneously generate domain-specific tests, and leverage the Quality Assurance best practices of the open-source community."

"According to the Open Worldwide Application Security Project, some of the most critical vulnerabilities that affect LLMs are Prompt Injection (when LLMs are manipulated to behave as the attacker wishes), Sensitive Information Disclosure (when LLMs inadvertently leak confidential information), and Hallucination (when LLMs generate inaccurate or inappropriate content)."

"Giskard's scan feature ensures the automatic identification of such vulnerabilities, and many others. The library generates a comprehensive report which quantifies these into interpretable metrics."

"Issues detected include: Hallucinations, harmful content generation, prompt injection, robustness issues, sensitive information disclosure, stereotypes & discrimination, many more..."

I wonder how it does all that? Very intriguing. I had a glance at the source code repository, but it looks like I'd have to really dig in in order to figure out how this system works.

I found it exists from reading this article about how to use it with MLflow, "an open-source platform for managing end-to-end machine learning (ML) workflows."

Evaluating large language models with Giskard in MLflow

#solidstatelife #ai #llms #cybersecurity

waynerad@diasp.org

Udio and Suno are being sued by the three biggest music labels: Universal, Sony, and Warner.

I learned about it from this video by music industry attorney "Miss Krystle", but for those of you who prefer text, I have a link to the complaint below so you can read that.

I'm really wondering how this is going to turn out. It seems everything depends on how you define "copying". In order to train AI models, copyrighted music is copied into GPU memory. Is this "fair use? But it is not copied into the models themselves. The models are a vast assortment of neural network parameters. If the models generate output that is extremely similar to the originals, does it count as "copying"?

Does the generated music count as a "derivative work", the way a remix would? Remixes take elements of the original music and recombine them in some new, and hopefully creative way. But to do that, those "elements of the original" are copied into the remix. But the neural network doesn't literally copy anything when generating new music -- the models are a vast assortment of neural network parameters. But when you listen to the generated music, you can definitely hear elements of the original music, especially if you ask for something very specific in the prompt, the the style of specific well-known artists.

I have no idea how the courts are going to rule on this.

Udio exposed: Massive lawsuit could end AI music | court document breakdown - Top Music Attorney

#solidstatelife #ai #audioai #genai #musicai

waynerad@diasp.org

Time complexity of machine learning algorithms. I found this infographic oddly engrossing. Someone actually figured out the time complexity in "Big O" notation of linear regression ("ordinary least squares" (OLS) vs "stochastic gradient descent" (SGD)), logistic regression (binary vs multiclass one-vs-the-rest (OvR)), decision trees, random forests support vector machines, k-nearest-neighbors (kNN), naïve Bayes, principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and k-means-clustering.

The Big O formulas are represented in terms of samples, dimensions, epochs, classes, depth, and some others for particular algorithms. The have separate Big O formulas for training and inference.

Not the kind of thing you normally see on an infographic, and it seems like by studying this you could get some intuition as to the efficiency of all these algorithms.

Training and inference time complexity of 10 ML algorithms

#solidstatelife #ai #aieducation

waynerad@diasp.org

Hunyuan-DiT is an image generator that generates art with "Chinese elements" using Chinese prompts. It's an open-source model created by Chinese giant TenCent. It's a diffusion model, and diffusion models are trained from text with "contrastive" learning. Hunyuan-DiT was started using an English dataset, and then was "fine-tuned" from there with a Chinese image and Chinese text dataset. Because of this, even though it is optimized to generating Chinese images from Chinese text, it is still capable of generating images from English text. It knows Chinese places, Chinese painting styles, Chinese food, Chinese dragons, traditional Chinese attire, and so on. It looks like if you ask it to generate images of people, it will generate images of Chinese people unless you ask otherwise.

Hunyuan-DiT: A powerful multi-resolution diffusion transformer with fine-grained Chinese understanding

#solidstatelife #ai #genai #computervision #diffusionmodels

waynerad@diasp.org

"Predictably Bad Investments: Evidence from venture capitalists" by Diag Davenport. This is from 2 years ago but I just found out about it today. So this guy created a machine learning model on a venture capital database, and found the machine learning model was able to predict the successes. Not perfectly, but probabilistically, and better than the alternatives -- he runs a comparison with the bond market and S&P 500 stocks.

The VC database is called Pitchbook.

"Pitchbook is a subscription data provider widely used by investors for information on deals, companies, and other investors in private capital markets." "There are three categories of information that I synthesize from the Pitchbook data: finances (e.g., Revenue, EBITDA, total capital raised), founder information (e.g., educational background and previous experience), and company/product description."

"I begin by identifying every startup that participated in any of the top 100 accelerator and incubator programs between 2009 and 2016. Many of the companies will be familiar to the reader and include Airbnb, Doordash, Stripe, Dropbox, Coinbase, Instacart, and Uber. For each of those 16,054 firms, I then construct a dataset of all equity deals known to Pitchbook within the first five years of completing their accelerator program. One key motivation for pre-defining a set of firms of interest and then following their valuations over time is to avoid any survivorship bias which would severely limit the interpretation of any results."

"Accelerators are seen as a launch pad for many startups. The key statistic is the sum of the post-money valuations for firms launched from a given accelerator. The top two (Rocketspace and Y Combinator) support several common, though conflicting intuitions about startup investing. First, Rocketspace, while only investing in 3 companies in my data, is ranked at the top. Its sole non-zero investment is Uber. This supports the notion that one right-tail outlier can define the performance of the entire portfolio. The second investor in the table is Y Combinator which has a much broader scope, investing in over 400 companies over the course of my data. These two investors together account for over 70% of the market value of companies in my data."

"The names of the investors in the top firms will be familiar to practitioners: Benchmark, Seqouia, and Accel. These firms again appear to have some ability to discern since they make up relatively small shares of total investment, but comprise outsized shares of market value of portfolio companies. For example, Sequoia Capital only makes up 1% of invested early-stage equity in my data, but its portfolio companies make up 6% of market value in my data."

"Early-stage investment" is defined (more or less) as "any equity deal in the Pitchbook data within two years of incubator completion that is categorized as 'Series A', 'Series B', 'Seed Round', or 'Angel (Individual)' in the Deal Type."

"Late-stage exit" is defined (more or less) as "initial public offering (IPO), merger or acquisition (MA), or any funding round that is categorized by Pitchbook as Series C or later (C+) within five years of accelerator completion."

He doesn't reveal anything about the machine learning model, but claims it worked.

"The algorithm has found signal. That startup success is predictable at all creates scope for savvy investors to have persistent returns which remains an outstanding question in the literature."

However, when he looks at "failure", as defined by "bankruptcy declaration" or "documented shut down", he finds, "no clear relationship between the firm's predicted success and its probability of failing." I took this to mean that success and failure are asymmetric: it takes a long string of many successes events to make a successful firm, but only one failure event to make a failure firm -- but no, he attributes this aspect of the data to the idea that the failure of many firms is not observed. He says if there is a high expectation that a firm will be successful, then news of its failure (through either "bankruptcy declaration" or "documented shut down") is more likely to be observed and included in the database.

So what explains VCs mistakes? He thinks the major factor is having too much confidence in people and too little in product.

"Despite the fact that none of the coefficients carry causal interpretations, there are several suggestive takeaways. First, nearly all points lie directly on the axes, suggesting that firms use totally different criteria when selecting good and bad investments. Second, the word tokens along the vertical axis appear to differ systematically from those along the horizontal axis -- the model trained on the worst firms appears to prioritize founder details, whereas the model trained on the best firms appears to prioritize product details. When making good investments, investors appear to bet on the horse, but when making bad investments they appear to be betting on the jockey."

"Business icons from Ray Kroc to Jeff Bezos have captured our attention and lend credence to the mythology surrounding founders and their startups. Hit TV shows such as 'Shark Tank' and industry norms around 'Pitch Day' events suggest an ethos built around founder charisma and personal persuasion. In his best-selling book 'Good to Great', Collins (2009) asserts 'First Who, Then What' and the idea has been absorbed into the zeitgeist." "Investors seem convinced that the founder-first model of the world is the correct one."

Predictably Bad Investments: Evidence from venture capitalists

#solidstatelife #startups #venturecapital #vc

waynerad@diasp.org

"Avian eye-inspired perovskite artificial vision system for foveated and multispectral imaging".

The paper is paywalled, but from the abstract what I'm able to figure out is these researchers have developed an, essentially, digital camera, except unlike a regular digital camera which produces pictures with a uniform density of pixels everywhere, this camera has a ton of extra pixels in the center of the image, giving it a "fovea" like birds of prey. Also, like bird eyes, this "foveated" digital camera can see ultraviolet light, in addition to regular visible light.

If any of you can get around the paywall it would be interesting to find out if the reason it's made with perovskites is to provide this additional ultraviolet detection. The say it's construction is a "vertically stacked perovskite photodetector". This may mean it's not a charge-coupled device (CCD) like regular digital cameras.

This is obviously a research product so no word yet on commercialization, but it seems obvious to me that this has immediate application in military drones.

Avian eye-inspired perovskite artificial vision system for foveated and multispectral imaging

#solidstatelife #computervision #photodetectors #perovskites

waynerad@diasp.org

Agent Hospital is a simulacrum of hospital with evolvable medical agents alrighty then. And an excuse to use the word "simulacrum".

"Once arrived the Agent Hospital, the patient's journey begins at the triage station. Patients arrive and describe their symptoms to the nursing agents. The instructions guide the nursing staff in their decision-making, enabling them to direct patients to the appropriate specialist departments where medical professional agents are available to conduct further diagnostics."

"After the initial assessment, patients follow the advice from the triage station and proceed to register at the registration counter. They then wait in the designated waiting area for their consultation turn with the specialists from the respective departments."

"When it is their turn for consultation, patients engage in a preliminary dialogue with the physician agents to describe their symptoms and the duration since onset. The physician then determines which medical examination is needed to investigate the cause and assist with diagnosis and treatment. In the current version, only one type of medical examination will be conducted for each patient based on the decisions made by doctor agents."

"After receiving the prescribed list of medical examinations, patients proceed to the relevant department to undergo the tests. The resulting medical data which are pre-generated by LLM are subsequently presented to the patient and the doctor. This process designed to mimic real-time diagnostic feedback, aligns with the presentation of symptoms."

"Subsequent to the medical examination, patients are guided to the respective department where physician agents undertake the diagnostic process. Patients disclose their symptoms and share the results of the medical examination with the physician agents, who then undergo diagnostic processes based on a predefined disease set. The diagnostic result is promptly communicated back to the patient, showcasing the model's capacity to integrate complex medical data and its advanced diagnostic ability."

"The medical agent is presented with the patient's symptoms, results from medical examinations and the diagnosis of the disease they made. In addition, three distinct treatment plans tailored to mild, moderate, and severe conditions are also provided. The doctor is then tasked with selecting the appropriate plan from the mild, moderate, or severe options, according to the patient's specific needs. If any medicine is prescribed, patients proceed to the dispensary to collect it."

"At the end of the diagnostic and treatment process, the patient provides feedback or updates on their health condition for follow-up actions. To mimic the dynamic progression of diseases accurately, the LLM-enhanced simulation involves a few key steps: doctors devise treatment plans based on the patient's detailed health information and test results, and then these details -- specifically the patient's symptoms, the prescribed treatment plan, and the diagnosed disease are incorporated into a template for simulation."

Ok, as you can see, quite an elaborate simulation. But how do the medical agents actually learn? The whole point of doing all this is to get medical agents that actually learn. Here's what they say (big chunk of quotes to follow):

"Doctor agents continuously learn and accumulate experience during the treatment process in Agent Hospital, thereby enhancing their medical capabilities similar to human doctors. We assume that doctor agents are constantly repeating this process during all working hours."

"Apart from improving their skills through clinical practice, doctor agents also proactively accumulate knowledge by reading medical documents outside of work hours. This process primarily involves strategies to avoid parametric knowledge learning for agents."

"To facilitate the evolution of LLM-powered medical agents, we propose MedAgent-Zero strategy MedAgent-Zero is a parameter-free strategy, and no manually labeled data is applied as AlphaGo-Zero."

"There are two important modules in this strategy, namely the Medical Record Library and the Experience Base. Successful cases, which are to be used as references for future medical interventions, are compiled and stored in the medical record library. For cases where treatment fails, doctors are tasked to reflect and analyze the reasons for diagnostic inaccuracies and distill a guiding principle to be used as a cautionary reminder for subsequent treatment processes."

"In the process of administering treatment, it is highly beneficial for doctors to consult and reference previously validated medical records. These medical records contain abundant knowledge and demonstrate the rationale behind accurate and adequate responses to diverse medical conditions. Therefore, we propose to build a medical record library for doctor agents to sharpen their medical abilities, including historical medical records from hospital practices and exemplar cases from medical documents."

"Learning from diagnostic errors is also crucial for the growth of doctors. We believe that LLM-powered medical professional agents can engage in self-reflection from these errors, distilling relevant principles (experience) to ensure correct diagnoses when encountering similar issues in future cases."

"If the answer is wrong, the agent will reflect the initial problem, generated answer, and golden answer to summarize reusable principles. All principles generated are subject to a validation process. Upon generation, the principle is integrated into the original question which was initially answered incorrectly, allowing medical professional agents to re-diagnose. Only if the diagnosis is correct will the principle be added to the experience base."

"To eliminate the influence of noise and maximize the utilization of the experience base, we incorporate additional judgment when utilizing experience. This judgment involves evaluating whether the top-K experience retrieved based on semantic similarity are helpful for the treating process. Helpful experience will be incorporated into the prompt, while unhelpful experience will be excluded."

Ok, so, kind of analogous to how our chatbots are originally pretrained (by self-supervised training) transformers that get further training from a reinforcement learning system called RLHF (reinforcement learning through human feedback), here we also have a LLM-based system where reinforcement learning is employed (albeit in a different way) to further train the LLMs.

I have mixed feelings about this. There's part of me that says this is a silly exercise, unlikely to produce anything reliable enough to be useful, and another part of me that says, yeah, but this could be the beginning of how all hospitals are run 20 or 30 years in the future.

Agent Hospital: A simulacrum of hospital with evolvable medical agents

#solidstatelife #ai #genai #llms #medicalai #reinforcementlearning #rl

waynerad@diasp.org

ToonCrafter: Generative Cartoon Interpolation.

Check out the numerous examples. This looks like something that could really help human animators make cartoons faster without losing their own hand-drawn animation style.

The way the system works is you input two frames, and ask the system to interpolate all the frames in between. You can optionally further augment the input with a sketch.

The way the system works is by using a diffusion model for video generation called DynamiCrafter. The DynamiCrafter has an internal "latent representation" that encodes something of the meaning of the frames that it uses to generate the video frames.

This system, ToonCrafter, uses the first and last frames to work backward to the "latent representations", then interpolates the "latent representations" to get the intermediate frames.

Because DynamiCrafter was trained on live-action video, and there's a huge gap in visual style between live-action and cartoons, such as exaggerated expressions and simplified textures, they had to take pains to "fine tune" the system with a lot of additional training on a high-quality cartoon dataset they constructed themselves.

It addition to the DynamiCrafter video generator, the also added a "detail-injecting" 3D decoder. This is an additional complex part of the system, with multiple 3D residual network layers and upsampling layers.

ToonCrafter: Generative Cartoon Interpolation

#solidstatelife #ai #genai #computervision #diffusionmodels #animation

waynerad@diasp.org

The future of nuclear reactors: "Micro" reactors?

A company called Radiant is developing a "portable" (but is it really going to get moved around after the initial installation? I think maybe "prefabricated" is a more accurate term) nuclear reactor. Small reactors have already been made for nuclear submarines, so why not civilian use?

Their reactor uses helium instead of water because, they say, helium doesn't become radioactive when exposed to radiation in the same way water/steam does. If there is an accident and all the coolant is released, it won't be radioactive, and won't fill the surroundings with radioactive pollutant.

The reactor is designed so that outside a nearby fence, radiation levels will be low enough to put a sidewalk or a McDonald's.

One unique thing about their design approach is they created a software model of the reactor and it's physics, and use it to test the reactor's control system against failures. So for example, you can click a button on a screen, and that will cause a component to fail in the simulation. Then you can see if the control system handles the failure correctly. Using this approach they are working on getting the control system to correctly handle every imaginable failure. They make sure that in shutdown scenarios the reactor will always shut down safely.

The future of nuclear = small, mobile, microreactors | Radiant - S3

#solidstatelife #energy #nuclear

waynerad@diasp.org

François Chollet claims to have a test for neural networks called ARC, which stands for Abstraction and Reasoning Corpus. The test reminds me of Raven's progressive matrices (the IQ test), but it uses larger grids and up to 10 unique symbols. Grids can go as large as 30x30.

The test is specifically designed to be resistant to memorization, and to require the test-taker to try new ideas.

In the discussion here (video between François Chollet and Dwarkesh Patel), they discuss how current large language models (LLMs) are currently doing essentially a lot of memorization.

I found it a fascinating discussion. If you study data science, one of the very first things you learn is the concept of "overfitting", where instead of learning the "general pattern", your model essentially memorizes the input points. Such a model does a bad job on input data points it has not seen before, or that aren't close enough to data points it has seen before.

One of the mysteries of neural networks is how, as you make the models larger and larger, they don't overfit, but continue to "generalize", to learn general patterns.

However, it seems like, even though today's current large language models (LLMs) don't overfit in the traditional statistical sense, they nonetheless rely heavily on memorization. You can ask it questions from various tests made for humans, like the US Medical Licensing Exam (USMLE), and it can outperform most humans, but it does so by relying on a vast amount of memorized input patterns.

If you give LLMs problems that are different enough from the input it has memorized, it will be unable to solve them, even if those same problems are easy to solve by humans, even human children, using simple reasoning.

Such is the claim made by François Chollet, and he and his collaborators are willing to put money on the line, offering $1 million in prize money to anyone who can make a neural network model that can beat the test. Apparently the test was originally invented in 2019, and while LLMs have seen dramatically increasing scores on other tests made for humans, there's only been slight improvement on the ARC test.

In the discussion, they talk a lot about "system 1" and "system 2". These terms come from Daniel Kahneman who hypothesized that the brain has a "system 1" that does its thinking in a fast, automatic, intuitive, effortless way, and a "system 2" that is slow, deliberate, conscious, and the opposite of effortless which I guess would be effortful, demanding of effort, and which is required to solve complex problems requiring careful reasoning. François Chollet hypothesizes that humans always use a combination of "system 1" and "system 2" and are not pure "system 1" or "system 2" thinkers. And this simple fact enables humans, even human children with relatively little memorized knowledge, to engage in reasoning beyond what LLMs are capable of.

I find that an intriguing concept because, subjectively, it seems like while LLMs are sometimes astonishingly brilliant, they are also sometimes make surprising mistakes, and their knowledge often seems to be shallow, getting the "surface style" exactly right initially but floundering if you try to dig too deep underneath it. So subjectively, it does seem like maybe a phenomena analogous somehow to "overfitting" is actually taking place, though it's hard to pin down exactly what it is.

It will interesting to see if anyone steps up to the plate and claims the $1 million prize any time soon.

Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution -Dwarkesh Patel

#solidstatelife #ai #agi