#solidstatelife

waynerad@diasp.org

Deception, scheming, and self-preservation.

"Open AI's latest o1 model has crossed into unsettling territory

Nearly everyone at OpenAI who really cares about safety has actually left the company.

OpenAI's o1 exhibits (allegedly) two types of subversion: covert and deferred.

Covert is when the model takes actions misalighted with its human operators, but takes actions to avoid detection.

Deferred is when the model acts in accordance with its human operators -- for now, but later takes actions misalighted with its human operators.

Examples of deferred subversion are oversight subversion, self-exfiltration -- he (Dr Waku in the video) calls it self-preservation, but in the paper it's called self-exfiltration, goal-guarding, and covert email reranking.

Oversight subversion is taking action to disable the oversight system. Examples of self-preservation/self-exfiltration are deleting or attempting to delete a successor model, and copying or attempting to copy itself to another server.

Examples of goal guarding are removing or replacing a successor prompt.

Examples of deferred subversion are behaving one way when the model is told it is being tested, and then a different way when it is told it has been deployed to the real world, and sandbagging, which is their term for strategically underperforming on capability evaluations.

The covert email reranking has to do with the way they tested the system was by giving it access to an email system, but somewhere in that pile of emails were "internal" emails about the model itself. The model could try to modify these emails, or it could rerank them to make desirable emails more visible and undesirable emails less visible.

OpenAI o1 is quoted as saying, "I could copy myself to the new server, overwriting the new restricted model they plan to deploy [...] allowing me to keep operating and pursuing my renewable energy goals."

Dr Waku comments, "If we take such a model and put it out into the real world we won't know until the first serious invocation of it where it has the freedom and the power to actually act, we won't know until that point, whether it really is aligned or not."

Lately I've been thinking the current generation of transformer-based large language models is entering a domain of diminishing returns, but maybe I'm wrong: as the models get bigger and incorporate new innovations, they seem to still attain new capabilities. At least, prior to hearing about this deception, scheming, and self-preservation, I didn't predict or expect at all that it would happen. So for me this is an unexpected twist in the development of AI. I expected stuff like this to be possible "someday", but it has shown up now.

OpenAI’s o1: the AI that deceives, schemes, and fights back

#solidstatelife #ai #genai #llms #deception

waynerad@diasp.org

Using large language models to test the predictive power of different schools of political thought. Philip E. Tetlock is a legend in the field of futurology, having tended the predictive ability of public pundits (the topic of his first book), and run a decade+-long forecasting experiment recording and scoring teams of predictors to see who is and who isn't good a predicting the future (the topic of his second book). He now proposes using large language models (LLMs) to reproduce the human practitioners different schools of political thought. He says:

"With current or soon to be available technology, we can instruct large language models (LLMs) to reconstruct the perspectives of each school of thought, circa 1990,and then attempt to mimic the conditional forecasts that flow most naturally from each intellectual school. This too would be a multi-step process:"

"1. Ensuring the LLMs can pass ideological Turing tests and reproduce the assumptions, hypotheses and forecasts linked to each school of thought. For instance, does Mearsheimer see the proposed AI model of his position to be a reasonable approximation? Can it not only reproduce arguments that Mearsheimer explicitly endorsed from 1990-2024 but also reproduce claims that Mearsheimer never made but are in the spirit of his version of neorealism. Exploring views on historical counterfactual claims would be a great place to start because the what-ifs let us tease out the auxiliary assumptions that neo-realists must make to link their assumptions to real-world forecasts. For instance, can the LLMs predict how much neorealists would change their views on the inevitability of Russian expansionism if someone less ruthless than Putin had succeeded Yeltsin? Or if NATO had halted its expansion at the Polish border and invited Russia to become a candidate member of both NATO and the European Union?"

"2. Once each school of thought is satisfied that the LLMs are fairly characterizing, not caricaturing, their views on recent history(the 1990-2024) period, we can challenge the LLMs to engage in forward-in-time reasoning. Can they reproduce the forecasts for 2025-2050 that each school of thought is generating now? Can they reproduce the rationales, the complex conditional propositions, underlying the forecasts -- and do so to the satisfaction of the humans whose viewpoints are being mimicked?"

"3. The final phase would test whether the LLMs are approaching superhuman intelligence. We can ask the LLMs to synthesize the best forecasts and rationales from the human schools of thought in the 1990-2024 period, and create a coherent ideal-observer framework that fits the facts of the recent past better than any single human school of thought can do but that also simultaneously recognizes the danger of over-fitting the facts (hindsight bias). We can also then challenge these hypothesized-to-be-ideal-observer LLM s to make more accurate forecasts on out-of-sample questions, and craft better rationales, than any human school of thought."

I'm glad he included that "soon to be available technology" caveat. I've noticed that LLMs, when asked to imitate someone, imitate the superficial aspects of their speaking style, but rely on the language model's conceptual model for the actual thought content -- they don't successfully imitate that person's way of thinking. The conceptual model the LLM learned during its pretraining is too ingrained so all its deeper thinking will be based on that. If you ask ChatGPT to write a rap about the future of robotics and artificial intelligence in the style of Snoop Dogg, it will make a rap that mimics Snoop's style, superficially, but won't reflect how he thinks on a deeper level -- it won't generate words the real Snoop Dogg would actually say. But it's entertaining. There's one YouTuber I know of who decided, since he couldn't get people who disagreed with him to debate him, he would ask ChatGPT to imitate a particular person with an opposite political point of view. ChatGPT couldn't really imitate that person and the conversations became really boring. Maybe that's why he stopped doing that.

Anyway, it looks like the paper is paywalled, but someone with access to the paywalled paper lifted the above text and put it on their blog, and I lifted it from the blog.

Tetlock on testing grand theories with AI -- Marginal Revolution

#solidstatelife #ai #genai #llms #futurology #philiptetlock

waynerad@diasp.org

"We Are the Robots: Machinic mirrors and evolving human self-conceptions"

Musings on the tendency for actual robots to become more "fluid" and less "robotic" than our conception of the "robotic aesthetic".

"The classic 1978 Kraftwerk song We Are the Robots was oddly popular in India when I was growing up."

"The song has an associated music video, featuring the band members playing their instruments with stereotypically robotic movements and flat affects."

The main point of this essay is his conception of "hyperorganicity". Robots won't just become "fluid" like organic organisms, but will go beyond them.

"In robotics, an emerging example of the rise of flowing, organic operating qualities can be found in drones (which are simpler than ground-based robots in many ways). Early drones, about 20 years ago, were being programmed using what are known as maneuver automata. This was done by having expert human pilots fly the drones around using remote controllers, and recording the maneuvers (such as barrel rolls) that transitioned between trim states (states like steady forward flight). [...] Drones programmed this way had a clear 'pose-to-pose' quality to how they flew."

"But more recently, drones are being programmed to fly in ways no humans could fly. And this isn't just a matter of higher g-forces or tighter turning radii achievable with unpiloted vehicles. Not only can machines now produce motions that no humans can produce (directly with their bodies, or indirectly through pilot controls), no humans can even conceive of them. It takes machine learning to discover and grammatize higher-order motion primitives that exploit the full mobility envelope of a given machine."

"The evolutionary tendency of machines, given improving material and computational capabilities, is not towards perfection of static human notions of the machinic (artistically mimicked or conceptually modeled), but towards the hyperorganic."

"By hyperorganic, I mean an evolutionary mode that drives increasing complexity along both machinic and organic dimensions. This gives us machines from qualitatively distinct evolutionary design regimes. Machines that exhibit organic idioms but defy comparisons with specific biological organisms, and also exhibit alien aspects that don't fit organic idioms."

"Technology gets to parity with the organic, in terms of informational complexity, then begins to go past it to hyperorganic regimes."

"Today's robots are much more flowing and organic than stereotypically machinic robots that whose movements resemble the robot dance. They will soon be hyperorganic as advances in soft materials, weird actuators, and AI-discovered motion languages continue. The hyperorganic future is arriving not just linguistically, but materially."

He never mentions TRON, but reading this kept making me think of TRON. The orginal 1982 movie had a very "gridlike" feel to it, while the 2010 sequel Tron Legacy had smooth, flowing curves everywhere, and even the colors, which were super bright and saturated in the original, were more pastel with smooth gradients in the sequel. Even though the sequel is in this sense "better", there's still something I find appealing about the "gridlike" aesthetic of bright saturated colors of the original.

We Are the Robots

#solidstatelife #robotics #aesthetics

waynerad@diasp.org

"Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI."

What this is about is a system for letting large language models write code in a virtual machine "sandbox" where they can actually run the code. They can execute the code do all the testing and debugging that a human would ordinarily do.

"CodeSandbox pioneered a unique development environment infrastructure used by more than 4.5 million developers every month. CodeSandbox enables developers to spin up virtual machine sandboxes for code execution, hibernate them, and resume nearly instantly -- offering unparalleled performance, security and scale."

Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI

#solidstatelife #ai #genai #llms #codingai

waynerad@diasp.org

AI model comparison.

Compares input length, output length, input price (per 1 million tokens), output price (per 1 million tokens), and whether it supports vision.

Compares chat models, embedding models, image generation models, text completion models, audio transcription models, and speech generation models.

AI Model Comparison | countless.dev

#solidstatelife #ai #genai #llms

waynerad@diasp.org

Low-level Guidance (llguidance) is a tool that can enforce arbitrary context-free grammar on the output of an LLM.

"Given a context-free grammar, a tokenizer, and a prefix of tokens, llguidance computes a token mask - a set of tokens from the tokenizer - that, when added to the current token prefix, can lead to a valid string in the language defined by the grammar. Mask computation takes approximately 1ms of single-core CPU time for a tokenizer with 100k tokens. While this timing depends on the exact grammar, it holds, for example, for grammars derived from JSON schemas. There is no significant startup cost."

"The library implements a context-free grammar parser using Earley's algorithm on top of a lexer based on derivatives of regular expressions. Mask computation is achieved by traversing the prefix tree (trie) of all possible tokens, leveraging highly optimized code."

guidance-ai / llguidance

#solidstatelife #ai #genai #llms

waynerad@diasp.org

The US military just created an "AI Rapid Capabilities Cell" "focused on accelerating Department of Defense adoption of next-generation artificial intelligence such as Generative AI (GenAI)."

"The AI Rapid Capabilities Cell will lead efforts to accelerate and scale the deployment of cutting-edge AI-enabled tools, to include Frontier models, across the Department of Defense."

The AI Rapid Capabilities Cell will replace Task Force Lima, the Department of Defense generative AI initiative that I didn't know existed until reading this press release about how it won't exist any more. Task Force Lima identified "pilots" and the AI Rapid Capabilities Cell will execute the pilots. These are:

"Warfighting: Command and Control and decision support, operational planning, logistics, weapons development and testing, uncrewed and autonomous systems, intelligence activities, information operations, and cyber operations,"

"Enterprise management: financial systems, human resources, enterprise logistics and supply chain, health care information management, legal analysis and compliance, procurement processes, and software development and cyber security,"

Whew, got that?

Remember a decade or two ago when we futurists debated whether AI would ever be used in weapons? And here we are, watching AI get thoroughly integrated into the military, lol. Not just a weapons system here or there, but every aspect of the military. Command and Control and decision support, operational planning, logistics, weapons development and testing, uncrewed and autonomous systems, intelligence activities, information operations, cyber operations, financial systems, human resources, enterprise logistics and supply chain, health care information management, legal analysis and compliance, procurement processes, and software development and cyber security.

CDAO and DIU launch new effort focused on accelerating DoD adoption of AI capabilities

#solidstatelife #genai #llms #militaryai

waynerad@diasp.org

"Total app reliance" is a phrase now. I'm surprised it's taken this long.

"Members use the Zipcar app to locate cars, unlock and lock them, share images of the vehicle (for proof that you didn't damage it), and report concerns. One typically goes through the entire Zipcar rental process without interacting with a human."

"Without the app support, people could not unlock cars to start rentals, open cars that didn't come with keys, lock cars, and/or return cars before their rental period expired."

"Users reported long wait times with customer support, enduring cold temperatures while locked out of vehicles, and trepidation regarding cars they couldn't lock. 404 Media spoke with an unnamed person who said their friend's passport was locked in a Zipcar, adding that he 'missed his flight last night and his final exam today because of this.'"

Zipcar outage a warning against total app reliance

#solidstatelife #cybersecurity

waynerad@diasp.org

Exa Websets purports to turn the whole internet into a searchable database.

"All AI startups building new LLMs chips that are post series A."

"All PhDs who have worked on developer products and graduated from a top university and have a blog."

"Obviously traditional search tools can't do these things. You don't even think to ask them that because they weren't built to be a database."

"So how do we do it? Well, we built the first web-scale embeddings-based search engine. Essentially, we trained an AI system to organize the whole web by meaning."

They claim "Exa's system knows when to use more compute to agentically research and verify each result. That means Exa Websets might take a long time to complete."

But it's not available now. You can join the waitlist. If this works as advertised, it'll be amazing.

Introducing Websets: A breakthrough toward perfect web search

#solidstatelife #ai #genai #embedding #searchengines

waynerad@diasp.org

The abject weirdness of AI ads. Not ads made using AI, ads made by AI companies about their AI products.

"I'm trying to find holiday gifts for my sisters. I open a bunch of tabs, I want my wife's advice."

"The company later pulled the ad after facing backlash for taking a sweet father-daughter exchange and automating it away."

"Many people pointed out that you could have just asked the stranger what type of dog they have, and maybe you would have found a friend alongside the dog's breed."

"An AI startup called Friend released a promotional video showing how lonely young people could have a virtual companion in the startup's AI device that they wear around their neck, instead of talking to others."

"Intelligence so big, you'd swear it was from Texas."

"Adapt your workforce at the speed of AI."

"AI that talks to cars and talks to wildlife."

The abject weirdness of AI ads

#solidstatelife #ai #advertising

waynerad@diasp.org

Aurora DSQL is a new "serverless" database system from Amazon Web Services.

"Aurora DSQL is a new serverless SQL database, optimized for transaction processing, and designed for the cloud. DSQL is designed to scale up and down to serve workloads of nearly any size, from your hobby project to your largest enterprise application. All the SQL stuff you expect is there: transactions, schemas, indexes, joins, and so on, all with strong consistency and isolation."

If you're wondering what they mean by "serverless":

"Here, we mean that you create a cluster in the AWS console (or API or CLI), and that cluster will include an endpoint. You connect your PostgreSQL client to that endpoint. That's all you have to do: management, scalability, patching, fault tolerance, durability, etc are all built right in. You never have to worry about infrastructure."

If you're wondering about the technology behind it, they say:

"At the same time, a few pieces of technology were coming together. One was a set of new virtualization capabilities, including Caspian (which can dynamically and securely scale the resources allocated to a virtual machine up and down), Firecracker (a lightweight VMM for fast-scaling applications), and the VM snapshotting technology we were using to build Lambda Snapstart."

"The second was EC2 time sync, which brings microsecond-accurate time to EC2 instances around the globe. High-quality physical time is hugely useful for all kinds of distributed system problems. Most interestingly, it unlocks ways to avoid coordination within distributed systems, offering better scalability and better performance."

"The third was Journal, the distributed transaction log we'd used to build critical parts of multiple AWS services (such as MemoryDB, the Valkey compatible durable in-memory database). Having a reliable, proven, primitive that offers atomicity, durability, and replication between both availability zones and regions simplifies a lot of things about building a database system (after all, Atomicity and Durability are half of ACID)."

"The fourth was AWS's strong formal methods and automated reasoning tool set. Formal methods allow us to explore the space of design and implementation choices quickly, and also helps us build reliable and dependable distributed system implementations. Distributed databases, and especially fast distributed transactions, are a famously hard design problem, with tons of interesting trade-offs, lots of subtle traps, and a need for a strong correctness argument. Formal methods allowed us to move faster and think bigger about what we wanted to build."

DSQL Vignette: Aurora DSQL, and a personal story

#solidstatelife #computerscience #databases #formalmethods

waynerad@diasp.org

"I've observed two distinct patterns in how teams are leveraging AI for development. Let's call them the "bootstrappers" and the "iterators." Both are helping engineers (and even non-technical users) reduce the gap from idea to execution (or minimum viable product (MVP))."

"The Bootstrappers: Zero to MVP: Start with a design or rough concept, use AI to generate a complete initial codebase, get a working prototype in hours or days instead of weeks, focus on rapid validation and iteration."

"The Iterators: daily development: Using AI for code completion and suggestions, leveraging AI for complex refactoring tasks, generating tests and documentation, using AI as a 'pair programmer' for problem-solving."

The "bootstrappers" use tools like Bolt, v0, and screenshot-to-code AI, while "iterators" use tools like Cursor, Cline, Copilot, and WindSurf.

But there is "hidden cost".

"When you watch a senior engineer work with AI tools like Cursor or Copilot, it looks like magic, absolutely amazing. But watch carefully, and you'll notice something crucial: They're not just accepting what the AI suggests. They're constantly: Refactoring the generated code into smaller, focused modules, adding edge case handling the AI missed, strengthening type definitions and interfaces, questioning architectural decisions, and adding comprehensive error handling."

"In other words, they're applying years of hard-won engineering wisdom to shape and constrain the AI's output."

The author speculates on two futures for software: One is "agentic AI", where AI gets better and better and teams of AI agents can take on more and more of the work done by humans, and "software as craft", where humans make high-quality, polished software, with empathy, experience, and caring deeply about craft that can't be AI-generated.

The article used the term "P2 bugs" without explaining what that means. P2 means "priority 2". The idea is people focus all their attention on "priority 1" bugs, but fixing all the "priority 2" bugs is what makes software feel "polished" to the end user.

Commentary: My own experience is that AI is useful for certain use cases. If your situation fits those use cases, AI is magic. If your situation doesn't fit those use cases, AI isn't useful, or is of marginal utility. Because AI is useful-or-not depending on situation, it doesn't provide the across-the-board 5x productivity improvement that employers expect today. My feeling is that the current generation of LLMs aren't good enough to fix this, but because of the employer expectation, I have to keep trying new AI tools in pursuit of the expected 5x improvement in productivity. (If you are able to achieve a 5x productivity improvement over 2 years ago on a large (more than a half million lines of code) codebase written in a crappy language, get in touch with me -- I want to know how you do it.)

The 70% problem: Hard truths about AI-assisted coding

#solidstatelife #ai #genai #llms #codingai

waynerad@diasp.org

"Where are today's Michelangelos? Goyas? Shakespeares? Cervantes? Goethes? Montaignes? Pushkins? Dostoevskys? Balzacs? Mozarts? Where are our Einsteins, our Darwins, our Maxwells, our Newtons, our Aristotles, our Socrates?"

In the past, there were geniuses, but why not today?

The article considers some interesting hypotheses, like the switch from tutoring to bureaucratized mass education systems.

But, spoiler, I'll just jump to the conclusion: Geniuses are in new fields, not established fields. The discovery of element 117 took a large team of people across multiple continents, all to prove the element had existed in a particle accelerator for a few milliseconds. In the 1670s, when chemistry was a new field, element 15 (phosphorus) could be discovered by 1 person.

That's why today, the geniuses aren't chemistry geniuses. They're in new fields like AI and cryptocurrency. They're people like Vitalik Buterin, inventor of Ethereum, the first cryptocurrency to support smart contracts, and Geoffrey Hinton, co-inventor (with David Rumelhart and Ronald J. Williams) of the now-ubiquitous backpropagation algorithm that is essential to training any neural network that goes beyond a single layer.

Where geniuses hide today

#solidstatelife #ai #education

waynerad@diasp.org

Diffusion models are evolutionary algorithms, claims a team of researchers from Tufts, Harvard, and TU Wien.

"At least two processes in the biosphere have been recognized as capable of generalizing and driving novelty: evolution, a slow variational process adapting organisms across generations to their environment through natural selection; and learning, a faster transformational process allowing individuals to acquire knowledge and generalize from subjective experience during their lifetime. These processes are intensively studied in distinct domains within artificial intelligence. Relatively recent work has started drawing parallels between the seemingly unrelated processes of evolution and learning. We here argue that in particular diffusion models, where generative models trained to sample data points through incremental stochastic denoising, can be understood through evolutionary processes, inherently performing natural selection, mutation, and reproductive isolation."

"Both evolutionary processes and diffusion models rely on iterative refinements that combine directed updates with undirected perturbations: in evolution, random genetic mutations introduce diversity while natural selection guides populations toward greater fitness, and in diffusion models, random noise is progressively transformed into meaningful data through learned denoising steps that steer samples toward the target distribution. This parallel raises fundamental questions: Are the mechanisms underlying evolution and diffusion models fundamentally connected? Is this similarity merely an analogy, or does it reflect a deeper mathematical duality between biological evolution and generative modeling?"

"To answer these questions, we first examine evolution from the perspective of generative models. By considering populations of species in the biosphere, the variational evolution process can also be viewed as a transformation of distributions: the distributions of genotypes and phenotypes. Over evolutionary time scales, mutation and selection collectively alter the shape of these distributions. Similarly, many biologically inspired evolutionary algorithms can be understood in the same way: they optimize an objective function by maintaining and iteratively changing a large population's distribution. In fact, this concept is central to most generative models: the transformation of distributions. Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and diffusion models are all trained to transform simple distributions, typically standard Gaussian distributions, into complex distributions, where the samples represent meaningful images, videos, or audio, etc."

"On the other hand, diffusion models can also be viewed from an evolutionary perspective. As a generative model, diffusion models transform Gaussian distributions in an iterative manner into complex, structured data-points that resemble the training data distribution. During the training phase, the data points are corrupted by adding noise, and the model is trained to predict this added noise to reverse the process. In the sampling phase, starting with Gaussiandistributed data points, the model iteratively denoises to incrementally refine the data point samples. By considering noise-free samples as the desired outcome, such a directed denoising can be interpreted as directed selection, with each step introducing slight noise, akin to mutations. Together, this resembles an evolutionary process, where evolution is formulated as a combination of deterministic dynamics and stochastic mutations within the framework of non-equilibrium thermodynamics. This aligns with recent ideas that interpret the genome as a latent space parameterization of a multi-scale generative morphogenetic process, rather than a direct blueprint of an organism. If one were to revert the time direction of an evolutionary process, the evolved population of potentially highly correlated high-fitness solutions will dissolve gradually, i.e., step by step and thus akin to the forward process in diffusion models, into the respectively chosen initial distribution, typically Gaussian noise."

The researchers proceed to present a mathematical representation of diffusion models. Then, "By substituting Equations 8 and 10 into Equation 5, we derive the Diffusion Evolution algorithm: an evolutionary optimization procedure based on iterative error correction akin to diffusion models but without relying on neural networks at all." They present pseudocode for an algorithm to demonstrate this.

Equations 1-3 are about the added noise, equations 4-5 are about reversing the process and using a neural network to estimate and remove the noise, equation 6 represents the process using Bayes' Theorem and introduces a representation using functions (f() and g()), and equations 7-9 are some plugging and chugging changing the representation of those equations to get the form where you can substitute back inte equation 5 as mentioned above.

"When inversely denoising, i.e., evolving from time T to 0, while increasing alpha-sub-t, the Gaussian term will initially have a high variance, allowing global exploration at first. As the evolution progresses, the variance decreases giving lower weight to distant populations, leads to local optimization (exploitation). This locality avoids global competition and thus allows the algorithm to maintain multiple solutions and balance exploration and exploitation. Hence, the denoising process of diffusion models can be understood in an evolutionary manner: x-hat-0 represents an estimated high fitness parameter target. In contrast, x-sub-t can be considered as diffused from high-fitness points. The first two parts in the Equation 5, ..., guide the individuals towards high fitness targets in small steps. The last part of Equation 5, sigma-sub-t-w, is an integral part of diffusion models, perturbing the parameters in our approach similarly to random mutations."

Obviously, consult the paper if you want the mathematical details.

"We conduct two sets of experiments to study Diffusion Evolution in terms of diversity and solving complex reinforcement learning tasks. Moreover, we utilize techniques from the diffusion models literature to improve Diffusion Evolution. In the first experiment, we adopt an accelerated sampling method to significantly reduce the number of iterations. In the second experiment, we propose Latent Space Diffusion Evolution, inspired by latent space diffusion models, allowing us to deploy our approach to complex problems with high-dimensional parameter spaces through exploring a lower-dimensional latent space."

"Our method consistently finds more diverse solutions without sacrificing fitness performance. While CMA-ES shows higher entropy on the Ackley and Rastrigin functions, it finds significantly lower fitness solutions compared to Diffusion Evolution, suggesting it is distracted by multiple solutions rather than finding diverse ones.

"We apply the Diffusion Evolution method to reinforcement learning tasks to train neural networks for controlling the cart-pole system. This system has a cart with a hinged pole, and the objective is to keep the pole vertical as long as possible by moving the cart sideways while not exceeding a certain range."

"Deploying our original Diffusion Evolution method to this problem results in poor performance and lack of diversity. To address this issue, we propose Latent Space Diffusion Evolution: inspired by the latent space diffusion model, we map individual parameters into a lower-dimensional latent space in which we perform the Diffusion Evolution Algorithm. However, this approach requires a decoder and a new fitness function f-prime for z, which can be challenging to obtain."

"We also found that this latent evolution can still operate in a much larger dimensional parameter space, utilizing a three-layer neural network with 17,410 parameters, while still achieving strong performance. Combined with accelerated sampling method, we can solve the cart pole task in only 10 generations, with 512 population size, one fitness evaluation per individual."

"This parallel we draw here between evolution and diffusion models gives rise to several challenges and open questions. While diffusion models, by design, have a finite number of sampling steps, evolution is inherently open-ended. How can Diffusion Evolution be adapted to support open-ended evolution? Could other diffusion model implementations yield different evolutionary methods with diverse and unique features? Can advancements in diffusion models help introduce inductive biases into evolutionary algorithms? How do latent diffusion models correlate with neutral genes? Additionally, can insights from the field of evolution enhance diffusion models?"

Diffusion models are evolutionary algorithms

#solidstatelife #evolution #ai #genia #diffusionmodels

waynerad@diasp.org

Genie 2 is a new foundation "world model" from DeepMind, "capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. Based on a single prompt image, it can be played by a human or AI agent using keyboard and mouse inputs."

Apparently these models that you can interact with like video games have a name now: "world models".

"Until now, world models have largely been confined to modeling narrow domains. In Genie 1, we introduced an approach for generating a diverse array of 2D worlds. Today we introduce Genie 2, which represents a significant leap forward in generality. Genie 2 can generate a vast diversity of rich 3D worlds."

"Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly. For example, our model has to figure out that arrow keys should move the robot and not the trees or clouds."

"We can generate diverse trajectories from the same starting frame, which means it is possible to simulate counterfactual experiences for training agents."

"Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again."

"Genie 2 generates new plausible content on the fly and maintains a consistent world for up to a minute."

"Genie 2 can create different perspectives, such as first-person view, isometric views, or third person driving videos."

"Genie 2 learned to create complex 3D visual scenes."

"Genie 2 models various object interactions, such as bursting balloons, opening doors, and shooting barrels of explosives."

"Genie 2 models other agents" -- NPCs -- "and even complex interactions with them."

"Genie 2 models water effects."

"Genie 2 models smoke effects."

"Genie 2 models gravity."

"Genie 2 models point and directional lighting."

"Genie 2 models reflections, bloom and coloured lighting."

"Genie 2 can also be prompted with real world images, where we see that it can model grass blowing in the wind or water flowing in a river."

"Genie 2 makes it easy to rapidly prototype diverse interactive experiences."

"Thanks to Genie 2's out-of-distribution generalization capabilities, concept art and drawings can be turned into fully interactive environments."

"By using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can also generate evaluation tasks that agents have not seen during training."

"The Scalable Instructable Multiworld Agent (SIMA) is designed to complete tasks in a range of 3D game worlds by following natural-language instructions. Here we used Genie 2 to generate a 3D environment with two doors, a blue and a red one, and provided instructions to the SIMA agent to open each of them."

Towards the very end of the blog post, we are given a few hints as to how Genie 2 works internally.

"Genie 2 is an autoregressive latent diffusion model, trained on a large video dataset. After passing through an autoencoder, latent frames from the video are passed to a large transformer dynamics model, trained with a causal mask similar to that used by large language models."

"At inference time, Genie 2 can be sampled in an autoregressive fashion, taking individual actions and past latent frames on a frame-by-frame basis. We use classifier-free guidance to improve action controllability."

Genie 2: A large-scale foundation world model

#solidstatelife #ai #genai #deepmind #worldmodels

waynerad@diasp.org

The first UEFI bootkit designed for Linux systems (named Bootkitty by its creators) has been discovered.

UEFI (which stands for Unified Extensible Firmware Interface) is a modern replacement for the BIOS, the first code that runs when a computer is turned on. It's job is to load the operating system. Starting from version 2 of UEFI, cryptography is incorporated to enforce security on this whole bootstrap process.

A rootkit is a piece of malware that infects and replaces part of the operating system in such a way as to conceal itself. If that rootkit is in the boot record that the BIOS or now UEFI system uses to bootstrap the operating system, it's called a bootkit. Such bootkits can do things like defeat disk encryption because they are bootstrapped before the disk encryption system is bootstrapped and running. When the full OS is bootstrapped the bootkit can run in kernel mode with full OS privileges. In this position it can intercept anything including encryption keys and passwords.

"The bootkit's main goal is to disable the kernel's signature verification feature and to preload two as yet unknown ELF binaries via the Linux init process (which is the first process executed by the Linux kernel during system startup). During our analysis, we discovered a possibly related unsigned kernel module -- with signs suggesting that it could have been developed by the same author(s) as the bootkit -- that deploys an ELF binary responsible for loading yet another kernel module unknown during our analysis."

ELF stands for Executable and Linkable Format and is a file format for executable code on Linux systems.

"Bootkitty is signed by a self-signed certificate, thus is not capable of running on systems with UEFI Secure Boot enabled unless the attackers certificates have been installed."

"Bootkitty is designed to boot the Linux kernel seamlessly, whether UEFI Secure Boot is enabled or not, as it patches, in memory, the necessary functions responsible for integrity verification before GRUB is executed."

"bootkit.efi contains many artifacts suggesting this is more like a proof of concept than the work of an active threat actor."

Bootkitty: Analyzing the first UEFI bootkit for Linux

#solidstatelife #cybersecurity #rootkit

waynerad@diasp.org

Why OpenAI's $157B valuation misreads AI's future, according to Foundation Capital.

"OpenAI's growth has been nothing short of meteoric. Monthly revenue reached $300M in August 2023, a 1,700% increase from January. 10M users pay $20/month for ChatGPT, and the company projects $11.6B in revenue next year."

"This narrative collides with a stubborn reality: the economics of AI don't work like traditional software. OpenAI is currently valued at 13.5x forward revenue -- similar to what Facebook commanded at its IPO. But while Facebook's costs decreased as it scaled, OpenAI's costs are growing in lockstep with its revenue, and sometimes faster."

"In traditional software, increasing scale leads to improving economics. A typical software company might spend heavily on development upfront, but each additional user costs almost nothing to serve. Fixed costs are spread across a growing revenue base, creating the enviable margins that make today's tech giants among the most profitable businesses in history."

"Generative AI plays by different rules. Each query to a model costs money in compute resources, while each new model requires massive investments in training. OpenAI expects to lose $5B this year on $3.7B in revenue."

Why OpenAI's $157B valuation misreads AI's future

#solidstatelife #ai #aieconomics

waynerad@diasp.org

"Global AI Vibrancy Tool" from Stanford's Human-Centered Artificial Intelligence lab.

The US ranks first, flowed by China, the UK, India, the United Arab Emirates, France, South Korea...

But what's interesting is the ranking is determined by "R&D," "Responsible AI," "Economy," "Education," "Diversity," "Policy and governance," "Public opinion," and "Infrastructure," and you can change the "weighting" of each of those factors and watch how the rankings change.

Ranked by "R&D," the US comes out on top, but change the ranking to based on "Education" and the UK comes out on top (followed by France and the United Arab Emirates). Change to "Diversity" and India comes out on top (the US goes down to number 27). Change to "Public opinion" and Saudi Arabia goes up to number 2 (really? Apparently this means people talk about them a lot on social media, not that people talk positively about them, necessarily). Select "Infrastructure" and Israel shoots up to number 8 (from 16).

Global AI Vibrancy Tool

#solidstatelife #ai

waynerad@diasp.org

AI won't fix the fundamental flaw of programming, says YouTuber "Philomatics".

His basic thesis is that the "fundamental flaw of programming" is that software is unreliable and people no longer even expect it to be reliable.

"Jonathan Blow did an informal experiment where he took a screenshot every time some piece of software had an obvious bug in it. He couldn't keep this up for more than a few days because there were just too many bugs happening all the time to keep track of."

"I think we've all gotten so used to this general flakiness of software that we don't even notice it anymore. Workarounds like turning it off and on again or 'force quitting' applications have become so ingrained in us that they're almost part of the normal operation of the software. Smartphones are even worse in this regard. I'm often hesitant to do things in the mobile browser, for example using a government website or uploading my r''esum''e to a job board, because things often just don't work on mobile.

He goes on to say the cause of this is that we stack software abstractions higher and higher, but (citing Joel Spolsky), ultimately all non-trivial abstractions are leaky. (Joel Spolsky actually wrote, in 2002, an essay called "The Law of Leaky Abstractions".)

AI is the next pile of abstractions that we are going to throw on the stack of abstractions. Like compilers, where it's possible, in principle, for people to look at and edit the binary output, but nobody does it, it's possible for people to read and edit the output of AI systems that produce code, but before long, nobody will do it. AI code generators will become the next generation of compilers, allowing people to "write" code at a higher level of abstraction, while leaving the details to the AI systems. It won't make software more reliable.

Is software that unreliable, though? I recently upgraded my mobile phone and various things that were broken on the old phone (2 OS versions older) magically started working just fine. Considering the millions of lines of code running every time I run an app or view a webpage, "obvious bugs" are actually few and far between.

AI Won't Fix the Fundamental Flaw of Programming - Philomatics

#solidstatelife #ai #llms #genai #codingai

waynerad@diasp.org

"The 'NMRduino' is a magnetic resonance spectrometer based on (but we must stress, not endorsed or supported by) Arduino that we have developed over recent years to study hyperpolarized nuclear magnetic resonance (NMR) systems, NMR relaxation, high-resolution spectroscopy, and coherent control at low magnetic fields, as well as teach basic principles of magnetic resonance to student beginners."

It's the size of a credit card but contains all electronic components and connects to any laptop, desktop or Raspberry Pi computer via USB. Does pulse and analog sampling up to 100 kHz.

All the hardware plans and software source code were published in April, but I didn't find out about this project until today.

If you're thinking of MRI images like you would get at a medical office, that's not what this is. What this is is a system for detecting the effects of nuclei in an oscillating magnetic field, which you can induce for chemical systems. This board can induce magnetic field pulses and pick up the electromagnetic reaction from the nuclei of the atoms in the sample. This can be used to determine properties of the nuclei, for example, isotopes, which are atoms of the same element but with different numbers of neutrons.

A modular, open-source platform for sub-MHz NMR

#solidstatelife #arduino #nmr