#futurology

waynerad@diasp.org

Using large language models to test the predictive power of different schools of political thought. Philip E. Tetlock is a legend in the field of futurology, having tended the predictive ability of public pundits (the topic of his first book), and run a decade+-long forecasting experiment recording and scoring teams of predictors to see who is and who isn't good a predicting the future (the topic of his second book). He now proposes using large language models (LLMs) to reproduce the human practitioners different schools of political thought. He says:

"With current or soon to be available technology, we can instruct large language models (LLMs) to reconstruct the perspectives of each school of thought, circa 1990,and then attempt to mimic the conditional forecasts that flow most naturally from each intellectual school. This too would be a multi-step process:"

"1. Ensuring the LLMs can pass ideological Turing tests and reproduce the assumptions, hypotheses and forecasts linked to each school of thought. For instance, does Mearsheimer see the proposed AI model of his position to be a reasonable approximation? Can it not only reproduce arguments that Mearsheimer explicitly endorsed from 1990-2024 but also reproduce claims that Mearsheimer never made but are in the spirit of his version of neorealism. Exploring views on historical counterfactual claims would be a great place to start because the what-ifs let us tease out the auxiliary assumptions that neo-realists must make to link their assumptions to real-world forecasts. For instance, can the LLMs predict how much neorealists would change their views on the inevitability of Russian expansionism if someone less ruthless than Putin had succeeded Yeltsin? Or if NATO had halted its expansion at the Polish border and invited Russia to become a candidate member of both NATO and the European Union?"

"2. Once each school of thought is satisfied that the LLMs are fairly characterizing, not caricaturing, their views on recent history(the 1990-2024) period, we can challenge the LLMs to engage in forward-in-time reasoning. Can they reproduce the forecasts for 2025-2050 that each school of thought is generating now? Can they reproduce the rationales, the complex conditional propositions, underlying the forecasts -- and do so to the satisfaction of the humans whose viewpoints are being mimicked?"

"3. The final phase would test whether the LLMs are approaching superhuman intelligence. We can ask the LLMs to synthesize the best forecasts and rationales from the human schools of thought in the 1990-2024 period, and create a coherent ideal-observer framework that fits the facts of the recent past better than any single human school of thought can do but that also simultaneously recognizes the danger of over-fitting the facts (hindsight bias). We can also then challenge these hypothesized-to-be-ideal-observer LLM s to make more accurate forecasts on out-of-sample questions, and craft better rationales, than any human school of thought."

I'm glad he included that "soon to be available technology" caveat. I've noticed that LLMs, when asked to imitate someone, imitate the superficial aspects of their speaking style, but rely on the language model's conceptual model for the actual thought content -- they don't successfully imitate that person's way of thinking. The conceptual model the LLM learned during its pretraining is too ingrained so all its deeper thinking will be based on that. If you ask ChatGPT to write a rap about the future of robotics and artificial intelligence in the style of Snoop Dogg, it will make a rap that mimics Snoop's style, superficially, but won't reflect how he thinks on a deeper level -- it won't generate words the real Snoop Dogg would actually say. But it's entertaining. There's one YouTuber I know of who decided, since he couldn't get people who disagreed with him to debate him, he would ask ChatGPT to imitate a particular person with an opposite political point of view. ChatGPT couldn't really imitate that person and the conversations became really boring. Maybe that's why he stopped doing that.

Anyway, it looks like the paper is paywalled, but someone with access to the paywalled paper lifted the above text and put it on their blog, and I lifted it from the blog.

Tetlock on testing grand theories with AI -- Marginal Revolution

#solidstatelife #ai #genai #llms #futurology #philiptetlock

waynerad@diasp.org

"Some experts give the Voyagers only about five years before we lose contact."

"The probes are running critically short of electricity from what are called their 'nuclear batteries' -- actually radioisotope thermoelectric generators that make electricity from the radioactive decay of plutonium. The fading power of the probes and the difficulties of making contact over more than 10 billion miles means that, one day soon, one or other of the Voyagers won't answer NASA's daily attempts to communicate via the Deep Space Network of radio dishes. Both probes use heaters to keep key instruments warm and keep the hydrazine in the fuel lines liquid: When the fuel freezes up, the probes won't be able to use their thrusters to keep their main radio antennae pointed at the Earth, and their communications will come to an end."

Voyagers ready to go dark

#futurology #astronomy #nasa #voyager

waynerad@diasp.org

"What is the Earth's carrying capacity?" This from the same guy who says desalination is more feasible than people think.

"Weirdly, most analysts believe the Earth is already at its carrying capacity! Look at the graph above: the mode1 is at 8B people!"

"What is the likelihood that we're just at the limit of the Earth? Very low. My immediate reaction to this is: Most analysts simply lack imagination. They see the current world, notice that there are some problems, and conclude that we're hitting our limits."

He goes on to cite The Limits to Growth, the 1972 report that raised the alarm about the Earth's population growth rate, and how it meant humanity would run out of many resources as it grew.

"So far, humans have not run out of a single resource."

He then goes on to look at the Planetary Boundaries report, which I myself posted here last September.

"A team of 28 academics have come up with the Planetary Boundaries, a series of nine processes that threaten to collapse under the weight of humanity's impact. From what I've seen, it's the most serious attempt to quantify what can go wrong. According to the team's latest report in 2023, six of the boundaries have already been transgressed."

He goes through these one by one -- well, presumably, only the first 2 are outside the paywall. Those are CO2 and deforestation. On both he says the situation is less dire than the alarmists would have you believe, and that the Earth's carrying capacity is higher than people think.

What is the Earth's carrying capacity?

#futurology #environment #carryingcapacity

waynerad@diasp.org

The future of polling: AI chatbots predict elections better than humans.

"In a closely watched New York Democratic primary in June, centrist George Latimer ousted incumbent Jamaal Bowman by a wide margin of 58.7% to 41.3%."

"Ahead of the vote, two 19-year-old college dropouts in Manhattan conducted a poll that accurately predicted the results within 371 votes. Their secret? They didn't survey a single person. Instead, they asked thousands of AI chatbots which candidate they preferred."

"For election results, a seven-person company called Aaru uses census data to replicate voter districts, creating AI agents essentially programmed to think like the voters they are copying. Each agent is given hundreds of personality traits, from their aspirations to their family relationships. The agents are constantly surfing the internet and gathering information meant to mimic the media diets of the humans they're replicating, which sometimes causes them to change their voting preferences."

A disturbing implication of this is that all of our voting choices are a more-or-less straightforward consequence of our media diets. So the real choice isn't who to vote for, it's what media to pay attention to.

No people, no problem: AI chatbots predict elections better than humans

#solidstatelife #ai #genai #llms #futurology

waynerad@diasp.org

"The Techno-Humanist Manifesto: A new philosophy of progress for the 21st century" by Jason Crawford.

"We live in an age of wonders. To our ancient ancestors, our mundane routines would seem like wizardry: soaring through the air at hundreds of miles an hour; making night bright as day with the flick of a finger; commanding giant metal servants to weave our clothes or forge our tools; mixing chemicals in vast cauldrons to make a fertilizing elixir that grants vigor to crops; viewing events or even holding conversations from thousands of miles away; warding off the diseases that once sent half of children to an early grave. We build our homes in towers that rise above the hills; we build our ships larger and stronger than the ocean waves; we build our bridges with skeletons of steel, to withstand wind and storm. Our sages gaze deep into the universe, viewing colors the eye cannot see, and they have discovered other worlds circling other Suns."

And yet, we live in a time of greater depression and anxiety disorders than ever before in human history. Which he doesn't mention but he does say...

"But not everyone agrees that the advancement of science, technology, and industry has been such a good thing. 'Is 'Progress' Good for Humanity?' asks a 2014 Atlantic article, saying that 'the Industrial Revolution has jeopardized humankind's ability to live happily and sustainably upon the Earth.' In Guns, Germs, and Steel, a grand narrative of civilizational advancement, author Jared Diamond disclaims the assumption 'that the abandonment of the hunter-gatherer lifestyle for iron-based statehood represents 'progress,' or that it has led to an increase in human happiness.' Diamond also called agriculture 'the worst mistake in the history of the human race' and 'a catastrophe from which we have never recovered,' adding that this perspective demolishes a 'sacred belief: that human history over the past million years has been a long tale of progress.' Historian Christopher Lasch is even less charitable, asking: 'How does it happen that serious people continue to believe in progress, in the face of massive evidence that might have been expected to refute the idea of progress once and for all?' Economic growth is called an 'addiction,' a 'fetish,' a 'Ponzi scheme,' a 'fairy tale.' There is even a 'degrowth' movement advocating economic regress as an ideal."

"With so little awareness of progress, and so much despair for the future, our society is unable to imagine what to build or to dream of where to go. As late as the 1960s, Americans envisioned flying cars, Moon bases, and making the desert bloom using cheap, abundant energy from nuclear power." "Today we hope, at best, to avoid disaster: to stop climate change, to prevent pandemics, to stave off the collapse of democracy."

"This is not merely academic. If society believes that scientific, technological and industrial progress is harmful or dangerous, people will work to slow it down or stop it."

"Even where the technical challenges have long been solved, we seem unable to build or to operate. The costs of healthcare, education, and housing continue to rise.30 Energy projects, even 'clean' ones, are held up for years by permitting delays and lack of grid connections.31 California's high-speed rail, now decades in the making, has already cost billions of dollars and is still years away from completing even an initial operating segment, which will not provide service to either LA or San Francisco."

This is an interesting point. Technological advancement should make everything cheaper and faster while still being just as good or better in terms of quality. But since that's not happening, at least in certain sectors, it would appear the weight of human bureaucracy can slow or prevent technological progress.

"On the horizon, powerful new technologies are emerging, intensifying the debate over technology and progress. Robotaxis are doing business on city streets; mRNA can create vaccines and maybe soon cure cancers; there's a renaissance in both supersonic flight and nuclear energy.34 SpaceX is landing reusable rockets, promising to enable the space economy, and testing an enormous Starship, promising to colonize Mars. A new generation of founders have ambitions in atoms, not just bits: manufacturing facilities in space, net-zero hydrocarbons synthesized with solar or nuclear power, robots that carve sculptures in marble.35 Most significantly, LLMs have created a general kind of artificial intelligence -- which, depending on who you ask, is either the next big thing in the software industry, the next general-purpose technology to rival the steam engine or the electric generator, the next age of humanity after agriculture and industrialization, or the next dominant species that will replace humanity altogether."

"The world needs a moral defense of progress based in humanism and agency -- that is, one that holds human life as its standard of value, and emphasizes our ability to shape the future. This is what I am calling 'techno-humanism': the idea that science, technology and industry are good -- because they promote human life, well-being, and agency."

Ok, so, if I understand this guy's premise correctly, the fact that depression and anxiety are at an all-time high, and this appears to be a reaction to previous generations of technology, is not something we should worry about because, while technology always creates new problems, yet more technology always solves them. So it is just a matter of time before solutions to the current depression and anxiety problems will be found, and maybe they will involve new technologies like AI.

Those of you who have been following me for a while know a lot of what I predict is based on my experience of disillusionment brought about by the internet. In the mid-to-late 90s, I was one of those people who thought the internet would be a "democratizing" force, empowering the little people, and bringing mutual understanding between people from different walks of life. Instead, it has proven to be a "centralizing" force, with a small handful of giant tech companies dominating the landscape, with economic power concentrated in those same tech companies, and the "little people" being worse off as inequality becomes vaster and vaster, and the vast increase in communications bandwidth hasn't brought people from different walks of life to any mutual understanding -- people are getting along worse, not better, and our society is more polarized than it ever was. As the old saying goes, fool me once, shame on you, fool me twice, shame on me. So I always feel distrustful of any utopian claims for future technology. The rule I tend to follow is: If we're talking about technological capabilities, I'm an extreme "optimist" -- I think technological capabilities will continue, even past the point where technology is capable of everything humans are capable of -- but if we're talking about social outcomes, I'm an extreme "pessimist" -- I think technology never solves problems rooted in human nature. Give humans infinite communication bandwidth, and you don't get mutual understanding and harmony. If people don't get along, people don't get along, and that's all there is to it. People have to solve "people" problems. Technology doesn't solve "people" problems.

The first 5 installments have been written and they're all pretty interesting. I'm just responding here to "The Present Crisis" introduction. I may or may not comment on later installments (not promising anything). I encourage you all to read it for yourself.

Announcing The Techno-Humanist Manifesto | The Roots of Progress

#solidstatelife #ai #environment #sociology #philosophy #futurology

waynerad@diasp.org

The hexagon of ideas: After X, what's next?

After X is invented, how can you use it to invent new stuff? And remember, you don't have to have invented X to use it.

The techniques suggested here are:

Generalizing to another dimension (X to Xd).

Fusion of the dissimilar (X + Y).

Finding all nails for a hammer (X).

Finding all hammers for a nail (X).

Adding an adjective (X++).

Doing exactly the opposite (X).

The Hexagon of Ideas

#futurology #invention #innovation

waynerad@diasp.org

Online poll of 18-30-year-old registered voters: "64% backed the statement that 'America is in decline.' A whopping 65% agreed either strongly or somewhat that 'nearly all politicians are corrupt, and make money from their political power' -- only 7% disagreed."

Lead pollster said, "Young voters do not look at our politics and see any good guys. They see a dying empire led by bad people."

Yeah, I know, online polls are not so accurate. Still, not a vote of confidence from the upcoming generation.

'A dying empire led by bad people': Poll finds young voters despairing over US politics

#futurology #domesticpolitics #generations

waynerad@diasp.org

"The 50-year-old petrodollar agreement between the US and Saudi Arabia expired on June 9, 2024."

"This expiration has far-reaching implications, as it has the potential to disrupt the global financial order."

"The petrodollar agreement, formalized after the 1973 oil crisis, stipulated that Saudi Arabia would price its oil exports exclusively in US dollars and invest its surplus oil revenues in US Treasury bonds. In return, the US provided military support and protection to the kingdom."

"By mandating that oil be sold in US dollars, the agreement elevated the dollar's status as the world's reserve currency. This, in turn, has profoundly impacted the US economy."

#futurology #economics #dollar #reservecurrency

https://www.nasdaq.com/articles/us-saudi-petrodollar-pact-ends-after-50-years

waynerad@diasp.org

"Is the AI Revolution losing steam or is this just a media narrative?"

You know, subjectively, I find myself alternating between feeling excited and feeling bored with AI. New announcements come out, you see a demo like Sora, and it's mindblowing. And then 6 months later, announcement come out of similar technology or improvements, and I feel like, yawn. I'm not sure why this is. Maybe it's because of my experience using coding AI. I know programmers who say they're twice as productive. I've found AI tools to me mostly, but not entirely, useless. I'm willing to try anything, of course, because I need anything I can get to crank up my development speed.

It reminds me of when in 2016, Elon Musk, looking at the rapid rate of AI progress, predicted Full Self Driving by 2017. Then in 2017, he predicted it for 2018. Then in 2018, he predicted it for 2019. And so on. Today, Telsa Full Self Driving is rumored to be just reliable enough to lull people into complacency, which is actually kind of dangerous. Nobody is ready to rip the steering wheel out of their cars and declare they'd rather have the AI do all the driving because it's better than human drivers. But that's what "Full Self Driving" is supposed to mean.

There's an old saying in software development, "The first 90% of the project takes the first 90% of the time, and the last 10% of the project takes the other 90% of the time."

People look at AI's current ability to write code and predict real soon now it's going to be taking over all coding tasks for whole codebases and racing software development forward. But it seems like, just like Tesla Full Self Driving, we reached that 90% threshold really fast, and the remaining 10% is taking a long time. Today's generative AI models feel like that to me.

Yes, I know, OpenAI says they are seeing no sign they have reached the end of getting better models as they scale them up. Yes, AI image generators had problems a year ago, like not being able to draw fingers, that are pretty much completely solved now, so it's reasonable to expect with today's video generators, music generators, code generators, and "AI agents" to get progressively better with time.

Anyway, this YouTube video explores a number of media narratives that the AI revolution is losing steam, and concludes (spoiler!) actually, no, the AI revolution is not losing steam. AI companies are gobbling up chips as fast as they can (not caring at all about the exorbitant cost), businesses everywhere are getting familiar with the technology and learning where it makes sense to use it to add business value, adoption is actually faster than the internet or any previous technology, and innovation in the field is continuing, so, no, the AI revolution is not losing steam.

Is the AI Revolution Losing Steam? - The AI Daily Brief: Artificial Intelligence News

#solidstatelife #ai #futurology

waynerad@diasp.org

People wanting only daughters are using in-vitro fertilization (IVF) for sex selection. Allegedly this is popular with software engineers in silicon valley.

"Old debates around sex selection focused on the wish for sons. Today in America, that preference is often reversed. One study found that white parents having a first child picked female embryos 70 percent of the time. (Parents of Indian and Chinese descent were more likely to pick boys.) Anecdotes back this up, with message boards filled with moms dreaming of a 'mini me.' A 2010 study showed that American adoptive parents were 30 percent more likely to prefer girls than boys and were willing to pay $16,000 more in finalization costs to ensure a daughter. Close looks at demographic data suggest that families with daughters tend to have fewer subsequent children than do families with sons, indicating a sense that a daughter is what makes a family complete."

It's illegal in most of the world. In America, new parents are embracing it -- for better or worse.

#futurology #demographics #ivf

waynerad@diasp.org

"Robert Dennard, father of DRAM, is deceased -- also known for his foundational Dennard scaling theory."

This obituary is worth noting for futurists because Dennard scaling is indeed a foundational theory, closely related to Moore's Law.

Dennard scaling, in short, is:

When you cut the linear dimensions of a digital circuit in half, you reduce the area to 1/4th of it's original size (area of a square is the side squared), which enables you to pack 4x as many transistors into that area, you reduce the electrical capacitance to 1/4th, you cut the voltage in half, you reduce the current by half, you reduce your power consumption to 1/4th, and you reduce transition time by half, which enables you to double your "clock speed".

This might make you wonder, why in the mid-2000s did clock speed stop increasing and power consumption stop going down, even though transistors continued to get smaller? Well, I researched this question a few years ago, and the surprising answer is: they would have if we had been willing to make our chips colder and colder. To have continued Dennard scaling to the present day, we'd need, like, cryogenically frozen data centers. The relationship to temperature is that, if you don't drop the temperature, then your electrical signals have to overcome the random jiggling of the atoms in the circuit -- which is what temperature is, the average kinetic energy of the molecules in your material. The way you overcome the "thermal noise" this introduces into your electric circuit is with voltage. So, you can't drop your voltage, and you can't drop your power, and, as it turns out, if you can't drop your voltage and power you can't drop your transition time, so you can't double your clock speed.

Robert Dennard, father of DRAM, is deceased -- also known for his foundational Dennard scaling theory.

#solidstatelife #futurology #mooreslaw #dennardscaling

waynerad@diasp.org

FutureSearch.AI lets you ask a language model questions about the future.

"What will happen to TikTok after Congress passed a bill on April 24, 2024 requiring it to delist or divest its US operations?"

"Will the US Department of Justice impose behavioral remedies on Apple for violation of antitrust law?"

"Will the US Supreme Court grant Trump immunity from prosecution in the 2024 Supreme Court Case: Trump v. United States?"

"Will the lawsuit brought against OpenAI by the New York Times result in OpenAI being allowed to continue using NYT data?"

"Will the US Supreme Court uphold emergency abortion care protections in the 2024 Supreme Court Case: Moyle v. United States?"

How does it work?

They say rather than asking a large language model a question in a 1-shot manner, they guide it through 6 steps for reasoning through hard questions. The 6 steps are:

  1. "What is a basic summary of this situation?"

  2. "Who are the important people involved, and what are their dispositions?"

  3. "What are the key facets of the situation that will influence the outcome?"

  4. "For each key facet, what's a simple model of the distribution of outcomes from past instances that share that facet?"

  5. "How do I weigh the conflicting results of the models?"

  6. "What's unique about this situation to adjust for in my final answer?"

See below for a discussion of two other approaches that claim similar prediction quality.

FutureSearch: unbiased, in-depth answers to hard questions

#solidstatelife #ai #genai #llms #futurology

waynerad@diasp.org

Survey of 2,700 AI researchers.

The average response placed each of the following within the next 10 years:

Simple Python code given spec and examples
Good high school history essay
Angry Birds (superhuman)
Answer factoid questions with web
World Series of Poker
Read text aloud
Transcribe speech
Answer open-ended fact questions with web
Translate text (vs. fluent amateur)
Group new objects into classes
Fake new song by specific artist
Answers undecided questions well
Top Starcraft play via video of screen
Build payment processing website
Telephone banking services
Translate speech using subtitles
Atari games after 20m play (50% vs. novice)
Finetune LLM
Construct video from new angle
Top 40 Pop Song
Recognize object seen once
All Atari games (vs. pro game tester)
Learn to sort long lists
Fold laundry
Random new computer game (novice level)
NYT best-selling fiction
Translate text in newfound language
Explain AI actions in games
Assemble LEGO given instructions
Win Putnam Math Competition
5km city race as bipedal robot (superhuman)
Beat humans at Go (after same # games)
Find and patch security flaw
Retail Salesperson

...and the following within the next 20 years:

Equations governing virtual worlds
Truck Driver
Replicate ML paper
Install wiring in a house
ML paper

... and the following within the next 40 years:

Publishable math theorems
High Level Machine Intelligence (all human tasks)
Millennium Prize
Surgeon
AI Researcher
Full Automation of Labor (all human jobs)

It should be noted that while these were the averages, the was a very wide variance -- so a wide range of plausible dates.

"Expected feasibility of many AI milestones moved substantially earlier in the course of one year (between 2022 and 2023)."

If you're wondering what the difference between "High-Level Machine Intelligence" and "Full Automation of Labor" is, they said:

"We defined High-Level Machine Intelligence thus: High-level machine intelligence is achieved when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption."

"We defined Full Automation of Labor thus:"

"Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption. [...] Say we have reached 'full automation of labor' when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers."

They go on to say,

"Predictions for a 50% chance of the arrival of Full Automation of Labor are consistently more than sixty years later than those for a 50% chance of the arrival of High Level Machine Intelligence."

That seems crazy to me. In my mind, as soon as feasibility is reachend, cost will go below human labor very quickly, and the technology will be adopted everywhere. That is what has happened with everything computers have automated so far.

"We do not know what accounts for this gap in forecasts. Insofar as High Level Machine Intelligence and Full Automation of Labor refer to the same event, the difference in predictions about the time of their arrival would seem to be a framing effect."

A framing effect that large?

"Since 2016 a majority of respondents have thought that it's either 'quite likely,' 'likely,' or an 'about even chance' that technological progress becomes more than an order of magnitude faster within 5 years of High Level Machine Intelligence being achieved."

"A large majority of participants thought state-of-the-art AI systems in twenty years would be likely or very likely to:"

  1. Find unexpected ways to achieve goals (82.3% of respondents),
  2. Be able to talk like a human expert on most topics (81.4% of respondents), and
  3. Frequently behave in ways that are surprising to humans (69.1% of respondents)

"Most respondents considered it unlikely that users of AI systems in 2028 will be able to know the true reasons for the AI systems' choices, with only 20% giving it better than even odds."

"Scenarios worthy of most concern were: spread of false information e.g. deepfakes (86%), manipulation of large-scale public opinion trends (79%), AI letting dangerous groups make powerful tools (e.g. engineered viruses) (73%), authoritarian rulers using AI to control their populations (73%), and AI systems worsening economic inequality by disproportionately benefiting certain individuals (71%)."

"Respondents exhibited diverse views on the expected goodness/badness of High Level Machine Intelligence. Responses range from extremely optimistic to extremely pessimistic. Over a third of participants (38%) put at least a 10% chance on extremely bad outcomes (e.g. human extinction)."

Thousands of AI authors on the future of AI

#solidstatelife #ai #technologicalunemployment #futurology

waynerad@diasp.org

Approaching human-Level forecasting with language models.

The idea here is to pit AI head-to-head against humans in forecasting competitions. They mention 5 of these: Metaculus, GJOpen, INFER, Polymarket, and Manifold. The way they are scored is with something called a "Brier score". To keep things simple, they limited their system to only yes/no "binary" questions. When dealing with "binary" questions, the way the Brier score is computed is, one option is assigned the value 0 (say, some event not happening by a certain date), or 1 (the event happening). The person -- or now, language model -- making the prediction actually predicts a probability -- a number between 0 and 1. Once the outcome is known, the difference between the prediction probability number and the actual outcome number is computed and then squared. For multiple predictions, these numbers are all averaged. In this way, the Brier score represents the "error" in the predictions. A perfect predictor will predict "1" for every event that actually happens, and "0" for every event that does not happen, leading to a Brier score of 0. If the predictor does not know if something will happen or not, they can say 0.5, which will lead to a Brier score of 0.25 no matter which outcome actually happens. It's better to do that then to predict 0 or 1 and be wrong.

This glosses over various details like how to handle when people change their predictions, how to handle multiple choice outcomes or numerical outcomes, but you get the idea. The Brier score represents your prediction error.

The researchers found language models are bad at predicting. With no additional information retrieval or fine-tuning, most language models do only a little better than picking at random, and the biggest and best models like GPT-4 and Claude-2 do better than chance but still much worse than humans.

For the dataset that they trained the model on, they used the above-mentioned 5 forecasting competitions and combined data from all of them to get a dataset of 33,664 binary questions. Here's an example showing what these binary questions look like:

"Question: Will Starship achieve liftoff before Monday, May 1st, 2023?"

"Background: On April 14th, SpaceX received a launch license for its Starship spacecraft. A launch scheduled for April 17th was scrubbed due to a frozen valve. SpaceX CEO Elon Musk tweeted: 'Learned a lot today, now offloading propellant, retrying in a few days . . . '"

"Resolution: Criteria This question resolves Yes if Starship leaves the launchpad intact and under its own power before 11:59pm ET on Sunday, April 30th."

"Key Dates: Begin Date: 2023-04-17, Close Date: 2023-04-30, Resolve Date: 2023-04-20."

The "begin date" is the date people can start making predictions. The "close date" is the last date people can make predictions. The "resolve date" is the date reality is checked to see if the prediction happened or not. But, for this example, the reason why the "resolve date" is before the "close date" is because the event occurred.

Their system consists of a retrieval system, a reasoning system, and a candidate selection system.

The retrieval system enables the system to do search engine searches. It consists of 4 steps: search query generation, news retrieval, relevance filtering and ranking, and text summarization. The summarization step is because large language models are limited by their context window, and that may be less of a limitation in the future.

The reasoning system works by first prompting the large language model to rephrase the question. The model is next asked to leverage the retrieved information and its pre-training knowledge to produce arguments for why the outcome may or may not occur. Since the model can generate weak arguments, to avoid treating them all as equal, it is instructed to weigh them by importance and aggregate them accordingly. Finally, "to prevent potential bias and miscalibration, the model is asked to check if it is over- or underconfident and consider historical base rates, prompting it to calibrate and amend the prediction accordingly."

This is called reasoning by "scratchpad prompting". Since the aggregate of predictions is usually superior to individual forecasts, this is repeated multiple times and the average is used.

All of this needs to be in place before fine-tuning because it's used by the fine-tuning system. The fine-tuning was done by selecting a subset of the data for fine-tuning, a subset where the model outperformed the human crowd. But they discard examples where the model is too much better than the crowd. They say this is because "We seek to fine-tune our model on strong forecasts" but at the same time, thus using the subset where the model outperformed the human crowd, but, "this can inadvertently cause overconfidence in our fine-tuned model" -- unless they discard the examples where the model exceeds the crowd prediction too much.

"The input to the model consists of the question, description, and resolution criteria, followed by summarized articles. The target output consists of a reasoning and a prediction. Importantly, the fine-tuning input excludes the scratchpad instructions. By doing so, we directly teach the model which reasoning to apply in a given context."

In addition they did a "hyperparameter sweep" where they tried to optimize the hyperparameters. The "hyperparameters" were the search query prompt, the summarization prompt, the number of articles to keep and rank, the reasoning prompt, and the ensembling method for combining multiple answers (they tested 5 different algorithms).

Anyway, the end result of all this is that the large language model had a Brier score of .179, while the crowd had .149, in a difference of only .03. So the system is very close to human accuracy. If traditional "accuracy" numbers are more intuitive to you, they gave 71.5% as their accuracy number, and 77.0% for the human crowd.

Approaching human-Level forecasting with language models

#solidstatelife #ai #genai #llms #futurology #predictionmarkets #brierscore

waynerad@diasp.org

Eurasia Group's top risks for 2024. "Ungoverned AI" is #4. #1 is "The United States vs itself". So we, and our upcoming election, expected to continue the trend of every election being crazier than the previous, is the planet's greatest risk.

On the flip side, they dismiss "US-China crisis" as a "red herring". Whew, I guess we can relax and not worry about that. Also "Populist takeover of European politics" and "BRICS vs G7".

"Risk 1: The United States vs itself: The 2024 election will test American democracy to a degree the nation hasn't experienced in 150 years."

"Risk 2: Middle East on the brink: The region is a tinderbox, and the number of players carrying matches makes the risk of escalation exceptionally high."

"Risk 3: Partitioned Ukraine: Ukraine will be de facto partitioned this year, an unacceptable outcome for Ukraine and the West that will nevertheless become reality."

"Risk 4: Ungoverned AI: Breakthroughs in artificial intelligence will move much faster than governance efforts."

"Risk 5: Axis of rogues: Deeper alignment and mutual support between Russia, Iran, and North Korea will pose a growing threat to global stability."

"Risk 6: No China recovery: Any green shoots in the Chinese economy will only raise false hopes of a recovery as economic constraints and political dynamics prevent a durable growth rebound."

"Risk 7: The fight for critical minerals: The scramble for critical minerals will heat up as importers and exporters intensify their use of industrial policies and trade restrictions."

"Risk 8: No room for error: The global inflation shock that began in 2021 will continue to exert a powerful economic and political drag in 2024."

"Risk 9: El Nino is back: A powerful El Nino climate pattern will bring extreme weather events that cause food insecurity, increase water stress, disrupt logistics, spread disease, and fuel migration and political instability."

"Risk 10: Risky business: Companies caught in the crossfire of US culture wars will see their decision-making autonomy limited and their cost of doing business rise."

"Red herrings: US-China crisis. Populist takeover of European politics. BRICS vs G7."

"Addendums: These addendums for Brazil, Canada, Europe, and Japan further illustrate how global risks play out in different parts of the world, with specific implications for governments and businesses."

Eurasia Group | The Top Risks of 2024

#futurology #risk #geopolitics

waynerad@diasp.org

Companies that provide survivalist bunkers for billionaires include: Atlas Survival Shelters (Texas), Vivos (California), SAFE (Strategically Armored & Fortified Environments) + Vital RN (Virginia), Creative Home Engineering (Arizona), and Ultimate Bunker (Utah).

Oh, and because this is The Hollywood Reporter, there's a whole bunch of stuff about fiery moats! water cannons! and rotating fireplaces right out of 'Indiana Jones'!

Billionaires' survivalist bunkers go absolutely bonkers with fiery moats and water cannons

#futurology #dystopia

waynerad@diasp.org

"Rapid AI progress surprises even experts: Survey just out"

Sabine Hossenfelder reviews a survey of 2,778 AI researchers. They say 50% chance of "high level machine intelligence" ("comparable to human intelligence") by 2047, but that's down 13 years from the same survey a year ago, which said 2060.

For "full automation of labor", 50% probability by 2120 or so, but that's down almost 50 years from last years' prediction. (So last years' prediction must've been 2170 or so).

I can't help but think, does anybody seriously think it will take that long? I get that the "AGI in 7 months" predictions are a bit hard to take seriously, but still? Do these people not understand exponential curves?

Ray Kurzweil, and before him, Al Bartlett, are famous for saying people extrapolate the current rate of change linearly out into the future, so always underestimate exponential curves. Not implying Kurzweil or Bartlett are right about everything but this does look to me like what is happening, and you would think professional AI researchers, of all people, would know better.

Rapid AI progress surprises even experts: Survey just out - Sabine Hossenfelder

#solidstatelife #futurology #ai #exponentialgrowth #technologicalunemployment

waynerad@diasp.org

"Charted: The rapid decline of global birth rates."

1950-2021 for the world's 50 most populous countries. Eh, 49 most populous. It's a 7x7 grid.

There's an interactive table further down the page where you can sort by birth rate for 1950, 1990, or 2021, or the change between 1950 and 2021.

Charted: The rapid decline of global birth rates

#futurology #demographics #fertility