Dnext

September 3, 2024 10:56am

[l] Aus der beliebten Kategorie "mit KI wird alles schlechter", heute: Eine australische Regulierungsbehörde hat mal "KI" zum Zusammenfassen von Textinput ausprobiert. Ergebnis:

Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people, a government trial of the technology has found.

Ja komm, höre ich den umgeschulten Blockchain-Bro im Hintergrund meckern, dann haben die halt nicht genug Blockc... äh... nicht genug "KI" genommen! Mehr hilft mehr!!

Amazon conducted the test

Siehste! Die Amazonier haben bestimmt bloß ihre eigenen Schrott-"KI"s genommen!1!!

The test involved testing generative AI models before selecting one to ingest five submissions from a parliamentary inquiry into audit and consultancy firms. The most promising model, Meta’s open source model Llama2-70B, was prompted to summarise the submissions with a focus on ASIC mentions, recommendations, references to more regulation, and to include the page references and context.

Oh. Nee. Sie haben nicht ihre eigene minderwertige "KI" genommen, sie haben fünf Modelle gegeneinander getestet. Achtung: ASIC ist der Name der Behörde, das hat nichts mit Schaltkreisen zu tun.

Ten ASIC staff, of varying levels of seniority, were also given the same task with similar prompts. Then, a group of reviewers blindly assessed the summaries produced by both humans and AI for coherency, length, ASIC references, regulation references and for identifying recommendations. They were unaware that this exercise involved AI at all.

Das ist schonmal ein guter Versuchsaufbau. Ergebnis:

These reviewers overwhelmingly found that the human summaries beat out their AI competitors on every criteria and on every submission, scoring an 81% on an internal rubric compared with the machine’s 47%.

Lacher am Rande: Die "KI"-Zusammenfassungen waren so scheiße, dass drei der fünf Prüfer Vermutungen äußerten, dass sie hier gerade "KI"-Müll lesen.

#fefebot #amazon

There are no comments yet.