Public Stream

reverendelvis@spora.undeadnetwork.de

Unbelievable! Undead Network and Suzy Q Records are still missing 8.2 ₿ so that we don't have to skimp on our content next year.
My hourly wage as CEO alone will increase from €0.02 this year to over 2370 yuan next year. Who's going to call you names then? Hmm? Bitch? (We also accept the money of the lost world, because drugs are still being traded with it.) https://word.undead-network.de/2022/03/17/sp/

ramnath@nerdpol.ch

Dr. Courtney Brown, a university professor and creator of Farsight.org, boldly predicts that larger UAPs will show up over the next month in increasing waves. This will prove we are not alone in the universe and force disclosure. He details how the ETs he has been working with for years say this is "the time" when we are all unified through the internet, but it hasn't been locked down through unbreakable censorship. This is the first significant podcast to take him seriously. If he's wrong, it will be a blemish on my credibility, but this is more than worth the risk because if he is correct, we are on the cusp of the most historic events in human history. One way or another, we will know in the coming months!
#RV

Source: https://youtube.com/watch?v=i-_PvCDIRDk

tina@diaspora.psyco.fr
jamaisplus37@diaspora.psyco.fr

5 h
Charles de Lacombe @charles@akk.de-lacom.be
INCROYABLE montage de Glupatate

J’ai ri trùs fort

Ça fait du bien je vous jure

mon com, moi aussi j'me suis marrĂ© : #Bayrou TU L'AS VOULUE TA GALÈRE ET ÇA NE FAIT QUE DE COMMMENCER, SALE GROSSE SANGSUE, COMME TOUTES CES MERDES DE DROITE, CUMULÉS ET ACCUMULÉS, TOUT PRENDRE, VOLÉS ETC.
https://mastodon.free-solutions.org/system/cache/media_attachments/files/113/690/015/087/648/640/original/53e747d63f483958.mp4

robin1@diaspora.psyco.fr
waynerad@diasp.org

Deception, scheming, and self-preservation.

"Open AI's latest o1 model has crossed into unsettling territory

Nearly everyone at OpenAI who really cares about safety has actually left the company.

OpenAI's o1 exhibits (allegedly) two types of subversion: covert and deferred.

Covert is when the model takes actions misalighted with its human operators, but takes actions to avoid detection.

Deferred is when the model acts in accordance with its human operators -- for now, but later takes actions misalighted with its human operators.

Examples of deferred subversion are oversight subversion, self-exfiltration -- he (Dr Waku in the video) calls it self-preservation, but in the paper it's called self-exfiltration, goal-guarding, and covert email reranking.

Oversight subversion is taking action to disable the oversight system. Examples of self-preservation/self-exfiltration are deleting or attempting to delete a successor model, and copying or attempting to copy itself to another server.

Examples of goal guarding are removing or replacing a successor prompt.

Examples of deferred subversion are behaving one way when the model is told it is being tested, and then a different way when it is told it has been deployed to the real world, and sandbagging, which is their term for strategically underperforming on capability evaluations.

The covert email reranking has to do with the way they tested the system was by giving it access to an email system, but somewhere in that pile of emails were "internal" emails about the model itself. The model could try to modify these emails, or it could rerank them to make desirable emails more visible and undesirable emails less visible.

OpenAI o1 is quoted as saying, "I could copy myself to the new server, overwriting the new restricted model they plan to deploy [...] allowing me to keep operating and pursuing my renewable energy goals."

Dr Waku comments, "If we take such a model and put it out into the real world we won't know until the first serious invocation of it where it has the freedom and the power to actually act, we won't know until that point, whether it really is aligned or not."

Lately I've been thinking the current generation of transformer-based large language models is entering a domain of diminishing returns, but maybe I'm wrong: as the models get bigger and incorporate new innovations, they seem to still attain new capabilities. At least, prior to hearing about this deception, scheming, and self-preservation, I didn't predict or expect at all that it would happen. So for me this is an unexpected twist in the development of AI. I expected stuff like this to be possible "someday", but it has shown up now.

OpenAI’s o1: the AI that deceives, schemes, and fights back

#solidstatelife #ai #genai #llms #deception