#openai

waynerad@diasp.org

Why this developer is no longer using Copilot. He feels his programming skills atrophy. He writes code by pausing to wait for Copilot to write code, and doesn't enjoy programming that way. The AI-generated code is often wrong or out-of-date and has to be fixed. Using copilot is a privacy issue because your code is shared with Copilot.

I thought this was quite interesting. I tried Copilot in VSCode and I figured I wasn't using it much because I'm a vim user. So I tracked down the Neovim plug-in & got it working in vim, but still found I don't use it. Now I've come to feel it's great for certain use cases and bad for others. Where it's great is writing "boilerplate" code for using a public API. You just write a comment describing what you want to do and the beginning of the function, and Copilot spits out practically all the rest of the code for you function -- no tedious hours studying the documentation from the API provider.

But that's not the use case I actually engage in in real life. Most of what I do is either making a new UI, or porting code from PHP to Go. For the new UI, AI has been helpful -- I can take a screenshot, input it to ChatGPT, and ask it how to improve the AI. (I'm going to be trying this with Google's Gemini soon but I haven't tried it yet.) When it makes suggestions, I can ask it what HTML+CSS is needed to implement those suggestions. I've found it gets better and better for about 6 iterations. But you notice, Copilot isn't part of the loop. I'm jumping into dozens of files and making small changes, and that's a use case where Copilot just isn't helpful.

For porting code from PHP to Go, I modified a full-fledged PHP parser to transpile code to Go, and this has been critical because it's important that certain things, especially strings, get ported over exactly -- no room for errors. So this system parses PHP strings using PHP's parsing rules, and outputs Go strings using Go's parsing rules, and is always 100% right. Copilot isn't part of the loop and doesn't help.

Another place I've found AI incredibly useful is debugging problems where I have no clue what the problem might be. This goes back to using other people's large systems such as the public APIs mentioned earlier. Every now and then you get cryptic error messages or some other bizarre malfunction, and endless Google searching doesn't help. I can go to ChatGPT, Gemini, Claude, Perplexity, DeepSeek (and others, but those are the main ones I've been using) and say hey, I'm getting this cryptic error message or this weird behavior, and it can give you a nice list of things you might try. That can get you unstuck when you'd otherwise be very stuck.

It's kinda funny because, obviously I'm an avid follower of what's going on in AI, and happy to try AI tools, and I constantly run across other developers who say "Copilot has made me twice as productive!" or "Copilot has made me five times as productive!" or somesuch. I've wondered if there's something wrong with me because I haven't experienced those results at all. But AI has been helpful in other areas nobody ever seems to talk about.

Why I'm no longer using Copilot - Dreams of Code

#solidstatelife #ai #genai #llms #codingllms #openai #copilot

waynerad@diasp.org

OpenAI is apparently partnering with a robotics company, Figure 01. The robot seems to do a remarkable job of integrating language with vision and body movements. They say it works using neural networks entirely. No teleoperation or hand-coded code. Nothing is scripted ahead of time.

"Can I have something to eat?"

"Sure thing."

"Great. Can you explain why you did what you just did while you pick up this trash?"

"On it! So I gave you the apple because it's the only edible item I could provide you with from the table."

"Great. So, based on the scene right now, where do you think the dishes in front of you go next?"

"The dishes on the table, like that plate and cup, are likely to go into the drying rack next."

"Great. Can you put them there?"

"Of course."

OpenAI's 'AGI Robot' Develops SHOCKING NEW ABILITIES | Sam Altman Gives Figure 01 Get a Brain - Wes Roth

#solidstatelife #ai #openai #figure01 #genai #llms #robotics

waynerad@diasp.org

"Introducing Rubra: Develop your AI assistants locally."

"Rubra is a full-stack OpenAI implementation, designed for local development. Built for developers, Rubra is an open-source project that delivers the same simplicity and intelligence as working with ChatGPT, but with a unique focus on building AI assistants powered by a locally running open-source LLM."

"Rubra serves as a drop-in replacement for OpenAI that can be deployed locally, for private, hassle-free AI development and testing."

"Rubra includes a highly tuned local model based on Mistral, that is ideal for local development. You can also add API keys for Open AI and Anthropic to compare how your agents perform. As models improve, we'll continue to upgrade the local model to provide the best experience."

If my attempt to build a PC with a powerful GPU capable of running these models hadn't failed, I'd be trying this. If you have a chance to try it, let us know how it goes.

Introducing Rubra: Develop your AI assistants locally

#solidstatelife #ai #genai #llms #openai #opensourcemodels

waynerad@diasp.org

Reaction video to OpenAI Sora, OpenAI's system for generating video from text.

I encountered the reaction video first, in fact I discovered Sora exists from seeing the reaction video, but see below for the official announcement from OpenAI.

It's actually kind of interesting and amusing comparing the guesses in the reaction videos about how the system works from the way it actually works. People are guessing based on their knowledge of traditional computer graphics and 3D modeling. However...

The way Sora works is quite fascinating. We don't know the knitty-gritty details but OpenAI has described the system at a high level.

Basically it combines ideas from their image generation and large language model systems.

Their image generation systems, DALL-E 2 and DALL-E 3, are diffusion models. Their large language models, GPT-2, GPT-3, GPT-4, GPT-4-Vision, etc, are transformer models. (In fact "GPT" stands for "generative pretrained transformer").

I haven't seen diffusion and transformer models combined before.

Diffusion models work by having a set of parameters in what they call "latent space" that describe the "meaning" of the image. The word "latent" is another way of saying "hidden". The "latent space" parameters are "hidden" inside the model but they are created in such a way that the images and text descriptions are correlated, which is what makes it possible to type in a text prompt and get an image out. I've elsewhere given high-level hand-wavey descriptions of how the latent space parameters are turned into images through the diffusion process, and how the text and images are correlated (a training method called CLIP), so I won't repeat that here.

Large language models, on the other hand, work by turning words and word pieces into "tokens". The "tokens" are vectors constructed in such a way that the numerical values in the vectors are related to the underlying meaning of the words.

To make a model that combines both of these ideas, they figured out a way of doing something analogous to "tokens" but for video. They call their video "tokens" "patches". So Sora works with visual "patches".

One way to think of "patches" is as video compression both spatially and temporally. Unlike a video compression algorithm such as mpeg that does this using pre-determined mathematical formulas (discrete Fourier transforms and such), in this system the "compression" process is learned and is all made of neural networks.

So with a large language model, you type in text and it outputs tokens which represent text, which are decoded to text for you. With Sora, you type in text and it outputs tokens, except here the tokens represent visual "patches", and the decoder turns the visual "patches" into pixels for you to view.

Because the "compression" works both ways, in addition to "decoding" patches to get pixels, you can also input pixels and "encode" them into patches. This enables Sora to input video and perform a wide range of video editing tasks. It can create perfectly looping video, it can animate static images (why no Mona Lisa examples, though?), it can extend videos, either forward or backward in time. Sora can gradually interpolate between two input videos, creating seamless transitions between videos with entirely different subjects and scene compositions. I found these to be the most freakishly fascinating examples on their page of sample videos.

They list the following "emerging simulation capabilities":

"3D consistency." "Sora can generate videos with dynamic camera motion. As the camera shifts and rotates, people and scene elements move consistently through three-dimensional space."

This is where they have the scene everyone is reacting to in the reaction videos, where the couple is walking down the street in Japan with the cherry blossoms.

By the way, I was wondering what kind of name is "Sora" so I looked it up on behindthename.com. It says there are two Japanese kanji characters both pronounced "sora" and both of which mean "sky".

"Long-range coherence and object permanence." "For example, our model can persist people, animals and objects even when they are occluded or leave the frame. Likewise, it can generate multiple shots of the same character in a single sample, maintaining their appearance throughout the video."

"Interacting with the world." "Sora can sometimes simulate actions that affect the state of the world in simple ways. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks."

"Simulating digital worlds." "Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity."

However they say, "Sora currently exhibits numerous limitations as a simulator." "For example, it does not accurately model the physics of many basic interactions, like glass shattering."

This is incredible - ThePrimeTime

#solidstatelife #ai #genai #diffusionmodels #gpt #llms #computervision #videogeneration #openai

waynerad@diasp.org

GitHub Copilot causes code churn? The term "code churn" is a fancy way of saying Copilot writes crappy code. Copilot writes crappy code, developers fail to notice it (at first), check it in, then discover it's crappy (within 2 weeks -- that's the arbitrary time window chosen for the study), causing them to go in and fix it, thus causing the code to "churn", get it?

Copilot Causes Code Churn? This Study Is Concerning... Theo - t3․gg

#solidstatelife #ai #genai #llms #openai #copilot #developers