#computerscience

waynerad@diasp.org

Aurora DSQL is a new "serverless" database system from Amazon Web Services.

"Aurora DSQL is a new serverless SQL database, optimized for transaction processing, and designed for the cloud. DSQL is designed to scale up and down to serve workloads of nearly any size, from your hobby project to your largest enterprise application. All the SQL stuff you expect is there: transactions, schemas, indexes, joins, and so on, all with strong consistency and isolation."

If you're wondering what they mean by "serverless":

"Here, we mean that you create a cluster in the AWS console (or API or CLI), and that cluster will include an endpoint. You connect your PostgreSQL client to that endpoint. That's all you have to do: management, scalability, patching, fault tolerance, durability, etc are all built right in. You never have to worry about infrastructure."

If you're wondering about the technology behind it, they say:

"At the same time, a few pieces of technology were coming together. One was a set of new virtualization capabilities, including Caspian (which can dynamically and securely scale the resources allocated to a virtual machine up and down), Firecracker (a lightweight VMM for fast-scaling applications), and the VM snapshotting technology we were using to build Lambda Snapstart."

"The second was EC2 time sync, which brings microsecond-accurate time to EC2 instances around the globe. High-quality physical time is hugely useful for all kinds of distributed system problems. Most interestingly, it unlocks ways to avoid coordination within distributed systems, offering better scalability and better performance."

"The third was Journal, the distributed transaction log we'd used to build critical parts of multiple AWS services (such as MemoryDB, the Valkey compatible durable in-memory database). Having a reliable, proven, primitive that offers atomicity, durability, and replication between both availability zones and regions simplifies a lot of things about building a database system (after all, Atomicity and Durability are half of ACID)."

"The fourth was AWS's strong formal methods and automated reasoning tool set. Formal methods allow us to explore the space of design and implementation choices quickly, and also helps us build reliable and dependable distributed system implementations. Distributed databases, and especially fast distributed transactions, are a famously hard design problem, with tons of interesting trade-offs, lots of subtle traps, and a need for a strong correctness argument. Formal methods allowed us to move faster and think bigger about what we wanted to build."

DSQL Vignette: Aurora DSQL, and a personal story

#solidstatelife #computerscience #databases #formalmethods

waynerad@diasp.org

"Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input changes slightly, incremental computation aims to reuse the original output and efficiently update the results. Efficiently means performing work proportional only to input and output changes."

"This paper introduces DBSP, a programming language inspired by signal processing (hence the name DB-SP). DBSP is simple, yet it offers extensive computational capabilities. With just four operators, it covers complex database queries, including entire relational algebra, set and multiset computations, nested relations, aggregations, recursive queries, and streaming computations."

The four operators are "lift", "delay", "differentiation", and "integration". "Lift" converts scalar functions to stream functions, "delay" shifts stream values, "differentiation" computes stream changes, and integration reconstructs original streams from change streams. Integration and differentiation are inverses of each other.

DBSP: Automatic incremental view maintenance for rich query languages

#solidstatelife #computerscience #informationtheory #databases

waynerad@diasp.org

Technique for adding compile-time checks to anything you can define as an invariant.

Many people have tried to make it so that buggy programs simply don't compile. But the netstack3 team has a concrete, general framework for approaching this kind of design. He broke the process into three steps: definition, enforcement, and consumption. For definition, the programmer must take something that Rust can reason about (usually types) and attach the desired property to it. This is usually done via documentation -- describing that a particular trait represents a particular property, for example. Then the programmer enforces the property by making sure that all of the code that directly deals with the type upholds the relevant invariant.

The article goes on to describe some specific techniques for doing this: adding a hidden field to a structure that is used to verify the invariant condition is being fulfilled, and zero-sized types that don't exist at run time, and have no run-time overhead, but enable the compiler to check things. The example language is Rust but these techniques may generalize to other languages and type systems.

Safety in an unsafe world [LWN.net]

#solidstatelife #computerscience #programminglanguages

harryhaller@diasp.eu

Putting the “You” in CPU

Curious exactly what happens when you run a program on your computer? Read this article to learn how multiprocessing works, what system calls really are, how computers manage memory with hardware interrupts, and how Linux loads executables. — https://cpu.land/

@🅴🆁🆄🅰 🇷🇺 — original post
Весьма годная статья о том, как работает ОС и компы, на примере #linux и с оговорками про другие ОС.
Годная тем, что расписывает сразу несколько составляющих и довольно детально, а не поверхностно с упрощениями. Читать по порядку смысла нет, а надо сходу работать с оглавлением, понимая где и о чём рассказывается. А иначе, с непривычки, можно утонуть.
Оригинал, а не перевод (перевод).
Тот редкий случай, когда автор собрала много разнородных сведений в одну кучу с разных источников и при этом оперирует исходниками ядра для пояснения отдельных моментов, но раскрывая их последовательно.
Вот для примера:
--- spoiler ---
Что насчет памяти ядра? На самом деле, ядру требуется хранить много собственных данных для отслеживания всех запущенных процессов и той же таблицы страниц. При каждом прерывании или запуске системного вызова, когда ЦП входит в режим ядра, ядро должно каким-то образом получить доступ к этой памяти.

Решение Linux состоит в том, чтобы всегда выделять верхнюю половину виртуальной памяти ядру, поэтому Linux называется ядром старшей половины (higher half kernel). Windows практикует похожую технику, а macOS… немного более сложная (прим. пер.: обратите внимание, что здесь три разных ссылки). Image/photo
...
Сама таблица страниц фактически содержится в пространстве памяти ядра! Когда чип таймера запускает аппаратное прерывание для переключения процессов, ЦП переключает уровень привилегий и переходит к коду ядра Linux. Нахождение в режиме ядра (кольцо 0 Intel) позволяет ЦП обращаться к защищенной области памяти ядра. Затем ядро может писать в таблицу страниц (которая находится где-то в этой верхней половине памяти) для повторной привязки нижней части виртуальной памяти для нового процесса. Когда ядро переключается к новому процессу и ЦП входит в режим пользователя, его доступ к памяти ядра закрывается.
...Image/photo...
В x86-64 исторически используется четырехуровневая иерархическая подкачка. В этой системе каждая запись в таблице страниц определяется путем смещения начала содержащей ее таблицы на часть (portion) адреса. Эта часть начинается со старших битов, которые работают как префикс, поэтому запись охватывает все адреса, начинающиеся с этих битов. Запись указывает на начало следующего уровня таблицы, содержащей поддеревья для этого блока памяти, которые снова индексируются следующим набором битов.

Разработчики четырехуровневой подкачки также решили игнорировать старшие 16 бит всех виртуальных указателей в целях экономии места в таблице страниц. 48 бит дают нам виртуальное адресное пространство размером 128 ТБ, что считается достаточно большим.

Поскольку первые 16 бит пропущены, "самые значащие биты" (most significant bits) для индексации первого уровня таблицы страниц фактически начинаются с 47-го, а не с 63-го бита. Это также означает, что приведенная выше диаграмма старшей половины ядра была технически неточной: начальный адрес пространства ядра должен быть изображен как середина адресного пространства размером менее 64 бит.
...
Я сказала, что x86-64 "исторически" использует четырехуровневую подкачку, потому что последние процессоры реализуют пятиуровневую подкачку. Пятиуровневая подкачка добавляет еще один уровень абстракции, а также еще 9 бит адресации для расширения адресного пространства до 128 ПБ с 57-битными адресами. Такая подкачка поддерживается Linux с 2017 г., а также последними серверными версиями Windows 10 и 11.
...
--- spoiler ---

Иначе говоря, описание «низкоуровневой магии» конечно присутствует и порой раздражает, но оно с картинками и весьма органично вписано в цельную картину.

#computers #computerscience #lang_ru @Russia@3zi.ru

waynerad@diasp.org

"The Safe C++ project adds new technology for ensuring memory safety, and isn't just a reiteration of best practices."

"Safe C++ prevents users from writing unsound code. This includes compile-time intelligence like borrow checking to prevent use-after-free bugs and initialization analysis for type safety.'"

"Sean Baxter, creator of the Circle compiler, said that rewriting a project in a different programming language is costly, so the aim here is to make memory safety more accessible by providing the same soundness guarantees as Rust at a lower cost. 'With Safe C++, existing code continues to work as always,' he explained. 'Stakeholders have more control for incrementally opting in to safety.'"

The empire of C++ strikes back with Safe C++ blueprint - The Register

#solidstatelife #computerscience #programminglanguages

waynerad@diasp.org

You may have heard of Grace Hopper, the creator of the COBOL programming language in the early years of the computer revolution -- more precisely, she invented a programming language called FLOW-MATIC in 1955 (while in the Navy) that was used by the team of people that created COBOL in 1959 (she didn't actually single-handedly create COBOL herself -- plus she later worked on standardization of FORTRAN as well as COBOL). But you've probably never heard what she sounds like. Well, this video, which evidently was a lecture given to the NSA in 1982, has mysteriously just surfaced online. What I never realized is what a sense of humor she has! She's a stand-up comic and computer scientist all in one.

In her talk, she emphasizes the importance of correct information in information systems. It may seem like a truism today, but in her day, all the attention went to hardware and software, and people didn't realize it's actually data that's the most valuable part of the system.

In her office, she banned the phrase "but we've always done it that way." You should always plan for the computers you're going to have, not the computers you have right now, or the computers you used to have. (On the flip side, later in the talk, she talks about the importance of calculating the cost of not doing something, and the benefit of sticking with standard languages and portable code -- but this too is anticipating the computers you're going to have which will support the standard languages but not the bells and whistles you're using right now.)

She explores what further exponential growth in computer power could do: weather forecasting, satellite imagery, oceanography, water management.

She shows the audience nanoseconds and microseconds. Programmers should be mindful of how many microseconds they are throwing away.

She foresees "systems of computers", the parallel processing of we have today in the form of multicore CPUs and GPUs and data centers with separation of concerns. Maybe too much of a separation of concerns, as she envisions specialized machines for databases instead of general-purpose computers, but today we use general-purposes computers for those things. We might beef them up with extra memory and network bandwidth, etc. We have truly specialized computers (ASICS) for other things (mining Bitcoin, lol), so she wasn't exactly right but she kinda still had the right idea.

She tells some stories of early computer industry security breaches.

Software costs too much to create and is too hard to maintain. In 1982, lol. Just think of the ripple effect of changes as an expected value problem. The solution is modular software with defined interfaces with named owners. Today that may seem like an intuitive solution, but I've never seen it explained with probability math.

I wish I had seen this talk in 1982. My 1982 self would have found it inspirational. (I would've been 11 -- back then the NSA was known as No Such Agency and never would've let me attend a talk). Even watching it now, I found it surprisingly riveting. Grace Hopper deserves her reputation as a computer pioneer. For 1982, she was surprisingly prescient.

NSA releases internal 1982 lecture by computing pioneer Rear Admiral Grace Hopper - The Black Vault Originals

#solidstatelife #computerscience

waynerad@diasp.org

Douglas Crockford says we should quit using JavaScript.

Douglas Crockford is the author of JavaScript: The Good Parts, creator of JSLint, the linter that has become the basis for JSHint, ESLint, etc -- all the JavaScript linters -- and creator of the JSON data interchange format.

"There are lots of terrible mistakes in the way that the web works, in the way our operating systems work, and we can't get new ones. We're just stuck with this crap and they keep piling new features on everything and the new features always create new problems and it doesn't have to be like that. We could be using a really clean operating systems with really clean languages and really clean runtimes and doing all this stuff in a much more reliable way. But we don't seem to want to do that."

I have a theory as to why we never seem to want to do that. But first, an explanation (quickly) of how I came to have some appreciation for what I've come to call "minimalism" -- the idea that in programming languages, more is not better. I've told the story before so you can skip if you've already heard it. Years ago I was working on a C++ project and my boss said, "Hey Wayne, so-and-so has left the company. You're taking over his code and we need you to fix some bugs and add some new features immediately." I went into his code... and couldn't even read it. Like, at all. It was like it was written in C++. I thought I knew C++, but no -- I realized that C++ was such a huge language (the official spec was something like 1,100 pages if memory serves -- it's probably even bigger now) that every developer uses a subset of C++. My subset and his subset had little overlap. This is not a recipe for long-term maintainability. For long term maintainability, one of the things you want is what I've come to call "ambiguity reduction". It's important to realize this is about the human reading the code, not the machine. To the machine, everything is always deterministic, no matter how complex. For the human, you want to be able to read a line of code and know exactly what it does. This makes it easy to reason about the program's behavior. This means not having hidden "magic" in programming language features. The more of these "magic" features you have to memorize, and the more complex the rules you have to memorize, the more liklihood you have ambiguity when you read lines of code.

Ok, now to my theory as to why "we don't seem to want to do that" -- make really 'clean' languages. New languages are often created by new people entering the industry, who have great enthusiam for one or another set of ideas. But what they don't know is how a design decision you make today will affect your software 10 or 20 years down the line. For that, you need to have been around for 10 or 20 years, and had that experience. In other words, it's the old people that have that knowledge. They "greybeards". And whatever the equivalent term is for the gals. But because the industry has been expanding exponentially, more or less since its inception, the old people are always outnumbered by the young people. And the young people think "more is better" so they always pile on the features. (Now we have young people with LLMs.)

What do y'all think?

Why we should stop using JavaScript according to Douglas Crockford (inventor of JSON) - Honeypot

#solidstatelife #computerscience #programminglanguages

waynerad@diasp.org

DARPA wants to automate translating all C code to Rust.

DARPA is going to have a "Hybrid Proposers Day" August 26th, 2024, 10am to 2pm, in Arlington, Virginia, for potential contractors to propose solutions to "Translating All C to Rust (TRACTOR)".

"Buffer overflow vulnerabilities and other related 'memory safety' software flaws allow an attacker to inject messages that hijack control of a computer. These vulnerabilities are only possible because programs written in C and C++ don't force their developers to check conditions, such as array bounds or pointer arithmetic, for correctness. Google and Microsoft have estimated that 70% of their security vulnerabilities stem from these and other related memory safety issues. While there are a variety of approaches to mitigate these risks, newer languages, like Rust, can completely eliminate them while preserving efficiency. Unfortunately, significant and expensive manual effort is required to rewrite legacy code into idiomatic Rust."

"After at least two decades of experience applying sophisticated tools towards mitigating memory safety issues in C and C++, the software engineering community has largely concluded that bug finding tools are not sufficient. Rather, the consensus is that it is preferable to use 'safe' programming languages that reject unsafe programs at compile time."

"The TRACTOR program aims to achieve a high degree of automation towards translating legacy C to Rust, with the same quality and style that a skilled Rust developer would employ, thereby permanently eliminating the entire class of memory safety security vulnerabilities present in C programs. Performers might employ novel combinations of software analysis (e.g., static analysis and dynamic analysis), and machine learning techniques (e.g., large language models)."

Translating All C to Rust (TRACTOR)

#solidstatelife #ai #genai #llms #programmingnlanguages #computerscience

waynerad@diasp.org

"Rust is the fastest-growing programming language, with its developer community doubling in size over the past two years, yet JavaScript remains the most popular language with 25.2 million active developers, according to the results of a recent survey."

"Python has overtaken Java as the second most popular language, driven by the interest in machine learning and AI."

"Meanwhile, the Go language saw its developer population grow by 10% over the last year."

"Objective-C has stagnated for the last two years."

"Swift has seen a small growth over the past 12 months (5%) to 4.6 million developers, which led to it being overtaken by Go."

Rust growing fastest, but JavaScript reigns supreme

#solidstatelife #computerscience #programminglanguages

waynerad@diasp.org

"Neurallambda".

"The Problem: My premise all comes down to 'reasoning', and the lack thereof, in current AI models. I'll provide my working definition of 'reasoning', but for a moment, please bear with a couple examples of reasoning failures."

"Transformer models cannot reason."

"Diffusion models cannot reason."

"AI is currently like a living textbook."

"What is Reasoning? Reasoning is the ability to know true things without having learned them. It is building knowledge/predictions/retrodictions/actions atop principles, instead of evidence."

"What are programs?" "Turing Machines are machines capable of executing programs which can calculate anything calculatable. Your computer is a Turing machine (in the limit of infinite memory and time). I'd also suggest that your conscious mind is Turing Complete."

"Tiers of 'Programming Ability' / 'Reasoning Ability'":

"1. An AI can execute programs"

"2. An AI can verify traits of a program"

"3. An AI can generate novel programs during training"

"4. An AI can generate novel programs post training"

"So far, this library provides an existence proof up to Level 1. It contains code which can execute arbitrary programs, written in a custom lisp dialect, in a fully differentiable setting. (Some tantalizing tests have proven up to Level 3, that AI can learn novel programs to solve toy problems via SGD, but, there are still frontiers of research here)."

By "this library", he is talking about his creation "Neurallambda". What is Neurallambda? It's a dialect of lisp designed to be generated by AI systems but at the same time be human-readable. It also has the important attribute that all the code generated using it is "differentiable". That means the code itself can be incorporated into a stochastic gradient descent model. That's what the "SGD" stands for above. In this form, Neurallambda code can be deterministically translated into tensors, compiled or interpreted, and then the resulting tensors be read back out, and presented in human-readable form.

What do y'all think? Is this a path to computer reasoning?

Neurallambda

#solidstatelife #ai #codellms #computerscience

waynerad@diasp.org

"PostgreSQL and Databricks founders join forces for DBOS to create a new type of operating system"

Funny, I was just watching a video earlier today about how Postgres has expanded from a database system to a complete backend stack.

But here, they're talking about something different.

"Today DBOS announced that it has raised $8.5 million in seed funding as well as the launch of its first product, DBOS Cloud, which provides a new type of cloud-native operating system for cloud application deployment."

"Today, a database is a type of application that runs on top of an operating system, which in the cloud is often Linux. DBOS takes a radically different approach to operating systems by running the operating system on top of a high-performance database."

"Operating system services, such as messages, scheduling and file operations, those are all written in SQL on top of a very high-performance OLTP DBMS [Online Transaction Processing Database Management System]."

"Taking aim at Linux and the Kubernetes container orchestration system and the etcd key value store, they say."

This isn't the first time I've heard of someone saying a database should be at the heart of an operating system. But it's the first time I've heard of anyone making a serious attempt to do it.

PostgreSQL and Databricks founders join forces for DBOS to create a new type of operating system - VentureBeat

#solidstatelife #computerscience #operatingsystems #databases

waynerad@diasp.org

"Mojo vs Rust: is Mojo faster than Rust?"

"Rust was started in 2006 and Swift was started in 2010, and both are primarily built on top of LLVM IR. Mojo started in 2022 and builds on MLIR (Multi-Level Intermediate Representation), which is a more modern 'next generation' compiler stack than the LLVM IR approach that Rust uses. There is a history here: our CEO Chris Lattner started LLVM in college in Dec 2000 and learned a lot from its evolution and development over the years. He then led the development of MLIR at Google to support their TPU and other AI accelerator projects, taking that learning from LLVM IR to build the next step forward: described in this talk from 2019."

"Mojo is the first programming language to take advantage of all the advances in MLIR, both to produce more optimized CPU code generation, but also to support GPUs and other accelerators, and to also have much faster compile times than Rust. This is an advantage that no other language currently provides, and it's why a lot of AI and compiler nerds are excited about Mojo. They can build their fancy abstractions for exotic hardware, while us mere mortals can take advantage of them with Pythonic syntax."

The article goes on to describe Mojo's native support for SIMD which stands for "Single Instruction, Multiple Data" and refers to special instructions that have been part of CPUs for a long time but are hard to use.

Mojo frees memory on the last use of an object, instead of waiting for when an object goes out of scope, a subtle difference that makes a big difference in the field of AI, "where freeing an object early can mean deallocating a GPU tensor earlier, therefore fitting a larger model in GPU RAM." It's also advantageous in a type of optimization called tail call optimization that applies to recursive functions.

Mojo vs Rust: is Mojo faster than Rust?

#solidstatelife #ai #computerscience #programminglanguages #python #mojo #rust

waynerad@diasp.org

"Flying Carpet: Send and receive files between Android, iOS, Linux, macOS, and Windows over ad hoc WiFi. No shared network or cell connection required, just two devices with WiFi chips in close range."

"Don't have a flash drive? Don't have access to a wireless network? Need to move a file larger than 2GB between different filesystems but don't want to set up a network share? Try it out!"

Interestingly, if you scroll down, you'll find this app was ported from Go to Rust, because of problems with the Go version. Turns out this wasn't a good use case for Go.

"There were several issues I didn't know how to solve in the Go/Qt paradigm, especially with Windows: not being able to make a single-file executable, needing to Run as Administrator, and having to write the WiFi Direct DLL to a temp folder and link to it at runtime because Go doesn't work with MSVC. Plus it was fun to use tokio/async and windows-rs, with which the Windows networking portions are written. The GUI framework is now Tauri which gives a native experience on all platforms with a very small footprint. The Android version is written in Kotlin and the iOS version in Swift."

spieglt / FlyingCarpet

#solidstatelife #computerscience