#postgres
Creative ways to optimize for cloud: offsite backups edition
Backblaze, like many providers, charges not only for storage, but also for API usage. In my case that meant that I was spending more on API usage than I was on storage (3-4x).
I spent some time debugging the issue and determined that it was #wal-g, the software I use for streaming #postgres backups that was causing the issue, running 20K+ ListObjects
calls per day.
So I configured postgres to backup to a bucket on my local #S3 server (#Minio), and then configured Minio to sync that bucket to #backblaze. That incurs exactly zero charged API calls, and the problem is solved.
What I thought was the problem (#kopia, which I use for filesystem backups) is actually quite efficient wrt/charged API calls.
New users to Postgres think vacuuming the database is weird. It's not some vestigial organ that stuck around for no reason. It is intrinsic how data storage works. This is a great presentation from SCALE on how Postgres works with data. #postgres #scale21x #programming https://youtu.be/6P8JmdRJI8s?si=Ug9TaY-hNcL4JO6a
Catching up on what's new in my favorite RDBMS, Postgres, from this year's SCALE conference. #postgres #scale21x
A look at the Elephant's trunk - PostgreSQL 17
I created an iSCSI volume on my #MyCloud #NAS to hold the #postgres data that the pod uses. ISCSI is fairly fast (85MB/s writes), and more importantly it’s not stored locally on any of the #k3s nodes, so the server can move to any node and still access the data on the volume. That means that I can drain that node, upgrade it, and the downtime is limited to a few seconds.
Basically I’m creating a redundant array of inexpensive compute (I’m running on three $150 mini PCs) to go with my redundant array of inexpensive disks. If a single node or disk fails, the other nodes/disks will pick up the load until the bad unit is replaced.
Now running #postgres 16.3 on the pod. FYI, Postgres 12 is EOL as of November 2024.
https://www.postgresql.org/about/news/postgresql-163-157-1412-1315-and-1219-released-2858/
I use a single #Postgres instance in my #k8s clusters, with separate users, passwords, and databases in that instance for each application. It’s simpler to manage and reduces complexity, and my databases don’t benefit from being spread across nodes (they’re small and place few demands on the system). So the same instance hosts the databases for synapse and diaspora, for example.
Today I discovered that objects in the public
schema of each database (which is where they’re created by default) are accessible to all users on the database, unless they’re specifically revoked. So the system wasn’t as secure as I thought it was.
You can change this behavior with REVOKE ALL ON DATABASE <name> FROM PUBLIC;
. Then only users granted access (or the owner) will have access. You can also create objects in non-public schemas, but that can be challenging.
One more thing to automate.
Upgraded the pod to #postgres 16.2 and it worked perfectly, because #Postgresql always works perfectly.
I started messing about with #postgres FTS in #Diaspora.
First, create a search column:
alter table posts add column fts_text_en tsvector generated always as (to_tsvector ('english', text )) stored;
Then, create a GIN a index on the new column:
create index posts_fts_text_en_idx on posts using gin(fts_text_en);
Next, query the column:
select id from posts where fts_text_en @@ websearch_to_tsquery('english', 'Linux');
Fast, simple. Prolly should integrate it into the search bar.
I guess it’s just #postgres then.
https://medium.com/@imashadowphantom/mariadb-com-is-dead-long-live-mariadb-org-b8a0ca50a637