#discovfefe

dredmorbius@joindiaspora.com

So, how big is all of Google+ Communities anyway?

Not how many communities (8 million and a skosh). Not how many actual users (a few tens of millions). But posts and text.

(I'm skipping the boring stuff like images, which blow things up a lot, but we can get to that later if you'd like.)

Suppose someone walked up to you and asked if you'd like a full copy of the Google+ Communities post archive. A few thoughts might occur to you, one being "where would I put that?"

I mean, Google, GOOGLE SCALE. Hyuuge!! Right?

Maybe ... not. At least by modern hardware standards

So, back in mid-December I sampled 36,000 randomly-selected G+ communities and got some information on them -- name, description, member count. And the ten most recent posts, along with the elapsed date range between the newest and oldest posts. So what does that give us?

First off, only 4,465 of the 36,000 communities had a full ten posts in their history. The others either never had more than 9 posts submitted, or had them purged (user deletion, spam, other actions, stuff). I'll treat any community with fewer than 10 posts as effectively zero, to simplify things.

The elapsed time gives me a post rate, which is how many posts are submitted over any given time interval. Weeks become handy, and the rate is about 1 post/wk, on average (1.0046/wk, if you want to be precise).

My 36k sample represents about 1 of every 223 actual communities, so we can multiply 4,465 by 223 and get ... about a million (996,000).

Gee, about a million, and about a post a week, so a million posts a week? Yeah, but let's be precise:

You have: 996000 * 1.0046
You want:
        Definition: 1000581.6

Yeah, that's pretty much a million posts/wk.

G+ Communities launched in December, 2012, and will be shut down in April, 2019. I'll simplify again and take Jan 1 2013 - December 31 2018, or six years of 52 weeks: 312 million posts.

But how long is a post?

I'm punting here (though ... actually I do have some data, come to think), and took a quick look at #Discovfefe, I mean, Google+ Discover. Grabbing a random few posts, they tend to run about 20-40 words typically, with some of the longer ones weighing in at 100 - 450 words. Mostly, though, about Tweet sized, which probably reflects a number of factors. Call it 250 bytes.

312 million posts * 250 bytes: 78 GB.

Yes: "Google-scale" source text for Google+ Communities is probably under 100 GB total storage.

Caveats, civet cats, and all that jazz

The short-cuts I took above probably overstate the storage estimate -- Communities are created over time, they did not all exist for the entire life of G+ communities, and posting rates may have changed. The "elapsed" interval also doesn't count communities which have stopped generating new posts.

The page weight is considerably higher. After you add in all the HTML, CSS, and Javascript, a G+ article is about 800 kB of data, and additional image assets are more. That balloons the archive size out to about 250 TB or so.

And it's likely that a fair amount of data was submitted but has been removed -- spam, deleted accounts, and the like.

And even if you're only counting text, there's some post-level metadata: the author, date, communityID, and related bits, which pad out the data requirements slightly. These numbers are rough. But the purpose is to give an approximate sense of the scale.

And photos. About 30% of posts seem to have an image attached. G+ has a max size I'd need to look up -- 2120 x 1192, apparent. But you're looking at about 93 million images or so, roughly. Some can be quite large. This beauty (and it is pretty) is 7.4 MB, at about 4k x 6k pixels, 24 MP raw. And this shot of Tulsi Gabbard as a link hero is about 1060x600 pixels.

SanDisk have one of the better image size / storage capacity references I've seen. In 12MP format (3k x 4k pixels), 1 GB can hold about 238 images, 128 GB, over 30,000.

Put it on my bill, said the duck

How much did Google pay for all of this?

No, really: How much did Google pay for all of this?

Because one aspect of this whole fiasco I kind of don't begrudge is that a handful of us got a fairly nice playground for a while. That it's getting shuttered wasn't a huge surprise. How it's being shuttered .... Well, I've written about that elsewhere.

Now, if only I knew someone who might be able to tell me the answer....

#google #socialMedia #capacityPlanning #gPlusRefugees