My current jq project: create a Diaspora post-abstracter
Given the lack of a search utility on Diaspora*, my evolved strategy has been to create an index or curation of posts, generally with a short summary consisting of the title, a brief summary (usually the first paragraph), the date, and the URL.
I'd like to group these by time segment, say, by month, quarter, or year (probably quarter/year).
And as I'm writing this, I'm thinking that it might be handy to indicate some measure of interactions --- comments, reshares, likes, etc.
My tools for developing this would be my Diaspora* profile data extract, and jq
, the JSON query tool.
It's possible to do some basic extraction and conversion pretty easily. Going from there to a more polished output is ... more complicated.
A typical original post might look like this, (excluding the subscribed_pods_uris
array):
{
"entity_type": "status_message",
"entity_data": {
"author": "dredmorbius@joindiaspora.com",
"guid": "cc046b1e71fb043d",
"created_at": "2012-05-17T19:33:50Z",
"public": true,
"text": "Hey everyone, I'm #NewHere. I'm interested in #debian and #linux, among other things. Thanks for the invite, Atanas Entchev!\r\n\r\nYet another G+ refuge.",
"photos": []
}
}
Key points here are:
entity_type
: Values "status_message" or "reshare".author
: This is the user_id of the author, yours truly (in this case in my DiasporaCom incarnation).guid
: Can be used to construct a URL in the form ofhttps://<hostname>/posts/<guid>
created_at
: The original posting date, in UTC ("Zulu" time).public
: Status, valuestrue
,false
. Also apparently missing in a significant number of posts.text
: The post text itself.
A reshare looks like:
{
"entity_type": "reshare",
"entity_data": {
"author": "dredmorbius@joindiaspora.com",
"guid": "5bfac2041ff20567",
"created_at": "2013-12-15T12:45:08Z",
"root_author": "willhill@joindiaspora.com",
"root_guid": "53e457fd80e73bca"
}
}
Again, excluding the .subscribed_pods_uris
. In most cases, reshares are of less interest than direc posts.
Interestingly, I've a pretty even split between posts and reshares (52% status_message
, that is, post).
My theory in creating an abstract is:
- Automation is good.
- It's easier to peel stuff off an automatically-created abstract than to add bits back in manually.
- The compilation should contain only public posts and exclude reshares.
Issues:
- It's relatively easy to create a basic extract:
jq '.user.posts[].entity_data | .author, .guid, .created_at, text
Adding in selection and formatting logic gets ... more complicated.
Among other factors, jq
is a very quirky language.
Desired Output Format
I would like to produce output which renders something like this for any given posts:
Diaspora Tips: Pods, Hashtags & Following
For the many Google Plus refugees showing up on Diaspora and Pluspora, some pointers: ...
https://diaspora.glasswings.com/posts/a53ac360ae53013611b60218b786018b (2018-10-10 00:45)
What if any options are there for running Federated social networking tools on or through #OpenWRT or related router systems on a single-user or household basis?
I'm trying to coordinate and gather information for #googleplus (and other) users looking to migrate to Fediverse platforms, and I'm aware that OpenWRT, #Turris (I have a #TurrisOmnia), and several other router platforms can run services, mostly #NextCloud that I'm aware. ...
https://diaspora.glasswings.com/posts/91f54380af58013612800218b786018b (2018-10-11 07:52)
The original posts can of course be viewed at the URLs shown.
What this is doing is:
- Extracting the first line of the post text itself.
- Stripping all formatting from it.
- Bolding the result by surrounding it in
**
Markdown. - Including the second paragraph, terminating it in an elipsis
...
. - Including a generated URL, based on the GUID, and here parked on Glasswings. (I might also create links to Archive.Today and Archive.Org of the original content.)
- Including the post date, with time in YYYY-MM-DD hh:mm resolution.
Including the month and year where those change might also be useful for creating archives.
Specific questions / challenges:
- How to conditionally export only public posts.
- How to conditionally export only
status_message
(that is, original) posts, rather than reshares. - How to create lagged "oldYear" and "oldMonth" variables.
- How to conditionally output content when computed Month and Year values > oldMonth and oldYear respectively. Goal is to create
## .year
and### .month
segments in output. - How to output up to two paragraphs, where posts may consist of fewer than two separate text lines, and lines may be separated by multiple or only single linefeeds
\r\n
. - Collect and output hashtags used in the post.
- Include counts of comments, reshares, likes, etc. I'm not even sure this is included in the JSON output.
There might be more, but that's a good start.
And of course, if I have to invoke other tools for part of the formatting, that's an option, though an all-in-jq solution would be handy.