#awk

Brad Koehn ☑️

February 22, 2023 11:49pm

Some poor, lost soul wrote #Wolfenstein3d in #awk.

From a terminal, ssh -t root@teso.segfault.net "awk -f /everyone/AlarmFlock/awkaster.awk”

Password is segfault.

Hackaday (unofficial)

August 25, 2022 3:00am

Coffee with Kernighan

#computerhacks #awk #computerphile #kernighan #unicode #hackaday
posted by pod_feeder_v2

Coffee With Kernighan

There was an interesting tidbit buried in a Computerphile video released last week (below the break), featuring professors [David Brailsford] and [Brian Kernighan] having a chat over coffee. Among …

Brad Koehn ☑️

August 23, 2022 7:10pm

https://arstechnica.com/gadgets/2022/08/unix-legend-who-owes-us-nothing-keeps-fixing-foundational-awk-code/

#awk #unix

Unix legend, who owes us nothing, keeps fixing foundational AWK code

Co-creator of core Unix utility, now 80, just needs to run a few more tests.

Dr. Roy Schestowitz

January 18, 2022 3:50pm

Video: BK and the early days of UNIX https://www.montanalinux.org/videos-lca-bk-early-days-of-unix.html the k in #awk

Dr. Roy Schestowitz

January 10, 2022 4:58pm

Reducing Overhead Associated With Video Production • Techrights ⚓ http://techrights.org/2022/01/10/video-production-shell-awk/ ䷉ #Techrights #GNU #Linux #FreeSW | ♾ Gemini address: gemini://gemini.techrights.org/2022/01/10/video-production-shell-awk/ #awk

Reducing Overhead Associated With Video Production

The Techrights Git server (self-hosted) has some tidier versions of the programs we use to produce videos quickly and frequently

Dr. Roy Schestowitz

December 7, 2021 8:18am

● NEWS ● #DataSwamp #BSD ☞ Using #awk to pretty-display #OpenBSD packages update changes https://dataswamp.org/~solene/2021-12-04-openbsd-package-update-report.html

Solene'% : Using awk to pretty-display OpenBSD packages update changes

Dr. Roy Schestowitz

December 3, 2021 4:17am

● [Old] ● #JoeDavis #Programming ☞ Advent of #Awk 2019 https://jo.ie/advent-of-awk-2019.html

Dr. Roy Schestowitz

November 9, 2021 10:01pm

A quick cross-file comparison with #AWK https://www.datafix.com.au/BASHing/2021-11-10.html #programming

Jojan

September 16, 2021 7:47pm

I am going to teach some colleagues to use shell. Starting with ls,cd, mkdir et al to navigate, and creating files and directories. Then some more advanced tools like grep and find, as well as a bit of sed and awk. Ending with advanced shell script. I will teach using bash, as it is default in many systems today, but thought I would mention some others and how they differ.

The idea now is to have five sessions of two hours each. I will try to make a proper outline this weekend. This is about what I have today

Session 1: Navigating in Shell, Unix file system, creating files, pipes (|, < and > and perhaps 2>&1 et al)
Session 2: grep, find, sed, tr
Session 3: aliases and .bashrc
Session 4: scripts
Session 5: more scripts?

Do you have any suggestion of what I should mention? Do you think this outline looks all right?

#bash #sed #awk #gnu #linux #unix #terminal

Dr. Roy Schestowitz

August 30, 2021 5:12am

● NEWS ● #Medium #Programming ☞ Analyzing Big Data with #grep and #awk https://medium.com/cloud-computer/analyzing-big-data-with-grep-and-awk-c07d362b6ab8

Analyzing Big Data with grep and awk

The Cloud has many features of a large, distributed, and very hard to use computer. The Cloud indeed offers to manage storage and compute…

Dr. Roy Schestowitz

August 3, 2021 4:28am

The many faces of #awk https://www.networkworld.com/article/3454979/the-many-faces-of-awk.html#tk.rss_all

The many faces of awk

The awk command provides a lot more than simply selecting fields from input strings, including pulling out columns of data, printing simple text evaluating content – even doing math.

Dana Booth

January 7, 2021 5:03am

#TCL #Awk #Sed #coding #scripting

Doc Morbius (moved to Glasswings)

March 6, 2020 9:14pm

COVID-19 A Laycat's US Outbreak Model

This is a non-expert's simple extrapolation of the past 11 days' COVID-19 experience within the US, projecting both further likely spread of the COVID-19 outbreak and the possible actual extent of infected individuals based on a presumed testing lag.

As with my earlier China extrapolation: The real message here is how quickly experience deviates below the projection here, suggesting containment efforts are effective. In the case of China, that began about two weeks after my initial post. I am a space alien cat on the Internet, not an expert.

I've probably fucked up all kinds of things. Cluebats welcomed.

How this model works

I'm using a simple exponential growth formula, and basing the expected number of cases (and deaths) from the 5 March 2020 case and death counts, based on what appears to be native community spread rates through the US from 20 February 2020 through 5 March (the period of visible community spread). This is a short window though one showing rapid growth.

It is overwhelmingly evident that the US does NOT have a solid handle on monitoring, and likely won't for at least another week, possible several. This both makes the data presented and model based on them more uncertain, and means that as monitoring improves, apparent case counts will likely increase rapidly. Again, this reflects experience in China.

Virus behaviour, population behaviour, public health measures, weather changes, sunspots, and timelords could all change things markedly.

Exponential growth function

The fomula for exponential growth is:

y(t) = a * e^(k * t)

See: https://www.mathsisfun.com/algebra/exponential-growth.html

Where:

y(t): quantity at time t
a: initial quantity
e: the natural log constant, about 2.7183
k: the grow rate per period.
t: the number of periods.

"Period" here is "days".

We can solve for k:

k = ln(y(t)/a)/t

This gives us the growth rate given two measurements t periods apart.

We can solve for t:

t = ln(y(t)/a)/k

In particular, if we solve for y(t) = 2 and a = 1, we get the doubling time.

I've written a simple gawk script which computes for k and doubling rate, and also projects the weekly (7 day) and fortnightly (14 day) growth rates.

Detection lag

A huge problem within the US is that confirmed cases are lagging actual infection dates by a substantial amount. How long that is is ... not entirely clear, though I'm going to assume a 14 day (two week) lag based on:

Initial infection is followed by a non-symptomatic period of about a week on average.
Seeking medical assistance has seen a further lag of several days in getting an appointment / performing a test.
Test results themselves take 4 days based on information I've seen.

The total lag is about 2 weeks.

I'd suggested that this could lead to as much as a 100-fold understatement of actual cases. Based on current data, that seems pessimistic: it's "only" about 47x greater than the published confirmed cases count -- a number that's moved around considerably, by the way, so don't put too much faith in that either. But it gives an indication.

We also get a doubling time of about 2.2 days, which means that however bad the situation is now, it's going to be twice as bad in a little over 48 hours. When you hear statements that the situation is "rapidly evolving" this is what is being referenced. Things are changing very quickly. Locations which may have low risk today may have a high risk in a day or two.

You should be finalising preparations and supplies runs about now, if not already.

Again: non-expert extrapolation based on early data, a simple model, and many uncertainties. I expect we'll likely see following trend, if not overshooting it, for a week or two, mostly as monitoring catches up to reality. I'm very much hoping we'll start to see a low-side numbers starting about two weeks out (18-22 March), as containment efforts begin to be effective. The caveat is that I don't see effective containment measures being enacted, certainly not on the scale that China performed starting ~22 January. In which case the projection here could well fit actual experience for longer.

As before, I'm posting this as a line in the sand of what my projection was. I hope and expect to be proved wrong on this within a couple of weeks. I'm dying to see how well this matches reality.

The professionals are apparently doing this as well

Dr. Messonier of the CDC mentioned 5 March in an NPR interview that there were numerous groups doing epidemic modelling to try to estimate the actual spread of SARS-CoV-2 within the US, though she pointedly refused to give any numbers herself. I have yet to find any published projections, but would be interested in seeing any.

The script

Hardcoded in (edit to modify) are the initial and current case counts. You'll need to supply days between these measures as well. Data are taken from Wikipedia's 2020 Coronavirus Outbreak in the United States article.

The script calcuates the growth rate, with an arbitrary high and low bound (basically assuming one day more or less error in the reported range -- it's kind of weak sauce but gives some idea of sensitivity), the doubling time, the weekly growth rate, and the 14-day growth rate.

It then produces two reports, one every day for 29 days, the other every seven days for 200 days. Both cut off if the infected population exceeds total US population, given as 330.4 million. Shown are projected deaths, cases, cases at a low or high growth rate, and as "w/ 14 day lag" the possible ground truth of total cases from which confirmed cases are drawn. I'll note that this presently exceeds 10,000 cases, and ... doubles ever 2.2 days or so. A rate which will hit 1,000,000 by 18 March.

By April 25, if present rates continue, the entire US is infected. At the WHO's 3.4% fatality rate, 11.2 million die, and given economic modelling, your retirement fund is trash.

(And then the disease may return in the fall....)

For Rest-of-world, you can substitute in values for that outbreak for a simiilar model. (I've got a separate script for this.) As values are hardcoded, it's a tad inflexible.

## Program Output

Minor reformatting aside, this is output as currently stands.

COVID-19 US Outbreak Model

Assumptions:
- init cases (2020-4-26): 14
- cases (2020-3-5): 175
- deaths (2020-3-5): 11
- daily growth rate: 1.316
- doubling time (days): 2.195
- 7 day growth: 6.83x
- 14 day growth/mon. lag: 46.59x

day	date	deaths	cases	@ lo dbl	@ hi dbl	w/ 14d lag
1	Mar 06, 2020	14	230	224	238	10,726
2	Mar 07, 2020	19	302	287	324	14,113
3	Mar 08, 2020	25	398	367	440	18,569
4	Mar 09, 2020	32	524	470	600	24,431
5	Mar 10, 2020	43	689	602	816	32,145
6	Mar 11, 2020	57	907	771	1,111	42,294
7	Mar 12, 2020	75	1,194	988	1,512	55,647
8	Mar 13, 2020	98	1,571	1,266	2,057	73,216
9	Mar 14, 2020	129	2,067	1,621	2,800	96,331
10	Mar 15, 2020	171	2,720	2,076	3,811	126,744
11	Mar 16, 2020	224	3,579	2,659	5,186	166,760
12	Mar 17, 2020	296	4,709	3,405	7,057	219,409
13	Mar 18, 2020	389	6,196	4,360	9,603	288,680
14	Mar 19, 2020	512	8,152	5,584	13,068	379,821
15	Mar 20, 2020	674	10,726	7,151	17,784	499,736
16	Mar 21, 2020	887	14,113	9,159	24,201	657,511
17	Mar 22, 2020	1,167	18,569	11,729	32,933	865,098
18	Mar 23, 2020	1,535	24,431	15,021	44,816	1,138,224
19	Mar 24, 2020	2,020	32,145	19,236	60,987	1,497,580
20	Mar 25, 2020	2,658	42,294	24,635	82,992	1,970,390
21	Mar 26, 2020	3,497	55,647	31,548	112,938	2,592,474
22	Mar 27, 2020	4,602	73,216	40,402	153,688	3,410,959
23	Mar 28, 2020	6,055	96,331	51,740	209,142	4,487,854
24	Mar 29, 2020	7,966	126,744	66,261	284,604	5,904,742
25	Mar 30, 2020	10,482	166,760	84,856	387,295	7,768,965
26	Mar 31, 2020	13,791	219,409	108,670	527,038	10,221,752
27	Apr 01, 2020	18,145	288,680	139,167	717,203	13,448,923
28	Apr 02, 2020	23,874	379,821	178,222	975,983	17,694,965
29	Apr 03, 2020	31,412	499,736	228,238	1,328,136	23,281,550

day	date	deaths	cases	@ lo dbl	@ hi dbl	w/ 14d lag
1	Mar 06, 2020	14	230	224	238	10,726
8	Mar 13, 2020	98	1,571	1,266	2,057	73,216
15	Mar 20, 2020	674	10,726	7,151	17,784	499,736
22	Mar 27, 2020	4,602	73,216	40,402	153,688	3,410,959
29	Apr 03, 2020	31,412	499,736	228,238	1,328,136	23,281,550
36	Apr 10, 2020	214,403	3,410,959	1,289,346	11,477,413	158,908,518
43	Apr 17, 2020	1,463,411	23,281,550	7,283,681	99,184,812	1,084,632,112
50	Apr 24, 2020	9,988,535	158,908,518	41,146,424	857,129,291	7,403,170,243

Source Code

https://pastebin.com/raw/Sn2jrG5f

Please note any observed errors / corrections.

Earlier

March 3: "COVID-19: A Laycat’s guide to what to consider / watch for" https://joindiaspora.com/posts/3abaa5e03fca0138fcb6002590d8e506
February 28: "A Laycat’s COVID-19 / Coronavirus Updates – February 28, 2020" https://joindiaspora.com/posts/bc04cb503c840138f4b8002590d8e506
~ January 22: "The Wuhan 2019nCoV Coronavirus Epidemic is growing by a factor of ten a week" https://joindiaspora.com/posts/6eab7f50245f0138ba5f002590d8e506

#coronavirus #covid-19 #covid19 #ncov2019 #epidemiology #epidemics #exponentialGrowth #IHopeIAmWrong #awk

Doc Morbius (moved to Glasswings)

December 29, 2019 2:17am

Stupid Awk text-processing tricks: Reframe your record and field delimiters

TL;DR: sometimes changing record / field separators can be exceptionally useful.

I've been wrestling with document conversions, from PDF, of what's really a set of structured data.[1] The tools for actually getting text out of PDFs has ... improved markedly over the years. The Poppler library's tools in particular.

But you've still got to manage the output. And what I'm getting has semantic columns, spaces, indents, text, unicode, lions, tigers, bears... All structured within multi-paged documents.

Awk's default processing model is to read a line of input at a time, and break that into fields based on whitespace.

But ... you're not limited to this.

There are a set of arguments and internal variables which can change all of this, as well as some ... suprisingly useful functions. The gawk(1) manpage and Gnu Awk User's Guide are especially helpful

Most useful to me are the RS and FS variables, and the split(s, a \[, r \[, seps\]\]) function.

RS defines the record separator. By default, that's a newline, but if what you're working with is more senisibly thought of as a page of data, well, you can set it to the "\f", that is the form-feed character (hex xOC, octal 014).

FS defines the field separator, a space (" ") by default (hex x20, octal 040). Here, it's more sensible to think of each line as an individual record.

Simply by setting these two values, suddenly I'm reading a full page of text at a time, automatically splitting that into fields consisting of a single complete line each, and setting useful values such as NF, ("number of fields"), now "number of lines" on the page.

If you've ever found yourself wanting to scroll backwards and forwards through a record ... well, now you can.

The split() function was the next realisation I had. In the LCSH file, "columns" are separated by, some testing confirms, two or more space characters. More or less.

(There are all matter of special cases in the data, but getting basic structure set ... helps a lot.)

The arguments to split(s, a \[, r \[, seps\]\]) are:

s: the source string. Here, an input line from my raw text file.
a: the results array. This is conveniently cleared (as is seps, hold tight) when invoked.
r: the field separator regular expression. Since I'm looking for two or more spaces, " \{2\}" works for me here.
seps: another array, this time consisting of the separators between the fields. Also cleared as is a.
return value: the number of fields extracted.

The square braces mean that some of those arguments are optional -- if not supplied, 'r' is the default field separator, and the separators themselves are discarded.

So suddenly I've got the means to access a page of lines which I can split into columns and keep track of the gaps between them, as well as counts of pages, lines, columns, gaps, lengths of columns and gaps, and All That Other Jazz which makes figuring out the Pieces to the Puzzle possible.

Since the entire LCSH collection is about 760,000 entries, getting a script to do this is Much Easier (and faster, and more replicable) than Trying to Do This by Hand.

I suspect this isn't an especially well-hidden secret, but I'd been finding lots of nothing looking for ways of rescoping the text-extraction problem. Reframing the data as "lines in page" rather than "text on lines" makes the concepts, and opportunities, of working with the data vastly more tractable. Often the trick to solving a problem is one of framing it the right way, and that's exactly what I'm able to do here.

I suspect the notion could be expanded to the point of inhaling complete files in a single fell swoop, which is a frequently-applied method in Perl. I don't need to do that yet, but should I need to ... the option seems to exist: set RS to EOF (decimal / hex / octal 4 / x04 / 004). Which I may yet play with.

Notes:

Library of Congress Classiification and Subject Headings definitions. Freely available ... as PDFs. See \u003chttps://www.loc.gov/aba/publications/FreeLCC/freelcc.html\u003e and \u003chttps://www.loc.gov/aba/publications/FreeLCSH/freelcsh.html\u003e.

#LCC LoCCS #LCSH #LoC #Libraries #Classifications #Ontologies #awk #gawk #TextExtraction

0 Persons are tagged with #awk

#awk

Coffee with Kernighan

COVID-19 A Laycat's US Outbreak Model

How this model works

Exponential growth function

Detection lag

The professionals are apparently doing this as well

The script

Source Code

Earlier

Stupid Awk text-processing tricks: Reframe your record and field delimiters