Say you have a list of increasing numbers and a threshold. How do you get the highest number smaller-or-equal to the threshold and all numbers that are larger?

@loveknight@programming.dev · edit-2 9 months ago

Say you have a list of increasing numbers and a threshold. How do you get the highest number smaller-or-equal to the threshold and all numbers that are larger?

Ephera · 9 months ago

Still, I’m also wondering: Can awk or any other standard tool do something like “for deciding whether to print this line, look at the next line”?

I don’t know enough about awk to think this through to the end, but a trick from functional programming, which you might be able to apply here, is to ‘zip’ the list with itself, offset by one.

So, it might then look like this, for example:

Of course, in Bash you would probably need to implement the zipping yourself, so it might not save you much effort here…

@loveknight@programming.dev · edit-2 9 months ago

Thank you, in fact I ended up doing something that’s mathematically pretty much just that: I have the previous line stored in an auxiliary variable lastline, and it is the evaluation of the current line $0 that determines whether the previous line gets printed.

awk -v threshold=150 'BEGIN {lastline=""}
  (lastline!="" && threshold<$0){print lastline} #the additional check lastline!="" prevents an empty line at the very beginning
  {lastline=$0}
  END{print} #hardcode printing of the very last line, because otherwise it would never be printed
'

Of note, in the case where some list entries are repeated, the behavior of this script will be:

The threshold value, if it’s in the list, will always be printed just once, even if it occurs multiple times in the list, and also if it happens to be the first, last, or only entry in the list.
All larger entries will be printed exactly as often as they occur in the list. This even holds for the largest value: its last repetition will be printed via the final END{print} statement, whereas all preceding instances get printed through the statement that depends on threshold<$0.

(IIRC, it was a StackOverflow post that led me to this.)