Let me show you what I mean by giving an example:
# Assume we have this list of increasing numbers
140
141
145
180
190
...
# If we pick 150 as threshold, the output should consist of 145 and all larger values:
145
180
190
...
# In the edge case where 150 itself is in the list, the output should start with 150:
150
180
190
...
I guess one can always hack something together in awk with one or two track-keeper variables and a bit of control logic. However, is there a nicer way to do this, by using some nifty combination of simpler filters or awk functionalities?
One way would be to search for the line number n
of the first entry larger than the threshold, and then print all lines starting with n-1
. What command would be best suited for that?
Still, I’m also wondering: Can awk or any other standard tool do something like “for deciding whether to print this line, look at the next line”?
(In my use case, the list is short, so performance is not an issue, but maybe let’s pretend that it were. Also, in my use case, all entries are unique, but feel free to work without this assumption.)
Algorithmically, I think you’d want to use a binary search algorithm to find the index of the threshold value, or the index of where it should be. Since the value at that index is either the threshold or the first value greater than the threshold, you can check the value at that index and, if not equal to the threshold, subtract one from the index. Then it’s a matter of making a subslice of the list of values starting at that index until the end of the list.
The binary search doesn’t make sense because printing the values above the threshold takes O(n) time anyway