Let me show you what I mean by giving an example:

# Assume we have this list of increasing numbers
140
141
145
180
190
...

# If we pick 150 as threshold, the output should consist of 145 and all larger values:
145
180
190
...

# In the edge case where 150 itself is in the list, the output should start with 150:
150
180
190
...

I guess one can always hack something together in awk with one or two track-keeper variables and a bit of control logic. However, is there a nicer way to do this, by using some nifty combination of simpler filters or awk functionalities?

One way would be to search for the line number n of the first entry larger than the threshold, and then print all lines starting with n-1. What command would be best suited for that?

Still, I’m also wondering: Can awk or any other standard tool do something like “for deciding whether to print this line, look at the next line”?

(In my use case, the list is short, so performance is not an issue, but maybe let’s pretend that it were. Also, in my use case, all entries are unique, but feel free to work without this assumption.)

  • jutty
    link
    English
    2
    edit-2
    11 hours ago

    For the simpler case of threshold match and larger, with the list already ordered, you could use sed:

    echo $list | sed -n '/^150$/,$p'
    

    The edge case is tricky because “equal or lower” can’t be expressed with regex cleanly, so even an awk solution would look kinda convoluted, so I personally prefer a for loop for readability’s sake:

    for i in $list; do
        if [ "$i" -le "$threshold" ]; then
            head="$i"
        else
            tail="$tail\n$i"
        fi
    done
    
    printf '%b\n' "$head$tail"
    
    • @loveknight@programming.devOP
      link
      fedilink
      English
      1
      edit-2
      2 hours ago

      For the first code snippet to run correctly, $list would need to be put in double quotes: echo "$list" | ... , because otherwise echo will conflate the various lines into a single line.

      The for loop approach is indeed quite readable. To make it solve the original task (which here means that it should also assign a number just smaller than $threshold to $tail, if $threshold is not itself contained in $list), one will have to do something in the spirit of what @Ephera@lemmy.ml and I describe in these comments.

      • jutty
        link
        English
        13 hours ago

        The quoting oversight was due to me testing the first one only on zsh, which quotes differently.

        The second was tested on Busybox ash and dash against the input in the example. It does assign a number just smaller or equal to threshold because head is overwritten on each iteration until it lands on the last value that was less than or equal to the threshold.

        • @loveknight@programming.devOP
          link
          fedilink
          English
          12 hours ago

          Ah that’s good to know about zsh.

          Sorry regarding the second code block; it does indeed work as intended, and quite elegantly.

  • Ephera
    link
    fedilink
    English
    223 hours ago

    Still, I’m also wondering: Can awk or any other standard tool do something like “for deciding whether to print this line, look at the next line”?

    I don’t know enough about awk to think this through to the end, but a trick from functional programming, which you might be able to apply here, is to ‘zip’ the list with itself, offset by one.

    So, it might then look like this, for example:

    140,141
    141,145
    145,180
    180,190
    190,
    

    Of course, in Bash you would probably need to implement the zipping yourself, so it might not save you much effort here…

    • @loveknight@programming.devOP
      link
      fedilink
      English
      3
      edit-2
      3 hours ago

      Thank you, in fact I ended up doing something that’s mathematically pretty much just that: I have the previous line stored in an auxiliary variable lastline, and it is the evaluation of the current line $0 that determines whether the previous line gets printed.

      awk -v threshold=150 'BEGIN {lastline=""}
        (lastline!="" && threshold<$0){print lastline} #the additional check lastline!="" prevents an empty line at the very beginning
        {lastline=$0}
        END{print} #hardcode printing of the very last line, because otherwise it would never be printed
      ' 
      

      Of note, in the case where some list entries are repeated, the behavior of this script will be:

      • The threshold value, if it’s in the list, will always be printed just once, even if it occurs multiple times in the list, and also if it happens to be the first, last, or only entry in the list.
      • All larger entries will be printed exactly as often as they occur in the list. This even holds for the largest value: its last repetition will be printed via the final END{print} statement, whereas all preceding instances get printed through the statement that depends on threshold<$0.

      (IIRC, it was a StackOverflow post that led me to this.)

  • @shadow53@programming.dev
    link
    fedilink
    English
    123 hours ago

    Algorithmically, I think you’d want to use a binary search algorithm to find the index of the threshold value, or the index of where it should be. Since the value at that index is either the threshold or the first value greater than the threshold, you can check the value at that index and, if not equal to the threshold, subtract one from the index. Then it’s a matter of making a subslice of the list of values starting at that index until the end of the list.

    • @Gurfaild@feddit.org
      link
      fedilink
      English
      19 hours ago

      The binary search doesn’t make sense because printing the values above the threshold takes O(n) time anyway