Viewing 30 posts - 1 through 30 (of 30 total)
  • Computer Question – awk, sed, grep or anything.
  • jcromton
    Free Member

    Hi, it’s a very simple problem I’d like to solve, I know this is probably not the best place to ask, but I’m not ready yet to sign up to a new forum.

    I have a list of numbers:
    1
    1

    1
    2
    2

    2

    n
    n

    n

    and I would like to replace the first occurrence of n with 1, the last occurrence of n with 3 and all n in between with 2, like this:

    1
    2
    2

    2
    3
    1
    2
    2

    2
    3

    1
    2
    2

    3

    n is around 300 and the list is 16000 so it’s not massive but may well become more massive in the future.

    I suspect’awk’ would be able to perform this task? This would be my preferred method before I try to use matlab.

    Thanks for reading, and thanks in advance of any help or suggestions.

    Chris

    allthepies
    Free Member

    perl FTW

    jimmyjames
    Free Member

    This would be trivial in Excel. Can advise if you have that.

    geoffj
    Full Member

    Choose your text wrangling tool of choice or use Excel.
    In excel load the numbers in column a with an index/key in column b. A couple of simple forumlae will have you on your way.

    brassneck
    Full Member

    awk is great for search and replace but you’re going to need a tasty regex to sort that out, well beyond me I’m afraid.

    Does it have to be done in batch? Wondering if dumping it to Excel might be easier.

    TheBrick
    Free Member

    my first attempt, can’t remember how to edit a specified line in sed but you can see what I’m getting at. NOT TESTED!!!

    file=infile.txt

    firstoccurence=wc -l $file
    lastoccurence=0

    #grep gives us the line numbers where 'n' occurs in the format $file:x:how-n-occurs so we strip out the line number and find the first and last occurrence. Where x is the line number.

    for i in grep -in 'n' $file
    do
    #remove front bit
    tmp=${i#$file:}
    #remove back bit
    tmp=${tmp%:*}

    if [ $tmp -lt $firstoccurence ]
    then
    firstoccurence=${tmp%:}
    fi

    if [ $tmp -gt $lastoccurence ]
    then
    lastoccurence=${tmp%:}
    fi

    done

    #replace first occurrence with 1
    some sed or awk command for line $firstoccurence

    #replace last occurrence with 3
    some sed or awk commad for line $lastoccurence

    #replace all middle occurrence with 2
    sed -i 's/n/2/g' $file

    exit 0

    TheBrick
    Free Member

    worked out the missing sed commands in above post.

    #replacing first occurrence

    sed $firstoccurence's/.*/1/' $file>tmp.txt;mv tmp.txt $file

    and

    #replacing last occurrence

    sed $lastoccurence's/.*/3/' $file>tmp.txt;mv tmp.txt $file

    so stitch it together and check.

    p.s.

    there should be back quotes around the wc -l $file command above and the grep command in the for loop but they are not showing for some reason.
    e.g
    firstoccurence=[backquote]wc -l $file[backquote]

    TheBrick
    Free Member

    just remembered you will have to add in a course of action for when there is only one occurrence of n in the file and hence.

    $firstoccurence -eq $lastoccurence

    which can be a simple find replace with sed of the value you wish n to be.

    jcromton
    Free Member

    Thanks for the replies, I’ve never really used Excel and learning this way of doing it would be more beneficial in the long run I feel.

    Wow, thanks TheBrick, that’s some impressive script, can you confirm the first for loop as being:

    for i in grep '-in 'n' $file'

    I’m not really following how we’re defining n.

    Thanks again for taking the time.

    EDIT: p.s. it’s a lot more complicated than I thought doing it this way!

    Aidy
    Free Member

    So if you’ve got 1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3?

    What happens if there’s only 1 or 2 occurrences of the number?

    tonyd
    Full Member

    I have to run but gave it a quick try, this should work:

    #!/bin/sh
    infile=/tmp/numbers.in
    outfile=/tmp/numbers.out
    first=grep -in n $infile | head -1 | cut -d: -f1
    last=grep -in n $infile | tail -1 | cut -d: -f1
    # replace first occurence, then last,then inbetweeners
    sed -e “${first}s/n/1/” -e “${last}s/n/3/” -e ‘s/n/2/g’ $infile > $outfile

    Give it a go and let me know?!

    allthepies
    Free Member

    scary shell skilz here 🙂

    TheBrick
    Free Member

    sorry I’ve misread your 1st post slightly but we can fix that.

    can you confirm the first for loop as being:

    for i in grep ‘-in ‘n’ $file’

    nearly it’s

    for i in [backquote]grep -in ‘n’ $file[backquote]

    where [backquote] is in the table on this page called “Command substitution” http://www.grymoire.com/Unix/Quote.html or here as grave accent http://en.wikipedia.org/wiki/Grave_accent.

    I originally thought you had a file with numbers and the letter “n” which required the 1st instance of n to be replaced by 1 and the last by 3 and all others replaced by 2. I was thinking your n was some version of NaN for some reason. So my script is useless.

    Let me think about and I’ll get back to you.

    TheBrick
    Free Member

    re reading your first post I’m unsure of what you are trying to do exactly. You’re example is not clear to me.

    1
    1

    goes to

    1
    2
    2

    ?

    jcromton
    Free Member

    Aidy hit the nail on the head in his post:

    1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3

    There will never be less than 3 occurrences of the same number.

    tonyd, thanks for that code too, unfortunately it just seems to copy the infile to the outfile and change nothing. I’ll have a good look.

    Thanks again guys

    Aidy
    Free Member

    #!perl

    open FID, “$ARGV[0]” or die “Can’t open file”;

    my @numbers;
    my %uniq;
    while (<FID>) {
    chomp;
    push @numbers, “2[$_]”;
    $uniq{$_}++;
    }

    close FID;

    my $n = “@numbers”;

    foreach my $i (keys %uniq) {
    $n =~ s/^(.*?)2\[$i\]/${1}1/;
    $n =~ s/(.*)2\[$i\](.*?)\z/${1}3${2}/;
    $n =~ s/\[$i\]//g;
    }

    print “$_\n” foreach split / /,$n;

    There’s probably a nicer way of doing that.

    jcromton
    Free Member

    Hi Aidy, I’ve never used perl before. My open line is:

    open FID, "$numbers.txt" or die "Can't open file";

    where numbers.txt is the file with the relevant numbers in. is that right?

    EDIT: Congratulation Aidy, you win! (removed $)

    oldnpastit
    Full Member

    Do the groups of numbers overlap?

    i.e. do you ever expect to get input that looks like:

    1 1 2 3 2 1

    If it’s always just “1 1 1 2 2 2 2 3 3 3 4 4 4” then this might work:
    #!/usr/bin/perl -w

    use strict;

    my $last;
    while (<>) {
    my $n = $_;
    chomp $n;
    if ($n != $last) {
    if (defined $last) {
    print “3\n”;
    }
    print “1\n”;
    $last = $n;
    } else {
    print “2\n”;
    }
    }
    print “3\n”;

    EDIT: good interview question 🙂
    EDIT: how do you put hard spaces or tabs into this forum?

    jcromton
    Free Member

    oldnpastit: the groups of numbers will never overlap.

    I have literally not even the faintest idea of perl so any sed/awk scripts would be very useful but I feel I’ve pushed it far enough already! Also, is there a way out outputting to a new file?

    You’ve been great thank you so much.

    Aidy
    Free Member

    You run it as “perl perlscript.pl filename.txt”

    And it prints output to stdout, i.e. “perl perlscript.pl filename.txt > output.txt”

    Aidy
    Free Member

    And I didn’t realise there were non-overlapping numbers. That’s an easier problem.

    tonyd
    Full Member

    Oh sorry – I read the OP to mean that you actually had n’s in the file!

    Aidy
    Free Member

    #!perl

    open FID, $ARGV[0] or die “Can’t open file”;

    my @l = map {my $i = <FID>} 0..1;
    chomp(@l);

    print “1\n”;

    while (<FID>) {
    chomp;
    push @l, $_;
    print $l[1] != $l[2] ? 3 : $l[1] != $l[0] ? 1 : 2, “\n”;
    shift @l;
    }
    close FID;

    print “3\n”;

    Although, the post above is probably neater.

    richP
    Full Member

    if you want to use awk then I think that the following will work:
    awk “BEGIN{old=-1000};{nxt=$1};{ if (NR>1) {if (cur!=old) {print 1} else {if (nxt!=cur) {print 3} else {print 2}}};old=cur;cur=nxt};END{print 3}” input.txt > output.txt

    on linux you probably should chage the double quotes to single.

    The above relies on there being no overlaps in the groups of numbers and also blank lines will probably need to be stripped out beforehand

    buzz-lightyear
    Free Member

    In Excel…

    Put 9999999 in Column A, Row 1
    Paste your number series in Column A, starting with Row 2.

    In Column B, row 2 enter the formula: =IF(A2=A1, IF(A2=A3,2, IF(A2<>A3,3)),1)

    Drag this all the way down Column 2, which copies the formula into each cell, transposing the row numbers automatically.

    Column B should now contain the results. Note that Column B row 1 is left empty.

    Did you say 16,000 numbers? Oops, Excel wont work.

    oldnpastit
    Full Member


    #include <stdio.h>
    #include <string.h>
    int main(void) {
    ...char buf[16];
    ...static char lastbuf[16];
    ...while (fgets(buf, sizeof(buf), stdin)) {
    ......if (!buf[0])
    .........continue;
    ......if (strcmp(buf,lastbuf) != 0) {
    .........if (lastbuf[0])
    ............printf("3\n");
    .........printf("1\n");
    .........strcpy(lastbuf, buf);
    ......} else {
    .........printf("2\n");
    ......}
    ...}
    ...printf("3\n");
    ...return 0;
    }

    DaveyBoyWonder
    Free Member

    shell = awesome
    perl = awesomer

    jcromton
    Free Member

    Guys, thank you so much for your help on this. I’ll apply these today and hopefully make some worthwhile contributions to Marine Science.

    Chris

    geoffj
    Full Member

    Did you say 16,000 numbers? Oops, Excel wont work.

    Why not?

    jcromton
    Free Member

    RichP – that works great, apart from it starts with a 2. Everything else is perfect. See my other thread if you fancy more awk related banter.

Viewing 30 posts - 1 through 30 (of 30 total)

The topic ‘Computer Question – awk, sed, grep or anything.’ is closed to new replies.