Computer Question - awk, sed, grep or anything. - Singletrack World Magazine April 7, 2011

Overview Chat Bike Members News Women

This topic has 29 replies, 12 voices, and was last updated 13 years ago by jcromton.

Viewing 30 posts - 1 through 30 (of 30 total)

Computer Question – awk, sed, grep or anything.
jcromton
Free Member

Hi, it’s a very simple problem I’d like to solve, I know this is probably not the best place to ask, but I’m not ready yet to sign up to a new forum.

I have a list of numbers:
1
1
…
1
2
2
…
2
…
n
n
…
n

and I would like to replace the first occurrence of n with 1, the last occurrence of n with 3 and all n in between with 2, like this:

1
2
2
…
2
3
1
2
2
…
2
3
…
1
2
2
…
3

n is around 300 and the list is 16000 so it’s not massive but may well become more massive in the future.

I suspect’awk’ would be able to perform this task? This would be my preferred method before I try to use matlab.

Thanks for reading, and thanks in advance of any help or suggestions.

Chris

Posted 13 years ago

allthepies
Free Member

perl FTW

Posted 13 years ago

jimmyjames
Free Member

This would be trivial in Excel. Can advise if you have that.

Posted 13 years ago

Fresh Goods Friday 702: The How Many Film References Edition

geoffj
Full Member

Choose your text wrangling tool of choice or use Excel.
In excel load the numbers in column a with an index/key in column b. A couple of simple forumlae will have you on your way.

Posted 13 years ago

brassneck
Full Member

awk is great for search and replace but you’re going to need a tasty regex to sort that out, well beyond me I’m afraid.

Does it have to be done in batch? Wondering if dumping it to Excel might be easier.

Posted 13 years ago

TheBrick
Free Member

my first attempt, can’t remember how to edit a specified line in sed but you can see what I’m getting at. NOT TESTED!!!

file=infile.txt

firstoccurence=wc -l $file
lastoccurence=0

#grep gives us the line numbers where 'n' occurs in the format $file:x:how-n-occurs so we strip out the line number and find the first and last occurrence. Where x is the line number.

for i in grep -in 'n' $file
do
#remove front bit
tmp=${i#$file:}
#remove back bit
tmp=${tmp%:*}

if [ $tmp -lt $firstoccurence ]
then
firstoccurence=${tmp%:}
fi

if [ $tmp -gt $lastoccurence ]
then
lastoccurence=${tmp%:}
fi

done

#replace first occurrence with 1
some sed or awk command for line $firstoccurence

#replace last occurrence with 3
some sed or awk commad for line $lastoccurence

#replace all middle occurrence with 2
sed -i 's/n/2/g' $file

exit 0

Posted 13 years ago

TheBrick
Free Member

worked out the missing sed commands in above post.

#replacing first occurrence

sed $firstoccurence's/.*/1/' $file>tmp.txt;mv tmp.txt $file

and

#replacing last occurrence

sed $lastoccurence's/.*/3/' $file>tmp.txt;mv tmp.txt $file

so stitch it together and check.

p.s.

there should be back quotes around the wc -l $file command above and the grep command in the for loop but they are not showing for some reason.
e.g
firstoccurence=[backquote]wc -l $file[backquote]

Posted 13 years ago

TheBrick
Free Member

just remembered you will have to add in a course of action for when there is only one occurrence of n in the file and hence.

$firstoccurence -eq $lastoccurence

which can be a simple find replace with sed of the value you wish n to be.

Posted 13 years ago

jcromton
Free Member

Thanks for the replies, I’ve never really used Excel and learning this way of doing it would be more beneficial in the long run I feel.

Wow, thanks TheBrick, that’s some impressive script, can you confirm the first for loop as being:

for i in grep '-in 'n' $file'

I’m not really following how we’re defining n.

Thanks again for taking the time.

EDIT: p.s. it’s a lot more complicated than I thought doing it this way!

Posted 13 years ago

Aidy
Free Member

So if you’ve got 1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3?

What happens if there’s only 1 or 2 occurrences of the number?

Posted 13 years ago

tonyd
Full Member

I have to run but gave it a quick try, this should work:

#!/bin/sh
infile=/tmp/numbers.in
outfile=/tmp/numbers.out
first=grep -in n $infile | head -1 | cut -d: -f1
last=grep -in n $infile | tail -1 | cut -d: -f1
# replace first occurence, then last,then inbetweeners
sed -e “${first}s/n/1/” -e “${last}s/n/3/” -e ‘s/n/2/g’ $infile > $outfile

Give it a go and let me know?!

Posted 13 years ago

allthepies
Free Member

scary shell skilz here 🙂

Posted 13 years ago

TheBrick
Free Member

sorry I’ve misread your 1st post slightly but we can fix that.

can you confirm the first for loop as being:

for i in grep ‘-in ‘n’ $file’

nearly it’s

for i in [backquote]grep -in ‘n’ $file[backquote]

where [backquote] is in the table on this page called “Command substitution” http://www.grymoire.com/Unix/Quote.html or here as grave accent http://en.wikipedia.org/wiki/Grave_accent.

I originally thought you had a file with numbers and the letter “n” which required the 1st instance of n to be replaced by 1 and the last by 3 and all others replaced by 2. I was thinking your n was some version of NaN for some reason. So my script is useless.

Let me think about and I’ll get back to you.

Posted 13 years ago

TheBrick
Free Member

re reading your first post I’m unsure of what you are trying to do exactly. You’re example is not clear to me.

1
1

goes to

1
2
2

?

Posted 13 years ago

jcromton
Free Member

Aidy hit the nail on the head in his post:

1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3

There will never be less than 3 occurrences of the same number.

tonyd, thanks for that code too, unfortunately it just seems to copy the infile to the outfile and change nothing. I’ll have a good look.

Thanks again guys

Posted 13 years ago

Aidy
Free Member

#!perl

open FID, “$ARGV[0]” or die “Can’t open file”;

my @numbers;
my %uniq;
while (<FID>) {
chomp;
push @numbers, “2[$_]”;
$uniq{$_}++;
}

close FID;

my $n = “@numbers”;

foreach my $i (keys %uniq) {
$n =~ s/^(.*?)2\[$i\]/${1}1/;
$n =~ s/(.*)2\[$i\](.*?)\z/${1}3${2}/;
$n =~ s/\[$i\]//g;
}

print “$_\n” foreach split / /,$n;

There’s probably a nicer way of doing that.

Posted 13 years ago

jcromton
Free Member

Hi Aidy, I’ve never used perl before. My open line is:

open FID, "$numbers.txt" or die "Can't open file";

where numbers.txt is the file with the relevant numbers in. is that right?

EDIT: Congratulation Aidy, you win! (removed $)

Posted 13 years ago

oldnpastit
Full Member

Do the groups of numbers overlap?

i.e. do you ever expect to get input that looks like:

1 1 2 3 2 1

If it’s always just “1 1 1 2 2 2 2 3 3 3 4 4 4” then this might work:
#!/usr/bin/perl -w

use strict;

my $last;
while (<>) {
my $n = $_;
chomp $n;
if ($n != $last) {
if (defined $last) {
print “3\n”;
}
print “1\n”;
$last = $n;
} else {
print “2\n”;
}
}
print “3\n”;

EDIT: good interview question 🙂
EDIT: how do you put hard spaces or tabs into this forum?

Posted 13 years ago

jcromton
Free Member

oldnpastit: the groups of numbers will never overlap.

I have literally not even the faintest idea of perl so any sed/awk scripts would be very useful but I feel I’ve pushed it far enough already! Also, is there a way out outputting to a new file?

You’ve been great thank you so much.

Posted 13 years ago

Aidy
Free Member

You run it as “perl perlscript.pl filename.txt”

And it prints output to stdout, i.e. “perl perlscript.pl filename.txt > output.txt”

Posted 13 years ago

Aidy
Free Member

And I didn’t realise there were non-overlapping numbers. That’s an easier problem.

Posted 13 years ago

tonyd
Full Member

Oh sorry – I read the OP to mean that you actually had n’s in the file!

Posted 13 years ago

Aidy
Free Member

#!perl

open FID, $ARGV[0] or die “Can’t open file”;

my @l = map {my $i = <FID>} 0..1;
chomp(@l);

print “1\n”;

while (<FID>) {
chomp;
push @l, $_;
print $l[1] != $l[2] ? 3 : $l[1] != $l[0] ? 1 : 2, “\n”;
shift @l;
}
close FID;

print “3\n”;

Although, the post above is probably neater.

Posted 13 years ago

richP
Full Member

if you want to use awk then I think that the following will work:
awk “BEGIN{old=-1000};{nxt=$1};{ if (NR>1) {if (cur!=old) {print 1} else {if (nxt!=cur) {print 3} else {print 2}}};old=cur;cur=nxt};END{print 3}” input.txt > output.txt

on linux you probably should chage the double quotes to single.

The above relies on there being no overlaps in the groups of numbers and also blank lines will probably need to be stripped out beforehand

Posted 13 years ago

buzz-lightyear
Free Member

In Excel…

Put 9999999 in Column A, Row 1
Paste your number series in Column A, starting with Row 2.

In Column B, row 2 enter the formula: =IF(A2=A1, IF(A2=A3,2, IF(A2<>A3,3)),1)

Drag this all the way down Column 2, which copies the formula into each cell, transposing the row numbers automatically.

Column B should now contain the results. Note that Column B row 1 is left empty.

Did you say 16,000 numbers? Oops, Excel wont work.

Posted 13 years ago

oldnpastit
Full Member

#include <stdio.h> #include <string.h> int main(void) { ...char buf[16]; ...static char lastbuf[16]; ...while (fgets(buf, sizeof(buf), stdin)) { ......if (!buf[0]) .........continue; ......if (strcmp(buf,lastbuf) != 0) { .........if (lastbuf[0]) ............printf("3\n"); .........printf("1\n"); .........strcpy(lastbuf, buf); ......} else { .........printf("2\n"); ......} ...} ...printf("3\n"); ...return 0; }

Posted 13 years ago

DaveyBoyWonder
Free Member

shell = awesome
perl = awesomer

Posted 13 years ago

jcromton
Free Member

Guys, thank you so much for your help on this. I’ll apply these today and hopefully make some worthwhile contributions to Marine Science.

Chris

Posted 13 years ago

geoffj
Full Member

Did you say 16,000 numbers? Oops, Excel wont work.

Why not?

Posted 13 years ago

jcromton
Free Member

RichP – that works great, apart from it starts with a 2. Everything else is perfect. See my other thread if you fancy more awk related banter.

Posted 13 years ago

Viewing 30 posts - 1 through 30 (of 30 total)

The topic ‘Computer Question – awk, sed, grep or anything.’ is closed to new replies.

Overview Chat Bike Members News Women

Sram T-Type XX SL 32t Chainring and XX SL 116 link Chain