Chan Chen Coding...

Perl Print Duplicate Line

# Find out duplicate line, if yes, print it out.
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY
LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL
 
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# ForceCommand cvs server

AcceptEnv LC_IDENTIFICATION LC_ALL

#! /usr/bin/perl
use strict;

open(FH, 'dupLine.sample');
my %seen;
while (<FH>) {
  $seen{$_}++;
}
 
while (my ($line$count) = each %seen) {
  print "$count: $line" if $count > 1;
}


Using the standard Perl shorthands:

my %seen;
while ( <> ) {
   
print if $seen{$_}++;
}

As a "one-liner":

perl -ne 'print if $seen{$_}++'

More data? This prints <file name>:<line number>:<line>:

perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'

Explanation on %seen:

  • %seen declares a hash. For each unique line in the input $seen{$_} is a scalar slot in the hash named by the the text of the line.
  • Using the postfix increment operator (x++) we take the value for our expression, remembering toincrement it after the expression. So, if we haven't "seen" the line $seen{$_} is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false.
  • Then it's incremented to 1.

So the first time we see a line, we take the undefined value which fails the if. It increments the count at the slot to 1. Thus, it is 1 for any future occurrences at which point it passes the if condition.

Now as I said above, %seen declares a hash, but with strict turned off, any variable expression can be created on the spot. So the first time perl sees $seen{$_} it knows that I'm looking for %seen, it doesn't have it, so it creates it.

An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.



-----------------------------------------------------
Silence, the way to avoid many problems;
Smile, the way to solve many problems;

posted on 2012-04-26 11:01 Chan Chen 阅读(291) 评论(0)  编辑  收藏 所属分类: Linux


只有注册用户登录后才能发表评论。


网站导航: