Find duplicate records in text file

zJun's Tech Weblog

Example:
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

UNIX:

display the no of occurance and the record
> sort f1.txt|uniq -c
   2 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
   1 aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
   2 tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

display only the duplicate records
> sort f1.txt|uniq -d
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

display distinct records
> sort f1.txt|uniq
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452

Reference:

How to find Duplicate Records in a text file

Shell: How To Remove Duplicate Text Lines
How to Remove Duplicate Lines in Unix

Windows:

Notepad++ can sort by line, and remove the duplicate lines at the same time.

Open the menu under: TextFX-->TextFX Tools
Make sure "sort outputs only unique..." is checked
select a block of text (ctrl-a to select the entire document).
click "sort lines case sensitive" or "sort lines case insensitive"

Reference:

remove duplicates from a text file in free editor

posted on 2012-04-11 12:10 zJun's帛罗阁阅读(481) 评论(0) 编辑收藏所属分类: 开发环境

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: MINT 13比较快的国内源 Find duplicate records in text file Windows下安装 Rails Eclipse下设置github开发环境 Ubuntu 安装 Eclipse Use FindBugs Plugin to generate a report Eclipse 的 JSEclipse 插件 Eclipse的properties插件 [Eclipse] The type XXX cannot be resolved. It is indirectly referenced from required .class files错误..... 一个最简单的定时任务Quartz的例子

Dev@Free

Find duplicate records in text file

导航

统计

常用链接

留言簿(15)

随笔分类

随笔档案

相册

收藏夹

博客

文档

站点

论坛

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜