提取以ID(共4字符,首字符为数字,其余三个字符为数字或字母),chain或region开始的行:
1i95
complexed with ede, mg, wo2, zn
chain q [62030]
1e7z
contains C-terminal His tag
complexed with hg
chain a [25318]
1khi
region a:103-173 [77409]
1fgu
the N-terminal two domains free
region a:181-298 [25296]
region a:299-426 [25297]
region b:181-289 [25298]
region b:298-426 [25299]
1hnz
complexed with hyg, mg, zn
chain q [25354]
1gd7
chain a [60441]
chain b [60442]
chain c [60443]
chain d [60444]
$file = $ARGV[0];
open(FH, $file) || die "Can not open $file: $!\n";
@lines = <FH>;
@extract = grep(/(^\d\w{3}$)|(^(chain|region))/, @lines);
print @extract;