标签:
1.统计fasta、fa和fastq文件的长度,统计fastq的reads个数,单个reads长度,reads总长度;统计fasta文件中contig的个数,列出名称,单条的长度,以及总长度。
1.统计fasta、fa和fastq文件的长度,统计fastq的reads个数,单个reads长度,reads总长度(主要是统计总长度,其他在Linux下很简单就实现了);统计fasta文件中contig的个数,列出名称,单条的长度,以及总长度。
思路整理:这是个典型的逐行读文件,取字段,计算长度问题。
fastq简单:四行一轮,解决办法很多,可以逐行读取,逐行匹配;也可以一次读两行;输出也少,单reads长度,reads数量,reads长度总和,也没有其他有价值的信息。
fasta略微复杂:没有规律,因为序列被切成短的,只能逐行读,逐行匹配;逐行读有个问题,怎么检测结束?(这是逐行读的一个最大缺陷,你无法操纵最后一次!!!)因此,只能把最后一次放在读循环的外面,每次输出的点就在匹配title那个地方。
代码如下:
#!/usr/bin/perl #Author:zxlee #Function: compute the length of fastq or fasta, fa. #usage: `perl script_name fastq/fasta/fa_file_name`, it will show the total length, also a detail file. use strict; use warnings; my $infile = shift; #give 1st default para to it, you can go on shift to get the 2st para open IN, "<$infile" or die $!; open OUT, ">./result_len.txt" or die $!; our $total_len = 0; our $seq_num = 0; our $len; if($infile =~ /fastq$/){ while(<IN>){ next if not /^@\S+/; my $seq = <IN>; #your cannot use $_ here!!! chomp($seq); $seq_num += 1; $total_len += length($seq); print OUT "\nreads_len = $total_len\n" if $seq_num == 1; } print OUT "Total num of reads is $seq_num\n"; } elsif($infile =~ /(fasta|fa)$/){ # easy way, not use "OR" my $chr_len = 0; while(<IN>){ chomp; my $line = $_; if ($line =~ /^>(\S+)/){ print OUT "$chr_len\n" if $chr_len != 0; print OUT "$1\t"; $chr_len = 0; }else{ $len = length($line) if $total_len == 0; $chr_len += length($line); $total_len += length($line); } } print OUT "$chr_len\n"; print OUT "one line has $len\n"; } print "The total length is $total_len\n"; close(IN); close(OUT);
标签:
原文地址:http://www.cnblogs.com/leezx/p/5761424.html