linux下用awk计算fasta序列的长度
fasta序列文件data.fa1
2
3
4
5
6
7
8>Gorai.004G111100.1
ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCACGCCAAATTTGATAACAGAACTGAGTTTTC
>Gorai.004G111100.2
ATGTTTTTCATGCTCCGGTGGACAAGATACTCTGGGATGCCGGGGAACAGTTTTTCCTTTTCTTGGCAGACATATGCACATAAAATTCTT
>Gorai.004G111100.3
ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCAC
>Gorai.004G111100.4
ATGGGAATGCATGAACTAGCAGCCAAAGTTGATGAGT
首先将fasta序列转换成一行显示,命令如下:1
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0"%":$0 }' data.fa >data2.fa
结果:1
2
3
4>Gorai.004G111100.1%ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCACGCCAAATTTGATAACAGAACTGAGTTTTC
>Gorai.004G111100.2%ATGTTTTTCATGCTCCGGTGGACAAGATACTCTGGGATGCCGGGGAACAGTTTTTCCTTTTCTTGGCAGACATATGCACATAAAATTCTT
>Gorai.004G111100.3%ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCAC
>Gorai.004G111100.4%ATGGGAATGCATGAACTAGCAGCCAAAGTTGATGAGT
长度计算:1
awk -F"%" '{print $1"\t"length($2)}' data2.fa >data3.fa
结果:1
2
3
4>Gorai.004G111100.1 80
>Gorai.004G111100.2 90
>Gorai.004G111100.3 51
>Gorai.004G111100.4 37
More: Question: Multiline Fasta To Single Line Fasta