Entering edit mode
4.1 years ago
amitpande74
▴
20
Hi, I have a fasta file formatted like this, product of paired end:
test_MAPQ.fasta->chr10:141146-141296
test_MAPQ.fasta:atgctcccattccaaatgagagtaattggctaaaacaaaggggctacaggtcccatacaagtccaaaacccaacagggcagtcattaaatcttTTCTAAttttaatttttattttatttgaagttctggggtacatgttcaggatgtata
test_MAPQ.fasta->chr10:142926-143076
--
test_MAPQ.fasta->chr10:146793-146943
test_MAPQ.fasta:gccAAATCATTACTTTTGAAGAAATAGTTAACAATGATTATTTCTTTTTGAATGACaataaattttattaataagttaaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgg
test_MAPQ.fasta->chr10:146870-147020
test_MAPQ.fasta:aaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgggaggccaaggtgttaatattgcttgagcaggggagtttgagaccagcctgggaaacatggtgaaacctcatatctac
I want an output where only both the pairs are included in the final results.
test_MAPQ.fasta->chr10:146793-146943
test_MAPQ.fasta:gccAAATCATTACTTTTGAAGAAATAGTTAACAATGATTATTTCTTTTTGAATGACaataaattttattaataagttaaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgg
test_MAPQ.fasta->chr10:146870-147020
test_MAPQ.fasta:aaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgggaggccaaggtgttaatattgcttgagcaggggagtttgagaccagcctgggaaacatggtgaaacctcatatctac
and the rest are ignored (like the upper pairs ending in dashes). Is there a tool which can filter out the results ? Kindly help.
what is the basis of pairing from OP example? Number of lines before -- ? amitpande74.
You can also do
awk 'BEGIN{RS="--"} NF>3 {print}
test.fa but it would create empty lines before and after the sequences.