Dear all,
i have a bam file created by mapping(paired ends) a sample file with the referece file using stampy. now i want to retrieve those reads that have one end mapped anywhere in the chromosomes and the other end is mapped to mitochondria. i think i can use samtools for that but not quite sure how.
if anyone has some idea about that plz suggest.
faster: add the flag "-F 2 " for samtools and use awk '($7!="=")' instead of grep
i used this command samtools view -F 2 ABGSA0012MS20U10.sorted.bam Ssc9MT | awk '($7!="=")' yes it is faster but the confusion is that above command given by " Ketil " is slow but the resulting file has more lines. i used following command to count lines of both files wc -l < file.txt
and the result is 452 and 76778. the difference is alot.what should i trust.
not sure if using 'Ssc9MT' will select the reads if any of the forward/reverse is mapped to this segment.
Searching for the specific sequence as part of the samtools command will give you one line per read mapped to that sequence. Using grep will give you one line for each read where the read or its mate mapps to the sequence. The former is probably okay for your purposes. And a lot faster.
If stampy writes out XA tags, you might have the string "Ssc9MT" in the line somewhere other than as the read chromosome name or the pair chromosome name. The -F 2 method requires it to be exactly the read chromosome name, so you'd expect it to give approximately half of the results of the grep method.
yeah matted your logic is right except that grep command also shows the lines when MRNM($7) is equal to "=" that i do not need. which is obtained by awk '($7!="=") in the Pierre Lindenbaum's command.
thank you all for your time and support.