I am realigning my fastq reads against contigs generated from metaspades to see which reads mapped to which contig.
After running the following code to get mapped reads:
samtools view -h -F 4 in.bam > mapped.bam
I am confused by the file output and my next step. Here is a shortened example of the mapped.bam
file (please note I cut off the end of the file just to make things neat as I am only interested in understanding the first few columns):
A00977:183:HLLKYDSXY:3:1503:16658:5071 73 NODE_5156_length_78_cov_28 1 60 78M27S = 1 0
A00977:183:HLLKYDSXY:3:2178:28248:9142 369 NODE_5159_length_78_cov_5 31 0 48M98H NODE_691_length_1085_cov_115.969 17 0
So the first column is clearly the read name, followed by read length (?), then the reference sequence name. I am not sure what columns 4, 5, 6 represent. I also don't understand why in the 7th column, some have the NODE contig name appear but others do not - what exactly does that mean?
If my end goal is to determine which contig a read mapped to, I should be looking at the 3rd column or 7th column? I notice that when I grep a read name, sometimes the read appears multiple times leading me to believe a read has mapped to multiple contigs at a time (fine okay), but which column is giving me the actual contig name that it mapped to?
I just want to get this type of info to determine read depth for the contigs.
Hi @DNAngel. I see below that you stated that you have tried to read the SAM format specification. You'll probably find in future that you get more positive responses if you make reference to this in your post, preferably stating which bits of the document you had trouble following and where precisely you need clarification.
Yes, learned my lesson. I'll avoid that in the future.