I haven't use samtools for a while and now I've some files from TCGA from which I'm checking the reads.
I'm trying to see if there are any unmapped sequences at TCGA bam files: samtools view -f 4 file_gdc_realn_rehead.bam
and I get this UNC-SN:197:C0XX:4:204:155:805 69 chr1 14496 0 * = 14496 0 ACCTGCTTCCCTGGGTGGGGGTGATGGAACCAGCACTGTGCGGAGACC CCCFFFFFHHHHHJJAEHIII6@FHIJJJJJIGJJJJJIIJIHHFFDD MC:Z:46M2S RG:Z::120425_UNC15-SN850_0197_AC0TK3ACXX_ACAGTG_L004 NH:i:0 HI:i:0 nM:i:1 AS:i:43 uT:A:4 69 chr1 14496 0 * = 14496 0 ACCTGCCGTGGGGTGATGAACCAGCACTGTAGACC CCCFFFFFHHHHHJJAEHIII6@FHIJJJJJIGJJJJJIIJIHHFFDD MC:Z:46M2S RG:Z::120425_UNC-SN:197:C0XX:4:204:155:8057_AC0TK3ACXX_ACAGTG_L004 NH:i:0 HI:i:0 nM:i:1 AS:i:43 uT:A:4
If this is an unmapped read why is there information about the chromosome (chr1)? What I'm missing?
Thanks
Yes, it seems thats what is happening. A "bug" from STAR aligner.
This is the case with other aligners also. I don't think it is a bug, because when you check reads with 'flag 12' i.e both reads of the pair are unmapped then in that case there will be no chromosome information. But if any one is mapped, chr information is obtained.