Sequences information from BAM file
4
0
Entering edit mode
6.8 years ago
Lolla ▴ 10

Hello, I'm working with paired end rna-seq data so two fastq files produced 1 BAM file. I know col 10 represent the seq and col 11 have its ascii code for quality. If the original seq length 100bp when I extracted some seq from BAM file I noticed some reads of length LESS than the original length. For example in one location one seq has cigar (40M, which means 40 bps matched) my question is: what happened to the rest of the seq (60bps)?? Another point, in this case we have paired end data which means two seq one from each end but BAM file has only one. This one included in BAM file to which end belongs??

Many thanks..

RNA-Seq next-gen alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

Did you map locally or end-to-end? Which aligner did you use?

ADD REPLY
0
Entering edit mode

end-to-end using STAR

ADD REPLY
0
Entering edit mode

Are you 100% sure? The CIGAR you reported is compatible with a local mapping. What is your command, can you post it here?

For what concerns the absence of the paired mate from the BAM: if you used an option to leave out unmapped scores, perhaps that's the reason.

To know which end it does correspond to, you have to see the bitwise flag. Reads on the reverse strand will have the 16 bit.

What is the bitwise flag for the paired read that doesn't have the mate in the file? Did you check it on https://broadinstitute.github.io/picard/explain-flags.html ?

ADD REPLY
1
Entering edit mode

Thanks Macspider; I confirmed trimmed happened prior the alignment, so this is solved now..

ADD REPLY
0
Entering edit mode

Then you can mark the thread as closed, so it doesn't show up in the "open threads" section. ;) Good luck!

ADD REPLY
0
Entering edit mode

Thank you, I'm still confused about some fields in BAM file..

ADD REPLY
0
Entering edit mode

Which ones? We can maybe help.

ADD REPLY
0
Entering edit mode

Have you checked page 5 of SAM Spec document?

ADD REPLY
0
Entering edit mode

I have posted my question here Count the fragments from BAM file Thanks in advance..

ADD REPLY
0
Entering edit mode
6.8 years ago

it could be some HARD clipping in your bam ... Difference between Hard Clip and Soft Clip in Samtools CIGAR string or you reads could have been trimmed by a tool like cutadapt

ADD COMMENT
0
Entering edit mode
6.8 years ago

There are basically two reasons that your reads would be trimmed like that...for quality reasons, or because the read read all the way through to adapter. If it's adapter, you definitely want that trimmed away. If it's for quality reasons..you probably want it trimmed away. You'll have to ask whoever did the alignment if they did a quality trim before aligning.

ADD COMMENT
0
Entering edit mode
6.8 years ago
GenoMax 147k

Another point, in this case we have paired end data which means two seq one from each end but BAM file has only one. This one included in BAM file to which end belongs??

Some aligners trim read headers after first whitespace when writing them to alignments, which removes Illumina identifiers signifying read number. Sounds like that applies in your case. So there is no easy way to figure out what end of the fragment the read came from.

ADD COMMENT
0
Entering edit mode
6.5 years ago

hi,I want to split paired end read as length distribution like (0-50,50-150, >150),How can I do it? when I using samtools extract reads(samtools view -h Epi2_input_rmdup.bam | awk 'length($10) > 100 && length($10) < 200|| $1 ~ /^@/' | samtools view -bS - > 100_200qc_rm/50_Epi2_input_rmdup.bam),I found all reads length no more than 100bp, but when I draw paired reads fragment size stastics using deeptools bamPEFragmentSize,it show length 0-500bp.

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6