.Bam Format Obtained From Same Samples Different Sizes
2
0
Entering edit mode
11.9 years ago

I'm trying to align three sequences .fastq with top hat and he does align them. Problem is I know the size of the bamfile I'm supposed to obtain, given by my supervisor, and I don't seem to be able to get it. Does anyone have any suggestions?

bam • 2.4k views
ADD COMMENT
1
Entering edit mode

You should give some more information. But... if you use the same tophat version, call it using the same parameters, use the same reference genome and the same input file, you should end up with the same output file like you supervisor.

ADD REPLY
1
Entering edit mode
11.9 years ago
matted 7.8k

Comparing file sizes of BAMs is a very brittle way to check results.

One problem that will occur even if everything else is perfectly matched is that Tophat is non-deterministic. With -g 1, multi-mapping reads will report a random alignment choice. From the documentation for -g:

If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments.

And since BAM is a compressed format, these minor differences will affect how well the file compresses, and therefore the final file size.

ADD COMMENT
0
Entering edit mode
11.9 years ago

Thanks for answering! And you're right, I gave very little info. So, I'm trying to align 3 control samples, SRR035585.fastq,SRR035586.fastq,SRR035587.fastq. (I work in command line, and I have very little experience, the project just started.) The last way I tried to properly align them (and properly means getting the same bam files as my supervisor) was by moving the samples to a folder called "Ctrl", in my user area. Then I ran in this folder the command:

tophat2 -g 1 /GenoStorage/Genomas/hg19/Genome_indexFiles/Bowtie2/hg19 SRR035585.fastq,SRR035586.fastq,SRR035587.fastq

(being /GenoStorage/Genomas/hg19/Genome_indexFiles/Bowtie2/hg19 the path to TopHat). I know the path is correct, and we're using the same TopHat version and the same files...

ADD COMMENT
1
Entering edit mode

What is the difference you get? Just different file size? How does the amount of mapping loci differ? Do you have a different number of mappable reads? If there are reads your supervisor was able to map, but you missed them.... take one read, write it in a separate fastq-file, give it to you supervisor and aks him if he can run tophat on his machine with absolutely the same parameters and indexes you used and see, if he can map it. If he uses different parameters to call tophat.... use his set of parameters.

ADD REPLY
0
Entering edit mode

Please use "comment" for supplement. Don't "answer" your question

ADD REPLY

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6