Hello here,
I have 10 samples :
file1_R1.fq
file1_R2.fq
...
file10_R1.fq
file10_R2.fq
And also a Trinity.fasta file. I would like to extract from my reads file, the paired reads that are mapped to Trinity.fasta.
I first ran bowtie (that shows the right amount of reads 33 983 651 *2 for file1)
Then I have sorted my 10 bam files using samtools, getting 10 sorted.bam files (like file1.sorted.bam ...)
Eventually I have run bam2fastq :
for i in $(ls *.sorted.bam|sed 's/.sorted.bam//g'|sort -u)
do
bam2fastq --aligned --force --strict -o "$i"_mapped'#.fq' "$i".sorted.bam
done
However when I look to the output file1_mapped.fq I have about 100 000 000 reads ! that is to say 4 times more than originally !! Does anyone can help me to solve this sorcery?
Ok thank you so I m doing : samtools view -F 256 input.bam > output.bam then I sort the bam and run bam2fastq.
Do you know if can use this command directly on my sorted.bam files ? That would make be gain a lot of time !
Mmmh I don't understand my output bam files are heavier than the input file ... And when I try to sort it I get error: "samtools sort: truncated file. Aborting" (edit it 's because the output is a sam fil i fixed it with -b option)
Most conversion programs require the reads to be name sorted when you do the BAM -> fastq conversion. You should be able to do all of this via samtools
(check other options you need for fastq conversion)
Thanks I have tried this exact command.
When i do count the number of reads for file3, I get 35354937 reads for R1 and 35328712 for R2 (shouldn't it be the same number ?).
Do you know to which reads it corresponds to ?( cf my bowtie stats bowtie.stats_file3 ).
Ideally I would like to only get the pairs that aligned exactly one time or more ! But I don't know how to modify my code for that.
Looks like you have some R1 reads in your file where the mate had not mapped. If you only want reads that are a pair then you can use
repair.sh
to remove singletons fromR1
file. Here is a guide to userepair.sh
.Honestly I am a bit lost. For me there is a problem in both R1 and R2 since I expect 62,31% +22,41 % of the total that is to say 22510215+8106913 = 30617128 reads in each R1 and R2 fq files.
I ran also a flagtest on the sorted bam (previous running samtools fastq) :
That is to say I feel like the last command extracts all reads and not only the paired. Besides in the properly paired line there is exactly the number of reads i expect (61234256 /2 = 30617128 ) !!