Creating fastq files without duplicata from .bam file
1
0
Entering edit mode
8.1 years ago
Picasa ▴ 650

Hi everybody,

From a mapping file, I used the fonction to assess my duplicates rates

picard-tools MarkDuplicates I=input.bam O=output.bam M=marked_dup_metrics.txt

I would like to creates 2 fastq files (paired end reads) without those duplicates and keep mate as possible.

Should I use samtools to keep only non duplicates paired reads ? what would be the flag/command ?

Thanks for your help.

picard duplicates fastq bam • 2.7k views
ADD COMMENT
0
Entering edit mode

If you have not done so already then run MarkDuplicates with REMOVE_DUPLICATES=TRUE option. Then @Medhat's methods mentioned below will work.

ADD REPLY
2
Entering edit mode
8.1 years ago
Medhat 9.8k

To remove duplication in picard set

REMOVE_DUPLICATES=true

like

java -jar picard.jar MarkDuplicates \
      I=input.bam \
      O=marked_duplicates.bam \
      REMOVE_DUPLICATES=true \
      M=marked_dup_metrics.txt

After that sort by query name, then you can extract them like follow

samtools sort -n aln.bam aln.qsort
bedtools bamtofastq -i aln.qsort.bam   -fq aln.end1.fq  -fq2 aln.end2.fq

using picard

java -jar picard.jar SamToFASTQ INPUT=<bamfile> FASTQ=outfile_1.fastq SECOND_END_FASTQ=outfile_2.fastq
ADD COMMENT
0
Entering edit mode

Thanks for your help.

1) Does it keep the paired end ? I mean if reads 1 is a duplicate and reads 2 is not, does it discard the both ? (this is what I want)

2) is that mandatory to sort by query name ?

ADD REPLY
0
Entering edit mode

To answer the first question I will quote from previous answer

If you have a paired data, then both reads for a pair will be used to select duplicates. In this case, if there is another pair that has both of its reads aligning at the same exact location as this pair, then one of these would be marked as duplicates

more info can be found here A: Samtools Remove Duplicates Question

2) is that mandatory to sort by query name ?

It should be according to documentation

BAM should be sorted by query name (samtools sort -n aln.bam aln.qsort) if creating paired FASTQ with this option.

ADD REPLY

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6