To extract unmapped pair end reads
3
0
Entering edit mode
7.9 years ago
Bioinfonext ▴ 470

Actually I map raw reads to CDS of Draft genome where all genes CDS is not present.

I used below cammand lines for pair end read mapping to CDS and want to extract unmapped pair end reads in which BOTH READS of Pair not able to map to CDS.

But when I am running last cammnd it show some warning on screen: but it write R1 and R2 READS in two separete files. is it ok?

WARNING: Query PC168224:127:C98A1ANXX:1:2316:21235:23135 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. WARNING: Query PC168224:127:C98A1ANXX:1:2316:21236:13229 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. *WARNING: Query PC168224:127:C98A1ANXX:1:2316:21237:66490 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.

Cammand Lines:

For BWA indexing

/home/yog/software/bwa-0.7.15/bwa index -p Radish_index Radish_cds.fasta

BWA Cammand

/home/yog/software/bwa-0.7.15/bwa mem Radish_index forward_216_5W_Ca1.fastq reverse_216_5W_Ca1.fastq >216_5W_Ca1.sam

Convert Sam to BAM

/home/yog/software/samtools-1.3.1/samtools view -bS 216_5W_Ca1.sam > 216_5W_Ca1.bam

Extract Unmapped Reads:

/home/yog/software/samtools-1.3.1/samtools view -b -f 4 216_5W_Ca1.bam > unmap_216_5W_Ca1.bam

To convert bam to Fastq File:

/home/yog/software/samtools-1.3.1/samtools sort  -n unmap_Ca1.bam -o unmap_216_5W_Ca1._resorted

/home/yog/software/bedtools2/bin/bamToFastq -i unmap_Ca1._resorted -fq Unmap_Ca1_read1.fastq -fq2 Unmap_Ca1_read2.fastq
RNA-Seq • 5.4k views
ADD COMMENT
3
Entering edit mode
7.9 years ago
Vitis ★ 2.6k

I think the use of flag 4 is not correct. Flag 4 indicates that the read is not mapped, but the mate can be properly mapped. So if you're looking for pairs with neither mate is mapped, you need to use flag 12, indicating the read itself is not mapped and the mate is also not mapped.

ADD COMMENT
2
Entering edit mode
7.9 years ago

This is a copy of this post, which has already been answered.

ADD COMMENT
0
Entering edit mode
7.9 years ago
Charles Plessy ★ 2.9k

In the BWA index, each reference sequence (chromosomes, ...) follow each other directly. Not sure for bwa mem, but with bwa align it sometimes happened to me that one mate aligned at the end of a chromosome, the other mate at the beginning of the next chromosome, and BWA would happily mark them as properly paired because their distance in the index, ignoring the transition between the two chromosomes, was small enough. Perhaps this happened to you as well ?

ADD COMMENT
0
Entering edit mode

you did not understand what I wanted to say..........

I map raw reads to CDS transcripts not on genome sequences.

But some of the gene sequences is missing in these CDS transcripts.

So when I map raw reads to these CDS, some pair end reads do not able to map.

I want to extract to those pair end reads in which BOTH READS of Pair not able to map to CDS.

For that I used below cammand line:

Cammand Lines:

For BWA indexing /home/yog/software/bwa-0.7.15/bwa index -p Radish_index Radish_cds.fasta

BWA Cammand /home/yog/software/bwa-0.7.15/bwa mem Radish_index forward_216_5W_Ca1.fastq reverse_216_5W_Ca1.fastq >216_5W_Ca1.sam

Convert Sam to BAM /home/yog/software/samtools-1.3.1/samtools view -bS 216_5W_Ca1.sam > 216_5W_Ca1.bam

Extract Unmapped Reads: /home/yog/software/samtools-1.3.1/samtools view -b -f 4 216_5W_Ca1.bam > unmap_216_5W_Ca1.bam

To convert bam to Fastq File: /home/yog/software/samtools-1.3.1/samtools sort -n unmap_Ca1.bam -o unmap_216_5W_Ca1._resorted

/home/yog/software/bedtools2/bin/bamToFastq -i unmap_Ca1._resorted -fq Unmap_Ca1_read1.fastq -fq2 Unmap_Ca1_read2.fastq

But when last cammnd line write R1 and R2 reads it shows some warning on screen:

WARNING: Query PC168224:127:C98A1ANXX:1:2316:21235:23135 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. WARNING: Query PC168224:127:C98A1ANXX:1:2316:21236:13229 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. *WARNING: Query PC168224:127:C98A1ANXX:1:2316:21237:66490 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.

So I am asking is there I am making any mistake?

ADD REPLY

Login before adding your answer.

Traffic: 3025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6