Question

filtering out reads from bam file

1

Entering edit mode

7.7 years ago

sgmiriuka ▴ 10

Dear all,

this may be simple for others, not for me... Is there any way to filter out reads from a bam file based on a gff3 file information? My purpose is to get out the microRNA info from bams to run analysis without this 'contaminating' data.

thanks!!!

sgm

bam gff3 filter RNA-seq • 3.9k views

ADD COMMENT • link updated 7.6 years ago by Alex Reynolds 36k • written 7.7 years ago by sgmiriuka ▴ 10

1

Entering edit mode

You could convert your gff3 file to BED and then use samtools view region: Extracting reads from multiple regions

ADD REPLY • link 7.7 years ago by GenoMax 147k

1

Entering edit mode

doesn't bedtools intersect -v -abam bam1 -b gff3 work???

ADD REPLY • link 7.7 years ago by Rohit ★ 1.5k

1

Entering edit mode

As genomax2 said, you can use samtools view with the -L option. Transform your gff3 to BED and then make the complement of that file. Then use this file for -L

samtools view -L complement.bed input.bam -o output_filtered.bam

ADD REPLY • link 7.7 years ago by ATpoint 85k

0

Entering edit mode

thanks all of you! great help

ADD REPLY • link 7.7 years ago by sgmiriuka ▴ 10

score 0 · Answer 1 · 2017-04-01

If you have only a bam and a gff3 file, you can use GenomicAlignments:

# Import gff as GRanges
gff <- import.gff3(pathgff)
# Import your alignments 
aln <- readGAlignments("myAlns.bam")
# Filter out the rRNA annotated ranges
no.miRNA <- gff[mcols(gff)$type!="miRNA_gene",] # Example with an SGD file
# Then get GenomicAlignments on filtered GRange object
filtered.bam <- subsetByOverlaps(aln,no.miRNA)
#Then eventually export the bam
export(filtered.bam,"filtered.bam",format = "BAM")

Use carefully, you will loose the headers.

If you have a fasta file of the microRNA, a better method would be to use the --un-gz option on Bowtie2, keeping unaligned reads while mapping on generated microRNA index.

score 0 · Answer 2 · 2017-04-01

0

Entering edit mode

7.6 years ago

Alex Reynolds 36k

Here are some commands to run this quickly:

$ gff2bed < annotations.gff > annotations.bed
$ bam2bed < reads.bam > reads.bed
$ bedops --element-of 1 reads.bed annotations.bed > answer.bed

Or if you are using bash, this will avoid making intermediate files:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff) > answer.bed

If you know the specific feature type of GFF elements you want, you could filter reads even more specifically:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff | grep miRNA -) > answer.bed

Use Unix pipes and streams where you can; it will save you a lot of your time over alternatives.

ADD COMMENT • link 7.6 years ago by Alex Reynolds 36k

0

Entering edit mode

OP either wants to extract reads out in a new file or filter bam files to remove the reads (it is not completely clear). This answer in the present form does not seem to be doing either. Unless I am misinterpreting.

ADD REPLY • link 7.6 years ago by GenoMax 147k

0

Entering edit mode

Maybe you are? Not sure.

ADD REPLY • link 7.6 years ago by Alex Reynolds 36k

0

Entering edit mode

well, this is the case. I have a bunch of bam files from small-rnaseq analyses, and I want to take out the info on microRNAs (coming from the gff3 file from mirbase). I do want the bam file left without the microRNA information and work with it. This file will allow me to align moRNA reads; doing so without taking out microRNA-related reads is a problem for microRNAs closely packed in clusters. hope its clear. thanks for your help! sgm

ADD REPLY • link 7.6 years ago by sgmiriuka ▴ 10

0

Entering edit mode

If you want reads without microRNAs, use --not-element-of instead of --element-of, with the grep miRNA statement. This would filter reads in reads.bam for those reads which do not overlap miRNA features from annotations.gff. In other words:

$ bedops --not-element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff | grep miRNA -) > answer.bed

There are tools available for converting BED to BAM, in case you need that format, instead.

Hope this helps!

ADD REPLY • link 7.6 years ago by Alex Reynolds 36k