this may be simple for others, not for me... Is there any way to filter out reads from a bam file based on a gff3 file information? My purpose is to get out the microRNA info from bams to run analysis without this 'contaminating' data.
As genomax2 said, you can use samtools view with the -L option. Transform your gff3 to BED and then make the complement of that file. Then use this file for -L
If you have only a bam and a gff3 file, you can use GenomicAlignments:
# Import gff as GRanges
gff <- import.gff3(pathgff)
# Import your alignments
aln <- readGAlignments("myAlns.bam")
# Filter out the rRNA annotated ranges
no.miRNA <- gff[mcols(gff)$type!="miRNA_gene",] # Example with an SGD file
# Then get GenomicAlignments on filtered GRange object
filtered.bam <- subsetByOverlaps(aln,no.miRNA)
#Then eventually export the bam
export(filtered.bam,"filtered.bam",format = "BAM")
Use carefully, you will loose the headers.
If you have a fasta file of the microRNA, a better method would be to use the --un-gz option on Bowtie2, keeping unaligned reads while mapping on generated microRNA index.
OP either wants to extract reads out in a new file or filter bam files to remove the reads (it is not completely clear). This answer in the present form does not seem to be doing either. Unless I am misinterpreting.
well, this is the case. I have a bunch of bam files from small-rnaseq analyses, and I want to take out the info on microRNAs (coming from the gff3 file from mirbase). I do want the bam file left without the microRNA information and work with it. This file will allow me to align moRNA reads; doing so without taking out microRNA-related reads is a problem for microRNAs closely packed in clusters. hope its clear.
thanks for your help!
sgm
If you want reads without microRNAs, use --not-element-of instead of --element-of, with the grep miRNA statement. This would filter reads in reads.bam for those reads which do not overlap miRNA features from annotations.gff. In other words:
You could convert your gff3 file to BED and then use
samtools view
region: Extracting reads from multiple regionsdoesn't
bedtools intersect -v -abam bam1 -b gff3
work???As genomax2 said, you can use samtools view with the -L option. Transform your gff3 to BED and then make the complement of that file. Then use this file for -L
thanks all of you! great help