Entering edit mode
8.6 years ago
EVR
▴
610
Hi,
I have a bam file aligned to a genome and a gff file generated from PASA which tell about the location of transcripts in the scaffold.
Now I would like to make use of these two files and extract only the regions outside of transript regions(off-target) where the reads from bam file aligned.
I created a genome file using samtools faidx to use it in bedtools complement but when I use it like following
bedtools -i bed_file -g genome_file
it gave me an error "Exiting...teger conversion of string"
Any suggestions. Thanks in advance
Maybe this post helps.
Thanks for the suggestion, but in that post they described a method to find the reads outside the target region. In my case I have only the reads which aligned outside the transcript regions. I would like to find regions and the counts outside the target regions based on the bam file and GFF/bed files. Any guidance will be really useful
Maybe I'm missunderstanding something... but it should be the same procedure regardless of the region you are interested in (target/transcript). I mean, for one hand, you have a
bam file
with reads mapped against the genome, and for other hand you have agff3/bed/gtf
(I think that PASA outputs the three types) with transcript regions. If you want to extract the number of reads falling outside the transcripts, first you could get to get the complement region of your transcripts, using "complementBed
" (bedtools
utility). Then, usingsamtools view -L complement.bed file.bam
, you should be able to extract the reads mapping outside transcripts.Or even simpler, using the following command
bedtools intersect -v -abam file.bam -b transcripts.bed
you should get your desired output.@iraun Thanks for pointing out the bedtools intersect with -v option. When I tried complementbed, it throwed me error like I mentioned above. I ran the bedtools like following
Now how to count the reads for the for the regions? Can I make use of samtools flagstat ot HTSeq counts? Kindly guide me.