Entering edit mode
8.3 years ago
erincyurtman
•
0
Hi everyone, I am just trying to create a bam file that contains only intergenic regions of genome. I have complete genome bam file but i couldn't find out any filtering way for this purpose. Do you have any suggestions about it ?
If you can put your intergenic regions into a bed file then
samtools view -L your_bed your.bam > intergenic.bam
will subset the file.And you can generate the bedfile of intergenic regions using bedtools complement.
Thank you very much for your answer but I am very new for bedtools so i would like to ask about genome file. How can I get this genome file ?
What genome are you working with?
It is zebrafish genome.
Bedtools will accept a GFF file so if you have one then you could try it with complement.
Edit: I should clarify that just using the GFF file as is may not be enough since it may have different kinds of annotations. If used as is then the resulting complemented version may not strictly have just intergenic regions. Some additional work will needed before running the
bedtools complement
command.I think the "genome" file in question is the one supplied to -g in bedtools complement
Its a tab-delimited file, which contig names in the first column, and their lengths, in base-pairs in the second. One way to find this info is from the header of your bam file or the BAM index using samtools idxstats. e.g.
I don't think this command is appropriate (unless I am missing something). It is not going to generate a file with the genic intervals which would be the input needed for bedtools complement.bedtools complement needs two inputs
1) An interval file of some sort (which in this case should contain the genic regoins - a GTF or GFF will be fine for this). I was assuming the OP would have this, but if not it can be downloaded from here)
2) A "genome" file (the OP asked about the "genome file"). As describe above, it tells bedtools how long the contigs are. That was the file I was creating in my above comment.
Ah yes. Thanks for the clarification.
[bam_index_load] fail to load BAM index. [bam_idxstats] fail to load the index. that is error that i received everytime i try your code with my bam file :(
I also tried to create my own genome file from chrom.size link in UCSC however it gives every time same error with different chromosome number: chromosome ... does not exist in the genome file. I still cannot get result that will come from complement function of bedtools.
Is your bam file (sorted) indexed? If not you can index it by doing
samtools index your_file.bam
. Then the command will work.Yes i already indexed my bam file.
Is the
.bai
file in the same folder as the.bam
file?Sorry for late answer. Should I remove .bam file from folder?
No. Otherwise which file will you run the
idxstats
command on that @i.sudbery provided?If you create a bed file with all intergenic regions of your genome of interest then it is just a simple bedtools intersect command