Creating intergenic bam file
0
0
Entering edit mode
8.3 years ago

Hi everyone, I am just trying to create a bam file that contains only intergenic regions of genome. I have complete genome bam file but i couldn't find out any filtering way for this purpose. Do you have any suggestions about it ?

R genome intergenic alignment • 2.4k views
ADD COMMENT
1
Entering edit mode

If you can put your intergenic regions into a bed file then samtools view -L your_bed your.bam > intergenic.bam will subset the file.

ADD REPLY
2
Entering edit mode

And you can generate the bedfile of intergenic regions using bedtools complement.

ADD REPLY
0
Entering edit mode

Thank you very much for your answer but I am very new for bedtools so i would like to ask about genome file. How can I get this genome file ?

ADD REPLY
0
Entering edit mode

What genome are you working with?

ADD REPLY
0
Entering edit mode

It is zebrafish genome.

ADD REPLY
1
Entering edit mode

Bedtools will accept a GFF file so if you have one then you could try it with complement.

Edit: I should clarify that just using the GFF file as is may not be enough since it may have different kinds of annotations. If used as is then the resulting complemented version may not strictly have just intergenic regions. Some additional work will needed before running the bedtools complement command.

ADD REPLY
1
Entering edit mode

I think the "genome" file in question is the one supplied to -g in bedtools complement

Its a tab-delimited file, which contig names in the first column, and their lengths, in base-pairs in the second. One way to find this info is from the header of your bam file or the BAM index using samtools idxstats. e.g.

samtools idxstats my_bam.bam | cut -f1,2 | grep -v '*' > genome.tsv
ADD REPLY
0
Entering edit mode

I don't think this command is appropriate (unless I am missing something). It is not going to generate a file with the genic intervals which would be the input needed for bedtools complement.

ADD REPLY
2
Entering edit mode

bedtools complement needs two inputs

1) An interval file of some sort (which in this case should contain the genic regoins - a GTF or GFF will be fine for this). I was assuming the OP would have this, but if not it can be downloaded from here)

2) A "genome" file (the OP asked about the "genome file"). As describe above, it tells bedtools how long the contigs are. That was the file I was creating in my above comment.

ADD REPLY
0
Entering edit mode

Ah yes. Thanks for the clarification.

ADD REPLY
0
Entering edit mode

[bam_index_load] fail to load BAM index. [bam_idxstats] fail to load the index. that is error that i received everytime i try your code with my bam file :(

I also tried to create my own genome file from chrom.size link in UCSC however it gives every time same error with different chromosome number: chromosome ... does not exist in the genome file. I still cannot get result that will come from complement function of bedtools.

ADD REPLY
0
Entering edit mode

Is your bam file (sorted) indexed? If not you can index it by doing samtools index your_file.bam. Then the command will work.

ADD REPLY
0
Entering edit mode

Yes i already indexed my bam file.

ADD REPLY
0
Entering edit mode

Is the .bai file in the same folder as the .bam file?

ADD REPLY
0
Entering edit mode

Sorry for late answer. Should I remove .bam file from folder?

ADD REPLY
1
Entering edit mode

No. Otherwise which file will you run the idxstats command on that @i.sudbery provided?

ADD REPLY
1
Entering edit mode

If you create a bed file with all intergenic regions of your genome of interest then it is just a simple bedtools intersect command

ADD REPLY

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6