bedtools error: could not find indexes
2
0
Entering edit mode
8.3 years ago
Katie D • 0

I want to use bedtools to count reads mapping back to genes, ultimately to use for differential expression analysis in edgeR. I used trinity to assemble my metatranscriptomes de novo, and bowtie2 to map raw reads back to the assembly. I seemingly sorted and indexed the file without a problem (returned no errors at least). When I go to run the bedtools multicov command, I get the error: could not find indexes. The indexes are in the same file as the bam file. Used prodigal for ORF calling to create .gff file.

Commands used:

First I sorted

samtools sort bowtie_output.bam -o sorted_output.bam -T temp_prefix

Then I indexed

samtools index -b sorted_output.bam indexed_sample.bam.bai

Then I try to get read counts

bedtools multicov -bams bowtie_output.bam -bed prodigal_output.gff

I thought maybe I was supposed to use the bam.bai file instead of bowtie output bam file, so I tried

bedtools multicov -bams indexed_file.bam.bai -bed prodigal_output.gff

But still get 'could not find indexes'

Any idea what I am doing wrong? Disclaimer: very much a beginner!

bedtools RNA-Seq • 6.2k views
ADD COMMENT
0
Entering edit mode

Don't supply output names to samtools index command, only input name.

ADD REPLY
0
Entering edit mode
8.3 years ago

1) The samtools command for indexing is (outputs .bai by default):

samtools index sorted_output.bam

2) Since you used Trinity to assemble your transcriptome, why did you use Prodigal to define ORFs? Do you need to distinguish non-coding RNAs (or 5' and 3' UTRs) for some reason?

3) Since you're mapping to the assembled transcriptome, wouldn't #_aligned_reads/total_reads provide the statistic you want?

4) If you want to use Bedtools for this purpose, I believe the correct command is 'coverage' with the '-counts' flag.

ADD COMMENT
0
Entering edit mode

2) I used Prodigal in the meta mode, because I am dealing with metatranscriptomes (bacterial community). I don't need to distinguish ncRNAs or 5' and 3' UTRs, I don't think. But, I thought I should define ORFs if I am interested in particular transcripts. Is this not so? Are you asking because there is a better ORF caller for my purposes, or because it is unecessary?

I based the decision to use Prodigal off of this Davids et al 2016 paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146423

3) I should be more specific. I am ultimately interested in transcript counts of particular genes, because I want to see if there are genes or types of genes being expressed more in particular treatments compared to control treatments. So, I don't think what you described is the statistic I want.

ADD REPLY
0
Entering edit mode
7.0 years ago
alouyakis • 0

I'm late to the game, but it looks like you indexed "sorted_output.bam", but then used your original file, "bowtie_output.bam", instead of the indexed bam, "sorted_output.bam", in your multicov.

ADD COMMENT

Login before adding your answer.

Traffic: 1958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6