gene_bodyCoverage.py No usable output...why?
1
1
Entering edit mode
5.2 years ago
crcarroll ▴ 90

I am trying to use geneBody_coverage.py -r Homo_sapiens.GRCh38.97.bed -i my_file.sorted.bam -o testOutput but it generates no results in the output. As soon as I run it, I get a wall of text that looks like this:

[NOTE:input bed must be 12-column] skipped this line: Y        26600890        26601022        ENSG00000237917 .       -       havana  exon    .       gene_id "ENSG00000237917"; gene_version "1"; transcript_id "ENST00000435945"; transcript_version "1"; exon_number "12"; gene_name "PARP4P1"; gene_source "havana"; gene_biotype "unprocessed_pseudogene"; transcript_name "PARP4P1-201"; transcript_source "havana"; transcript_biotype "unprocessed_pseudogene"; exon_id "ENSE00001744948"; exon_version "1"; tag "basic"; transcript_support_level "NA";

And then at the end I get this:

Total 0 transcripts loaded
Cannot get coverage signal from S10nM_r1.sorted.bam ! Skip

To start with, I downloaded Homo_sapiens.GRCh38.97.gff3 from Ensembl and used gff2bed (2.4.36) to convert to a bed file which I used for reference. The first thing to do is get a 12-column bed file but I can't seem to find how to do that. But I also think I have multiple issues here and I just can't diagnose it. Any solutions?

RNA-Seq gff2bed geneBody_coverage.py bed • 3.7k views
ADD COMMENT
0
Entering edit mode

well your input file doesn't fit the required BED format, check it out here. Also, if I remember correctly, the program requires tab-separated input.

ADD REPLY
1
Entering edit mode

Why should the bed file not be formatted correctly? I simply downloaded the gfff from Ensemble and used BEDOPS gff2bed < my_file.gff > my_file.bed.

ADD REPLY
1
Entering edit mode

And yet, here we are ;)

Just a general tip for bioinformatics: always check the results you get, you cannot simply trust any program blindly. Even popular formats like BED, GTF can have slightly different properties / requirements depending on the tool you are using. The error is telling you that the input should have 12 columns. Does your file look like it has 12 columns?

ADD REPLY
1
Entering edit mode

Right, I knew that. Part of my question included how to convert my gff file from Ensembl to a 12-column bed file. I thought you were then referring to some other format issue.

ADD REPLY
0
Entering edit mode

Alright, are you using bash? If so you can run an awk one liner to get the 12 column you need. Something along these lines should work:

awk 'BEGIN{OFS="\t"}{$7=$2; $8=$3; $9="0,0,0"; $10=1; $11=$3-$2","; $12="0,"; print}'
ADD REPLY
1
Entering edit mode

I figured something out. I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities . This outputs a 12-column .bed.

ADD REPLY
1
Entering edit mode

I useed read_distribution.py implemented in RSeqQC with ensemble GTF file, similar question occured, the gtf2bed problems. Using differnt methods, I failed to transfer the GTF to BED formats the scripts can. A error occured every time. see below. Followed your instruction, I fixed the format problem, hope it will help other people who has similar problem.

File "lib/bx/bitset.pyx", line 216, in bx.bitset.BinnedBitSet.set_range File "lib/bx/bitset.pyx", line 186, in bx.bitset.bb_check_range_count IndexError: Count (-40608) must be non-negative.

ADD REPLY
2
Entering edit mode
5.2 years ago
crcarroll ▴ 90

I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities. This outputs a 12-column .bed. geneBody_coverage.py is running so far without error.

ADD COMMENT

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6