I am trying to use geneBody_coverage.py -r Homo_sapiens.GRCh38.97.bed -i my_file.sorted.bam -o testOutput
but it generates no results in the output. As soon as I run it, I get a wall of text that looks like this:
[NOTE:input bed must be 12-column] skipped this line: Y 26600890 26601022 ENSG00000237917 . - havana exon . gene_id "ENSG00000237917"; gene_version "1"; transcript_id "ENST00000435945"; transcript_version "1"; exon_number "12"; gene_name "PARP4P1"; gene_source "havana"; gene_biotype "unprocessed_pseudogene"; transcript_name "PARP4P1-201"; transcript_source "havana"; transcript_biotype "unprocessed_pseudogene"; exon_id "ENSE00001744948"; exon_version "1"; tag "basic"; transcript_support_level "NA";
And then at the end I get this:
Total 0 transcripts loaded
Cannot get coverage signal from S10nM_r1.sorted.bam ! Skip
To start with, I downloaded Homo_sapiens.GRCh38.97.gff3 from Ensembl and used gff2bed (2.4.36) to convert to a bed file which I used for reference. The first thing to do is get a 12-column bed file but I can't seem to find how to do that. But I also think I have multiple issues here and I just can't diagnose it. Any solutions?
well your input file doesn't fit the required BED format, check it out here. Also, if I remember correctly, the program requires tab-separated input.
Why should the bed file not be formatted correctly? I simply downloaded the gfff from Ensemble and used BEDOPS gff2bed < my_file.gff > my_file.bed.
And yet, here we are ;)
Just a general tip for bioinformatics: always check the results you get, you cannot simply trust any program blindly. Even popular formats like BED, GTF can have slightly different properties / requirements depending on the tool you are using. The error is telling you that the input should have 12 columns. Does your file look like it has 12 columns?
Right, I knew that. Part of my question included how to convert my gff file from Ensembl to a 12-column bed file. I thought you were then referring to some other format issue.
Alright, are you using bash? If so you can run an awk one liner to get the 12 column you need. Something along these lines should work:
I figured something out. I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities . This outputs a 12-column .bed.
I useed read_distribution.py implemented in RSeqQC with ensemble GTF file, similar question occured, the gtf2bed problems. Using differnt methods, I failed to transfer the GTF to BED formats the scripts can. A error occured every time. see below. Followed your instruction, I fixed the format problem, hope it will help other people who has similar problem.
File "lib/bx/bitset.pyx", line 216, in bx.bitset.BinnedBitSet.set_range File "lib/bx/bitset.pyx", line 186, in bx.bitset.bb_check_range_count IndexError: Count (-40608) must be non-negative.