Hi, At the end I ended up doing this:
bedtools coverage -a target.sorted_no_chr.bed -b sdx_1.bed -d > coverage.bed
awk '$8>-1 {print}' coverage.bed | wc -l
23715
we can do the same by counting the number of lines in coverage.bed
wc -l < coverage.bed
23715
how many base pairs has more than 20x coverage
cat coverage.bed | awk '$8>20 {print}' | wc -l
19848
What is the percentage of bases pairs that have more than 20x
bc -l<<<19848*100/23715
83.69386464263124604680
total number of reads of the target region
awk 'OFS="\t" {SUM += $8} END {print SUM}' coverage.bed
409763804
average coverage of the target region
awk 'OFS="\t" {SUM += $8} END {print SUM/NR}' coverage.bed
17278.7
target.sorted_no_chr.bed example content:
13 32906398 32907534 NM_000059_exon_9_10_chr13_32906409_f 0 +
13 32910391 32915343 NM_000059_exon_10_10_chr13_32910402_f 0 +
13 32918684 32918800 NM_000059_exon_11_10_chr13_32918695_f 0 +
13 32920953 32921043 NM_000059_exon_12_10_chr13_32920964_f 0 +
13 32928987 32929435 NM_000059_exon_13_10_chr13_32928998_f 0 +
sdx_1.bed example content:
13 32890537 32890720 M01363:123:000000000-ANHGM:1:2112:20339:9512 44 +
13 32890537 32890720 M01363:123:000000000-ANHGM:1:2107:18752:22562 44 +
13 32890537 32890720 M01363:123:000000000-ANHGM:1:1108:9974:20382 44 +
13 32890537 32890720 M01363:123:000000000-ANHGM:1:2112:3611:15774 44 +
13 32890537 32890720 M01363:123:000000000-ANHGM:1:2109:22374:6929 44 +
coverage.bed:
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 1 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 2 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 3 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 4 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 5 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 6 0
13 32889606 32889814 NM_000059_exon_0_10_chr13_32889617_f 0 + 7 0
However, the results was different from the one I got from Qualimap tools, is there something wrong with the commands above?
Hi,
May be you can have a look on bedtools : http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
You will need the position of your genes in a bed format.
Best
Thank you very much @Titus. Is that mean if I want a pipline to do that I need first to use i.e Awk to convert the genes_names.bed to genes_location.bed and then feed it to a bedtool command. right?
Hi,
Yes you will need to convert you gene list with genomic position. You can use AWK or have looks here : https://genome.ucsc.edu/cgi-bin/hgTables .
@Titus I tried to follow your instruction by creating a bed file from a list of gene names from your link and then used the following command with samtools, samtools bedcov /path_to/mygenes1.bed /path_to/mybamfile.bam
But I got the following error: Errors in BED line 'chr1 2524251 2524425 ENST00000469962_exon_1_20_chr1_2524272_r 0 -' Errors in BED line 'chr1 2525232 2525532 ENST00000469962_exon_2_20_chr1_2525253_r 0 -' Errors in BED line 'chr1 2537699 2537825 ENST00000509374_exon_0_20_chr1_2537720_r 0 -' Errors in BED line 'chr1 2538392 2538528 ENST00000509374_exon_1_20_chr1_2538413_r 0 -' Errors in BED line 'chr1 2540757 2540878 ENST00000509374_exon_2_20_chr1_2540778_r 0 -' Errors in BED line 'chr1 2541088 2541290 ENST00000509374_exon_3_20_chr1_2541109_r 0 -' Errors in BED line 'chr1 2542699 2542860 ENST00000509374_exon_4_20_chr1_2542720_r 0 -' Errors in BED line 'chr1 2561176 2561656 ENST00000511099_exon_0_20_chr1_2561197_r 0 -' Errors in BED line 'chr1 2564284 2564438 ENST00000511099_exon_1_20_chr1_2564305_r 0 -' Errors in BED line 'chr1 2563986 2564134 ENST00000427302_exon_0_20_chr1_2564007_f 0 +' Errors in BED line 'chr1 2567774 2568079 ENST00000427302_exon_1_20_chr1_2567795_f 0 +'
Mygenes1.bed looks like that: track name="tb_ensGene" description="table browser query on ensGene" visibility=3 url= chr1 66999045 66999110 ENST00000237247_exon_0_20_chr1_66999066_f 0 + chr1 66999908 67000071 ENST00000237247_exon_1_20_chr1_66999929_f 0 + chr1 67091509 67091613 ENST00000237247_exon_2_20_chr1_67091530_f 0 + chr1 67098732 67098797 ENST00000237247_exon_3_20_chr1_67098753_f 0 + chr1 67099742 67099866 ENST00000237247_exon_4_20_chr1_67099763_f 0 + chr1 67105439 67105536 ENST00000237247_exon_5_20_chr1_67105460_f 0 + chr1 67108472 67108567 ENST00000237247_exon_6_20_chr1_67108493_f 0 + chr1 67109206 67109422 ENST00000237247_exon_7_20_chr1_67109227_f 0 + chr1 67126175 67126227 ENST00000237247_exon_8_20_chr1_67126196_f 0 + chr1 67133192 67133244 ENST00000237247_exon_9_20_chr1_67133213_f 0 + chr1 67136657 67136722 ENST00000237247_exon_10_20_chr1_67136678_f 0 + chr1 67137606 67137698 ENST00000237247_exon_11_20_chr1_67137627_f 0 + chr1 67138943 67139069 ENST00000237247_exon_12_20_chr1_67138964_f 0 + chr1 67142666 67142799 ENST00000237247_exon_13_20_chr1_67142687_f 0 + ..... Any idea what went wrong?
Hi, Your bed looks good , what do you have per line ? something like that :
chr1 66999045 66999110 ENST00000237247_exon_0_20_chr1_66999066_f 0 +
Yes, exactly multi-lines in a format like that
Is it tabulation or simple space between columns ?
I think it is a tabulation.
I also tried the bedtools command: bedtools coverage -abam /path_to_bam/file.bam -b /path_to_bed/mygenes1.bed -d > /path_to/result.txt
but I got as a result: * WARNING: File - has inconsistent naming convention for record: 13 32890537 32890720 M01363:123:000000000-ANHGM:1:2117:25174:10912 44 +
Any idea what I am doing wrong?
check your chromosome names: chr13 vs 13. Check your reference fasta for chromosome names (>) used in alignment. Your bed file has chr in chromosome names
example of the fasta file: @NS500441:186:HLM5FAFXX:4:11401:23513:1026 2:N:0:CAGAGAGG+CTCCTTAC CAATAAACCATGAGAAACCTACCATTGTATTGTGCTGAGAAGTGGGAGTCCTATGGCCACAAGGCTCTTGCGTCAGCAGAGAAACTGGCTGCAATTCCTCAGAGTGAGGCTTGTTNGGANAANN + AAAAA66EAEEEEE/E/EE/EAEEEEEEEEEEAEEEEEEEEEEEEEEEAEEE/EEEEEEE/EE6EEE/<aeaeeeee eaaa="" aeeee="" <="" e="" e<6eeaee="" eaaeee6="" <aa#eea#ea##<="" p="">
and here is an example of my bam file when converted to bed with bamTobed command: 3523 34 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:2108:27336:5913 21 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:1104:25649:11829 34 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:2102:24036:2255 34 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:1115:7116:17263 34 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:2114:21848:11026 21 + 41276032 41276199 255,0,0 1 167 0 17 41276032 41276199 M01363:123:000000000-ANHGM:1:1110:16865:16320 21 + 41276032 41276199 255,0,0 1 167 0
I see in the bam file that the chromosome names are only numbers without "chr" unlike the Mygenes.bed which include the "chr". Hence, since Mygenes.bed file is made by genome.ucsc web browser. is there a way to change the output in that browser to match the format I have in the bam file or the only way I can go is to change the pipline of the fasta alignement in order to get a bam/sam file without the "chr" on it?
Hi, I think the easier is to had chr before number in you chromosome you can do :
cat your_ref.fa | perl -pe 's/^>/>chr/g' > your_ref_modified.fa
After you have to realign to have the good ref, and you should use chr against a number it's easier if you need to read your bam/sam file to check something :)
Best