Circos histogram from SNP.VCf file
2
2
Entering edit mode
7.5 years ago
Bioinfonext ▴ 470

I have called SNP in transcriptome data using GATK pipeline. I installed circos on linux platform. I understand from circos tutorial that it needs conf. file with plot setting.

In that I need to mention path to karyotype file which contain information about chromosomes. But still I am not able to understand how to give SNP.VCf as a input file?

Please suggest how to give input files to draw histogram form SNP.VCF file.

SNP • 7.0k views
ADD COMMENT
0
Entering edit mode

histogram of what ? variant per Mb ? per sample/genotypes Mb ?

ADD REPLY
0
Entering edit mode

I have a two contrasting genotype transcriptome. I mapped these pair end reads to same reference genome. I want to see SNP histogram along the all chromosomes for both genotypes. I think it should be varient per Mb.

ADD REPLY
0
Entering edit mode

Circos only accepts data in its simple but strict tabular format and does not accept VCF as an input. You will need to either use another tool or preprocess the data until you have the exact data you want to plot in the format Circo needs.

ADD REPLY
0
Entering edit mode

Thanks a lot for this valuable help. I will try it and inform you if works for me.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
6
Entering edit mode
7.5 years ago
  • using awk get the chrom/start of the VCF, group the pos by windows of 10000
  • sort / uniq to get the count of variants in each window of 10000 bp
  • use awk to create chrom/start/end/count . The chrom must be the same than in the circos karyotype/config, that's with I why I've added a 'hs' prefix.
     $ awk '/^#/ {next} {printf("%s\t%d\n",$1,$2-$2%10000);}' input.vcf | sort | uniq -c | awk '{printf("hs%s\t%s\t%d\t%s\n",$2,$3,$3+10000,$1);}' > vcf.dat

    $ head vcf.dat
    hs10    103580000   103590000   4
    hs10    103600000   103610000   1
    hs10    112540000   112550000   4
    hs10    112550000   112560000   2
    hs10    112570000   112580000   3
    hs10    112580000   112590000   1
    hs10    112590000   112600000   2
    hs10    115800000   115810000   4
    hs10    121420000   121430000   1
    hs10    121430000   121440000   2

create the config file:

    (....)

    <plots>
    <plot>
    type      = histogram
    min=0
    max=70
    file          = vcf.dat
    r0  = 0.3r
    r1  = 0.9r
    color         = black_a4
    fill_color = lgreen
    thickness     = 5
    </plot>
    </plots>
    (...)

and call circos:

    circos  -outputdir ./   -outputfile  vcf -conf vcf.conf

enter image description here

ADD COMMENT
0
Entering edit mode

Thanks a lot for this valuable help. I will try it and inform you if works for me.

ADD REPLY
3
Entering edit mode
7.5 years ago
Sej Modha 5.3k

Maybe you can try a custom version of Circos called CircosVCF for this specific case. http://212.150.245.226/~tools/CircosVCF/

ADD COMMENT

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6