Extract nucleotide read quartet from vcf file
1
0
Entering edit mode
8.2 years ago
berge2015 ▴ 110

Hello,

Is there a way to get read depth for each nucleotide in the format A,C,G,T (any delimiter is okay) in every individual from the vcf file? Vcftools has utilities to get read depth for alleles at SNP sites summed for all individuals but I want it reported for each individual.

Example:

Chrom   Pos   Ref_allele   Sample_1   Sample_2  ..  Sample_n
Chr1    20      A          3,0,9,0    1,0,17,0  ..  0,0,7,0

I need the data to calculate LD. I tried using --hap-r2 but because the chromosomes in my reference genome are organized very messily (each chrom is split into thousands of small scaffolds), this isn't generating anything. I could lower the window size but then scanning for actual LD across the whole chromosome would be impacted.

I'd appreciate any help. Thanks.

vcf SNP variant • 2.2k views
ADD COMMENT
2
Entering edit mode
8.2 years ago

use https://github.com/lindenb/jvarkit/wiki/SAM2Tsv 'cut+sort+uniq -c ' to get the number of bases for each positions, use R or wathever to compile this information into the desired output.

ADD COMMENT
0
Entering edit mode

Thanks Pierre. Looks like a great tool but I am having too many problems compiling it (issue with javac version). Setting both JAVA_HOME & PATH to 1.8.0 did not do it for me (echo shows correct version for both paths, i.e. 1.8.0). I'll give it another go later today/tomorrow and open an issue on github if I can't get it to work. Thanks again!

ADD REPLY
0
Entering edit mode

if you find something wrong, feel free to submit an issue. But for now, the travis-ci is working fine.

ADD REPLY
0
Entering edit mode

Pierre,

The program runs fine but I am not getting the expected output as shown in your Sam2Tsv page under 'Example'. Here's my command:

java -jar dist/sam2tsv.jar sample.sam -r ref.dna.fa -o sample_nuc_count.txt

And here's the first 10 lines of the output:

#READ_NAME      FLAG    CHROM   READ_POS        BASE    QUAL    REF_POS REF     OP
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      0       A       D       59371   A       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      1       G       G       59372   G       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      2       T       G       59373   T       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      3       G       G       59374   G       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      4       C       G       59375   C       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      5       G       G       59376   G       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      6       G       G       59377   G       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      7       T       G       59378   T       M
D00555:140:c8gy1anxx:7:1114:16140:45498 16      TGACv1_scaffold_576745_7BL      8       T       G       59379   T       M

I also got this on the terminal, but not sure if this is causing it: Ignoring SAM validation error due to lenient parsing: Error parsing text SAM file. MAPQ should be 0 for unmapped read.; File sample.sam; Line 1260558

Thanks for your help.

ADD REPLY
0
Entering edit mode

I don't see anything wrong in the outpit

the message "Ignoring SAM validation" is just a warning. There is an error in your SAM file because there is an unmapped read having a mapping quality >0 (which is wrong).

ADD REPLY
0
Entering edit mode

Ah, my bad. The 'Qual' column threw me off here. Sorry. Thanks for the great tool!

ADD REPLY
0
Entering edit mode

cool. BTW, you can use a bam as well, or stdin. something like;

samtools view -bu your.bam TGACv1_scaffold_576745_7BL | java -jar sam2tsv.jar -r ref.dna.fa

ADD REPLY

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6