Standard Genome Plus Vcf To Variant Genome
1
2
Entering edit mode
13.3 years ago
Darked89 4.7k

I have to do some work with an organism doing transplicing and RNA-Seq (50bp, stranded but unpaired). It is fairly easy to map most of the RNA-Seq reads (no introns), but I want to get more out of this data, in this case transcription starts.

I managed to extract reads containing splice leader (SL) or its part, and mapped these to standard genome for this organism. Problem is that for some genes I can not find even a semi-plausible mappings upstream of their ORFs. I have tried 5 different mappers for that purpose (last, bowtie, GEM, bfast and bwa => results differ) but lack of some mappings is not due to algorithm used.

Since there are some differences between strains, I was thinking about constructing genome sequence closer to my strain, by using mappings from the whole RNA-Seq, looking for SNPs/indels, then picking the most frequent variant in my data set for constructing new genome sequence. That way I hope to cope with genome differences which may affect mappings of SL-containing reds. BTW, I remove the SL sequence, so the reads I am trying to map are 50bp minus 8 up to 39bp shorter.

Question: are there any tools for: reference.genome.fas + VCF => my_strain.genome.fas conversion?

EDIT GATK has a tool (FastaAlternateReferenceMaker) described here but it required input is in ".rod" format. Thanks to swbarnes2 for pointing me to GATK.

EDIT 2 Just to clarify how I got VCF file:

samtools mpileup -ugf my_reference_genome.fa s_1.bam s_2.bam s_3.bam s_4.bam | bcftools view -bvcg - > s1-4.var.raw.bcf 
bcftools view s1-4.var.raw.bcf   | vcfutils.pl varFilter -D 100 > s1-4.var.flt.vcf

EDIT 3

The answer is:

java -jar ~/soft/GenomeAnalysisTK-1.0-6121-g8a78414/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R my_reference_genome.fa
-o my_NEW_genome.fa -B:testsnp,VCF s1-4.var.flt.vcf

EDIT 4 GATK changed syntax:

java -jar ~/soft/GenomeAnalysisTK_1.4/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R  my_reference_genome.fa -o  my_NEW_genome.fa --variant  s1-4.var.flt.vcf
genome vcf • 6.1k views
ADD COMMENT
2
Entering edit mode

Does it handle indels? I guess not?

ADD REPLY
1
Entering edit mode

See also: New Fasta Sequence From Reference Fasta And Variant Calls File? (GATK's FastaAlternateReferenceMaker as accepted answer)

ADD REPLY
0
Entering edit mode

@gaffa: many thanks. I fixed the link in the answer to your question.

ADD REPLY
0
Entering edit mode

@lh3: I just did the test selecting from s1-4.var.flt.vcf only entries with quality 999 and "INDEL". The resulting fasta file shows differences with diff. So I am quite positive it must incorporate indels for creating alternative reference.

ADD REPLY
2
Entering edit mode
13.3 years ago
Swbarnes2 ★ 1.6k

Yes, but they aren't considered great. samtools/bcftools vcf2fq will do that, but it will simply ignore indels. GATK might also have one, but I'm not as familiar with what software, so I'm not sure how well it works.

ADD COMMENT

Login before adding your answer.

Traffic: 2312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6