Hi,
I have some expression data acquired with old microarrays. The microarrays has been designed using ESTs from several maize lines, non-traceable.
Now, I have data from transcriptome profiling using these microarrays but for other maize line. The reference genome is B73 and I think most of probes has been done for it, though I can't prove it.
The natural thing would be mapping oligo probes to the B73 genome to see how the probes relate to new data on its sequence.
The problem is that I have done experiments with material from other line. So maybe it would be better to map oligos to transcripts of "my" line.
I have the genome of "my" line resequenced and corresponding vcf
file.
So I wonder if I can somehow use 1. the reference sequence (B73), 2. gff
ot gtf
file for it. 3. vcf
with data for "my" line. to get cDNA sequences for "my" maize line?
I know, I can apply SNPs to genome of "my" line as follows bcftools consensus -i -s 140903_SND104_A_L007_HDS-1 -H 1 -f ../../agpv3/genome.fa AGPv3.fasta.mp.bcf.MUT.filt.vcf.gz -o s68_ref.fa
cat ../../agpv3/genome.fa | vcf-consensus AGPv3.fasta.mp.bcf.MUT.filt_S68.vcf.gz > s68_ref.fa
I only know, that with
bcftools
I can make pseudo-genome of my line based onvcf
and reference line. Still I don't know how to make something similar with transcripts, i.e. to have new version of spliced cDNA.The problem is that for some probes I have perfect match to transcript of one gene, but almost perfect match (one mismatch) for transcript of other gene. So I try to decide, how many mismatches are acceptable, and SNPs are complicating it.
@JC For a while I thought that I understand your answer. I.e. that I can apply SNPs to B73 genome and use
gtf
/gff
to extract sequence withbedtools getfasta
. But it is not possible,gtf
contains coordinates of "transcripts" but to get cDNA I should rather join proper exons to avoid using also introns. I'm continuing my search for appropriate tool, so dear all feel free to comment/answer :-)is the GTF/GFF coordinates for your line or for B73?
My
gff
is for B73.@JC I came across
HGVS
format andVEP
tool which can write such format fromvcf
. It looks promising for me, but before I would get far into manual, could anybody tell me if above tools allow retrieval of transcripts with applied variants? And, ifHGVS
supports custom IDs, i.e. notensembl
orrefseq