cDNA database based on genome resequencing
1
0
Entering edit mode
3.6 years ago
boczniak767 ▴ 870

Hi,

I have some expression data acquired with old microarrays. The microarrays has been designed using ESTs from several maize lines, non-traceable.

Now, I have data from transcriptome profiling using these microarrays but for other maize line. The reference genome is B73 and I think most of probes has been done for it, though I can't prove it.

The natural thing would be mapping oligo probes to the B73 genome to see how the probes relate to new data on its sequence.

The problem is that I have done experiments with material from other line. So maybe it would be better to map oligos to transcripts of "my" line.

I have the genome of "my" line resequenced and corresponding vcf file. So I wonder if I can somehow use 1. the reference sequence (B73), 2. gff ot gtf file for it. 3. vcf with data for "my" line. to get cDNA sequences for "my" maize line?

I know, I can apply SNPs to genome of "my" line as follows bcftools consensus -i -s 140903_SND104_A_L007_HDS-1 -H 1 -f ../../agpv3/genome.fa AGPv3.fasta.mp.bcf.MUT.filt.vcf.gz -o s68_ref.fa cat ../../agpv3/genome.fa | vcf-consensus AGPv3.fasta.mp.bcf.MUT.filt_S68.vcf.gz > s68_ref.fa

cDNA resequencing microarray • 1.6k views
ADD COMMENT
0
Entering edit mode
3.6 years ago
JC 13k

I think you already have your answer, my only comment is to be sure your VCF is only SNV data, otherwise, you will change coordinates.

Also, I believe you can directly map your reads to B73 transcripts and check coverages, if the variation is in a single SNP per transcript, it should not affect too much the mapping.

ADD COMMENT
0
Entering edit mode

I only know, that with bcftools I can make pseudo-genome of my line based on vcf and reference line. Still I don't know how to make something similar with transcripts, i.e. to have new version of spliced cDNA.

ADD REPLY
0
Entering edit mode

if the variation is in a single SNP per transcript, it should not affect too much the mapping

The problem is that for some probes I have perfect match to transcript of one gene, but almost perfect match (one mismatch) for transcript of other gene. So I try to decide, how many mismatches are acceptable, and SNPs are complicating it.

ADD REPLY
0
Entering edit mode

@JC For a while I thought that I understand your answer. I.e. that I can apply SNPs to B73 genome and use gtf/gff to extract sequence with bedtools getfasta. But it is not possible, gtf contains coordinates of "transcripts" but to get cDNA I should rather join proper exons to avoid using also introns. I'm continuing my search for appropriate tool, so dear all feel free to comment/answer :-)

ADD REPLY
0
Entering edit mode

is the GTF/GFF coordinates for your line or for B73?

ADD REPLY
0
Entering edit mode

My gff is for B73.

ADD REPLY
0
Entering edit mode

@JC I came across HGVS format and VEP tool which can write such format from vcf. It looks promising for me, but before I would get far into manual, could anybody tell me if above tools allow retrieval of transcripts with applied variants? And, if HGVS supports custom IDs, i.e. not ensembl or refseq

ADD REPLY

Login before adding your answer.

Traffic: 1574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6