Hi All,
I downloaded mRNA data for human
<Query virtualSchemaName="default" formatter="FASTA" header="0" uniqueRows="1" count="" datasetConfigVersion="0.8">
<Dataset name = "hsapiens_gene_ensembl" interface = "default">
<Filter name = "status" value = "KNOWN"/>
<Filter name = "transcript_status" value = "KNOWN"/>
<Filter name = "biotype" value = "protein_coding"/>
<Attribute name = "ensembl_transcript_id"/>
<Attribute name = "cdna"/>
</Dataset>
</Query>
When I check it on biomart.org the count shows 20467 but the file I am getting is huge and the count goes over 100l+. I have tried playing with datasetConfigVersion = "0.8" setting it to 0.6, 0.7 and 0.8 and always the same. Why am I getting so many sequences with sml query? Wven when I do not use status transcript status and biotype only the total number of genes with cDNA sequence is about 50k. Also I keep getting MySQL server errors with lost connection error. Busy server? Thanks.
Tom