Hello
I am working on equine embryos and we did non-oriented paired-end RNA-seq (Illumina, NextSeq High) on not-pooled embryos.
I first mapped the data with STAR using EquCab3.0.99 from Ensembl. It worked perfectly (around 60% of mapping for each embryo) but I am interesting in two genes: XIST and SRY which would allow me to know the embryo sexe. This 2 genes are not in the Ensembl annotation but they seems to be in NCBI genebank (but I am not sure):
https://www.ncbi.nlm.nih.gov/nuccore/U50911.1 https://www.ncbi.nlm.nih.gov/gene/100033824
So, I want annotate the Ensembl unmapped reads with NCBI genebank to find XIST and SRY.
In STAR, I saw that I have to use the argument:
--outReadsUnmapped Fastx
but I am wondering about the read 1 and 2: are they in the same fastq file? Can I use these fastq files directly in the mapping with NCBI genebank?
Moreover, I do not know how find the good fasta and GTF/GFF files in NCBI. Could someone give me a tutorial to download the good one?
Finally, in Ensembl annotation I have a lot of "lnc DNA" annotated as novel gene. I am wondering: is it possible that the genes that are annotated to "novel gene" could be XIST for example? Can I check that?
Thank you
Emilie
There's four assemblies at the NCBI:
https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/145/
In case your genes of interest are not annotated, but you know their sequence, why don't you blast them against your genome?
Thank you both for your answer. Did the sequences of SRY and XIST are complete in the links I published? Actually, the good question is, are they sufficient to do a blast on our genome ?
I am new in this kind of analysis and I did not think it was possible.
Thank you again
Sincerely yours
Emilie
I tried blasting XIST against against the Horse taxID. Even with regular
blastn
(dissimilar sequences) there are no significant hits other than to the accession itself. Take a look for yourself here (NCBI blast link will expire in 2 days).While we don't recommend using a reduced representation of the genome when searching with NGS data in this case you could simply make a database with the two genes you have or use suggestion @h.mon made below to look at k-mer signatures.