Entering edit mode
8.3 years ago
viscardi.neander
▴
10
Hi!
I',m working with the wolf genome and want to retrieve a specific region. The ENA only provides me the FASTQ file http://www.ebi.ac.uk/ena/data/view/ERS715549. If i had the SAM file i could simply use samtools to retrieve the specific region given a BED format. However starting from the FASTQ I`m not sure what to do. I believe that to do so I need first to index the genome reference from canFam3 with BWA or Bowtie2, align the whole wolf genome with the canFam3, convert to SAM and then retrieve the sequence of interest. Is that right? Or there is an shorter path to work?
If there's no wolf reference genome then your only options are (1) align to the most similar available genome, call variants, make a reference out of the results and extract from that or (2) assemble into contigs, blast the gene of interest against that and extract the resulting region.
Can you show me te steps or softwares to use on each step?
After alignment i converted the file .sai to .sam with the command:
then i`m thinking to converto to .bam to work faster
and then just work normally with samtools?
Thanks a lot :)
You'll be able to find plenty of tools by searching this site.
I still cannot find the solution...
my steps.
1)index
2) align
3)convert to sam
4) convert to bam
5) index wolf
6) I'm trying to assemble the paired-ends of this gene.fastq, because I ended with the short reads from this fastq file:
but what i need is one fasta file to compared with other Dogs genome
Thanks again, and sorry for bothering you
Step 6 will be "call variants" (e.g., with GATK, freeBayes, or samtools mpileup). Step 7 will be to take the VCF file produced in step 6 and produce a new fasta file (e.g., with GATK's FastaAlternateReferenceMaker). You can then
samtools faidx
that get the sequence.Hummm, I'm running it and then will follow your steps! Thanks a lot! =D