Question

Extract sequences from the raw reads

0

Entering edit mode

6.1 years ago

Jusnib • 0

I need to extract promoter and gene sequences of few (5) genes from more than 100 soybean lines. However, I have only raw reads of the genome. Mapping the reads of all the lines to soybean genome will take very long time. Is there any other quick way to extract those sequences?

WGS next-gen • 2.2k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.1 years ago by Jusnib • 0

2

Entering edit mode

To extract the gene/promoter sequences from you raw reads, you have to map them on some reference. Here reference does not mean genome all the time.

You can make your own customized reference database from the interested gene/promoter sequences and then using any sequence alignment tools (I would suggest short read aligner like BWA, bowtie and bowtie2 as you have raw sequencing reads), you can map your raw reads on such small customized database to your save time.

At the end of the alignment, you will get the gene/promoter sequences from your raw reads which are similar to the customized database(gene/promotor database).

ADD REPLY • link 6.1 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Perhaps BLAST?

ADD REPLY • link 6.1 years ago by goodez ▴ 640

0

Entering edit mode

If I understand correctly, you have sequences for 5 genes and you want to extract all of the WGS reads that map to these genes. Am I correct? If so, BLAST is probably your best option. If the data are already in SRA then it would be even easier as you can use the web BLAST and use your gene sequence as query against the WGS SRA project as the subject database. If the data are not in SRA then you can run BLAST locally.

ADD REPLY • link 6.1 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

Is there any particular reason why you don't want to assemble the reads first?

ADD REPLY • link 6.1 years ago by n,n ▴ 370

0

Entering edit mode

These are not my data, I got these raw reads from our collaborator. I need sequences of few genes and If possible, I would like to avoid spending time in assembling the reads. If there is no other way I will assemble the reads.

ADD REPLY • link 6.1 years ago by Jusnib • 0

0

Entering edit mode

You could also pseudo-align to the FASTA mRNA sequences for the genes of interest using Kallisto or Salmon, produce a pseudobam from this pseudo-alignment, and then extra the reads that have aligned from the BAM. Be aware of the biases in these steps, though.

Otherwise, assemble the genome and generally follow steps by Nitin Narwade.

ADD REPLY • link 6.0 years ago by Kevin Blighe 88k