extracting reads from FASTA/FASTQ file
1
0
Entering edit mode
6 weeks ago
bleven • 0

Hello, I have specific genes for which I want to pull reads from FASTA or FASTQ (I am not sure which would be better). How would I do that? thank you!

RNA-seq RNA • 374 views
ADD COMMENT
0
Entering edit mode

Is there a reason you want to do this? A couple of ways to proceed. One way is to use your gene sequence as an alignment index, and then align your fastq reads to the index. Keep all the reads that match the index. Another way is to map all the reads to the alignment index for the genome from which the gene was derived, and then keep all the reads that overlap with the gene coordinates (easy using samtools). Other than that, the only other solution consistent with the way you worded your question would be to use a direct pattern match between your fastq reads and the gene sequence - which seems exceedingly clunky and probably not what you're really after. What are you trying to achieve?

ADD REPLY
0
Entering edit mode
6 weeks ago
GenoMax 147k

Use bbduk.sh from BBMap suite in filter mode. Guide available here : https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ It will work with any kind of reads and you can output data as fasta if you wish.

That said, if the original data came from whole genome, using a reduced representation (e.g. genes of your interest) always has the possibility that some reads may get pulled in by chance. If you have a reference available then aligning to complete genome and then extracting reads (as suggested already) would be the cleanest way to do this. bbmap.sh the aligner can help with that.

ADD COMMENT

Login before adding your answer.

Traffic: 2528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6