Entering edit mode
3.3 years ago
vaishnavi
▴
80
Hi everyone,
I want to extract the last 400 sequence from a fastq file using python, I am a beginner in bioinformatics can anyone please guide me how to do it?
Probably you can modify from How to extract the last 1000 nt from a group of sequences in a FASTA file?
I appreciate your suggestion but I need to extract sequence from a illumina based file which has a fastq format.
If you have raw Illumina data then you don't have any reads that are longer than 150/300 bp long (disclaimer: it is possible to run a paired-end 300 bp kit as single end to get 600 bp reads but that would be highly unusual/non-standard).
If you don't have raw data then your reads are no longer in fastq format. So please clarify exactly what you have and what you want to get out of that data.
I want extract last 400 reads from a fastq file, containing 72 bp per read using linux command.
This is a completely different requirement than what the title of the post says.
You can do this
reformat.sh in=your.fq out=sampled.fq skipreads=N
(from BBMap Suite). SetN = (total reads - 400)
.Note: If this is an assignment and requires you to specifically use
python
then please make a note of that. You will also need to show some code you wrote for people to provide suggestions.thankyou for help, this was an assignment it was suppose to be done in python but my professor changed it to linux now.