Entering edit mode
2.8 years ago
Apex92
▴
320
I am working with a fastq file where I take specific parts from each sequence (I remove adapters). I read the fastq file using biopython SeqIO and I do not know how to print the third and fourth lines (phred scores) as they are in the original file. Any inputs?
Here is my code:
with open("test.fastq", "r") as Fastq:
for record in SeqIO.parse(Fastq,'fastq'):
if record.id in lst:
adapter_pos = record.seq.find('AACTGTAGGCACCATCAAT')
RNAseq = record.seq[:adapter_pos]
adapter_seq=record.seq[adapter_pos:adapter_pos+19]
umi_seq = record.seq[adapter_pos+19:adapter_pos+19+12]
print(record.id)
print(RNAseq+adapter_seq)
you also need to trim or extract qualities in line with sequence. Search for iterator properties (in this record) to extract quality and third line from a fastq record.
thank you for your response - exactly the problem that I have is searching for iterator properties in the records to extract quality and third line from a fastq record.
Do you have any suggestions?
letter_annotations (4th line) for SeqIO.parse. But these are (4th line) decoded scores. Refer to "property letter_annotations" section in "https://biopython.org/docs/1.75/api/Bio.SeqRecord.html" page.
If you want scores as they are in fastq records, you can use SeqIO.QualityIO.FastqGeneralIterator (title = read ID, sequence = sequence, quality= quality) and you can print + for third line.