Entering edit mode
8.5 years ago
adityabandla
▴
30
Is it critical to trim the barcodes (or index reads), present in read headers before aligning sequences to the NR database using a tool such as DIAMOND?
Dataset consists of dual-indexed paired-end reads generated on the HiSeq
No. You would need to convert the fastq sequences to fasta format (in strict fasta format anything after the first space in fasta header is ignored).
PE reads should basically give you the same result (unless you have a fusion or something unusual) so searching with only one of the reads should be adequate.
Hi Genomax
Thank you for the reply! Much appreciated. DIAMOND seems to accept FASTQ files as input as well (https://github.com/bbuchfink/diamond)
So I am just trimming adapters and adjusting the read headers before running them through DIAMOND
Ah well in that case you possibly don't need to worry about the tag sequence in fastq header (I assume your data is already de-multiplexed).