Entering edit mode
6.7 years ago
avneeshbt
•
0
Hello everyone, Someone please explain or share some information from where I can know how exactly the second column in a NGS read name line i.e. line starting with "@" is used by downstream softwares, be it assembler or mapping tools. Thanks
What exactly do you mean with second column?
Given a read identifier like this
Are you referring to the
1:N:18:1
part or to the136
(assuming ':' as column identifier)?In the former case, this information holds the read pair information (
1:N:18:1
= read1,2:N:18:1
= read2, s. lieven.sterck's answer), in the latter case it's the run id and does not matter for downstream analysis (also think about the fact that when you upload your reads e.g. to SRA, the read names will be changed to a header much shorter than the original one).I am referring to 1:N:18:1. For a program we are anyways mentioning which file is P1 or P2 in a paired-end data, than why this is still required. For example in case of SPAdes, if I use reads file without 1:N:18:2 column, it will give an error, even if I specify p1 and p2 as separate files.
mainly because every cleaver implemented software will do some double checking to verify that what you enter on the command line is also actually true (data-wise). Also how else would he know which read is left and which one is right?
From the order of files specified, but you're right.
That is indeed strange. Especially when you think of the fact that some public, e.g. from SRA, data will not contain the second column anymore. What kind of error are you getting? It might also be possible to add /1 and /2 to the read names in order to reconstruct the pair information.