Entering edit mode
7.5 years ago
Jeffin Rockey
★
1.3k
Hi,
Raw data is a paired end fastq file.
Aligned it with Genome-1 (using STAR) and got unaligned R1 and Unaligned R2.
Also aligned it with Genome-2 (using STAR) and got unaligned R1 and Unaligned R2.
Please advise what would be the best method/tool to obtain the 'union' fastq of the unaligned reads from the unaligned of both the genome alignments.
Jeffin
Do you really want the union or rather the intersection?
Union itself is the requirement
Creating the union would be simply combining the unaligned files together? Just need to avoid duplicates.
Assuming unaligned R1/R2 are .bamfiles:
Analogously for .R2
Thanks.
Could you please provide some detail on whats happening in the grep line?
Thanks a lot @cschu181 . Good to learn about the -A functionality in grep :)
Hi, The best way to avoid duplicate fastq entries is the aspect what I am doubtful about.
If I write a small script it would be easily doable.But I wanted to know whether there is better method to combine while keeping duplicates away