I ran the following command:
path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam
I got following HISAT2 process statistics:
Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads; of these: 31525247 (100.00%) were paired; of these:
31445543 (99.75%) aligned concordantly 0 times 2010 (0.01%) aligned concordantly exactly 1 time 77694 (0.25%) aligned concordantly >1 times ---- 31445543 pairs aligned concordantly 0 times; of these: 20260019 (64.43%) aligned discordantly 1 time ---- 11185524 pairs aligned 0 times concordantly or discordantly; of these: 22371048 mates make up the pairs; of these: 7707288 (34.45%) aligned 0 times 5218508 (23.33%) aligned exactly 1 time 9445252 (42.22%) aligned >1 times
The output is 25G SRR925687.sam file. Everything seems to be fine, except the warning "**Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads;"
Kindly, explain why hisat2 is throwing such warning message.
Are you sure your command wasn't the following one?
That would make more sense given the warning message.
Yes, I am sure. My command is:
Two separate fasta files for "-1 and -2"
That's weird... it might be bug then, although I don't known for sure. The code responsible for that warning in hisat2:
So it should only trigger when (mates1[i] == mates2[j]), i.e, when both -1 and -2 files share exactly the same name.
Thank you for your response. Will check
Is there a specific reason why you're using fasta files instead of fastq files?
Can you post the exact command you used? (Do not doctor it with
/path/to/...
etc.)Also show us the first few lines of each of your fastas.
The command is:
First few lines of SRR925687_1.fa:
First few lines of SRR925687_2.fa:
I've edited your markup to remove the uncessesary quotes, but it appears how you've copied them leaves the headers and sequence on a single line. Is this correct or have you made an error in how you've copied the data here?
Header and sequence are in two separate lines.
Perhaps the files are named differently but they contain the same reads inside and there was a mixup before alignment?
That's my thinking. If hisat inspects the files at all, it could be that R1 got duplicated and renamed to R2, so the R1 file and R2 file are the same but named differently.