hisat2 warning message dunring alignment of fasta file with reference genome
1
0
Entering edit mode
6.2 years ago
shuksi1984 ▴ 60

I ran the following command:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

I got following HISAT2 process statistics:

Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads; of these: 31525247 (100.00%) were paired; of these:

31445543 (99.75%) aligned concordantly 0 times
2010 (0.01%) aligned concordantly exactly 1 time
77694 (0.25%) aligned concordantly >1 times
----
31445543 pairs aligned concordantly 0 times; of these:
  20260019 (64.43%) aligned discordantly 1 time
----
11185524 pairs aligned 0 times concordantly or discordantly; of these:
  22371048 mates make up the pairs; of these:
    7707288 (34.45%) aligned 0 times
    5218508 (23.33%) aligned exactly 1 time
    9445252 (42.22%) aligned >1 times
  

The output is 25G SRR925687.sam file. Everything seems to be fine, except the warning "**Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads;"

Kindly, explain why hisat2 is throwing such warning message.

RNA-Seq software error next-gen alignment hisat2 • 4.4k views
ADD COMMENT
4
Entering edit mode

Are you sure your command wasn't the following one?

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_1.fa -S path/to/SRR925687.sam

That would make more sense given the warning message.

ADD REPLY
0
Entering edit mode

Yes, I am sure. My command is:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

Two separate fasta files for "-1 and -2"

ADD REPLY
1
Entering edit mode

That's weird... it might be bug then, although I don't known for sure. The code responsible for that warning in hisat2:

// Check for duplicate mate input files
    if(format != CMDLINE) {
        for(size_t i = 0; i < mates1.size(); i++) {
            for(size_t j = 0; j < mates2.size(); j++) {
                if(mates1[i] == mates2[j] && !gQuiet) {
                    cerr << "Warning: Same mate file \"" << mates1[i].c_str() << "\" appears as argument to both -1 and -2" << endl;
                }
            }
        }
    }

So it should only trigger when (mates1[i] == mates2[j]), i.e, when both -1 and -2 files share exactly the same name.

ADD REPLY
0
Entering edit mode

Thank you for your response. Will check

ADD REPLY
1
Entering edit mode

Is there a specific reason why you're using fasta files instead of fastq files?

ADD REPLY
2
Entering edit mode

Can you post the exact command you used? (Do not doctor it with /path/to/... etc.)

Also show us the first few lines of each of your fastas.

ADD REPLY
0
Entering edit mode

The command is:

/tools/hisat/hisat2 -f -x /references/grch38/genome -1 /inputfile/SRR925687_1.fa -2 /inputfile/SRR925687_2.fa -S /rnaseq/SRR925687.sam

First few lines of SRR925687_1.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/1
GTGAGATCTTGTCTTAGNAACAAACAAANNACGANTAAAAAAAAAANANNNAAGGCCGGGCCTGGNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/1
GCAGAAGTGACACAGCCATCCTTGGGTGTAGGCTNTGAGCTGGGCCNGNNNGTGGCCTTTAACAANNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/1
GATCGGAAGAGCGGTTCNGCAGGAATGCCGCGACNGACCTCGTCTCNGNNNTTCTGCTTGAACAANNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/1
GGCCAACAGCTCACCTCNAAAACTTCCCCACTGANAATAATGGCATNGNNNGGAAACTCGGGTCCNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/1
CTCATCATCTTCAGCTGCCCGCTTGCCCGTAGCTNACTCAGCTTCCNCNNNTTCATCTCCATCCCNNNNNNNNNNN

First few lines of SRR925687_2.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ADD REPLY
0
Entering edit mode

I've edited your markup to remove the uncessesary quotes, but it appears how you've copied them leaves the headers and sequence on a single line. Is this correct or have you made an error in how you've copied the data here?

ADD REPLY
0
Entering edit mode

Header and sequence are in two separate lines.

ADD REPLY
0
Entering edit mode

Perhaps the files are named differently but they contain the same reads inside and there was a mixup before alignment?

ADD REPLY
0
Entering edit mode

That's my thinking. If hisat inspects the files at all, it could be that R1 got duplicated and renamed to R2, so the R1 file and R2 file are the same but named differently.

ADD REPLY
0
Entering edit mode
6.2 years ago
sure ▴ 110

The quickest test I would do is to check the insert size distribution of 1st million reads and if the size is zero then both files R1 and R2 have same data otherwise if the insert size distribution matches what is expected from library prep, I would consider it a bug in hisat2 and proceed further in my analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 2124 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6