Question

hisat2 warning message dunring alignment of fasta file with reference genome

0

Entering edit mode

6.2 years ago

shuksi1984 ▴ 60

I ran the following command:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

I got following HISAT2 process statistics:

Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads; of these: 31525247 (100.00%) were paired; of these:

31445543 (99.75%) aligned concordantly 0 times
2010 (0.01%) aligned concordantly exactly 1 time
77694 (0.25%) aligned concordantly >1 times
----
31445543 pairs aligned concordantly 0 times; of these:
  20260019 (64.43%) aligned discordantly 1 time
----
11185524 pairs aligned 0 times concordantly or discordantly; of these:
  22371048 mates make up the pairs; of these:
    7707288 (34.45%) aligned 0 times
    5218508 (23.33%) aligned exactly 1 time
    9445252 (42.22%) aligned >1 times

The output is 25G SRR925687.sam file. Everything seems to be fine, except the warning "**Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads;"

Kindly, explain why hisat2 is throwing such warning message.

RNA-Seq software error next-gen alignment hisat2 • 4.4k views

ADD COMMENT • link updated 6.2 years ago by sure ▴ 100 • written 6.2 years ago by shuksi1984 ▴ 60

4

Entering edit mode

Are you sure your command wasn't the following one?

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_1.fa -S path/to/SRR925687.sam

That would make more sense given the warning message.

ADD REPLY • link 6.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

Yes, I am sure. My command is:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

Two separate fasta files for "-1 and -2"

ADD REPLY • link 6.2 years ago by shuksi1984 ▴ 60

1

Entering edit mode

That's weird... it might be bug then, although I don't known for sure. The code responsible for that warning in hisat2:

// Check for duplicate mate input files
    if(format != CMDLINE) {
        for(size_t i = 0; i < mates1.size(); i++) {
            for(size_t j = 0; j < mates2.size(); j++) {
                if(mates1[i] == mates2[j] && !gQuiet) {
                    cerr << "Warning: Same mate file \"" << mates1[i].c_str() << "\" appears as argument to both -1 and -2" << endl;
                }
            }
        }
    }

So it should only trigger when (mates1[i] == mates2[j]), i.e, when both -1 and -2 files share exactly the same name.

ADD REPLY • link 6.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

Thank you for your response. Will check

ADD REPLY • link 6.2 years ago by shuksi1984 ▴ 60

1

Entering edit mode

Is there a specific reason why you're using fasta files instead of fastq files?

ADD REPLY • link 6.2 years ago by Sej Modha 5.3k

2

Entering edit mode

Can you post the exact command you used? (Do not doctor it with /path/to/... etc.)

Also show us the first few lines of each of your fastas.

ADD REPLY • link 6.2 years ago by Joe 21k

0

Entering edit mode

The command is:

/tools/hisat/hisat2 -f -x /references/grch38/genome -1 /inputfile/SRR925687_1.fa -2 /inputfile/SRR925687_2.fa -S /rnaseq/SRR925687.sam

First few lines of SRR925687_1.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/1
GTGAGATCTTGTCTTAGNAACAAACAAANNACGANTAAAAAAAAAANANNNAAGGCCGGGCCTGGNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/1
GCAGAAGTGACACAGCCATCCTTGGGTGTAGGCTNTGAGCTGGGCCNGNNNGTGGCCTTTAACAANNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/1
GATCGGAAGAGCGGTTCNGCAGGAATGCCGCGACNGACCTCGTCTCNGNNNTTCTGCTTGAACAANNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/1
GGCCAACAGCTCACCTCNAAAACTTCCCCACTGANAATAATGGCATNGNNNGGAAACTCGGGTCCNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/1
CTCATCATCTTCAGCTGCCCGCTTGCCCGTAGCTNACTCAGCTTCCNCNNNTTCATCTCCATCCCNNNNNNNNNNN

First few lines of SRR925687_2.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

ADD REPLY • link updated 6.2 years ago by Joe 21k • written 6.2 years ago by shuksi1984 ▴ 60

0

Entering edit mode

I've edited your markup to remove the uncessesary quotes, but it appears how you've copied them leaves the headers and sequence on a single line. Is this correct or have you made an error in how you've copied the data here?

ADD REPLY • link 6.2 years ago by Joe 21k

0

Entering edit mode

Header and sequence are in two separate lines.

ADD REPLY • link 6.2 years ago by shuksi1984 ▴ 60

0

Entering edit mode

Perhaps the files are named differently but they contain the same reads inside and there was a mixup before alignment?

ADD REPLY • link 6.2 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

That's my thinking. If hisat inspects the files at all, it could be that R1 got duplicated and renamed to R2, so the R1 file and R2 file are the same but named differently.

ADD REPLY • link 6.2 years ago by Joe 21k

score 0 · Answer 1 · 2018-09-24

The quickest test I would do is to check the insert size distribution of 1st million reads and if the size is zero then both files R1 and R2 have same data otherwise if the insert size distribution matches what is expected from library prep, I would consider it a bug in hisat2 and proceed further in my analysis.