RSeQC (infer_experiment) problem
1
0
Entering edit mode
9.9 years ago
biolab ★ 1.4k

HI everyone

I am using RSeQC to determine strandness. When running the command infer_experiment.py -r TAIR10.bed -i WT_accepted_hits.bam.sort.bam, I faced the following failure. Could anyone help? THANKS a lot.

Failure message:

This is PairEnd Data
Fraction of reads failed to determine: 1.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0000
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0000

head of TAIR10.bed

chr01    0    30427671    Chr1    .    .    TAIR10    chromosome    .ID=Chr1;Name=Chr1
chr01    3630    3759    AT1G01010.1-Protein    .    +    TAIR10    five_prime_UTR    .    Parent=AT1G01010.1
chr01    3630    3913    AT1G01010.1-Protein    .    +    TAIR10    exon    .Parent=AT1G01010.1

head of bam file

FC42T26AAXX:7:104:1142:897#0    73    chr01    3636    50    75M    *    00    ATTAGATATACCAAACCAGAGAAAACAAATACATAATCGGAGAAATACAGATTACAGAGAGCGAGAGAGATCGAC    abbbabbbbb`^abaaaababaaaaabababaabbaaabaaaaaaaaaaaaaaaaa`aaaaaaaaaaaaaa^a__    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:75    YT:Z:UU    NH:i:1
RSeQC • 3.9k views
ADD COMMENT
0
Entering edit mode
9.9 years ago

Note the chromosome names, they don't match.

ADD COMMENT
0
Entering edit mode

Hi Devon Thanks for your reply. I have changed the bed file (all "Chr"s are converted to "chr0"s, please see below). Unfortunately, the problem still exists.

chr01    0    30427671    chr01    .    .    TAIR10    chromosome    .    ID=chr01;Name=chr01
chr01    3630    3759    AT1G01010.1-Protein    .    +    TAIR10    five_prime_UTR    .    Parent=AT1G01010.1
chr01    3630    3913    AT1G01010.1-Protein    .    +    TAIR10    exon    .    Parent=AT1G01010.1
chr01    3630    5899    AT1G01010    .    +    TAIR10    gene    .    ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
chr01    3630    5899    AT1G01010.1    .    +    TAIR10    mRNA    .    ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
ADD REPLY
0
Entering edit mode

The alignment you posted is of a singleton (i.e., one read from a pair aligned alone). Are they all like that? If so, then that's your problem. BTW, a simple say to check this is to just run samtools view foo.bam | cut -f 2 | sort | uniq -c.

ADD REPLY
0
Entering edit mode

Hi Devon, thanks a lot! However, I still have the problem. samtools view bam-file, then sort and uniq produce following output. I think I probably need to check other sequencing datasets.

8801117 chr01
4474771 chr02
5994228 chr03
4496609 chr04
6503270 chr05
ADD REPLY
0
Entering edit mode

You must have used cut -f 3 instead of cut -f 2. We need to know the count of alignments for each flag value.

ADD REPLY
0
Entering edit mode

The FLAGs are as follows. Anything strange? THANKS.

5993 113
   6962 129
 133244 137
 141985 145
6450498 147
 139543 153
 142099 161
6434219 163
   5993 177
   6962 65
1789282 73
 142099 81
6434219 83
1844414 89
 141985 97
6450498 99
ADD REPLY
1
Entering edit mode

So most of the alignments are informative. I haven't a clue then why the script is having problems. Just look at a few genes in IGV and you should be able to discern the strandedness.

ADD REPLY
0
Entering edit mode

Hi Devon, Thanks a lot anyway! I will manually check some genes after separating XS strands.

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6