DEXSeq missing mate encountered
1
0
Entering edit mode
9.5 years ago

Hi,

I have BAM files for pairs-end reads aligned using TopHat2. The BAM files were sorted based on read name using samtools -n. But when I performed read count analysis, I found almost all reads to have missing mates. Below is the error message:

/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py:622: UserWarning: 197873794 reads with missing mate encountered.

Here are the top few lines from BAM file. The read names from mates are differing by .1 and .2 extension. Is this an issue? I will highly appreciate if you could help me out in this regard.

@HD    VN:1.0    SO:queryname
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chrM    LN:16571
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@PG    ID:TopHat    VN:2.0.10
.
.
.    
SRR1660308.1.1    81    chr8    145140609    50    38M138N2M    =    145140331    -456    GTGCCCACTNCTGCGCTTGCCAAGCCTCATGGGCCCCGGA    I
IGGHB;<0#FJIIGJJJIJIJJJJJJHHHHHFFFFFCCC    AS:i:0    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:9G30    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.1.2    161    chr8    145140331    50    39M    =    145140609    456    CACTGCTGGCGATTTATGCAGCTGGCCTGGCCCTGCCCC    22AFHFFHJIJGIIBEF
BGIJJIEGIJ@HIHIJJG?FIE    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.3.1    97    chr17    40925519    50    28M204N12M    =    40926678    1198    GGCAGTATCGCTTCCCACCCTTCTTTACGTNACAACCGAA    @
CCFFFBDDFHHHIIIIIIIIIIIIIIIII#1:CFHGFIG    AS:i:0    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:30T9    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.3.2    145    chr17    40926678    50    39M    =    40925519    -1198    GAGTCGATCCAGATTGTATTAGAGGAACTGAGGAAGAAA    GGFAC94IGCJIGIGII
IJHEGGHGE<C4IGFDHDDA22    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.4.1    83    chr5    154194381    50    40M    =    154194287    -134    TTTATTTTTNTCTGACCTTTCCATCCTTGAAAAAACGGGG    JJJJJHG?1
#GJIGGJJIJIIJJJIIJHHHHHFFDFFCCC    AS:i:-6    XN:i:0    XM:i:2    XO:i:0    XG:i:0    NM:i:2    MD:Z:9C25T4    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.4.2    163    chr5    154194287    50    39M    =    154194381    134    CAGCCTCTCCCCACTCCCCTTGCCCCACACCCTTTACTC    22AFHHHHJJJJJJJ3C
GHJIIHIJIIJJJGHHIGGJII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.5.1    99    chrM    8899    3    30M    =    9023    154    CCTAGCCCACTTCTTACCACAAGGCACAAC    =;:B7A++::;4D4+<CF@?D3AF=+<B)?    AS:i:-2    X
N:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:28C1    YT:Z:UU    NH:i:2    XS:A:-    HI:i:1

Best regards,
Anil

RNA-Seq • 3.1k views
ADD COMMENT
0
Entering edit mode
9.5 years ago

The difference in read names are almost certainly breaking things. You could likely fix this with awk, but next time use the -F option with fastq-dump. That'll prevent this sort of thing from happening any give you more meaningful read IDs in addition.

ADD COMMENT
0
Entering edit mode

Thanks a lot for you suggestion. Can awk function work with bam (binary)

ADD REPLY
0
Entering edit mode

No, but I suspect you can pipe to the dexseq counting script.

ADD REPLY
0
Entering edit mode

The easy way could be to convert BAM to SAM and then perform awk function

ADD REPLY
0
Entering edit mode

Nice catch, what should the header look like? Guess I ran in to the same problem.

ADD REPLY

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6