Question

I would like to compare my RNA seq file together to find overlap?

0

Entering edit mode

7.4 years ago

star ▴ 350

I have 3 RNA seq files and i would like to compare these files together to find overlaps and unique reads between them. In fact, I have 3 files (Files1, Files2 and files3) that I think File1 is the merge of File2 and File3 but I am not sure, so I decide to compare these 3 files together to find is there any unique reads between them?

I have .fastq file (Raw data) , .bam file (after aligning) and count table file from those. I would like to know it is better to do comparing in which step and how can I compare them?

I have also checked number of their reads before alignment and after alignment and also number of mapped reads and i found that the merge of File 2 and File3 is a bit bigger than File 1.

               number of read       number of mapped read          file size 
File1               10403419          10294966                        1.8 GB
File2                 5539406          5487472                      944.4 MB
File3                 5517327          5466102                      940.7 MB

RNA-Seq genome sequencing bam fastq • 1.1k views

ADD COMMENT • link 7.4 years ago by star ▴ 350

1

Entering edit mode

You should compare them after aligning. Have a look at bedtools intersectand bedtools subtract.

ADD REPLY • link 7.4 years ago by cschu181 ★ 2.8k

1

Entering edit mode

File 1 reads =/= File 2 reads + File 3 reads, if those numbers above are correct. So at a minimum that does not explain a simple addition.

If you feel that somehow the reads in file 2 and file3 have been combined into file 1 then you can extract a subset of read headers from file 2 and 3 and see if they are present in file 1 (raw data). Comparing sequence/count data does not make a lot of sense since at that level it is not assignable to a particular file.

ADD REPLY • link 7.4 years ago by GenoMax 152k