Question

Characteristic features of Chip-seq and RNA-seq data

1

Entering edit mode

7.9 years ago

cl10101 ▴ 80

What are the characteristic features of Chip-seq and RNA-seq data? If I have fastq files which are the results of Chip-seq and RNA-seq experiment is it possible to differentiate then, for example by comparing to Chip-seq input result, which is explicitly marked?

RNA-Seq ChIP-Seq sequencing • 2.8k views

ADD COMMENT • link updated 7.9 years ago by Carlo Yague 8.9k • written 7.9 years ago by cl10101 ▴ 80

2

Entering edit mode

While it may be possible to differentiate the data if you don't have clear information about what is what that just seems like bad experimental practice. If someone gave you this data then you should go back and get additional information from them. If you analyze the data as is and if it turns out that some of your assumptions were wrong then you will get blamed for the fallout.

ADD REPLY • link 7.9 years ago by GenoMax 147k

1

Entering edit mode

You are totally right. I interpreted the question as a theoretical one, but if you really end up in a situation when you don't know what your data is, then guessing the data type just by looking at it is rather deseperate.

Moreover, there is more to know than just the distinction between ChIP and RNA-seq, such as the library preparation used, the origin of the samples, whether there were some kind of selection (ex. ribodepeltion),... All of this is important for interpreting the data.

ADD REPLY • link 7.9 years ago by Carlo Yague 8.9k

0

Entering edit mode

Let us hope the question was indeed theoretical :)

Your answer below gives good hints of how to distinguish the samples (in theory) if cl10101 has no other option but to press on.

ADD REPLY • link 7.9 years ago by GenoMax 147k

score 1 · Accepted Answer · 2016-12-19

1

Entering edit mode

7.9 years ago

Carlo Yague 8.9k

RNA-seq : (In eukaryots) splicing (some reads must be split to map), very uneven read coverage, especially in total RNA-seq where rRNA reads dominate.

ChIP-seq : relatively even coverage in the input fraction.

To check genome coverage requires mapping, but over-represented sequences can be analysed with fastqc and blast to provide a quick indication directly from the fastq files : If there are over-represented sequences that corresponds to highly expressed genes, then you are dealing with RNA-seq data.

ADD COMMENT • link 7.9 years ago by Carlo Yague 8.9k

0

Entering edit mode

Thank you for your response. I mapped my fastq files to reference genome and now I am trying to differentiate them visually using IGV. Samples mapped to genome It seems to me that sample A (upper sample) has the most uneven coverage, but peaks location do not correspond to genes location (it is mRNA-seq data). What is the best way to differentiate them?

ADD REPLY • link 7.9 years ago by cl10101 ▴ 80

1

Entering edit mode

but peaks location do not correspond to genes locations (it is mRNA-seq data)

Well, a characteristic feature of mRNA-seq data is that "peaks" correspond to genes. You say that it is mRNA-seq data but are u sure about this ? What do you really want to achieve here ? It sounds like a XY problem...

ADD REPLY • link 7.9 years ago by Carlo Yague 8.9k