Question

Can I determine the machine used for RNA-seq from the .count file extension?

0

Entering edit mode

6.9 years ago

MarjoryMollusc ▴ 50

Hi guys,

I feel like this should be super obvious, but I am struggling to find anywhere online the steps to figure out what machine a data set came from. The files have a .count file extension on them, and are filled with ENSEMBL IDs followed by the count. an example:

ENSMUSG00000000001.4    3906
ENSMUSG00000000003.11   0
ENSMUSG00000000028.10   1005
ENSMUSG00000000031.11   4766
ENSMUSG00000000037.12   20
ENSMUSG00000000049.7    0
ENSMUSG00000000056.7    775
ENSMUSG00000000058.6    546
ENSMUSG00000000078.6    1075

Is there a way to determine the machine used based on that?

Thanks

RNA-Seq count • 2.0k views

ADD COMMENT • link updated 6.9 years ago by mkulecka ▴ 380 • written 6.9 years ago by MarjoryMollusc ▴ 50

3

Entering edit mode

What do you mean by machine?

If you are talking about sequencing machine, no it's not super obvious. After processing a lot your RNA-seq data looks like what your handling now. That's a raw counts file. I'm guessing featureCounts was applied to calculate raw counts (if you meant machine=tool).

ADD REPLY • link 6.9 years ago by venu 7.1k

2

Entering edit mode

That's probably not possible. Looks like a file generated by htseq-count/featurecounts/... which will only contain counts per gene/transcript and no meta-data on the run. Those should be in the fastq and/or bam file.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

6.9 years ago

mkulecka ▴ 380

Don't you have a bam file with alignments? If you're lucky, the platform info should be in read group (@RG) section of header, under PL or PM tag (as per SAM format specification - https://samtools.github.io/hts-specs/SAMv1.pdf).

ADD COMMENT • link 6.9 years ago by mkulecka ▴ 380

score 4 · Accepted Answer · 2018-06-08

4

Entering edit mode

6.9 years ago

grant.hovhannisyan ★ 2.6k

I am not sure if it is possible to do with count files. Usually the machine info can be found in fastq file, so you might try to find the raw data and have a look. On the other hand, you can ask the people who has generated the data.

ADD COMMENT • link 6.9 years ago by grant.hovhannisyan ★ 2.6k

1

Entering edit mode

This answer is not wrong but is also not completely accurate (it is also not really an answer for the original question). You need to do some additional sleuthing to find the sequencer information for Illumina (assuming the fastq header has not been changed from default, which is easy to do). You can refer to this thread for more information: Illumina Instrument Type from fastq?

ADD REPLY • link 6.9 years ago by GenoMax 150k

0

Entering edit mode

Well, I said "usually" :) and the answer to the original question should be just "NO", but it wouldn't have helped much the OP.

ADD REPLY • link 6.9 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

If a "no" tells the OP to not waste time trying to do the impossible, it's helpful.