Question

NextSeq 500 RNA-Seq results

0

Entering edit mode

7.0 years ago

realnewbie ▴ 30

Hi, I had RNA-seq results from NextSeq 500 platform. They gave me two types of datasets for same samples with exactly same file name: blablabla.txt.gz and blablabla.bam files. I viewed them on 010 Editor. And, they look like same. What is the difference between txt.gz and bam file? Did bam files are the aligned/mapped files? It was not written as "sorted", could these files be sorted? If these are the aligned reads, how can I convert them into count matrices like featureCounts, htseq counts? Thanks for your help

RNA-Seq bam txt.gz • 2.4k views

ADD COMMENT • link 7.0 years ago by realnewbie ▴ 30

0

Entering edit mode

Hi realnewbie,

Tags do not require a # sign. I have know changed your post, but please take this into account for your next posts. Tags make sure that those who can answer your question can easily find it.

Cheers,
Wouter

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Okay, thanks a lot. I will take this into consideration for next posts.

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

score 2 · Answer 1 · 2017-12-02

2

Entering edit mode

7.0 years ago

WouterDeCoster 47k

A bam file usually contains the aligned reads, which you can sort using samtools. You can use featureCounts (recommended) for counting reads. If you read the manual you should be able to figure out how to do it, it's quite clear.

I have no idea how your .txt.gz file looks like. Note that you can also have an "unaligned" bam of reads, but with the information you provided here I can't tell.

ADD COMMENT • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, but I did not understand what you mean by "manual".

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

0

Entering edit mode

Manual = The guidelines, explanation of commands of featurecounts

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

I was thinking that is a manual given with rna-seq results :') I was thinking too complex. Thanks a lot! By the way, I learnt that txt.gz is a compressed version of fastq files.

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

1

Entering edit mode

Renaming a compressed fastq to a .txt.gz is cruel. It should just be a .fastq.gz.

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

totally, agree. It also causes a trouble for newbies like me (I was searching about txt.gz format).

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

score 1 · Answer 2 · 2017-12-04

1

Entering edit mode

7.0 years ago

colindaven 7.0k

If I were you I'd attempt to get some sort of raw data eg. FASTQ - this is a completely new platform - and go through a similar pipeline.

FASTQC
Alignment (STAR? BWA mem?)
Bam convert (Samtools)
visualize
with GTF - get counts (featureCounts, htseq etc)
differential expression - eg Degust

Please tell us what experiences you have with the genereader. I have not yet seen data from this platform. What is the read length for example ?

ADD COMMENT • link 7.0 years ago by colindaven 7.0k

0

Entering edit mode

I m sorry, it was NextSeq 500. NextSeqHihg-75SE(single ended)

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

0

Entering edit mode

Actually, I haven't used raw sequencing data (fastq files). I used bam files and continue with them through HT-seq. HT-seq gave me the raw counts (for annoatation I used gencode.v19.annotation.gft , the others from ensembl did not work. I dont know why they did not work).

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30

1

Entering edit mode

This is rather beyond the scope of the original question. And no one really wants to teach you anything as large as "DE analysis" from scratch. Pick a program, like DESeq2, try some things, then come back with a single precise question which demonstrates that you've put some effort into learning yourself.

ADD REPLY • link 7.0 years ago by swbarnes2 14k

1

Entering edit mode

You are god damn right :) Actually, I am trying very hard. I did not take any course that teaches RNA-seq data analysis from scratch. I am trying to understand and learn every detail in a very short period of time . To combine all the details together and produce something meaningful and right could be hard for some times. I am not perfect at this job, but trying to be good at least. Thanks for your advice.

ADD REPLY • link 7.0 years ago by realnewbie ▴ 30