DeNovo assembly of transcriptome
1
0
Entering edit mode
8.9 years ago
rakeshmbb • 0

Hi everyone. I am interested to find out microsatellite from publicly available transcriptome data. I want to use sequencing data from SRA archive of NCBI. I am using CLC Bio workbench for processing of data. In the first very step I am having trouble. Is the sequence read of SRA file is adapter trimmed or not. Another problem I faced while using illumina paired data that using SRA tool kit fastqdump it can not split the data into two files.

Assembly • 3.0k views
ADD COMMENT
0
Entering edit mode

To check if the sequences are adapter trimmed you can use FastQC tool: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

ADD REPLY
0
Entering edit mode

Passing the data through a trimming program is not a bad idea. If it was trimmed then it should come through without changes. Only thing that you have to invest in is some time. This should apply to trimming tool in CLC as well.

As for fastq-dump you need to use the --split-files option to get the two reads in two separate files. While you are at it may as well use the --origfmt option to recover the original fastq file headers.

ADD REPLY
0
Entering edit mode
Hi genomax2. Thank you. When I use the fastq-dump --split-files it says rejected 2567436 read because of filtering out non biological reads. and it gives just one fastq file. What could be the problem
ADD REPLY
0
Entering edit mode

What SRA # are you looking at?

ADD REPLY
0
Entering edit mode
Hi genomax it is SRR2163549. It is a paired layout data.
ADD REPLY
0
Entering edit mode

Get the fastq files from EBI: http://www.ebi.ac.uk/ena/data/view/ERR1203908

ADD REPLY
0
Entering edit mode

Did you check the project metadata? The information if the reads have been trimmed or not may be available there. Also, maybe fastq-dump is not splitting files because it is a single end run?

ADD REPLY
0
Entering edit mode
Hi h.Mon.The SRA file is paired layout.can u tell me how to retrieve project metadata
ADD REPLY
0
Entering edit mode
8.9 years ago
h.mon 35k

I look at project (and run, sample, etc) metadata at the respective SRA pages, e.g. here for this run. Which version of fastq-dump are you using? Running:

fastq-dump --split-files SRR2163549

gave me just one fastq file, but also the following output:

Rejected 6856409 READS because READLEN < 1
Read 6856409 spots for SRR2163549
Written 6856409 spots for SRR2163549

So I suspect either read 2 was trimmed for some reason, or the metadata is incorrect, or this record is corrupted

ADD COMMENT

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6