RNA seq FASTX quality trimming
3
0
Entering edit mode
8.7 years ago
Rahul ▴ 30

Hello,

I have filtered my illumina pair-end reads (Forward lib-24 million reads, Reverse Lib 24 million) using FASTX_Quality_Filter by applying the Q20 score to 90 percent of bases. (75 bp reads, insert size 200 bp)

But after filtering, I am observing around 18 million reads in a forward library and 20 million reads in a reverse library. I can see here 2 million bases difference between two libraries. Can I use above libraries for making transcriptome assembly purpose given that the number of reads are unequal?

Regards Rahul

RNA-Seq Assembly next-gen alignment rna-seq • 3.2k views
ADD COMMENT
4
Entering edit mode
8.7 years ago

fastx-toolkit is not pair-aware and should never be used for paired reads. There are many modern tools (such as BBDuk, which I wrote) that properly handle paired reads, and will give you paired reads as output, along with singletons in which the mate was discarded.

Q20 is too high for RNA-seq filtering (or pretty much anything), anyway - that will increase the bias of your output. Trimming to, say, Q10 is a much better idea.

ADD COMMENT
0
Entering edit mode

FastX is not for paired end data, its for single end.

You can also try Cutadapt

ADD REPLY
1
Entering edit mode

Can I use trimmomatic/ printseq? for pair end reads

Thanks

ADD REPLY
2
Entering edit mode
8.7 years ago

Simple answer is "Yes, you can". Just check how the program that you are going to use treats the singleton reads ( i.e 2 million extra reads in one of the file ) and how to input them.

P.S My answer was to original question, wether we can use singletons for assembly along with paired-end reads. The context ( and title ?) of the question changed later.

ADD COMMENT
0
Entering edit mode

Thank you very for much for giving comments on my query. I am using Soapdenovo trans (iplant Collaborative site) for assembling reads with default a default parameter.

I have got around 50% completeness report of CEGMA when I tried assembly (scaffolding) with trimmed and quality filter reads. On other occasion when I tried assembly with raw reads, I got 81% CEGMA completeness report.Hence, I am in confusion whether I am giving right or wrong input. After ensuring proper cleanup steps still my results are not up to the mark.

ADD REPLY
0
Entering edit mode

I don't think that's the best practice, though...

ADD REPLY
1
Entering edit mode

I edited my answer. The original question was different. It was about using singleton reads in assembly.

ADD REPLY
2
Entering edit mode
8.7 years ago

If using Illumina data, try to compare the results you get using fastq_quality_trimmer Maybe it will let your files synchronized with the same number of sequences just because it will make sequences shorter, preserving more sequences with high quality

ADD COMMENT
0
Entering edit mode

Thanks for your valuable comments and suggestions...

A) After assembling scaffolds from the trimmed and quality filter reads, I am getting the following ratio Average_number_of_contigs_per_scaffold :-1.0

B) For the untrimmed raw reads.... Average_number_of_contigs_per_scaffold :-1.2-1.4

C) Assembly in published paper showing around... Average_number_of_contigs_per_scaffold :-1.9

I don't know whether the problem in my scaffolds is due to input reads or else.....?

Any suggestion will be highly appreciated...

Regards Rahul

ADD REPLY
0
Entering edit mode

If you give some attention to the assemblathon 2 contest, you will notice that the number of contigs depends upon the source of the DNA. In Assemblathon 2 you will read that some assemblers works better with fish and not with the boa. The contrary happens with a different assembler. Source of DNA, and in particular its complexity and number of repeated sequences play a key role in the formation of contigs and scaffolds. If statistics of the "publisher paper" rely or was done with a different genome, I believe you cannot compare

ADD REPLY
0
Entering edit mode

The publisher used Soapdenvo 2 for assembling. I am trying assembly with soapdenovo trans on same published reads with almost same parameters except the quality trimming parameters.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6