fastq file do not look as they should (running Hisat2)
1
0
Entering edit mode
6.2 years ago
luzglongoria ▴ 50

Hi there, I'm working with bird samples that are infected with parasites (malaria parasites) and I need to do the novo assembling but before that I have to filter out my reads (distinguish between parasite and host RNA). For Trinity I want a fastq file input, so I need that the output in Hisay to be fastq file. With this purpose I run Hisat2 and I used this command:

hisat2 --dta -x /home/luz_garcia_longoria/workspace/reference_genomes/hisat2/parasitereference.fasta --al-conc-gz aligned_read_hisat2.fastq -1 /home/luz_garcia_longoria/workspace/s21_1.fq.gz,s22_1.fq.gz,s23_1.fq.gz,s24_1.fq.gz,s25_1.fq.gz,s31_1.fq.gz,s32_1.fq.gz,s33_1.fq.gz,s34_1.fq.gz,s35_1.fq.gz -2 /home/luz_garcia_longoria/workspace/s21_2.fq.gz,s22_2.fq,s23_2.fq.gz,s24_2.fq.gz,s25_2.fq.gz,s31_2.fq.gz,s32_2.fq.gz,s33_2.fq.gz,s34_2.fq.gz,s35_2.fq.gz

When the program finishes I get these two files:

ls
aligned_read_hisat2.1.fastq
aligned_read_hisat2.2.fastq

and if I try to see what there is inside:

head -8 aligned_read_hisat2.1.fastq 
?[??͎?ʲ4
ο?h??
%*?n??`?h?Isp??I.??<d??Ȭ??:?p?U?)Q?*g?????????????t?>????????a??????????????O{j??????:??N??g?????g??|3?~?????????9yys?o?3??Ԩ?.o?ˋ^???2?K???^C????????J?0?
r??;??7??????Z~
Ӵ?ߎ ??!LnyÇ?9??|????2zϗ???|?a?????k?+?r?o??i???}????7?>????Ϲk??>Z??!??L?y?o?????R
                                                                                            ??)??-b???'???W5???7?z~??(w@N?M?K?m??7??,,?40lgV
=?$?\????i{??IO[?z=\?????x9i?tXLJ{?M?K????'G_a??֝?????[
?+Z׮???a??-??^>(?g????x????b???50\ؼ.f??`bp???,?f???>???{^,??e?'?o???Ƚ??|????Y?Sa??|=-?~dkz?*???=???|'???/?垨I婷a?k?????utu??'B??UN??׍???k#܏<??,YӋ???K?/6i]~N?=5;? e?Nt??7$lV?b@{}?{????߄^n??S#?5?>Y??J[????٫ɜ|K???{?n?^oB??<?H?z?=??????>???b??9??>???????܉??;Y??4?kկ??Yw9,k?UK-?~N?~??
?u?at]Œb?Y]?:y?g?8]O,???f?                                                              ?}rA?H?~q<xK/??-??-/:߼?tw??e_<_??1?@?{
??#?E0??a٫!e?Û??W?w?#??#?????W??K?9?t???]6??[?Yӻ??^:??)??}/??9?#*??u?O????Q??]k??/?p?&?vdj?z?8??.??S}$?C舙???
                                                                                                          p;\??c??'F??KgQ?;#?7??Ǔ?q?/?<??Jo{???Χ˵hi}?)H,7k?0??Pd???`p?F*??
                                                                                                                                                                       žGz???????ok`??C?>pO??B-???I???+??b?!?????????҇{?????@???5?$?Y<4]<C??p~?~?tK??f??????+>??t?.?zA?y~?G@????cY?ww?bY???ό?]o2??<|xz??]????k?v9u?\??;??8?A??`섭????G??,6?|of\???]?g????????Ŋ??Q?z?#~??$8I|??ȁ?B????>K߳????KgV~e????|v??מ\NoJj?h?ҏ<G?e;3s@?%???c?ȇ??q???1?????=??]ni}|e?g@???`?lٛ3??h ??H"?p?b?;??e?o?0\M?G????7Н` >
                                               V????
                                                    ???ݾ?7Q.???f?4?|??TY?X
dB\?q?'I1V?~5?????s??Ǒ-]??_?????a??fcHJ:qy????Cwm?y???H??????>?
]??CWϪ?ރ?Fu?b<ŬG??w0??>??G????z?'??
?u?<?WY]Al_Q?z???\n??Tr?̃??Kq??%s3ݶ?Y??Qs?V֌???"G^??Ȃ??ӹ
=B?                                                    ?u5????R?1?V\f??=֫{B?1??????\\W??
 ?n????@?CQD?
w?#?ݒ9?????j8??Q???`M?K?M!??P??[3@?!C?d???^+?A?<~\?r???Aޭ=?s??r??@  ??Ja???C??>#??{?
J+Ni???5K???B?b]OK?Z3Y^m?]?K?I

If I try so see how my raw data looks like, it's fine:

zcat s34_1.fq.gz | head -8
@ST-E00494:182:HCT5FALXX:6:1101:5213:1309 1:N:0:NAAGGAGC
NGGCTCCTTAGTGGTACTTGCGGGCCAGGGCGTGGGCGACCACACGCACCAGCTTCTGCCAGGCAACCTGGCAGTCGGGAGTGAAGTCCTTGCCGAAGTGGGCGGCCAGGACAACGACCAGGATGTCACTGAGGAGCCTGAAGTTCTCGG
+
#AAAFAJFJFJJAJFFJFFJJJFJFJFFFJJJJJJJFF<JJJJJFJFAJJJJJJJJFJJJJJJJ7<JFJFJJFF-7FAJFJFJJFFJJJAJ<JJJF<7AJAAAA<JJF7FAFAFJJ<A7<JF-<FJJFJ---7)-A<<)A)AA--<7AAF
@ST-E00494:182:HCT5FALXX:6:1101:5233:1309 1:N:0:NAAGGAGC
NGGGTGGCCAGTGTCACCTGCAAGCACTGCGACAGGAACTTGAAGTTGACGGGGTCCACACGCCGGTTGTAGGCGTGCAGGTTGCTGAGCTCAGACAGAGCCTGGCTCAGGTTGTCCCGGTTCTTGATGGCATTGCCCAGGGCAGCCACC
+
#AAA<<<JFJJJJJAFJJJJJFFAF7FFFJF<<JJJAJAFF<<JFJJJ7JFA<JJJAFJF<AF-77<-AJ-7<JJAAJ7<AAAFJJA-7FAAAAFJFFFJJ<-A<J<JJJ-7-7AJJ-7AJJJAA777F--A-7-AAF)-<AA<-AF<AF

I wonder if it's something to be with the option '--al-con-gz'. Should I use '--al-con' instead? But my files are compressed so that's why I use this option and not '--al-cont'. So frustrating.

Any help is more than welcome :)

hisat2 RNA-Seq fastq • 3.0k views
ADD COMMENT
0
Entering edit mode

Maybe they are still zipped. What do you see if you check this?

zcat aligned_read_hisat2.1.fastq | head -8
ADD REPLY
0
Entering edit mode

Your files aligned_read_hisat2.1.fastq and aligned_read_hisat2.2.fastq look like either indexed or compressed files. Did you try to

zcat aligned_read_hisat2.1.fastq | head -10
ADD REPLY
0
Entering edit mode

damn, too slow ! :)

ADD REPLY
0
Entering edit mode

Haha, do you want me to remove my answer?

ADD REPLY
0
Entering edit mode

Sure not ! You answered while I was writing, I didn't see your answer

ADD REPLY
0
Entering edit mode

output of

file *.fastq

?

ADD REPLY
0
Entering edit mode

Hi there, I'm working with bird samples that are infected with parasites (malaria parasites) and I need to do the novo assembling but before that I have to filter out my reads (distinguish between parasite and host RNA).

If you have reference genome of at least one of the species then you can bin those reads away from others using bbsplit.sh from BBMap suite.

ADD REPLY
0
Entering edit mode

If I do that I get:

@ST-E00494:182:HCT5FALXX:5:1101:5233:1309 1:N:0:NCAGATTC    NTTCACCTTCTGCAGAAAGTGATTTATCAAGTAGACAAAGTGACAATTATGAATATTGCGAAAGTCCCTCAACAAGTGGAAGAAGTTCACCTACTGTCAGAAGTGAGTTATCAAGTAGGTGCAGTATCCCTGGTGTTGATATAGGAGGAC
+
#-AAFF<F<A-<JJJJ7J<AJ-FJJFAJFFFFJAJJJJJFJFFJJFJJ-JJ-7FFJJJJJAF-A<7<F<J<JJJF<7FJFAJFFJA-FJJJJFFFFFAA--77-AAF<FAJF-AAF-FJFFJF<FJFF<-AFFJJA-<-777--7A<)7)
@ST-E00494:182:HCT5FALXX:5:1101:5781:1309 1:N:0:NCAGATTC
NTTTTATTTGCTATATTGGTTGCAGTTTTAATTATTTGTTGATATCCTTCTTTTAATATAGGTTCATTAATAGAAATTTCTTTATTTGTTGATTCATCATAAATAAAACAAGACAAAAAAGCACATATATAATCATTATTATATTTACTT
+
#<AAAJJJJFAF7FAJJ<FAFA<-FFFJJ--FJJJJJJJJA-FJFF<FFAJ-FFF7FFJJAF-<F<FJAJJF-AJJJJJJJJJAFJJFJJA<FJ<7F-AA77---7F<<A7FFAF7<<-FFFF-FJ7<77--FF7A-7<<-7-7AF--7-

now the problem is how to change from .fastq to .gzbecause if I do :

gunzip aligned_read_hisat2.1.fastq
gzip: aligned_read_hisat2.1.fastq: unknown suffix -- ignored

Can I do it just with the command 'cp'???

ADD REPLY
1
Entering edit mode

You could do mv aligned_read_hisat2.1.fastq aligned_read_hisat2.1.fastq.gz.

ADD REPLY
0
Entering edit mode

What about

cp aligned_read_hisat2.1.fastq aligned_read_hisat2.1.fastq.gz

?

ADD REPLY
1
Entering edit mode

You could do that. It will duplicate the data and you will have two files with different name but identical content.

ADD REPLY
0
Entering edit mode
6.2 years ago
luzglongoria ▴ 50

Yes! it works!! thank you so much guys! I wonder if also could run again the program like:

hisat2 --dta -x /home/luz_garcia_longoria/workspace/reference_genomes/hisat2/parasitereference.fasta --al-conc aligned_read_hisat2.fastq -1 /home/luz_garcia_longoria/workspace/s21_1.fq.gz,s22_1.fq.gz,s23_1.fq.gz,s24_1.fq.gz,s25_1.fq.gz,s31_1.fq.gz,s32_1.fq.gz,s33_1.fq.gz,s34_1.fq.gz,s35_1.fq.gz -2 /home/luz_garcia_longoria/workspace/s21_2.fq.gz,s22_2.fq,s23_2.fq.gz,s24_2.fq.gz,s25_2.fq.gz,s31_2.fq.gz,s32_2.fq.gz,s33_2.fq.gz,s34_2.fq.gz,s35_2.fq.gz

So ask to have an output in fastq file. Like this the output would be an fastq file directly, isn't it?

ADD COMMENT
0
Entering edit mode

Your reads are already in fastq format they are just compressed (--al-conc-gz). Trinity should be fine with using that file.

Edit: Your confusion may be due to expectation that since you provided .fastq extension the file would contain uncompressed fastq reads. While some programs (e.g. bbmap) may honor those extensions (or use them to adjust input/output) they are not required for unix. So the command line option --al-conc-gz kept your files gzip-compressed. If you had used --al-conc-bz2 the files would have been compressed using bzip2.

ADD REPLY
0
Entering edit mode

Ok. But now how can I change the format? I know I can compress the file by using 'zip' but my file is already compress. So I only need to change the name of the extension. Is it fine if I just use the command 'cp' (copy)? and change the name of the file to aligned_read_hisat2.fq.gz? and the uncompress the file by using 'gunzip'?

ADD REPLY

Login before adding your answer.

Traffic: 1728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6