Hi there, I'm working with bird samples that are infected with parasites (malaria parasites) and I need to do the novo assembling but before that I have to filter out my reads (distinguish between parasite and host RNA). For Trinity I want a fastq file input, so I need that the output in Hisay to be fastq file. With this purpose I run Hisat2 and I used this command:
hisat2 --dta -x /home/luz_garcia_longoria/workspace/reference_genomes/hisat2/parasitereference.fasta --al-conc-gz aligned_read_hisat2.fastq -1 /home/luz_garcia_longoria/workspace/s21_1.fq.gz,s22_1.fq.gz,s23_1.fq.gz,s24_1.fq.gz,s25_1.fq.gz,s31_1.fq.gz,s32_1.fq.gz,s33_1.fq.gz,s34_1.fq.gz,s35_1.fq.gz -2 /home/luz_garcia_longoria/workspace/s21_2.fq.gz,s22_2.fq,s23_2.fq.gz,s24_2.fq.gz,s25_2.fq.gz,s31_2.fq.gz,s32_2.fq.gz,s33_2.fq.gz,s34_2.fq.gz,s35_2.fq.gz
When the program finishes I get these two files:
ls
aligned_read_hisat2.1.fastq
aligned_read_hisat2.2.fastq
and if I try to see what there is inside:
head -8 aligned_read_hisat2.1.fastq
?[??͎?ʲ4
ο?h??
%*?n??`?h?Isp??I.??<d??Ȭ??:?p?U?)Q?*g?????????????t?>????????a??????????????O{j??????:??N??g?????g??|3?~?????????9yys?o?3??Ԩ?.o?ˋ^???2?K???^C????????J?0?
r??;??7??????Z~
Ӵ?ߎ ??!LnyÇ?9??|????2zϗ???|?a?????k?+?r?o??i???}????7?>????Ϲk??>Z??!??L?y?o?????R
??)??-b???'???W5???7?z~??(w@N?M?K?m??7??,,?40lgV
=?$?\????i{??IO[?z=\?????x9i?tXLJ{?M?K????'G_a??֝?????[
?+Z???a??-??^>(?g????x????b???50\ؼ.f??`bp???,?f???>???{^,??e?'?o???Ƚ??|????Y?Sa??|=-?~dkz?*???=???|'???/?垨I婷a?k?????utu??'B??UN?????k#<??,YӋ???K?/6i]~N?=5;? e?Nt??7$lV?b@{}?{????߄^n??S#?5?>Y??J[????٫ɜ|K???{?n?^oB??<?H?z?=??????>???b??9??>???????܉??;Y??4?kկ??Yw9,k?UK-?~N?~??
?u?at]b?Y]?:y?g?8]O,???f? ?}rA?H?~q<xK/??-??-/:?tw??e_<_??1?@?{
??#?E0??a٫!e?Û??W?w?#??#?????W??K?9?t???]6??[?Yӻ??^:??)??}/??9?#*??u?O????Q??]k??/?p?&?vdj?z?8??.??S}$?C舙???
p;\??c??'F??KgQ?;#?7??Ǔ?q?/?<??Jo{???Χ˵hi}?)H,7k?0??Pd???`p?F*??
žGz???????ok`??C?>pO??B-???I???+??b?!?????????҇{?????@???5?$?Y<4]<C??p~?~?tK??f??????+>??t?.?zA?y~?G@????cY?ww?bY???ό?]o2??<|xz??]????k?v9u?\??;??8?A??`섭????G??,6?|of\???]?g????????Ŋ??Q?z?#~??$8I|??ȁ?B????>K߳????KgV~e????|v??מ\NoJj?h?ҏ<G?e;3s@?%???c?ȇ??q???1?????=??]ni}|e?g@???`?lٛ3??h ??H"?p?b?;??e?o?0\M?G????7Н` >
V????
???ݾ?7Q.???f?4?|??TY?X
dB\?q?'I1V?~5?????s??Ǒ-]??_?????a??fcHJ:qy????Cwm?y???H??????>?
]??CWϪ?ރ?Fu?b<ŬG??w0??>??G????z?'??
?u?<?WY]Al_Q?z???\n??Tr?̃??Kq??%s3ݶ?Y??Qs?V???"G^??Ȃ??ӹ
=B? ?u5????R?1?V\f??=֫{B?1??????\\W??
?n????@?CQD?
w?#?ݒ9?????j8??Q???`M?K?M!??P??[3@?!C?d???^+?A?<~\?r???Aޭ=?s??r??@ ??Ja???C??>#??{?
J+Ni???5K???B?b]OK?Z3Y^m?]?K?I
If I try so see how my raw data looks like, it's fine:
zcat s34_1.fq.gz | head -8
@ST-E00494:182:HCT5FALXX:6:1101:5213:1309 1:N:0:NAAGGAGC
NGGCTCCTTAGTGGTACTTGCGGGCCAGGGCGTGGGCGACCACACGCACCAGCTTCTGCCAGGCAACCTGGCAGTCGGGAGTGAAGTCCTTGCCGAAGTGGGCGGCCAGGACAACGACCAGGATGTCACTGAGGAGCCTGAAGTTCTCGG
+
#AAAFAJFJFJJAJFFJFFJJJFJFJFFFJJJJJJJFF<JJJJJFJFAJJJJJJJJFJJJJJJJ7<JFJFJJFF-7FAJFJFJJFFJJJAJ<JJJF<7AJAAAA<JJF7FAFAFJJ<A7<JF-<FJJFJ---7)-A<<)A)AA--<7AAF
@ST-E00494:182:HCT5FALXX:6:1101:5233:1309 1:N:0:NAAGGAGC
NGGGTGGCCAGTGTCACCTGCAAGCACTGCGACAGGAACTTGAAGTTGACGGGGTCCACACGCCGGTTGTAGGCGTGCAGGTTGCTGAGCTCAGACAGAGCCTGGCTCAGGTTGTCCCGGTTCTTGATGGCATTGCCCAGGGCAGCCACC
+
#AAA<<<JFJJJJJAFJJJJJFFAF7FFFJF<<JJJAJAFF<<JFJJJ7JFA<JJJAFJF<AF-77<-AJ-7<JJAAJ7<AAAFJJA-7FAAAAFJFFFJJ<-A<J<JJJ-7-7AJJ-7AJJJAA777F--A-7-AAF)-<AA<-AF<AF
I wonder if it's something to be with the option '--al-con-gz'. Should I use '--al-con' instead? But my files are compressed so that's why I use this option and not '--al-cont'. So frustrating.
Any help is more than welcome :)
Maybe they are still zipped. What do you see if you check this?
Your files aligned_read_hisat2.1.fastq and aligned_read_hisat2.2.fastq look like either indexed or compressed files. Did you try to
damn, too slow ! :)
Haha, do you want me to remove my answer?
Sure not ! You answered while I was writing, I didn't see your answer
output of
?
If you have reference genome of at least one of the species then you can bin those reads away from others using
bbsplit.sh
from BBMap suite.If I do that I get:
now the problem is how to change from .fastq to .gzbecause if I do :
Can I do it just with the command 'cp'???
You could do
mv aligned_read_hisat2.1.fastq aligned_read_hisat2.1.fastq.gz
.What about
?
You could do that. It will duplicate the data and you will have two files with different name but identical content.