Number of read with fastq file name
3
0
Entering edit mode
5.7 years ago
Bioinfonext ▴ 470

I want no. of reads in fastq file with file name, below script can only give no. of reads in fastq file:

File name are like this:

Soil-6_S22_L001.m150-p1.join.fq

Soil-7.m150-p1.join.fq

Soil-8_S32_L001.m150-p1.join.fq

I am interested to get no. of reads with file name like this:

Soil-6              384994

Soil-7              205889

How should I modify below script?

#!/bin/bash

for i in `ls *.fq`; do echo $(cat ${i} | wc -l)/4|bc; done

Kind Regards

linux bash • 3.0k views
ADD COMMENT
0
Entering edit mode

input fastq:

$ cat test.fq 
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

with awk to print from one file:

$ awk -v OFS="\t" '/@/ {count++} END{print FILENAME,count}' test.fq 
test.fq 3

from multiple files:

$ parallel awk -v "OFS='\t' '/@/ {count++} END{print FILENAME,count}'" {} ::: *.fq
test10.fq   3
test1.fq    3
test2.fq    3
test3.fq    3
test4.fq    3
test5.fq    3
test6.fq    3
test7.fq    3
test8.fq    3
test9.fq    3


$ find . -type f -name "*.fq" -exec awk -v OFS="\t" '/@/ {count++} END{print FILENAME,count}' {} \; 
./test5.fq  3
./test4.fq  3
./test2.fq  3
./test8.fq  3
./test7.fq  3
./test3.fq  3
./test9.fq  3
./test6.fq  3
./test.fq   3
./test10.fq 3
./test1.fq  3
ADD REPLY
1
Entering edit mode
5.7 years ago
#!/bin/bash

for i in `ls *.fq`; do file_name=$(basename -s .fq $i);  printf "$file_name\t$(cat ${i} | wc -l)/4|bc\n"; done

Explanation

file_name=$(basename -s .fq $i)

basename = linux command to strip directory and suffix from filenames

-s = SUFFIX, remove a trailing SUFFIX

file_name=$(basename -s .fq $i) = remove the suffix .fq from the given file and store the name in the variable called file_name

ADD COMMENT
0
Entering edit mode

thanks for this code, lakhujanivijay! but when i run it, is not actually printing the calculation. is giving me the number of reads followed by /4|bc\n. i've been trying to figure out where this is not working, but i had no success. can someone help? :)

ADD REPLY
0
Entering edit mode

rturba This post is 3 yr old and many programs for such tasks are available now. Please post a new query with example data, your code and expected output.

ADD REPLY
0
Entering edit mode

could you help point me out to those programs, @cpad0112? i was searching online for solutions and this was the best i could find. thank you!

ADD REPLY
1
Entering edit mode
5.7 years ago
for i in  *.fq ; do echo -n "$i " &&  cat $i | paste - - - - | wc -l ; done
ADD COMMENT
1
Entering edit mode
5.7 years ago
JC 13k
for i in *.fq; do echo "$( echo $i | perl -pe 's/_.*//')  $( grep -c '@' $i)"
ADD COMMENT

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6