Hi everyone,
I have no information on this large sequence file(1.2Gb) nor its method of sequencing which i received as .txt. Is it advisable to proceed further with analysis. How do you go from here.
For starts, can anyone identify the format/background a little about the following few lines: How you did it or can do it is firstly great to know.
Thanks.
>S111:32:A03SG:1:1:12484:2206
NTCATTTAGATATCTGGCTTACACGCATAGCATTCTCAAGACATGGTACGCTTAATAAGTGATATNAATNTTTCAATTAAANCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>S111:32:A03SG:1:1:13579:2206
NTGTCGAGTCGATGTCTGATGGACGAGCATGAGTGACGCTCTGTTGTTTGTGCAGATTTGGCTGCNTTTNCGTTTTGNGTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>S111:32:A03SG:1:1:12338:2206
NGGTCCAGAAATGTACTTTTGAGGGTTGTTTCAAGACCAACATGACTTTCAAACATATTCTGGAANATTNCTGGGTCATTCNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNN
Yes, it means they're useless data. You want to pursue the originator of these files for the FASTQ files from which this has been generated. The FASTQ files will have the base qualities associated with each base, and will be much more amenable to analysis with NGS tools.
would the trailing consecutive NNNN's mean anything useful?