Hello, I have a fastq file with 4 lines per sequence. This is an example of one of the entries:
@m54284U_200831_121835/3/ccs
ATGGGTGGGCTGGCAGTAGCCAGGGACGATGGGCTCTTCTCTGGGGATCCCAACTGGTTTCCTAAGAAATCCAAGGAGAACCCTCGGAACTTCTCGGACAACCAGTTGCAAGAGGGCAAGAACGTGATTGGGTTGCAGATGGGCACCAACCGTGGAGCATCTCAGGCCGGCATGACCGGCTATGGGATGCCACGGCAGATCCTCTGATCATACTCTCTCTCCTTCCCCTGCCCTCCATGAATGGTTAATATATATGTATATATATGTTTTAGCAGACATTCCCTGAGAGCCCCTGGATTGCTGAACCCCCCTCTGCCAGGGTCCAGGCCAGCCTATCTTGTCACCACTGGCAGGGCCTGATAATTGCCTCTCTCTCTCTCTCTCTTTCTCTCTCTCTCTCTCTCTGGGCTTACTAATGCATTCCTCCCCCCACATTATTCCCACAGTCTCAAGCACGTGGATTCTGCTGTAGTCGTACGCCGATGCGAAACATCGGCCACGTCGCTATTGCAGCGAGTAGATCGGAAGAGCACACGTCTGAACTCCA
+
5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555
I have the following sequence: CAGACGTGTGCTCTTCCGATCT. I would like to know:
1) In which entries this sequence is present (both the number of entries and which entries).
2) Where in the sequence does this string match (which nucleotide position within the entry).
For example, I would like something like: The sequence is present in all entries. In entry 1, the sequence starts at nucleotide 10. etc.
I have tried with grep and awk but I certainly lack the knowledge to do this. Any help would be so appreciated!!!
Thank you! I used your command but I am getting the following error:
Do you have java (1.8 and above) installed? You just unzipped the
bbmap
software bundle and did not move any files/folders inside it anywhere. You can useconda
to installbbmap
otherwise.That worked thank you!! I am getting the following result in a reduced version with only 250 reads:
I am assuming the "contaminants" are my sequence of interest?
You can easily check that. Add
rcomp=f
if you only want to search in the reads in forward direction.This is great thanks so much!! Do you know if this tool allows for checking where in the read this sequence was found? Or do you know of any other tool? I played around with grep but couldn't make it work...
Not easily I would think. You will need to parse alignments (CIGAR strings) or simply convert the sequence you find to fasta and do a multiple-sequence alignment.