I am looking for a solution in both Bash/Linux and Python.
Also,
If more than 40%[Aa]’s or [Tt]’s present, remove the sequence and the description.
e.g. file.fasta
>seq1
GGCAGAGGCCCCCTAGCCCCGCCCGCGCCATGGTCAGGCACGCCCCTCCTCATCGCGGGGCACAGCCCGGCGGGTAGCCCCAGCGCTGGAGGCGGGCGGGGCCGGCCGGCGGAGGCCTGAGCAGCAGCCCAGCGCGGGCCGCCGAGACACCATGAGAGCCCCCACACTCCTCGCCCCACCGGCCCTGGCCGCACTGGGCACCGCTGGCCGGGCGGGTGGGTGCCCC
>seq2
CCACTGCACTCACCGCACCCGGCCAATTTTTTTTGTGTTTTTAGTAGAGACTAAATACCATATAGTGAACACCTAAGACGGGGGGCCTTGGATCCAGGGCGATTCAGAGGGCCCCGGTCGGAGCTGTCGGAGATTGAGCGCGCGCGGTCCCGGGATCTCCGACGAGGCCCTGGACCCCCGGGCGGCGAAGCTGCGGCGCGGCGCCCCCTGGAGGCCGCGGGACCCCTG
After processing, output should be:
>seq1
GGCAGAGGCCCCCTAGCCCCGCCCGCGCCATGGTCAGGCACGCCCCTCCTCATCGCGGGGCACAGCCCGGCGGGTAGCCCCAGCGCTGGAGGCGGGCGGGGCCGGCCGGCGGAGGCCTGAGCAGCAGCCCAGCGCGGGCCGCCGAGACACCATGAGAGCCCCCACACTCCTCGCCCCACCGGCCCTGGCCGCACTGGGCACCGCTGGCCGGGCGGGTGGGTGCCCC
seq1 is fine but seq2 and its description have to be removed from a file.
e.g.
AGAGCTAGAAGGGG - ok
AGAAAAAAAAGAGG - remove -- more than 6 A sequentially
ATTTTTTTTATGATG - remove -- more than 6 T sequentially
AGTAGTTAGGGGGG - ok
ACAGAAACAGAATG - remove -- more than 40% of A
TCTGATTTATTATTG - remove -- more than 40% of T
Are there any tools available on the market for these purposes? I would like to get a solution via tool as well use python/Linux.