1. When data are read from a text file, you can use the BufferedReader to read one line at a time. After a line of data is read, there is no way of going back to read it again. To overcome this you can first read all the data into a structured object to store them, and then process the data later. Please use the DNA class (we have developed in the past a few weeks, which has properties of ID and seq, and the set/get methods) to develop a Java program to read in a FASTA format DNA sequence file, and parse out each sequence record into the part of ID and sequence. The ID is identified between the ">" and the "|" in the header line, and the sequence is the concatenation of all lines of the sequence part into a single string. Each DNA sequence record can then be stored into an array element of the DNA class. Use a loop in your program to prompt the user to enter a sequence ID, and if the ID exists print out the sequence. If the ID does not exist, print out a warning message. Exit the loop if the user enters “quit”. Please use the sequence file (seq.fasta) as the input file. Below is a sample output of the program: (2 points)
fastaParse.java 2. PROSITE (http://au.expasy.org/prosite/) is a database of protein domains, families and functional sites. Each PROSITE record is often associated with a pattern or profile to describe the protein domain or functional site. Please look at the record of PDOC00300 (http://prosite.expasy.org/PDOC00300) which is a GATA-type zinc finger domain that binds to DNA sites with the consensus sequence (A/T)GATA(A/G). This type of “zinc finger” domains consist of a consensus sequence of C-x2-C-x17-C-x2-C , which means one Cys, two any amino acids, one Cys, 17 any amino acids, one Cys, two any amino acids, and one Cys. Please use this consensus sequence, and write an equivalent regular expression pattern.
1) Record the regular expression