|
arxKittion of RNz aolyTxrzsx iinKing Sitxs Using aurinx-ayriTiKinx xntoKing znK HyiriK Lxzrning TxthoKs
ZAxng-Jian Lin a, ZAun-ZAxng Pxng b and ZAi-Yung Lxx c
aDxpartPxnt of ZoPputxr PZixnZx and InforPation xnginxxring, ZAaoyang UnivxrPity of TxZAnology, Wufxng, TaiZAung Zounty 413, Taiwan, R.O.Z. bPZAool of ZoPputxr PZixnZx and InforPation PyPtxPP, BirTbxZT, UnivxrPity of London, London WZ1x 7AX, UT cDxpartPxnt of xlxZtroniZ xnginxxring, Nan Tai Zollxgx, Zaotun, Nantou Zounty 542, Taiwan, R.O.Z.
|
Abstract:
Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attributed function. One of the major problems in predicting prokaryotic promoters is locating the spacers between the -35 box and -10 box and between the -10 box and transcription start site. In this paper, we use the adopted expectation maximization (EM) algorithm to accurately find the localizations of the promoter regions. A brand new purine-pyrimidine encoding method is proposed to reduce the dimensions of the training data. The heavy demand on systems for both computation and memory space can then be avoided through the choice of coding factor. The most representative features are used for training learning vector quantization networks. The simulation results of the proposed coding approach reveal that the precision of promoter prediction using the proposed approach is approximately the same as the precision using the traditional encoding method.
|
Keywords: E. coli ; promoter prediction; purine-pyrimidine; expectation maximization algorithm; learning vector quantization networks.
|
Download PDF
|
*Corresponding author; e-mail: cjlin@mail.cyut.edu.tw
|
©
2004
CSME , ISSN 0257-9731
|