Predicting Drug-Target Interaction for New Drugs Using Enhanced Similarity Measures and Super-Target Clustering |
|||||||||||||||||||||
Jian-Yu Shi a, Siu-Ming Yiu b, Yiming Li c, Henry C. M. Leung b, Francis Y. L. Chin *b |
|||||||||||||||||||||
a School of Life Sciences, Northwestern Polytechnical University, No.127, Youyi Road West, Xi'an, Shaanxi, China, 710072b Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kongc Department of Psychiatry, The University of Hong Kong, Pokfulam Road, Hong Kong | |||||||||||||||||||||
Abstract: Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets. This idea is promising. However, there are two shortcomings that have not yet been addressed appropriately. Firstly, most of the methods only use 2D chemical structures and protein sequences to measure the similarity of drugs and targets respectively. However, this information may not fully capture the characteristics determining whether a drug will interact with a target. Secondly, there are very few known interactions, i.e. many interactions are "missing"; in the database. Existing approaches are biased towards known interactions and have no good solutions to handle possibly missing interactions which affect the accuracy of the prediction. In this paper, we enhance the similarity measures to include nonstructural (and non-sequence-based) information and introduce the concept of a "super-target"; to handle the problem of possibly missing interactions. Based on evaluations on real data, we show that our similarity measure is better than the existing measures and our approach is able to achieve higher accuracy than the two best existing algorithms, WNN-GIP and KBMF2K. | |||||||||||||||||||||
INPUT The INPUT to our method is the similarities between the new drug and those known drugs in the datasets .The drug similarity is measured by chemical-structure alignment and ATC, and the target similarity is measured by protein-sequence alignment and FC respectively. Both the similarities between the known drugs and those between the known targets are pre-calculated and built-in. |
|||||||||||||||||||||
OUTPUT The OUTPUT from our method is a set of the confidence scores which denote how likely this new drug interacts with all known targets respectively. |
|||||||||||||||||||||
Computational Methods Involved The computational methods used in our methods include Hattori et al.'s method [1] for drug chemical-structure similarity, the protein-sequence alignment algorithm of Smith and Waterman[2] for protein sequence similarity, the agglomerative hierarchical clustering [3] for building Super-Target and the KNN classifier [4] for performing DTI predictions. The datasets used for training and evaluating our model were collected by Yamanishi et al [5] and can be downloaded from their webpage (http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/). Yamanishi et al. [5] collected and integrated drug-target interactions from the KEGG BRITE [6], BRENDA [7], SuperTarget & Matador [8] and DrugBank databases [9]. These datasets were also used as the benchmark datasets for comparing the performance of DTI prediction algorithms by subsequent works (e.g. KBMF2K [10] and WNN-GIP [11]). The details about how to collect drugs, targets and their interactions, can be checked in their original paper [5]. All benchmark DTIs are split into four datasets according to the type of protein targets, including enzyme, ion channel (IC), G protein-coupled receptor (GPCR) and nuclear receptor (NR). The numbers of drugs, targets and their unknown interactions in the corresponding datasets are listed in Table 1.
All ATC codes of the drugs can be found in KEGG (http://www.kegg.jp/kegg/drug/) and all FC codes of the targets can be found in HUGO Gene Nomenclature Committee (http://www.genenames.org/). | |||||||||||||||||||||
Implementation The input of our method includes pairwise drug similarities, pairwise target similarities and known drug-target interactions. The output is the confidence scores which denote how likely the pairs of known targets and new drugs are potential interactions. The whole workflow is illustrated in Fig 1.
Download: SourceCodes |
|||||||||||||||||||||
References
|