skip to main content
資源種類 顯示結果: 顯示結果: 查詢種類 索引

以樣式為基礎的分群法之研究

吳崢榕; Wu, Jeng-Rung 李建億; Chien-I Lee; 資訊教育研究所碩士班 2004

線上取得

  • 題名:
    以樣式為基礎的分群法之研究
  • 著者: 吳崢榕; Wu, Jeng-Rung
  • 李建億; Chien-I Lee; 資訊教育研究所碩士班
  • 主題: 資料探勘; 相似性; 樣式為基礎; 分群法; pCluster
  • 描述: 分群技術是以物件與物件之間彼此的”相似度”來進行分類,讓相似度相同的儘量聚集在同一群的一種方法。而相似度定義通常是以物件間的距離為基礎,如:歐幾里得距離。但在基因分群的應用中,傳統以距離為相似度計算依據的方式並不適合,因為有時基因與基因間並不具有相近的物理距離,但卻存在有相似的一致性樣式,因此一種新的分群模型-樣式為基礎的分群(Pattern-Based clustering)簡稱為”pCluster”被提出來解決此一問題。所謂兩個物件是否同屬一pCluster,取決於它們屬性中的子集是否有一致性的樣式,這樣兩兩比對找出pCluster的過程,必須耗費大量的計算時間。然而如何在大量資料中,準確且有效率地找出這些pCluster,便成了一個非常值得探究的議題。目前的方法在處理大型資料集或是資料集中群與群間重疊很嚴重時,效能表現並不好。因此,在本篇論文中,將提出一個新的演算法: 稱之為PCP (pCluster plus),主要的做法是藉由減少計算不必要比對的物件,使得處理pCluster的問題更為快速。此外,也提出考量資料位移(Shift)及資料量新增(Incremental)時能有效找出pCluser的方法。經過實驗証明,我們所提出的方法的確有更好的效能。
    Clustering, a method which makes similar objects gather in the same cluster, is done by means of the similarity between objects. And the definition of similarity is mostly based on the distance between objects such as Euclid’s distance. However, for clustering genes objects, the traditional method that worked out the similarity by distance is not so proper because sometimes there doesn’t exist approximate distance but a coherent pattern. Therefore, a brand-new clustering model – Pattern-Based Clustering, called pCluster, was proposed to slove this problem. Two objects in the same cluster is decided whether the subsets they belong to share a coherent pattern. The way to find the pClusters by compare their objects will waste time, nevertheless, how to find out these clusters precisely and efficiently in considerable data becomes a question worthy of inquiry. Although some have brought up solutions to pCluster, bad efficiency is found when there is huge overlap of the large volumn of objects. For this reason, our purposed PCP, a method more efficient and precise method which mainly serves to reduce some unnecessary objects is provided. Besides, the the problem of data shift and incremental are considered as well. Through the experiment, our purposed method have better efficiency than others.
    碩士
  • 建立日期: 2004
  • 格式: 121 bytes
    text/html
  • 語言: 中文
  • 識別號: http://nutnr.lib.nutn.edu.tw/handle/987654321/4654
  • 資源來源: NUTN IR

正在檢索遠程資料庫,請稍等