中国科学院机构知识库网格系统: 基于半监督学习的P2P协议识别

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于半监督学习的P2P协议识别

文献类型：学位论文


作者	谭炜
学位类别	硕士
答辩日期	2008-06-03
授予单位	中国科学院研究生院
授予地点	中国科学院软件研究所
导师	吴健
关键词	聚类半监督 P2P应用协议识别特征权值 Newton-Raphson KD-Tree
学位专业	计算机软件与理论
中文摘要	协议识别是进行有效的网络管理与控制的重要条件，由于新的P2P软件（以Skype,Emule，BitComet，迅雷为代表）开始使用加密协议和协议伪装等技术手段来防止被网管探测、识别、封堵，传统的根据协议特征码来识别的方式已经难以识别这些软件产生的流量。基于流量特征的P2P协议识别的方法是目前研究的主要方向，将机器学习的理论与模型运用到协议识别领域是发展的一个趋势。通过对传输层数据包(包括TCP和UDP数据包)进行分析，并结合P2P系统所表现出来的流量特征，来识别某个网络流是否属于P2P。这类方法包括：TCP/UDP端口识别技术、网络直径分析技术、节点角色分析技术、协议对分析技术和地址端口对分析技术等，但是其准确性和识别率不如特征码识别。本文就基于半监督聚类的模型运用到识别具体P2P应用的可能性进行了分析与实验，提出了一种基于Newton-Raphson方法学习特征权值矩阵的训练的办法，在依据P2P应用特征选取连接特征的基础上进一步提高系统识别准确率和召回率。在本文的实验环境下，针对具体的BitComet和Emule应用的识别器的识别率和召回率均达到了85%左右，在加密协议的识别上取得了不错的效果。如何优化系统的识别准确率和召回率，提高系统效率是本文重点研究并试图解决的问题，主要包括以下三个方面的成果：一、实验并分析了基于半监督学习的聚类模型在加密P2P应用识别上的效果，同时总结了一套分析P2P协议特征的办法。二、将Newton-Raphson方法引入到连接特征的选取上，将特征权值矩阵用于距离的计算，进一步提高了训练和识别的效果。三、基于KD-Tree的识别器的实现使得整个在线识别过程能在内核的协议层高效实现，有效的控制了系统的计算复杂度。
索取号	暂无
英文摘要	Network traffice identification is import for network administration, especially, the p2p identification.However, some p2p application use encryted protocols make protocol identification harder these years.Payload based methods can’t classify all protocols nowadays, more and more techniches based on machine learning are used in the process of protocol identification. Transport layer features, especially the flowing features are the most usally used features today, many identifycation framework are built base on them. Such as port, node action, IP/Port features, network diameter and so on. But low accurency and high system complexity prohibits the developing of these systems. In this paper, a Semi-Supervised clustering model is built for classifying p2p applications. How to get higher accurency and feature selecting is what this paper concentrates on. First, we analyze the system performance for detailed p2p application identification here.A way for feature selecting is concluded for the p2p protocols, and we make a detailed test here, 85% rate of accurency is achieved. Second, a matrix A is applied to the distance function. The features contribute differently to the distance for higher accurency. Newton-Raphson method is used for finding this A. Third, the aim of this system is build a online protocol classifier in the kernel. So the complexity and rate of identify is very important. KD-Tree is used in the identifying process for low caculate complexity.
公开日期	2011-03-17
分类号	暂无
源URL	[http://124.16.136.157/handle/311060/6726]
专题	软件研究所_基础软件国家工程研究中心_学位论文
推荐引用方式 GB/T 7714	谭炜. 基于半监督学习的P2P协议识别[D]. 中国科学院软件研究所. 中国科学院研究生院. 2008.

入库方式： OAI收割

来源：软件研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。