中国科学院机构知识库网格系统: 语音驱动的人脸动画关键技术研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

语音驱动的人脸动画关键技术研究

文献类型：学位论文


作者	陈柯
学位类别	博士
答辩日期	2005
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	三维重构网格匹配人脸纹理映射唇形建模唇形动画的协同发音
其他题名	Key Technologies Research for Speech Driven Facial Animation
中文摘要	在人机信息交互的过程中，传统的单模态语音技术会受到噪声程度的很大限制而降低交互的效能。研究表明，与语音高度关联的视觉信，息对人的听觉感知有很强的互补作用，并提高人机交互的自然性。因此，语音驱动人脸动画技术目前在国际上受到高度重视，是多模式的人机交互界面、多媒体的视频会议、网络虚拟主持人等领域的研究热点之一。本论文进行语音驱动的人脸动画关键技术的研究，主要贡献如下：1．提出了从不同视角的照片进行人头三维重构的方法，通过在2一5幅不同视角的照片中选取一定量的特征点，进行特征点的投影重构，然后以对偶绝对二次曲线为摄像机的校准实体将投影重构升级为比例重构，最后用LM算法对重构结果进行优化，达到了鲁棒高效的重建的目的，同时回避了摄像机预先定标的问题；2·提出了采用Dirichlet自由变形方法（DFF功对网格进行适配的方法，根据DFFD方法的控制点位置可以自由放置的特点，选取基于MPEG-4标准定义的人脸表面特征点集合、人头边缘点集合、容纳人脸模型的立方体顶点集合三类控制点集合，通过控制点集合的变形来插值出网格的变形，达到对人脸三维模型变形的控制的目的；3，提出了一种基于散乱数据插值的人脸纹理映射方法，将网格所有顶点投影到平面上，用特征点集形成的Delaunay三角网剖分整个人脸区域，用其它顶点在剖分域中的相对位置作为进行纹理坐标插值计算的依据，在特征点选取恰当和精确的条件下，可以不受网格适配精度的影响实现令人满意的纹理映射；4．提出唇部建模的参数变形模板方法，通过引入基于主动形状模型的唇部建模方法，对嘴唇的标注点进行PCA统计分析，得到最低只需3到4个控制模式作为参数的变形模板，实现灵活的大范围的唇部的变形，不仅适于用作唇形动画的模型，还可以推广到三维，清况应用于整个人脸的变形；5．提出一个修正的语音驱动的唇形动画协同发音模型，在对Cohen的Dominance函数进行改进的基础上，引入自然状态的Dominance函数来描述发音段之间的停顿，用时间阻力函数来处理唇齿音和摩擦音等音素的发音唇形，用形状函数来模拟某些特殊的唇形动作，使修正后的Dominance函数可以更好地实现自然流畅的唇形动画。
英文摘要	During the human-machine interaction, the traditional single modal speech technology always shows its disadvantages in the noise environment. The researches have figured out that, the speech-correlated visual information can offer great improvement to human's auditory perception, and increase the natural level of human-machine interaction. Therefore, the technology of speech-driven facial animation has been highly regarded in the world in recent years, and has been a hot research field of multi-model human machine interface, multi-media video conference, web virtual stuntman and so on. Propose an algorithm to reconstruct the 3D human head from several photos of different viewpoints. After picking some feature points in 2-5 photos of different viewpoints, the projection reconstruction of the feature points has been done. Then, the projection reconstruction is updated to the metric reconstruction, by taking the Dual Absolute Conic as the calibration object. Finally, the reconstruction result is optimized by LM algorithm. The algorithm can do the reconstruction robustly and efficiently and avoid the procedure of pre-calibration of the camera. The algorithm of face mesh fitting based on the Dirichlet Free-Form deformation algorithm (DFFD) is proposed. The control points in the DFFD algorithm can be placed arbitrarily, which is the most advantages of DFFD. Three sets of control points are selected including the MPEG-4 based facial definition points (FDP), the points on the edge of the head and the vertices of a cube. The mesh deformation is achieved by the interpolations of the control points' displacement and the 3D facial model deformation is obtained. 3. A method to map the facial texture based on the interpolations of scattered points is proposed. First, all the mesh points are projected to a plane and a Delaunay triangulation of all the projected feather points is imposed to the whole face area. The points beside the feature points can obtain their texture coordinates based on their natural neighborhood coordinates in the triangulation net. The mapping result doesn't depend on the accuracy of mesh fitting. If only the feature points are felicitously distributed, a satisfying texture mapping can be achieved. 4. A new method to construct the lip model for speech-driven animation is represented based on the active shape model. By the principle component analysis upon the points marked on the lip, an active shape model of lip can be achieved which need only 3 to 4 parameters to control the deformation. This model can describe a wide sampling of face geometries, and to be controlled easily. These properties make it not only a perfect model for lip animation, but also suitable in the 3D situation to describe the whole human face. A modified coarticulation model of lip animation driven by speech is proposed. The Cohen's Dominance function is improved. First, a silence dominance function is applied to describe the pause period between articulation segments. Second, the temporal resistance function is used to guarantee the articulate lip positions of labiodentals and spirants. Third, the shape function is used to shape some specific lip trajectory. The modified Dominance function can synthesize the lip animation more naturally and fluently.
语种	中文
公开日期	2011-05-07
页码	99
源URL	[http://159.226.59.140/handle/311008/1044]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	陈柯. 语音驱动的人脸动画关键技术研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。