Unity官方文档中有一篇是讲图形性能优化的,这篇文章无疑是指导Unity开发优化的最佳文章。Unity圣典曾翻译过旧的版本,但是太老旧了,跟最新的文档差别很大。我试着翻译一下最新的文档,点击查看原文链接。
Optimizing Graphics Performance 图形性能优化
Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s graphical rendering.
良好的性能是很多游戏成功的关键。以下是一些最大化提高游戏图形渲染速度的简单指导。
The graphical parts of your game can primarily cost on two systems of the computer: the GPU or the CPU. The first rule of any optimization is to find where the performance problem is; because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite - it’s quite common to make GPU do more work while optimizing for CPU, and vice versa).
游戏的图形显示部分主要消耗计算器的两个系统:GPU和CPU。任何优化的第一规则都是查明性能瓶颈在哪里,因为优化GPU和优化CPU的策略常常是不同的(甚至常常是相互对立的——常常为了优化CPU而把它的一些工作交给GPU,反之亦然)。
Typical bottlenecks and ways to check for them:
典型的瓶颈和检查方法:
Of course, these are only the rules of thumb; the bottleneck could as well be somewhere else. Less typical bottlenecks:
当然,这只是根据经验的主要开销点,而瓶颈也可能在别处。不那么典型的瓶颈有:
In order to render any object on the screen, the CPU has some work to do - things like figuring out which lights affect that object, setting up the shader & shader parameters, sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card. All this “per object” CPU cost is not very cheap, so if you have lots of visible objects, it can add up.
为了渲染一个物体到屏幕上,CPU需要做一些工作——比如,哪些灯光影响物体,建立着色器和着色器参数,给显卡驱动发送绘制命令,然后准备发送给显卡的命令。单个物体的CPU开销并不昂贵,但是如果有很多可见物体,这些开销会累加。
So for example, if you have a thousand triangles, it will be much, much cheaper if they are all in one mesh, instead of having a thousand individual meshes one triangle each. The cost of both scenarios on the GPU will be very similar, but the work done by the CPU to render a thousand objects (instead of one) will be significant.
所以,举例来说,比如你有1000个三角形,相比每个三角形一个独立的网格,它们都在一个网格中对CPU开销要低得多。这两种方案对于GPU来说差别不大,但是CPU渲染1000个物体(代替1个)的开销多很多。
In order to make CPU do less work, it’s good to reduce the visible object count:
为了让CPU做更少的工作,减少可见对象的数量是很有效的:
Combine objects together so that each mesh has at least several hundred triangles and uses only one Material for the entire mesh. It is important to understand that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for having multiple materials is that two meshes don’t share the same textures, so to optimize CPU performance, you should ensure that any objects you combine share the same textures.
合并物体使每个网格有至少几百个三角形,并且整个网格只使用一种材质。合并两个不共用同一材质的物体并不会提升性能,理解这一点很重要。拥有多个材质的最常见原因是两个网格不共用相同的纹理,所以为了优化CPU性能,你要确保合并的物体共用相同的纹理。
However, when using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense, as explained below.
然而,如果在正向渲染路径下使用很多像素光照,有一些情况下合并物体并没有效果,下面解释。
When optimizing the geometry of a model, there are two basic rules:
有两个优化模型几何的基本原则:
Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the geometric vertex count, i.e. the number of distinct corner points that make up a model. For a graphics card, however, some geometric vertices will need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is invariably higher than the count given by the 3D application.
注意,图形硬件处理的顶点实际数量常常跟3D应用程序报告的不一致。建模应用常常显示几何顶点数量,即构成模型的不同角点的数量。然而,图形显卡为了渲染目的可能会把一些几何顶点拆分成两个或者更多个逻辑顶点。如果一个顶点有多个法线、UV坐标或者顶点颜色,那么必须把它拆分。因此,Unity中的顶点数量一定会比3D应用程序给的定点数多。
While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU, for example mesh skinning.
模型的几何数量主要对GPU有意义,Unity中的一些特性也在CPU上处理模型,比如网格蒙皮。
Lighting which is not computed at all is always the fastest! Use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
不需要计算的光照是最快的。使用光照贴图烘焙静态光,只需要一次,代替了每帧计算。生成光照贴图环境,比在Unity的场景中放一个光源消耗的时间仅仅多一点,但是:
In a lot of cases there can be simple tricks possible in shaders and content, instead of adding more lights all over the place. For example, instead of adding a light that shines straight into the camera to get “rim lighting” effect, consider adding a dedicated “rim lighting” computation into your shaders directly.
许多情况下,有一些着色器和内容的简单技巧,而不是在所有地方添加更多的光源。比如,为了获得“边缘光照”效果,可以直接在着色器中添加一次“边缘光照”计算,而不是添加一个直射相机的灯。
Per-pixel dynamic lighting will add significant rendering overhead to every affected pixel and can lead to objects being rendered in multiple passes. On less powerful devices, like mobile or low-end PC GPUs, avoid having more than one Pixel Light illuminating any single object, and use lightmaps to light static objects instead of having their lighting calculated every frame. Per-vertex dynamic lighting can add significant cost to vertex transformations. Try to avoid situations where multiple lights illuminate any given object.
对每一个受到影响的像素,逐像素动态光会累加可观的渲染耗费,并且会导致物体在多个通道被渲染。在性能比较差的设备上,比如移动设备或者低端PC的GPU,避免使用多于一个的像素灯照射任何单个物体,并且使用光照贴图来照亮静态物体而不是每帧计算光照。逐顶点动态光照会在顶点转换上累加客观的消耗。努力避免多个灯光照射任何给定物体的情况。
If you use pixel lighting then each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it will increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object will be taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, and so nothing is gained by combining. For this reason, you should not combine meshes that are far enough apart to be affected by different sets of pixel lights.
如果你使用像素光照,对于每一个网格,像素光照射它多少次,它将被渲染多少次。如果你合并两个相距较远的网格,将会增大合并物体的有效大小。渲染的时候,照射到合并物体任一部位的所有的像素光都会被计算,所以,需要渲染通道数量会增加。一般地,渲染合并物体的通道数量等于分别渲染独立物体的通道数量之和,所以,合并没有作用。因此,你不应该合并足够远以至于受不同像素光影响的网格。
During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is. Furthermore, some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important will typically have a lower rendering overhead.
渲染的时候,Unity查找所有网格周围所有的光,并计算哪一个对网格影响最大。Quality Settings(质量设置)可以修改最终多少个光是像素光,多少个是顶点光。每一个光基于距离网格的距离计算它的权重和光照强度。此外,取决于游戏内容,有些光比别的光更重要。因此,每个光源有渲染模式设置,可以把它设置为重要或者不重要,标记为不重要的光一般有更低的渲染开销。
As an example, consider a driving game where the player’s car is driving in the dark with headlights switched on. The headlights are likely to be the most visually significant light sources in the game, so their Render Mode would probably be set to Important. On the other hand, there may be other lights in the game that are less important (other cars’ rear lights, say) and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important so as to avoid wasting rendering capacity in places where it will give little benefit.
举例来说,考虑一个赛车游戏,玩家的车开着车头灯,在黑夜中行驶。车头灯是游戏中最重要的可见光,所以它们的渲染模式可能要设置为重要。另一方面,可能游戏里的其它灯光没那么重要(比如其它汽车的尾灯),对这些灯光来说,使用像素光照提升可视效果作用不大,可以把它们设置为不重要,避免在只能获得较少效果的地方浪费渲染性能。
Optimizing per-pixel lighting saves both CPU and the GPU: the CPU has less draw calls to do, and the GPU has less vertices to process and pixels to rasterize for all these additional object renders.
对于CPU和GPU来说,优化逐像素光照都可以减少开销:CPU需要处理的draw call少了,GPU需要处理的顶点和光栅化所有这些额外对象的渲染的像素少了。
Using Compressed Textures will decrease the size of your textures (resulting in faster load times and smaller memory footprint) and can also dramatically increase rendering performance. Compressed textures use only a fraction of the memory bandwidth needed for uncompressed 32bit RGBA textures.
使用压缩纹理会减少纹理大小(结果是更快的加载速度和更小的内存占用)并且大幅提高渲染性能。压缩纹理占用的存储带宽只有未压缩的32位RGBA纹理的一小部分。
As a rule of thumb, always have Generate Mip Maps enabled for textures used in a 3D scene. In the same way Texture Compression can help limit the amount of texture data transfered when the GPU is rendering, a mip mapped texture will enable the GPU to use a lower-resolution texture for smaller triangles.
作为经验,在3D场景中使用的纹理总是启用生成多重纹理。以同样的方式,GPU渲染时,纹理压缩可以帮助限制传输的纹理数据量,因为对于较小的三角形,多重纹理允许GPU使用较低分辨率的纹理。
The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.
这条规则的例外是,知道texel(纹理像素)是1:1映射到渲染的屏幕像素,比如UI元素或者在2D游戏中。
In some games, it may be appropriate to cull small objects more aggressively than large ones, in order to reduce both the CPU and GPU load. For example, small rocks and debris could be made invisible at long distances while large buildings would still be visible.
在一些游戏中,为了减少CPU和GPU负担,可以适当剔除小物体。比如,远距离的小石头和碎片可以设为不可见,而大的建筑物是可见的。
This can be either achieved by Level Of Detail system, or by setting manual per-layer culling distances on the camera. You could put small objects into a separate layer and setup per-layer cull distances using the Camera.layerCullDistances script function.
可以使用LOD系统,或者在相机上设置手工每层剔除距离,来做剔除。你可以把小物体放入一个独立的层,然后使用Camera.layerCullDistance脚本函数设置每层的剔除距离。
Realtime shadows are nice, but they can cost quite a lot of performance, both in terms of extra draw calls for the CPU, and extra processing on the GPU. For further details, see the Shadows page.
实时阴影效果很好,但是会消耗很多的性能,包括CPU额外的draw call和GPU额外的处理。更多细节,看文档的阴影页面。
A high-end PC GPU and a low-end mobile GPU can be literally hundreds of times performance difference apart. Same is true even on a single platform. On a PC, a fast GPU is dozens of times faster than a slow integrated GPU; and on mobile platforms you can see just as large difference in GPUs.
毫不夸张的说,高端PC和低端移动设备的GPU性能可能相差几百倍,甚至在同一个平台上也相差这么大。在PC上,一个快的GPU几十倍速于低端集成GPU;在移动设备上,也是如此。
So keep in mind that GPU performance on mobile platforms and low-end PCs will be much lower than on your development machines. Typically, shaders will need to be hand optimized to reduce calculations and texture reads in order to get good performance. For example, some built-in Unity shaders have their “mobile” equivalents that are much faster (but have some limitations or approximations - that’s what makes them faster).
所以,请记住,在移动设备和低端PC上的GPU性能,可能比你的开发机器低得多。典型地,为了良好的性能,着色器需要手工优化来减少计算和纹理读取。例如,一些内置的Unity着色器有快得多的等价的“移动”版本(但是有些限制或者是近似值 - 就是这些使得更快)。
Below are some guidelines that are most important for mobile and low-end PC graphics cards:
下面是一些针对移动设备或者低端PC显卡的指南:
Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan, etc) are quite expensive, so a good rule of thumb is to have no more than one such operation per pixel. Consider using lookup textures as an alternative where applicable.
复杂的数学函数(比如pow、exp、log、cos、sin、tan等)开销很大,所以一个好的经验是不要在每个像素上使用这些函数。如果可以,考虑使用查找纹理作为替换。
It is not advisable to attempt to write your own normalize, dot, inversesqrt operations, however. If you use the built-in ones then the driver will generate much better code for you.
不建议自己实现normalize、dot、inversesqrt等运算,使用内置的函数,驱动会生成更好的代码。
Keep in mind that alpha test (discard) operation will make your fragments slower.
记住,alpha测试(裁剪)操作会使你的片段更慢。
You should always specify the precision of floating point variables when writing custom shaders. It is critical to pick the smallest possible floating point format in order to get the best performance. Precision of operations is completely ignored on many desktop GPUs, but is critical for performance on many mobile GPUs.
写自定义的着色器时,应该指定浮点数精度。为了获得更好的性能,选用最小的可行浮点数格式是很关键的。运算精度在很多台式机GPU上完全被忽略,但是在移动设备GPU上,它对于性能很关键。
If the shader is written in Cg/HLSL then precision is specified as follows:
如果着色器是Cg/HLSL写的,精度如下:
If the shader is written in GLSL ES then the floating point precision is specified specified as highp, mediump, lowp respectively.
如果着色器用GLSL ES写的,浮点数格式分别是:highp, mediump, lowp。
For further details about shader performance, please read the Shader Performance page.
了解着色器性能的更多细节,请阅读文档着色器性能页面。
原文地址:http://blog.csdn.net/ynnmnm/article/details/39003147