Pedal to the Metal
In my post to Seeking Alpha I wrote:
- The final sleeper feature with deep implications for Apple’s future is Metal.
- Metal is a new technology for writing graphics programs, particularly animations and games. It allows programmers to write code at a much lower level than previously, and this produces a real, up to 10x performance improvement.
Clearly the newest Apple hardware (iPhone, iPad,a nd possibly Apple TV box) will be getting upgraded processors (system on a chip or SoC), the A8. I wrote in detail on the current A7 in this post, and I also presented my speculations on the A8. While Apple’s creative design keeps them a step ahead in the processor race, because they control both the operating system (iOS) and the processor technology
- Apple is in a unique position to optimize system graphics performance
So what is it that they have done?
The process is a bit complicated, but the overall concept less so. Basically, Apple has taken a lot of the graphics workload and moved it from run time operations to compile and load time operations.
First – The earlier problem
Beginning before the advent of the smartphone, PCs were built with a wide variety of hardware, each with a different GPU (Graphic Processing Unit). Those who wrote graphics based applications – drawing, video editing, photo editing, games, etc. – had a tough time dealing with the effort to write for the different hardware options. So a system was created called OpenGL that would provide one common set of subroutines that the programmers could “call” in their programs to perform the particular graphics operations, and these would run on any GPU that supported the OpenGL standard. Now programmers could write for one system only and the program would run on any supporting hardware.
Computer programs direct the system on precisely what to do at any given instant. They are written in high level Computing Languages they are somewhat like natural languages so that they can be easily understood by humans. But computer processors work on an individual arcane language of numeric codes. So the high level language program needs to be compiled, or translated, into the machine code by another program called a compiler.
So, a program is written, then compiled into a machine code package by the developers, and this compiled package (along with any required resources) is delivered to the customer as the Application.
The problem here is that GPUs differ radically in their architecture, and even in their processing models, and so one cannot simply have one single set of machine code that will run OpenGL routines. The solution here is that portions of the GPU instructions need to be compiled at runtime (i.e. while the computer is running). Unfortunately, for generating on the fly graphics, particularly for games, this work is done repeatedly for every frame of the adventure.
And this is very costly work!
You can get around this problem if and only if you can control precisely what the hardware is – and will be in the future. Only Apple can do this. Craig Federighi:
“…OpenGL ends up being a thick layer of overhead between the game and the hardware.”
In particular, there are two task that are very expensive, yet must be done in an OpenGL system. These are Shader Compilation, and State Validation. OpenGL must do both these operations at frame render time. If we could move these to other times that happen less frequently, then we could improve performance. Because Apple will always know precisely what its Graphics units are, it can do precisely that, as shown in the following slide.
This shows how compute intensive tasks that used to be performed while the application was running, can now be performed either at app compile time (i.e. by the developer before app installation), or once when a particular scene is loaded into the app. During the actual game runtime, only the actual frame generation needs to be done.
Electronic Arts: Plants vs. Zombies
- over 1.3 million triangles on the screen at a time
- Frostbite console engine ported.
- Console level Geom Cache ported
- up to 4000 draw calls per frame.
Epic Games: Founder Tim Sweeney – Zen Garden
- 5000 flower petals being individually simulated
- 3500 individually animated butterflies
- 10 fold increase in rendering efficiency
- “to have this capability is a stunning breakthrough.”
- It is important to understand that this is an ongoing advantage that Apple will have. It cannot be overcome by upgrading GPU power as it is a software to GPU enhancement.
Comments greatly appreciated!
Dear friend – if you appreciate my commentary please consider viewing my product linked below.
Elegant, Handcrafted, Genuine Leather