Header image

In my current client project, we’re developing an AIR application targeted for iOS (Android will follow) and we wanted to make use of some iOS SDK features, so I had to write my first NativeExtension. Developing the Objective C part is pretty straight forward (If you know C++ and Objective C) and so is the Actionscript part. There are some good examples and tutorials on the Adobe site about all kind of extensions.

The hard part was to get this thing to work. So I just wanted to share my settings here. This might become useful, if you’re starting to develop your first ANE. I had strange crashes when I packaged the app with my ANE and I couldn’t figure out what was wrong. The app just crashed everytime I launched it on the device. The crashlog wasn’t very helpful. After quite a search, I found out, that I didn’t set an apparently important compiler flag for the LLVM compiler in my XCode project. So, be sure to set:

Enable Linking With Shared Libraries: No

And if you want to get rid of the warnings:

Warnings: Missing Function Prototypes: No

The second part was packaging the ANE correctly. The working command for my case is:

adt -package -target ane MyExtension.ane extension.xml -swc MyExtension.swc -platform iPhone-ARM library.swf libMyNativeExtensionIOS.a

The annoying thing about packaging the ANE is, that after you have built your swc, you have to extract the library.swf out of it (By renaming it to .zip and extracting the swf). So you need both, the swc AND the swf. I didn’t write an ANT task to do automate the process until now and I don’t know the reason for this strange step, since the ADT compiler has everything it needs within the swc. Only Adobe knows ;)

Obviously you can not test on the device everytime, because the deployment process to iOS is more or less manual and just takes too long at the moment. I found out, that I could link the ANE as a regular library (SWC) in my Flash Builder project and launch the app on my desktop machine. When the native extension tries to create the context on the desktop machine, it fails and returns null, because it was just built for the iOS platform:

context = ExtensionContext.createExtensionContext(EXTENSION_ID, null);

So I could implement a fallback for the extention when running on the desktop that mocked the behaviour in AS3. To package the application for iOS, I wrote a small ANT task. This way we can easily test on the device and have a fallback, when testing 0n the desktop without writing desktop extensions as well.

So, maybe someone will find this useful…

ND2D – Blur

December 7th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Pixelshader - (6 Comments)

Good news everyone. I found a little time to implement a blur shader for ND2D and I’m trying to explain how to implement a shader like this:

First of all: How does a blur work? To blur an image, you sample neighbouring pixels of each pixel in the image and compute the average color. For example if you have a 3×3 image and the pixel in the middle is black, the rest white. You sample all neighbours of the middle pixel (r: 1.0 g: 1.0 b: 1.0) * 8 plus the pixel itself (r: 0,0 g: 0,0 b: 0,0) * 1 and compute the average (divide by 9), the resulting pixel will be (r: 0,88 g: 0,88 b: 0,88). Just do that for every pixel in the image and you’ll have a blur.

To implement this in a shader we have to consider a few things: First you want to save as many texture sampling calls as possible. For example if you want to blur your image 4 pixels horizontally and vertically, you would have to take 9 x 9 = 81 samples (4 to the left, up and down and the pixel that should be blurred itself). This is way too much and you could never squeeze this into a fragment shader with AGAL. But there is a trick: First blur your image horizontally, take the result and blur it vertically. This way, you have to take only 9 + 9 = 18 samples (see Article: Gaussian Blur Shader). Implementing it this way, means we have to do a horizontal blur, write the output to a texture and do a vertical blur with the already horizontal blurred texture. In other words, a two pass rendering. A nice sideeffect of this approach is, that we can not only blur in x AND y direction, but in x OR y individually.

So we’ve implemented our blur now and are happy that everything is blurry with a 4×4 blur, but how do we animate it now? We could generate the shader dynamically, so that we would have a different shader for different blur values, but space is limited in a fragment shader. A program can’t exceed a certain size. What if we want to have a blur of 50 x 50? We can’t write a shader that does this. The program would be just too big, since we don’t have loops in AGAL.

One part of the answer is good old: Carl Friedrich Gauß. He invented a formular a few hundred years ago, that let’s us weighten the sampled pixels (see Article: Guassian Blur and an Implementation). So our shader can remain static and sample always 9 pixels, but the gaussian function will tell us how the samples are treated. So instead of dividing all samples by 9, we have a factor for each sample. Now not only the blur is dynamic, it even looks a lot better with the gauss values than our simple “divide by 9″ approach. Neat! Now we can animate a blur from 0 to 4 pixels. That’s ok, but we wanted 50 or more, remember?

The last and final part to our full dynamic blur shader is: Just repeat what we’ve done already! If you want to have a blur of 10, just blur two times by 4 pixels, followed by a 2 pixel blur. Implementing this is also straight forward: setRenderToTexture(), renderBlur(), switchTextures(). All done in a loop.

Enough of the tech talk, here’s the result (move your mouse to blur the sprites in x and/or y):

You’ll notice the ugly edges in the middle image. This happens, if the blur is larger than the transparent space available in the texture. So the blur is “cut off”. I haven’t found a good solution for this, except of: Leave enough space in your textures if you want to blur it ;)

I found some time to add a little bit more “D” to ND2D. Besides the regular “rotation” property which rotated around the z-axis, all nodes now have  rotationX, rotationY, rotationZ properties and are displayed via a perspective projection. It works similar to the Flash 10 2.5D API (Planes in space), could be useful for some fancy transition effects.

Second, I added a few properties to change the appearance of textures. You can strech textures now and define how they should be sampled. The API let’s you choose how the texture is filtered, if mipmapping should be used and how the mipmap filtering should be. I created four predefined quality settings: LOW, MED, HIGH and ULTRA. Have fun:

ND2D – Speed tests

October 23rd, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Talk - (15 Comments)

When talking about accelerated 2D in Flash, everybody is always asking for performance comparisons. So I threw together a little speed test for ND2D. Mainly to give you some numbers, but also to test the different implementations of ND2D‘s objects. After selecting one of the four different options, the test will keep adding sprites until the framerate drops below 60hz. While adding sprites, it’s likely, that the framerate drops below 60hz for a short while, because adding and creating objects is expensive too. But what counts is the end result.

This test allows you to compare four different types of objects / rendering:

  • Sprite2D with a shared texture. Every sprite is drawn in a seperate drawCall, but there’s only one texture in memory
  • Sprite2D with individual textures. A drawCall for every sprite is used as well and there are as much textures in memory, as there are sprites
  • Sprite2DCloud. All sprites have a shared texture and are drawn in a single drawCall. All movement is calculated on the CPU and the vertexbuffer is uploaded to the GPU every frame
  • Sprite2DBatch. Shared texture as well, but most of the work is done by the GPU with batch processing.


Hit ‘F’ for fullscreen

The results on my machine in Chrome at fullscreen resolution (1680 x 1050) and the Flash Player 11 Release (Please, don’t try it in the debug player, it’s way slower) are:

  • Sprite2D shared Texture: 2157
  • Sprite2D individual Textures: 1881
  • Sprite2DCloud: 14579
  • Sprite2DBatch: 6180

There are still a lot of things, that can be optimized. For example, I’m not saving and comparing state changes in the context (texture bind / unbind checks, etc.). At least the first test could be optimized a lot with this technique I think. Even though there is still space for optimization, I’d say that ND2D is fast enough to build some stunning games! Who needs 15 thousand moving sprites in a game? That should be more than enough ;)

A few people where wondering why they can’t control individual particles in the ND2D particlesystem. Let me explain why:

The ParticleSystem2D is built for speed. This means, that everything and really everything for each particle is calculated on the GPU. When you create a system, initially the starting values for each particles are created and uploaded to the GPU. From now on, everything is calculated in shaders based on the current time step. This way ND2D is able to render 10.000 (or even more) particles at 60hz without any CPU usage. The drawback is, that you don’t have control over each particle, but you’ll have a lot of CPU time left for more important stuff. The ParticleSystem2D can be used for effects like rain, fire or water, but you won’t be able to animate a swarm of birds with it. You can play around with the system below, but be careful. Depending on the size of the particles you can display 10.000 at 60hz or nearly freeze your machine. The larger, the slower.

If you want to have  control over individual particles, you can use one of the batch nodes provided by ND2D. The Sprite2DCloud or the Sprite2DBatch. With these batch nodes you’re able to move each child, but they are slower, because all the positional information has to be uploaded to the GPU every single frame. When I say slower, I mean that you can still display 1000 (or a lot more) particles alphablended at 60hz. This should be enough for a whole army of kinghts or a fancy mousefollower. Play around with it here:

And if you haven’t installed the new Flash Player 11 that has been released yesterday, grab it here.

One really cool thing about textures on the GPU are the different wrapmodes when sampling pixels from it. In Molehill, there are two different types available:

  • CLAMP – if UV coordinates are lower than zero or greater than one, the coordiantes are clamped to 0..1, so the edge pixels are repeated
  • REPEAT – if UV coordinates are lower than zero or greater than one, the whole texture is repeated. So for a UV of (1.2, 1.4) the pixel of (0.2, 0.4) is sampled

Simply spoken, if you set the wrapmode to REPEAT, animate the UV-coordinates and have a self repeating texture, you’ll have the most simple endless scroller you can imagine. Don’t worry, everything is built into ND2D, you don’t have to care about what I just told you. Just watch the example:

This example is included in the ND2D Examples on Github. This scene just consists of two sprites with a fixed position in the middle of the screen. The only thing that is done on the CPU in the step loop is this:

override protected function step(elapsed:Number):void {
    starfield1.material.uvOffsetX -= (stage.stageWidth * 0.5 - mouseX) * 0.00002;
    starfield1.material.uvOffsetY -= (stage.stageHeight * 0.5 - mouseY) * 0.00002;
    starfield2.material.uvOffsetX -= (stage.stageWidth * 0.5 - mouseX) * 0.00004;
    starfield2.material.uvOffsetY -= (stage.stageHeight * 0.5 - mouseY) * 0.00004;
}

This can become handy, if you want to animate a waterfall, waves or a space field background in your game. Have fun!

I never really introduced the TextureRenderer of ND2D and what possibilities you have, when using it. The TextureRenderer does what the name suggests: It renders a display object (Sprite2D, etc.) and all subsequent objects onto a Context3D texture. The cool thing is, that you are able to draw your entire scene to a (fullscreen) texture and add some post processing effects, by writing a new material / shader and displaying it via a standard Sprite2D.

Here’s the plain scene without post processing:

… and here with a small “dizzyness” post process shader:

I’ve added this test to the examples incluced in the ND2D sources. You can see the live running example here (test #18).

ND2D – Stage3D Masks

September 2nd, 2011 | Posted by lars in Actionscript | Molehill / Stage3D | ND2D | Source - (6 Comments)

Another feature I really wanted to implement in ND2D were masks. Just like the setMask() method in flash. In Stage3D (OpenGL), there is no such thing as a mask. You can display textured triangles, that’s it, but you know that nearly everything is possible with a pixel shader. So let’s start:

The idea of masking in a fragment shader is to grab the pixel color of your texture, then grab the pixel color of your mask, multiply the two colors and display the result. But how do we find the correct pixel in the mask? Our task is to find the right UV coordinates for the mask texture.

If you look at the above image, the mask is rotated and overlaps the sprite we want to mask. How do we find the correct pixel (UV coordinate) of the mask, that overlaps this orange pixel in the sprite? Somehow we have to map the position of the pixel in the sprite to the pixel in the mask and we can do that by transforming it between the different coordinate systems. In a vertex shader we calculate the final pixel positon from local space to world space. The idea is to map this pixel in world space back to the local coordinate system of the mask. This way it’s pretty easy to find the correct UV coordinates. Let’s do a simple actionscript test:

// this is the top right corner of our sprite quad.
var v:Vector3D = new Vector3D(128, -128, 0, 1);
 
// this is the sprites matrix, translated a bit
var clipSpaceMatrix:Matrix3D = new Matrix3D();
clipSpaceMatrix.appendTranslation(100, 0, 0);
// this is the masks matrix, it's in the same position as the sprite
var maskClipSpaceMatrix:Matrix3D = new Matrix3D();
maskClipSpaceMatrix.appendTranslation(100, 0, 0);
// this is the masks size
var maskBitmap:Rectangle = new Rectangle(0, 0, 256, 256);
 
// invert the matrix, because we want to map back from world space to local mask space
maskClipSpaceMatrix.invert();
 
// transform our vertex from local sprite space to world space
v = clipSpaceMatrix.transformVector(v);
[trace] moved to clipspace: Vector3D(228, -128, 0)
 
// transform world space vertex back to local mask space
// the result is the same vector of course, because the positions of mask and sprite are equal
v = maskClipSpaceMatrix.transformVector(v);
[trace] moved to local mask space: Vector3D(128, -128, 0)
 
// calculate the uv coordinates from the local pixel position
v = new Vector3D((v.x + (maskBitmap.width * 0.5)) / maskBitmap.width,
                 (v.y + (maskBitmap.height * 0.5)) / maskBitmap.height,
                  0.0, 1.0);
 
// the result is what we expect, the top right uv coordinate:
[trace] local mask uv: Vector3D(1, 0, 0)

Porting this idea to a shader is pretty straight forward. Let’s code a PB3D Material Shader:

void evaluateVertex()
{
     interpolatedUV = float4(uvCoord.x + uvOffset.x, uvCoord.y + uvOffset.y, 0.0, 0.0);
 
     float4 worldSpacePos = float4(vertexPos.x, vertexPos.y, 0.0, 1.0) * objectToClipSpaceTransform;
     // maskObjectToClipSpaceTransform is the invertex clipspace matrix of the mask
     float4 localMaskSpacePos = worldSpacePos * maskObjectToClipSpaceTransform;
 
     // halfMaskSize.xy is maskBitmap.width/height * 0.5 passed as a parameter
     // invertedMaskSize.xy = 1.0 / maskBitmap.width/height passed as a parameter, because divisions are not properly working in the current pb3d release
     interpolatedMaskUV = float4((localMaskSpacePos.x + halfMaskSize.x) * invertedMaskSize.x,
                                 (localMaskSpacePos.y + halfMaskSize.y) * invertedMaskSize.y,
                                  0.0, 0.0);
}
 
void evaluateFragment()
{
    float4 texel = sample(textureImage, float2(interpolatedUV.x, interpolatedUV.y), PB3D_2D | PB3D_MIPNEAREST | PB3D_CLAMP);
    float4 texel2 = sample(textureMaskImage, float2(interpolatedMaskUV.x, interpolatedMaskUV.y), PB3D_2D | PB3D_MIPNEAREST | PB3D_CLAMP);
 
    result = float4(texel.r * color.r * texel2.r,
                    texel.g * color.g * texel2.g,
                    texel.b * color.b * texel2.b,
                    texel.a * color.a * texel2.a);
}

If you don’t want to use PixelBender3D and like to ‘torture’ yourself with AGAL, you can write the same shader this way:

/*
vertex shader:
 
vc0-vc3 = clipspace matrix of sprite
vc4-vc7 = inverted clipspace matrix of mask
vc8.xy = half mask width / height
vc8.zw = mask width / height
va0 = vertex
va1 = uv
*/
 
m44 vt0, va0, vc0           // vertex * clipspace
m44 vt1, vt0, vc4           // clipspace to local pos in mask
add vt1.xy, vt1.xy, vc8.xy  // add half masksize to local pos
div vt1.xy, vt1.xy, vc8.zw  // local pos / masksize
mov v0, va1                 // copy uv
mov v1, vt1                 // copy mask uv
mov op, vt0                 // output position
 
/*
fragment shader:
*/
 
mov ft0, v0                                // get interpolated uv coords
tex ft1, ft0, fs0 <2d,clamp,linear,nomip>  // sample texture
mov ft2, v1                                // get interpolated uv coords for mask
tex ft3, ft2, fs1 <2d,clamp,linear,nomip>  // sample mask
mul ft1, ft1, ft3                          // mult mask color with tex color
mov oc, ft1                                // output color

The result is visible here: ND2D – alpha masks (Move your mouse over the crates). I added one more feature: You can set the alpha of a mask, that means that you can specify how much the mask affects the sprite. In the demo above the alpha fades from 0.0 to 1.0. Since we’re using all four color components in our calculations (r,g,b,a), we can not only mask the alpha, but all color channels. I don’t know if this it’s a “nice thing to have” or if it will get annoying when you use sprites as masks in your game and need to provide an extra image for that. Just let me know :) Here is the example: ND2D – disco color masks.

ND2D – Pixel Bleeding

August 30th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | OpenGL | Talk - (9 Comments)

This post is more a note to myself, but you might find that interesting.

There was a bug that was annoying me for a while in ND2D, but I didn’t had the time to fix it: When you use spritesheets and the sprites are packed without any space between them like this one:

It’s likely that you run into issues where the GPU is drawing the pixels of another sprite around your sprite. This looks like this then (The lower image is the fixed version):

If you use mip-mapping it get’s even worse, but that’s another story…

This happens, because OpenGL / DirectX needs to have the center of uv-coordinates on the pixel and not on the edge of the pixel. The solution is pretty simple: Instead of calculating the uv-coordinates from 0 to screenwidth, you’re technically supposed to calculate from 0.5 to screenwidth – 0.5. This way the edge pixels are “cropped” out and the bleeding stops :)

Operation successful, patient alive & breathing. Nurse, I need a drink, cheers!

Hi there,

I just updated ND2D to the latest public beta of the Flash Player 11. I’m totally amazed how much faster the new player is. Without any codechanges I get as twice as much FPS in most of my demos. Check it out:

ND2D – Demo

April 29th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Talk - (20 Comments)

I needed some more serious game scenarios to test ND2D. So I created this little sidescroller demo:

Be patient, there is no preloader… The visuals I created, were heavily inspired by Glit. I hope they will release a playable version of the game soon!

It features most of the effects currently implemented in ND2D:
- 2D Sprites (floor and ceiling)
- Particles (fire and moving dust)
- 2D Grid (Distortion effect on the ‘cloud’ layer)
- 2D SpriteSheets (waving grass)

This little demo runs in full screen at 60hz on my machine! Yay! I’ll add it to the examples with the latest improvements I made for ND2D the next days.

ND2D – Box2D Tests

April 27th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Talk - (1 Comments)

Good news everyone. Sven was so kind to create a little demo with Box2D and ND2D. The performance is already pretty good, but there are still a lot of things I have to optimize. I’ll include the source code of the Box2D example in the sources and post some more details of the latest ND2D features the next days.


(Note: The demo is broken with the latest Flashplayer 11 Release due to API changes)

ND2D – beta released

April 12th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Talk - (7 Comments)

Yay! I just released the first beta of ND2D. You can grab the sources via my github account: nulldesign/nd2d.

There is still a lot to do, especially in terms of performance. Since I decided to use PixelBender3D and not AGAL as my shader language, I have to wait for the next release, because a lot of features are still missing that are available in the AGAL opcodes (No KIL instruction, no Arrays, etc…).

Please play around with it, fork it, use it and send me feedback!!!

Update: You can try out a few live demos here.

One of biggest challenges in modern computer graphics, still is the high cost of rendering thousands of different objects, no matter how simple they are. While developing ND2D, I’m experimenting and trying out different techniques to get a good performance.

To optimize the rendering you have to know it’s weaknesses. As a simple rule you can say: Every state change on the graphics context (Context3D) and especially the drawTriangles() call is using a lot of processing power. You’ll notice pretty fast, that if you try to render 2000 sprites (a sprite are just two textured triangles, so 4000 tri’s in total) and you’re doing a draw call for every single sprite, the overhead will be so high, that the output looks more like a slideshow than a smooth animation. The possible solution is simple: Just do as little state changes and draw calls as possible. The implementation is a bit more work…

So how do you save draw calls? The answer is geometry batching. Instead of drawing one sprite per draw call you just draw multiple sprites in a single call. To get it to work, you have to dig a bit deeper into pixel shader programming and the graphics hardware:

Single sprite per draw call:
A sprite consists of two triangles, a triangle of three vertices and each vertex has the following attributes: x,y,z, u,v, which will be the format for our vertex buffer. The shader input parameters (constants) will be the mvp matrix, a color (to tint a sprite and to enable transparency) and of course the texture image (image4). This way you’re able to draw one sprite per call, pretty easy and straight forward… but slow.

Improvement, batching calls:
You can only batch calls, if the sprites you want to draw have all the same texture (Setting a texture is also pretty expensive). The main idea is, that you pass multiple mvp matrices and multiple colors to the shader instead of just one. Within the shader, depending which sprite is drawn, a different mvp matrix is used. But how many values you can pass to the shader? Todays modern graphic hardware has at least 128 constant registers available in the GPU, so to be compatible with all the different graphics cards out there it’s limited to 128 in the Molehill API. In the following picture you can see the different inputs that are available for the vertex shader. We won’t bother with temp registers and input vectors now, because it’s just unlikely that we are running out of registers while drawing sprites. So just keep in mind, that the vertex shader has limited storage space. In our case we’re limited to 128 constants.

(Image taken from the DX8 SDK documentation)

A single register can hold a float4. So, let’s do some simple math. The matrix uses 4 registers (4 x float4) and the color just one: 128 / 5 = 25. We should be able to batch 25 draw calls in a single call. But how does the shader know which matrix to use? To provide this information in the shader, we simply add a batch identifier to the vertex buffer: x,y,z, u,v, batchID. The vertex shader could look like this then:

...
parameter float4x4 clipSpaceMatrix[25];
 
void evaluateVertex()
{
    vertexClipPosition = vertexPosition * clipSpaceMatrix[batchID];
}

Yay! We just batched our draw calls and the engine will run a lot faster for sprites with the same texture.

But there is more… Right now, we can only batch sprites that share the same texture. Wouldn’t it be great if we could batch just everything? There is an idea called texture atlas. Basically it’s pretty simple as well: Instead of using different textures, you just “bake” every texture used in your game into a single big texture like this: Pocket God Texture Atlas. All you have to do then, is to adjust the UV coordinates of your sprites to match the original texture in the big one. Generating a texture atlas at runtime and adjusting the UV coords is in fact a bit more work…

Have fun exploring the GPU ;)

Cocos2D Particle System Sources

March 21st, 2011 | Posted by lars in Cocos2D | iPhone | Particles | Source - (1 Comments)

Since I’m receiving questions about the particle system I used in my FluidToy 2 quite often lately, I thought I just release the sources here.

The particle system is based on a CCNode and works similar to the existing Cocos2D particle systems. The main difference is, that you can control the position and velocity of each particle. Please note that this system is not very optimized and lacks in flexibility since I just built it for FluidToy 2 and it was never meant to be more general. In other words: The code is a bit dirty! But feel free to play around with it, modify it and make use of it (and let me know what you’re making out of it).

There are two different systems: The SimpleParticleSystem, which draws points of any size or a point sprite using a texture. The second one is a LineParticleSystem, which will draw line particles. If you initialize the system with a size of 1000, you’ll have 500 lines, because a line consists out of 2 points (Wow, you never imagined that, did you? ;)). So if you loop through the particles, be sure to move only every second particle, the other one will follow with a small delay.

Download: Cocos2DSimpleParticleSystem

It works like every other CCNode:

particles = [SimpleParticleSystem node];
[particles initialize: 1000 width: size.width height: size.height];
[particles setTextureByString: @"particle_small.png"];
[self addChild: particles];
 
Loop through particles:
 
while(count < particles.particleCount)
{
   p = &particleAr[count];
   p->dir.x += CCRANDOM_MINUS1_1();
   p->dir.y += CCRANDOM_MINUS1_1();
   ++count;
   ...
}

Have fun!