Project Dogwaffle - How's it made?

Information from Dan Ritchie, for Developers - iin case you are curious
                        Howler - beyond Project Dogwaffle!

 (written by Dan Ritchie Sept. 2011)
Here's a small amount of information on the architecture of Dogwaffle/PD  Pro/ Howler...

The program is written in straight c, no c++, with interface, glue code,  and some functions in VB (classic, not ,net).

The .net framework was added as a requirement later to support the addition of plugins that could  make use of it, since VB classic was becoming outdated and .net tools were  available in free downloadable versions such as VB express.

The plugin interface is an active x server with a Dogwaffle class and also  a 3D version of the Dogwaffle class. Each class adds numerous functions  for accessing internal data such as the image buffer which can be copied and used by the plugin as desired. There are no strict guidelines for  plugin development because the interface is very general and flexible. There is however a getting started section and fairly complete function  reference available on

The interface uses VB style data types, namely 32 bit longs, 16 bit  integers, and 8 bit bytes. There are no 64 bit data types in the plugin  interface. For historic reasons, the width and height of an image  returned by the plugin functions are 1 pixel larger than indicated by the  width and height properties.

 Examples of plugins that use the newer .net framework include the Batch  Browser and Matte Cutter. The .net framework is not used by the core program at all. It is however  used by the graphics server applet, that sits in the background and waits  for messages through a window based message port, in order to render  simple graphics. It is currently used by the curve tool. Excluding that,  there is currently no core dependence on the .net framework. The program  will "probably" work without it, but that hasn't been tested at this time,  and some features would fail, such as the afore mentioned Batch Browser, The c code used in PD is contained in a handful of DLLs. In version 6  these DLLs were split further to support newer functions that were  compiled using SSE2 support, and older functions that had hand written MMX  code and assembler code.

There is very little assembler code left over  frome the days when c by itself wasn't fast enough. Also, MMX code was  written in assembler by nececity, and there was no compeling reason to  abandon it since the SSE code is strictly for floating point. There is no  hand-written SSE code at this time. All SSE code is generated by the C compiler using scaler math. Numerous functions inside the app are implemented as plugins. These can  be programs written in just about any language, but currently most are  written in VB classic. Example of these plugins would include the Store  Image feature. There are several advantages to this approach. One is that plugins are parallel by nature, and also the main program is  protected if a plugin should crash. The plugins run in their own address space, so the mothership (Dogwaffle) is safe.

On the subject of designing plugins that run parallel to the main program,  some care should be taken to avoid possible problems that could result in  deadlock, although rare.

 Other features are implemented as LUA scripts. These scripts are executed  by an external jit compiler. DogLua is roughly compatible with the LUA  scripting engines in several other graphics editors. (see

Graphically, the program makes use of GDI, and doesn't use any of the  higher end API's such as OpenGL. Acceleration is achieved through  multi-threading and the aforementioned MMX and SSE, along with a lot of  plain old hand optimizing and the occasional "special case" optimization. Multi-threading doesn't use OpenMP at this time. Multithreading currently  works like this in most cases, especially in the case of filters. A task  in split into several segments, one for each core on the CPU. All threads  are then executed at the same time, letting the scheduler decide how to execute them until the task is complete. Each segment is of the same size. It is possible that one will complete before another. For tasks that take a long time to execute, it would be better to use a more dynamic approach, but in the case of filters that often execute in less than 1/10  or 1/15 of a second or less, the extra little bit of savings would probably get lost in the scheduler's quantum, or that's the theory anyway.

 Yeah, well.

 There are other special cases where a specific number of threads are  created, when dividing an image into segments doesn't make sense. Each  thread may work on one image plane (r,g, or b) for example, in that case.

Happy coding!

