GammaCV: a simple custom operation example
A couple of weeks ago, I was doing a proof of concept for a new feature and needed to perform some computer vision operations in the process. I wanted to use this as an opportunity to try out a JavaScript computer vision library other than jsfeat, which helped me build a camera-controller musical instrument two years ago. I personally love jsfeat’s API, but it’s, well, dead. The last commit in the repository is, as of November 2022, almost five years old, so I wanted to try an alternative with a bit more active contributors. And so, enter GammaCV!
But let’s look at the task at hand. Let’s say that we have a (very) simple image:
For the feature we’re building, we need to grayscale, blur, and extract edges from this and similar images. Let’s start with preparing a standard GammaCV workflow. First, we need the input:
import * as gm from "gammacv";
import img from "../assets/bitmap.js"; // our image in base64 format
// the width and height of the image
const WIDTH = 820;
const HEIGHT = 462;
const process = async () => {
const input = await gm.imageTensorFromURL(img, "uint8", [HEIGHT, WIDTH, 4]);
};
process();
What is a tensor? It’s an N-dimensional (in this case — 3-dimensional) data structure that holds all the information about an image. In this case, pretty intuitively, it holds the number of columns equal to the width of the image in pixels, amount of rows equal to the height, and 4 values for each pixel — the value of red, green, blue, and alpha.
Next, we need to define the operations that will be applied to the image. The official documentation shows how this can be done using reassigning a variable, but I prefer pipelining them using the functional approach and ramda (or rambda, the quicker alternative to ramda).
import * as gm from "gammacv";
import * as R from "rambda";
import img from "../assets/bitmap.js"; // our image in base64 format
// the width and height of the image
const WIDTH = 820;
const HEIGHT = 462;
const process = async () => {
const input = await gm.imageTensorFromURL(img, "uint8", [HEIGHT, WIDTH, 4]);
const operations = R.pipe(
gm.grayscale,
(previous) => gm.gaussianBlur(previous, 10, 3),
gm.sobelOperator,
(previous) => gm.cannyEdges(previous, 0.25, 0.5)
)(input);
};
process();
This is starting to look good, but it won’t do anything. GammaCV’s API requires a few additional steps to actually do the processing. We need to have an output tensor that will receive the outcome. GammaCV also requires us to create and start a session to process the image.
import * as gm from "gammacv";
import * as R from "rambda";
import img from "../assets/bitmap.js"; // our image in base64 format
// the width and height of the image
const WIDTH = 820;
const HEIGHT = 462;
const process = async () => {
const input = await gm.imageTensorFromURL(img, "uint8", [HEIGHT, WIDTH, 4]);
const operations = R.pipe(
gm.grayscale,
(previous) => gm.gaussianBlur(previous, 10, 3),
gm.sobelOperator,
(previous) => gm.cannyEdges(previous, 0.25, 0.5)
)(input);
const output = gm.tensorFrom(operations);
const session = new gm.Session();
session.init(operations);
session.runOp(operations, 0, output);
};
process();
OK, so the operations are now run. Logging the output
object will give us a non-empty array of pixels. How can we see the result? GammaCV also provides a simple way to create canvases, we just need to attach them to the DOM:
import * as gm from "gammacv";
import * as R from "rambda";
import img from "../assets/bitmap.js"; // our image in base64 format
// the width and height of the image
const WIDTH = 820;
const HEIGHT = 462;
const process = async () => {
const input = await gm.imageTensorFromURL(img, "uint8", [HEIGHT, WIDTH, 4]);
const operations = R.pipe(
gm.grayscale,
(previous) => gm.gaussianBlur(previous, 7, 3),
gm.sobelOperator,
(previous) => gm.cannyEdges(previous, 0.25, 0.5)
)(input);
const output = gm.tensorFrom(operations);
const session = new gm.Session();
session.init(operations);
session.runOp(operations, 0, output);
const inputCanvas = gm.canvasCreate(WIDTH, HEIGHT);
const outputCanvas = gm.canvasCreate(WIDTH, HEIGHT);
document.body.append(inputCanvas);
document.body.append(outputCanvas);
gm.canvasFromTensor(inputCanvas, input);
gm.canvasFromTensor(outputCanvas, output);
};
process();
Ta-da! We can now check the result:
OK, what happened here? It’s a bit hard to debug as we’re running a couple of operations in one batch. Let’s only limit to extracting a grayscale image for now. The result is:
OK, so we know what happened — event though we think that we have some text on a white background, GammaCV interprets it as text on black background, and the difference between the colors is too small to detect edges. Why does that happen? Let’s check the first pixel of the source image. To get the data from a tensor
, we need to use its get
method. And since it’s a 3-dimensional tensor, each call will get us a value of a channel, and we need to supply three coordinates for each channel:
console.log(
input.get(0, 0, 0),
input.get(0, 0, 1),
input.get(0, 0, 2),
input.get(0, 0, 3)
);
// 0 0 0 0
That’s not very fascinating, is it? Still, it gets the job done. As it turns out, the background of our image is transparent, as indicated by the zero. What’s the problem? It’s also interpreted as black (which makes sense as long as the opacity is equal to 0), and applying GammaCV’s grayscale (as well as any other operation actually) ignores the opacity, hence transforming the seemingly white pixels to black ones, and in effect ruining our output.
How to handle that? Well, unfortunately, GammaCV doesn’t have a ready-to-use method to tackle that kind of a situation, but it does have a great API to build our own, custom operations. Let’s give it a try!
First, we need to register an operation. As we will want to use this custom operation in our pipeline, let’s wrap it in an arrow function:
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency");
In this step, we simply create an instance of the operation, from which we can chain methods that will define it in a way that we need. Let’s continue with defining the input:
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8");
The Input
method accepts are two arguments: the first one is the name that is going to be used to retrieve the data further on, and the second one denotes the type of data — in our case, it’s uint8
. As for the name, the Src
part is arbitrary and can be whatever describes the data best, the t
part is a good practice to distinguish the inputs further on. It’s also worth mentioning that an operation can accept multiple inputs. Next, let’s define the output:
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8");
This is pretty straightforward. An operation can only have one return value, and it’s going to be uint8
. We also need to define the shape of the returned value, similar to the tensors we already defined:
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8")
.SetShapeFn(() => [HEIGHT, WIDTH, 4]);
In this case, we’re going to iterate over the pixels one by one, processing each if needed, and return a tensor of the same shape. In this case, it’s going to be [HEIGHT, WIDTH, 4]
, where 4
represents the number of channels again. We could think of a case where we’d like to calculate the average value of all the pixels in an image; in this case, we’d only want to return one pixel, so it would be [1, 1, 4]
. If we’d want to have a custom grayscale function, which would return one grayscale value instead of RGBA channels, the returned shape would be [HEIGHT, WIDTH, 1]
.
Next up, since the custom operations use GLSL Shaders to do the processing, we’ll need to load a chunk, which is a pre-defined function that we’ll be able to use in the shader kernel itself. In this case, we’re going to use the pickValue
function (more on it later on):
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8")
.SetShapeFn(() => [HEIGHT, WIDTH, 4])
.LoadChunk("pickValue");
What does pickValue
do? Our GLSL entry function is going to receive the coordinates of a pixel, pickValue
allows to retrieve its values. Let’s move on to the GLSL function itself. Let’s put the code in a separate kernel.glsl
file:
vec4 operation(float y, float x) {
vec4 data = pickValue_tSrc(y, x);
}
There are a couple of things happening here, so let’s stop for a moment. First, vec4
. This is a vector of four single-precision floating point numbers. Why four? Because a vector in this use case represents a pixel represented in RGBA format, which consists of four numbers. The operation
function receives two arguments, y
and x
positions of the current pixel in the image. To get the actual value, we need to use the pickValue
. Please note that every input has its own function to retrieve the value — we don’t pass the input to the function, but rather GammaCV generates a separate function for each input. More information on the pickValue
function can be found here.
Once we have the RGBA value of the pixel, we can proceed with removing the transparency. We need to assume the color of the background (it’s also possible to pass the color of the background as an argument to the operation — if you’d like to learn how, please let me know in the comments!). For our use case, let’s assume that it will be white. We need a value for each of the RGBA channels:
vec4 operation(float y, float x) {
vec4 data = pickValue_tSrc(y, x);
return vec4(
?,
?,
?,
1.0
);
}
The alpha
channel already has a value of 1.0
— that’s because we want to remove opacity completely, and each channel in the resulting pixels should have a value from a closed interval from 0.0
to 1.0
. What about the RGB channels? In order to remove transparency completely, we need to blend the original color with white proportionally to the original opacity. So for each channel, the result should be:
const channelValue =
(1.0 - pixelOpacity) * 1.0 + pixelOpacity * originalValue;
What’s happening here? We take the original channel value and multiply it with the original pixel opacity. If the opacity is not 100% (1.0), we fill the missing value with the maximum channel value (so—white, if we take all the channels into account). We can now add that logic to our function, but we need to know one more thing — how to access the channel values from the data
object? It’s pretty straightforward — it’s data.r
, data.g
, data.b
, and data.a
for RGBA, respectively. Now we’re ready to put all of this together and complete the GLSL kernel:
vec4 operation(float y, float x) {
vec4 data = pickValue_tSrc(y, x);
return vec4(
(1.0 - data.a) * 1.0 + data.a * data.r,
(1.0 - data.a) * 1.0 + data.a * data.g,
(1.0 - data.a) * 1.0 + data.a * data.b,
1.0
);
}
From here, we can move on to finish our custom operation:
import * as gm from "gammacv";
import kernel from "./kernel.glsl";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8")
.SetShapeFn(() => [HEIGHT, WIDTH, 4])
.LoadChunk("pickValue")
.GLSLKernel(kernel);
If our project’s setup doesn’t handle .glsl
files, and we couldn’t be bothered to add it, the kernel can be supplied as a string:
import * as gm from "gammacv";
const removeTransparency = () => gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8")
.SetShapeFn(() => [HEIGHT, WIDTH, 4])
.LoadChunk("pickValue")
.GLSLKernel(`
vec4 operation(float y, float x) {
vec4 data = pickValue_tSrc(y, x);
return vec4(
(1.0 - data.a) * 1.0 + data.a * data.r,
(1.0 - data.a) * 1.0 + data.a * data.g,
(1.0 - data.a) * 1.0 + data.a * data.b,
1.0
);
}
`);
We’re only missing one more thing to complete the custom operation — we need to supply the data!
import * as gm from "gammacv";
import kernel from "./kernel.glsl";
const removeTransparency = (previous) =>
new gm.RegisterOperation("removeTransparency")
.Input("tSrc", "uint8")
.Output("uint8")
.SetShapeFn(() => [HEIGHT, WIDTH, 4])
.LoadChunk("pickValue")
.GLSLKernel(kernel)
.Compile({ tSrc: previous });
We can now check if the operation works correctly. For now, let’s check what happens when we — like previously — try to apply grayscale, but this time on the effect of our operation. This is how our operation pipeline will look like in this case:
const operations = R.pipe(
(previous) => removeTransparency(previous),
gm.grayscale
)(input);
And here’s the effect:
Just for the sake of checking how the colors blend, let’s check how it would look like without the grayscale
operation:
It looks like the original image, which is exactly what we wanted. Now, let’s apply all the operations and check if the final effect is satisfactory. Here’s our final operations
pipeline:
const operations = R.pipe(
(previous) => removeTransparency(previous),
gm.grayscale,
(previous) => gm.gaussianBlur(previous, 7, 3),
gm.sobelOperator,
(previous) => gm.cannyEdges(previous, 0.25, 0.75)
)(input);
And here’s the effect:
And this is exactly what we wanted to achieve. For complete code and a working demo, check out this code sandbox.
Hungry for more Computer Vision in the browser? Check out my previous stories on this subject and be sure to follow me — more still to come!
I mentor software developers. Drop me a line on MentorCruise for long-term mentorship or on CodeMentor for individual sessions.