blur detection in z-stacks

Z-stacking

Tissue presented on a glass slide can be obtained in three fashions: tissue slides by a microtome from a FFPE tissue block is typically completely flat. This means a scanner can usually obtain an optimal focus point and scan a large area of tissue in a single pass.

The two other types of samples however are typically not flat: both cytological samples and needle aspirations contain clumps of cell (a pap smear test is a good example); if samples from FFPE tissue blocks are mirror-like surfaces (at least with respect to smoothness), then cytological samples are more like the gravel surface on a tennis court.

Slide scanners can’t decide on a single layer within the tissue to obtain a “one size fits all” focus, so they typically scan a number of sequential planes. This is your z-stack. Pathomation software supports z-stack data from various scanner vendors.

Beyond viewing

Allowing an end-user to navigate through a z-stack is nice, but can be very tedious and time consuming. Imagine that we have a z-stack with 4 images, then each individual layer can look like this:

A skilled pathologist or researcher can scan through a complete slide in under 10 minutes, but it takes time to acquire this skill, and even so: much of that time is still spent zooming in and out, looking for the optimal sharpness in many regions of the slide.

Can we then use an algorithm to at least select out the most out-of-focus and in-focus tiles in a tiled z-stack? Of course the answer is yes (or we wouldn’t be writing about it)!

To follow along with the rest of this blog, you’ll need a Jupyter environment (we use Anaconda), a Python programming environment, as well as the PMA_python, Matplotlib, Numpy, and OpenCV packages. You can learn here about setting up PMA_python in Anaconda; take care that you install OpenCV with the correct package name, too (it’s PyPI package name is actually opencv-python, even though you import it into your code as “cv2”):

Let’s first see about reading in all images in an array (and visualize them as a sanity check):

import matplotlib.pyplot as plt
import numpy as np
import cv2

folder = "C:/Users/Yves/blog/"
image_names = ["z-stack1.jpg", "z-stack2.jpg", "z-stack3.jpg", "z-stack4.jpg"]

images = [plt.imread(folder + i_name) for i_name in image_names]

for idx in range(1, len(images)+1):   
    plt.subplot(1, len(images), idx)
    plt.imshow(images[idx-1])

Next, let’s apply an OpenCV blurring filter on each image and compute the difference between the pixel-values in the original image and the blurred image.

blur = []
diff = []

for idx in range(0, len(images)):
    blur.append(cv2.blur(images[idx],(5,5)))
    diff.append(np.abs(blur[idx] - images[idx]))

And plot the result (although admittedly visualization of the diff-arrays doesn’t necessarily reveal much information):

for idx in range(0, len(images)):
    plt.subplot(len(images), 3, 1),plt.imshow(images[idx]),plt.title('Original tile ' + str(idx+1))
    plt.xticks([]), plt.yticks([])
    plt.subplot(len(images), 3, 2),plt.imshow(blur[idx]),plt.title('Blurred tile ' + str(idx+1))
    plt.xticks([]), plt.yticks([])
    plt.subplot(len(images), 3, 3),plt.imshow(diff[idx]),plt.title('Difference tile ' + str(idx+1))
    plt.xticks([]), plt.yticks([])
    plt.show()

The idea behind our exercise is the following: a blurred image that is blurred, will show less difference in pixel-values, compared to a sharp images that is blurred with the same filter properties.

Therefore, we can now compute the sum of all the values in the diff-array and identify the indices of the lowest and highest values. For our z-stack, the index that contains the smallest summed differences will point to the blurriest (or “most blurred”?) image; the index that contains the largest summed difference will point to the sharpest image. Let’s see if this works:

diff_sums = [np.sum(arr) for arr in diff]

idx_blurriest = diff_values.index(np.min(diff_sums))
idx_sharpest = diff_values.index(np.max(diff_sums))

plt.subplot(121)
plt.imshow(images[idx_blurriest])
plt.title("Blurriest tile")

plt.subplot(122)
plt.imshow(images[idx_sharpest])
plt.title("Sharpest tile")

Success! We now have a way to identify the sharpest image in a z-stack, using only free and open source software.

You can download the Jupyter notebook with the code so far here.

Sharpness heatmap

For any z-stacked slide, we can now write a loop that goes over all tiles in the slide. At each location, we retrieve all the z-stacked tiles:

def get_z_stack(path, x, y):
    z_stack = []
    for i in range(0, num_z_stacks):
        # print("getting tile from", path, " at ", x, y, " (zoomlevel", sel_zl, ") / z-stack level ", i)
        z_stack.append(core.get_tile(path, x, y, zoomlevel=sel_zl, zstack=i))
    return z_stack

Now we can calculate at each location independently the sharpest tile.

def determine_sharpest_tile_idx(tiles):
    blur = []
    diff = []

    for idx in range(0, len(tiles)):
        blur.append(cv2.blur(np.array(tiles[idx]),(5,5)))
        diff.append(np.abs(blur[idx] - tiles[idx]))
        diff_sums = [np.sum(arr) for arr in diff]

    return diff_sums.index(np.max(diff_sums))

And we can invoke these method repeatedly for each location in the slide:

zoomlevels = core.get_zoomlevels_dict(slide)
sel_zl = int(round(len(zoomlevels) / 2)) + 2     # arbitrary selection
max_x, max_y = zoomlevels[sel_zl][0], zoomlevels[sel_zl][1]

sharpness_map = []
for x in range(0, max_x):
    print (".", end="")
    for y in range(0, max_y):
        sharpness_map.append(determine_sharpest_tile_idx(get_z_stack(slide, x, y)))

Say that the thumbnail of our image look as follows:

We can then print the selected optimal layers at each position as a heatmap with matplotlib.

The resulting image illustrates the fact that indeed throughout the slide, different clusters of cells are sharper in one plane or another.

The output for one of our slides looks like this:

 

Constructing a new single-plane image

Instead of returning the index of the sharpest tile in a stack of tiles, we can return the sharpest tiles itself. For this, we only need to make a small modification in our earlier determine_sharpest_tile_idx function.

def determine_sharpest_tile(tiles):
    blur = []
    diff = []

    for idx in range(0, len(tiles)):
        blur.append(cv2.blur(np.array(tiles[idx]),(5,5)))
        diff.append(np.abs(blur[idx] - tiles[idx]))
        diff_sums = [np.sum(arr) for arr in diff]

    return tiles[diff_sums.index(np.max(diff_sums))]

We can subsequently paste all of the sharpest tiles into a new PIL image object that is made up of the dimensions of the selected zoomlevel from which the tiles are selected:

def get_focused_slide(slide, zoomlevel):
    x_count, y_count, n = core.get_number_of_tiles(slide, zoomlevel)
    x_tile_size, y_tile_size = core.get_tile_size()
    focused_tiles = []
 
    image = Image.new('RGB', (x_count * x_tile_size, y_count * y_tile_size))
 
    coords = list(map(lambda x: (x[0], x[1], zoomlevel), itertools.product(range(y_count), range(x_count))))

    for x in range(0, max_x):
        print (".", end="")
        for y in range(0, max_y):
            focused_tiles.append(determine_sharpest_tile_idx(get_z_stack(slide, x, y)))

    i=0
    for y in range(y_count):
        for x in range(x_count):
            tile = focused_tiles[i]
            i+=1
            image.paste(tile, (x*x_tile_size, y*y_tile_size))

    return image

If you’ve run the code so far, you’ve already found out (the hard way) that it can actually take quite some time to loop over all tiles in the slide this way.

Therefore, let’s introduce some parallellization logic here as well:

def get_focused_tile(c):
    return determine_sharpest_tile(get_z_stack(slide, c[1], c[0], c[2]))

def get_focused_slide(slide, zoomlevel):
    x_count, y_count, n = core.get_number_of_tiles(slide, zoomlevel)
    x_tile_size, y_tile_size = core.get_tile_size()
    focused_tiles = []
 
    image = Image.new('RGB', (x_count * x_tile_size, y_count * y_tile_size))
 
    coords = list(map(lambda x: (x[0], x[1], zoomlevel), itertools.product(range(y_count), range(x_count))))

    tiles = Parallel(n_jobs=-1, verbose=2, backend="threading")(
        map(delayed(get_focused_tile), tqdm(coords)))
 
    i=0
    for y in range(y_count):
        for x in range(x_count):
            tile = tiles[i]
            i+=1
            image.paste(tile, (x*x_tile_size, y*y_tile_size))

    return image

Once you have the new image, saving the image to TIFF output is trivial:

image = get_focused_slide(slide, 6)    # do this at zoomlevel 6, can take a while...
image.save('focused_image.tiff')

Which in practice can end up looking like this (including parallellization progression bars):

The resulting image can now be opened in turn (in PMA.start, of course) to navigate without the need to continuously having to scan for the sharpest plane.

Here’s comparison of a random zoomlevel in the original image with the same region of interest in the new optimized image:

The sourcecode for this procedure so far can be downloaded as another Jupyter notebook (thanks Oleg Kulaev for helping with the finer details of parallellization and image orientation handling in matplotlib).

Cool huh? Well, sort of, kinda… read on.

Libvips to save our TIFFs

There’s an implicit problem with the TIFFs that we generate: they’re big! Most importantly… they’re flat.

There’s some irony to this. We spent all this effort flattening the z-stacked CZI file for a flat TIFF, which is infinitely slower to navigate than the original CZI!

So saving as a pyramidal TIFF instead of a flat tiff would definitely help our case here.

The quickest way to do this is through libvips, with pyvips bindings installed (make sure that you install libvips before getting pyvips with pip; also, if you’re on Windows, make sure that the libvips’ \bin folder is included in your SYSTEM path variable).

Our final step therefore is getting our “flat” PIL image object to a pyramidal tiff (if you need to freshen up about why the pyramid exists in these files, see our section on Whole Slide Images and pyramid stacks in https://realdata.pathomation.com/digital-microscopy-imaging-data-with-python/)

import pyvips

dtype_to_format = {
 'uint8': 'uchar',
 'int8': 'char',
 'uint16': 'ushort',
 'int16': 'short',
 'uint32': 'uint',
 'int32': 'int',
 'float32': 'float',
 'float64': 'double',
 'complex64': 'complex',
 'complex128': 'dpcomplex',
}

img_3d_array = np.asarray(image)
height, width, bands = img_3d_array.shape
linear = img_3d_array.reshape(width * height * bands)
pyvips_image = pyvips.Image.new_from_memory(linear, width, height, bands, dtype_to_format[str(img_3d_array.dtype)])

tile_width, tile_height = core.get_tile_size()
pyvips_image.write_to_file('pyramidal_image.tiff', tile=True, tile_width=tile_width, tile_height=tile_height, pyramid=True)

This final procedure is available as yet another separate Jupyter notebook. Once again much gratitude goes to Oleg Kulaev for figuring out the libvips interactions.

Optimization

The procedure as described can still take quite a while to process. The time needed is highly dependent on the selected zoomlevel. Deeper zoomlevels give you better accuracy, but at the expense of having to process more pixels.

We wrote this post mostly to explain the principle of how blur detection and z-stack optimization can work. By no means do we claim that is production-ready code. If you need professional, scalable blur-detection methods, check out e.g. David Ameisen‘s start-up company ImginIT (just don’t expect his process to be as clearly explained as we just did).

Here are a couple of ideas for optimization:

  • Do we really need to consider all pixels in a tile? Can we get by with selecting perhaps every other 2nd, 4th, 8th… pixel perhaps?
  • Do we need to run the blur detection on tiles from zoomlevel x in order to determine the right layer to select from? Perhaps we can get by with zoomlevel x – 1, x – 2…
  • Do we need to process tiles in their entirety? Perhaps we can select tiles from a coarser zoomlevel, artificially break up an individual tiles in smaller tiles, and use those as a reference.
  • Last but not least, do we even need all contingent tiles? This is easiest to illustrate in a 1-dimensional world: if the optimal layer to select from for tile 1 and tile 3 are the same, is there still any point to examine tile 2 as well? It’s easy now to conceive a recursive algorithm that starts with a (possibly even random) selection of tiles that only analyzes intermediate tiles as the need arises.

In a next post we’ll be discussing how you can use PMA.start to generate DICOM-compliant image objects.

 

 

 

Integrity for all

Transferring whole slide images

The following is a cartoon from XKCD dating from a few years back. If you’ve ever been involved in the transfer of whole slide images, this will be very recognizable.

With whole slide imaging data, the above actually is more complicated. The step that is being skipped in the cartoon is the file integrity check. And with whole slide images, as a single virtual slide can consist of multiple files, it can become really complex really fast.

If you’re a professional user of PMA.core, you no doubt have already appreciated the convenience of the “integrity check” tool: it allows you to select a set of slides, and check if they actually really whole slide imaging data.

Can your image management solution do this? 🙂

But we admit, buying PMA.core just to verify virtual slide integrity is probably overkill.

DIY slide integrity

It turns out, PMA.start is just a well capable to verify virtual slide integrity. But first some background about why you would even want to do this.

The fundamental reason why you want to route all of your slide manipulation operations through PMA.start or its bigger brother PMA.core is that the software knows and understands what a slide is. When you copy a number a multi-file slides from one device to another and you refer to our software components in your code for verification purposes, you cannot make any mistakes in terms of accidentally copying only parts of files the slides. If a mistake does crop up in your report, you can simultaneously check whether something was perhaps wrong with the original slides and files. Digital pathology is hard enough as it is; at least do it on a unit of data (the “slide” rather than the “file”) that makes sense!

As much of slide transfers occur “ad hoc” (on a “need to transfer” basis sort of speak), we wrote an integrity check procedure ourselves as an Anaconda Jupyter notebook.

In order to get PMA.python to work with Anaconda, you’ll first have to get it through PyPI. Start an anadonda prompt and type in “pip install pma_python”.

Great, now you’re ready to import our nifty library in your Jupyter notebooks.

Here’s a good way to start all of your notebooks:

The first block confirms the availability of the library; the second block confirms that PMA.start is running on your system.

This is where Jupyter really shines: you can subdivide your Python (or R) code in separate discrete parts and execute them in an order you want. It becomes really convenient to tweak your scripts and even faster to debug them.

Our strategy for doing an integrity check on a folder an its subfolders is straightforward: loop over all the slides in the folder and see whether we can request the slide information from each one:

def loop_over_directories(dir):
    slides = {}
    for subdir in core.get_directories(dir):
        slides.update(loop_over_directories(subdir))
        for slide in core.get_slides(dir):
            slides[slide] = core.get_slide_info(slide)
    return slides

The result of this function is a dictionary that as keys has paths to all the slides, and as values has information objects (or None in case the slide turned out to be corrupt).

We can now invoke the function on an external hard disk’s folder after copying to see if it goes ok. By means of a sanity check, we’ll also output some parameters on the first slide encountered, as well as the number of items in the dictionary:

slides = loop_over_directories("LaCie (F:)/Pathomation")
topslide = slides[list(slides.keys())[0]]
print(topslide["Filename"], topslide["Width"], topslide["Height"])
len(slides)

The output in Jupiter looks like this:

Our next step is to convert part of the dictionary data to a pandas DataFrame, which in turn it written out as an Excel file. The Excel file is subsequently stored on the external hard disk to provide a check-list for the client and confirm data integrity on their end upon receipt.

df = pd.DataFrame(columns=["SlideName", "PixelWidth", "PixelHeight", "ppmX", "ppmY", "SizeOnDisk"])
df.set_index(["SlideName"])
for slide in slides:
    info = slides[slide]
    if info == None:
        print("Unable to get slide information for", slide)
    else:
        df = df.append({"SlideName": slide,
            "PixelWidth": info["Width"],
            "PixelHeight": info["Height"],
            "ppmX": info["MicrometresPerPixelX"],
            "ppmY": info["MicrometresPerPixelY"],
            "SizeOnDisk": info["PhysicalSize"]}, ignore_index=True)
print (df.shape[0], " out of ", len(slides), " check out OK")

Great, but what if things don’t work out they’re supposed to?

As it turns out, we had been copying a collection of slides (about 60GB in total) overnight from an FTP site. Chances are that in the time that takes, something will go wrong. This turned out have happened in our case, as well, and the output flagged correctly that there was a mistake.

 

The flagged item was minor actually; one of the files had indeed failed to copy and had resulted in a file size with 0 bytes. We downloaded the file anew, and now everything checked out.

Did we need PMA.start to find out that a file was the wrong size? No, we could have used a range of other tools. But the point is that many things can go wrong with whole slide image transfer. The final goal is to check slide integrity, and once a problem is detected, other tools can be used to dig further and resolve them.

There are other potential problems, too: you may be sending someone slides who’s unable to read them. By including a test report from your end with the data, you’re telling them that the files are indeed all right, and that other problems may be at play (perhaps they’re not using the right viewer or import module for the file format that you sent them).

Troubleshooting and debugging is all about eliminating possibilities. We like to think PMA.start (and it’s bigger brother PMA.core) can help you get there faster than anything else on the market today.

You can download the Jupyter notebook discussed in this post. Just modify the folder_to_check variable in the third codeblock to point to the location that you want to validate.

 

Our SDK is now available in Java, too!

SDK for Java

Following the release of our SDK for python, our next step is to implement the same set of functionalities for Java.
Similarly to what we already did for Python, this Java package comes with a set of methods to do basic tasks such as obtaining lists of slides, navigating a hierarchical folder structure, and, of course, extracting tiles. The code was deposited in GitHub.

To make using the SDK easier, we submitted the package to the Maven central repository

Getting an interface to Pathomation software in Java is now as easy as including this dependency tag into your pom.xml (version tag set to always retrieve latest version) :

<dependency>
    <groupId>com.pathomation</groupId>
    <artifactId>pma-java</artifactId>
    <version>[2.0,)</version>
</dependency>

You should then be able to invoke Core methods :

Getting started

What can you do with our Java SDK today? If you have PMA.start installed on your system, you can go right ahead and try out the following code:

import com.pathomation.*;

public class Test {

    public static void main(String[] args) throws Exception {
        if (Core.pmaIsLite()) {
            System.out.print("Congratulations; PMA.start is running on your system");
            System.out.print("You’re running PMA.core.lite version " + core.getVersionInfo());
        } else {
            System.out.print("PMA.start not found. Either you don’t have it installed, or you don’t have the server-component running currently");
            throw new Exception("PMA.start not detected");   
        }
    }
}

You can use the same isLite() method by the way to ask your end-user to make sure PMA.start IS running before continuing the script:

import com.pathomation.*;

public class Test {

    public static void main(String[] args) {
        if (Core.pmaIsLite()) {
            System.out.print("PMA.core.lite is NOT running.");
            System.out.print("Make sure PMA.core.lite is running and press <enter> to continue");
            System.in.read();
        }
        if (Core.pmaIsLite()) {
            System.out.print("PMA.core.lite is NOT running.");
            System.exit(1);
        }
    }
}

Slides

Now that you know how to establish the availability of PMA.start as a back-end for whole slide imaging (WSI) data, you can start looking for slides:

import com.pathomation.*;

public class Test {

    public static void main(String[] args) throws Exception {
        if (!Core.pmaIsLite()) {
            throw new Exception("PMA.start not detected");
        }
        // assume that you have slides in C:\my_slides (note the capital C)
        for (String slide : Core.getSlides("C:/my_slides")) {
            System.out.print(slide);
        }
    }
}

But you knew already that you had slides in that folder, of course. By, the way, if NO data shows up, check the specified path. It’s case sensitive, and drive letters have to be capitalized. Also make sure to use a forward slash instead of the more traditional (on Windows at least) backslash.

Now what you probably didn’t know yet is the dimensions of the slide, both in pixels as well as micrometers.

System.out.print("Pixel dimensions of slide:");
Integer xDimPix = Core.getPixelDimensions(slide).get(0),
        yDimPix = Core.getPixelDimensions(slide).get(1);
System.out.print(xDimPix.toString() + " x " + yDimPix.toString());

System.out.print("Slide surface area represented by image:");
Float xDimPhys = Core.getPhysicalDimensions(slide).get(0),
      yDimPhys = Core.getPhysicalDimensions(slide).get(1);
System.out.print(xDimPhys.toString() + "µm x " + yDimPhys.toString() + "µm = ");
System.out.print((xDimPhys * yDimPhys / 1E6) + " mm2");

Below is the output on our computer, having 3 3DHistech MRXS slides in the c:\my_slides folder. You can use this type of output as a sanity check, too.

While the numbers in µm seems huge, they start to make more sense once translated to the surface area captured. As a reminder: 1 mm2 = 1,000,000 µm2, which explains why we divide by 1E6 to get the area in mm2. 1020 mm2 still not saying much? Then keep in mind that 100 mm2 equals 1 cm2, and that 10 cm2 can very will constitute a 2 cm x 5 cm piece of tissue. A physical slide’s dimensions are typically 10 cm x 4 cm. Phew, glad the data matches reality!

Determining a slide’s magnification

We can also determine the magnification at which an image was registered. The getMagnification function has a Boolean exact= parameter that works as follows: when set to True, getMagnification will round to the nearest “whole number” magnification that’s typically mentioned on a microscope’s objective lens. This could be 5X, 20X, 40X… But bear in mind that when a microscopist looks through his device, he can fine-focus on a sample, thereby slightly modifying the actual magnification used, perhaps from 40X to 38X (even though the label on the lens still says 40X of course). Scanners work in the same manner; because of auto-focusing, the end-result of a scan may be in 38X instead of 40X, or 21X instead of 20X. And this is the number that is returned when the exact= parameter is set to True.

 

Of course, when building a Dataframe (using library TableSaw), you might as well include columns for both measurements (perhaps using the rounded measurement later for a classification task):

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;

import com.pathomation.*;
import tech.tablesaw;

public class Test {

    public static void main(String[] args) throws Exception {

        if (!Core.pmaIsLite()) {
            throw new Exception("PMA.start not detected");
        }
        // create blank list (to be converted into a pandas DataFrame later)
        List<Map<String, Object>> slideInfos = new ArrayList<Map<String, Object>>();

        //Initialize the components for TableSaw
        StringColumn colSlide = StringColumn.create("slide");
        DoubleColumn colApproxMag = DoubleColumn.create("approxMag");
        DoubleColumn colExactMag = DoubleColumn.create("exactMag");
        BooleanColumn colIsFluo = BooleanColumn.create("isFluo");
        BooleanColumn colIsZStack = BooleanColumn.create("isZStack");
        // assume that you have slides in C:\my_slides (note the capital C)
        for (String slide : Core.getSlides("C:/my_slides")) {
            colSlide.append(Core.getSlideFileName(slide));
            colApproxMag.append(Core.getMagnification(slide, false));
            colExactMag.append(Core.getMagnification(slide, true));
            colIsFluo.append(Core.isFluorescent(slide));
            colIsZStack.append(Core.isZStack(slide));
        }
        Table results = Table.create("Results", colSlide, colApproxMag, colExactMag, colIsFluo, colIsZStack);
        results.print();
    }
}

The output of this script on our computer is as follows:

Note that for one slide, both the exact and the approximate magnification is 0. This is because that particular slide is a .jpg-file, which doesn’t contain any useful (pixels per micron) metadata to use to determine the magnification.

Representation

PMA.start is a free desktop viewer for whole slide images. Earlier, we introduced you to Core java, a novel package that serves as a wrapper-library and helps interface with PMA.start’s back-end API.

The images PMA.start typically deals with are called whole slide images, so how about we show some pixels? As it turns out, this is really easy. Just invoke the showSlide() call. Assuming you have a slide at c:\my_slides\alk_stain.mrxs, we get:

import Core.java;

public class Test {

    public static void main(String[] args)  {
        
        String slide = "C:/my_slides/alk_stain.mrxs";
        Core.showSlide(slide);
    }
}

The result depends on whether you’re using PMA.start or a full version of PMA.core. If you’re using PMA.start, you’re taken to the desktop viewer:

If you’re using PMA.core, you’re presented with an interface with less frills: the webbrowser is still involved, but nothing more than scaffolding code around a PMA.UI.View.Viewport is offered (which actually allows for more powerful applications):

Associated images

But there’s more to these images; if you only wanted to view a slide, you wouldn’t bother with Java in the first place. So let’s see what else we can get out of these?

Assuming you have a slide at c:\my_slides\alk_stain.mrxs, you can execute the following code to obtain a thumbnail image representing the whole slide:

import java.awt.BorderLayout;
import java.awt.Image;
import javax.swing.ImageIcon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;

import com.pathomation.*;

public class Test {

    public static void main(String[] args) {

        String slide = "C:/my_slides/alk_stain.mrxs";
        Image thumb = Core.getThumbnailImage(slide);
        displayImage(thumb);
    }

    public static void displayImage(Image image) {
        JFrame frame = new JFrame();
        JLabel lblImage = new JLabel(new ImageIcon(image));
        JPanel mainPanel = new JPanel(new BorderLayout());
        mainPanel.add(lblImage);
        frame.add(mainPanel);
        frame.setVisible(true);
    }
}

But this thumbnail presentation alone doesn’t give you the whole picture. You should know that a physical glass slide usually consists of two parts: the biggest part of the slide contains the specimen of interest and is represented by the thumbnail image. However, near the end, a label is usually pasted on with information about the slide: the stain used, the tissue type, perhaps even the name of the physician. More recently, oftentimes the label has a barcode printed on it, for easy and automated identification of a slide. The label is therefore sometimes also referred to as “barcode”. Because the two terms are used so interchangeably, we decided to support them in both forms, too. This makes it easier to write code that not only syntactically makes sense, but also applies semantically in your work-environment.

A systematic representation of a physical glass slide can then be given as follows:

The Core java library then has three methods to obtain slide representations, two of which are aliases of one another:

core.getThumbnailImage() returns the thumbnail image

core.getLabelImage() returns the label image

core.getBarcodeImage() is an alias for getLabelImage

All of the above methods return PIL Image-objects. It actually took some discussion to figure out how to package the data. Since the SDK wraps around an HTTP-based API, we settled on representing pixels through Pillows. Pillows is the successor to the Python Image Library (PIL). The package should be installed for you automatically when you obtained Core java.

The following code shows all three representations of a slide side by side:

import java.awt.BorderLayout;
import java.awt.Image;
import javax.swing.ImageIcon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;

import com.pathomation.*;

public class Test {

    public static void main(String[] args) {

        String slide = "C:/my_slides/alk_stain.mrxs";
        // displaying thumbnail image
        Image thumb = Core.getThumbnailImage(slide);
        displayImage(thumb);
        // displaying label image
        Image label = Core.getLabelImage(slide);
        displayImage(label);
        // displaying barcode image
        Image barcode = Core.getBarcodeImage(slide);
        displayImage(barcode);
    }
    
    public static void displayImage(Image image) {
        JFrame frame = new JFrame();
        JLabel lblImage = new JLabel(new ImageIcon(image));
        JPanel mainPanel = new JPanel(new BorderLayout());
        mainPanel.add(lblImage);
        frame.add(mainPanel);
        frame.setVisible(true);
    }    
}

The output is as follows:

Note that not all WSI files have label / barcode information in them. In order to determine what kind of associated images there are, you can inspect a SlideInfo dictionary first to see what’s available:

Map<String, Object> info = Core.getSlideInfo(slide);
System.out.print(Collections.singletonList((HashMap<String, Object>) info.get("AssociatedImageTypes")));

AssociatedImageTypes may refer to more than thumbnail or barcode images, depending on the underlying file format. The most common use of this function is to determine whether a barcode is included or not.

You could write your own function to determine whether your slide has a barcode image:

protected Boolean slideHasBarcode(String slide) {
    Map<String, Object> info = Core.getSlideInfo(slide);
    return ((HashMap<String, Object>) info.get("AssociatedImageTypes")).keySet().contains("barcode");
}

Whole Slide Images

If you already know about pyramidical image files, feel free to skip this paragraph. If you don’t, sticks around; it’s important to understand how microscopy data coming out of slide scanners is structured to be able to manipulate it.

It all starts with a physical slide: a physical slide is a thin piece of glass, with the dimensions

When a physical slide is registered in a digital fashion, it is translated into a 2-dimensional pixel matrix. At a 40X magnification, it takes a grid of4 x 4 pixels to represent 1 square micrometer. We can also say that the image has a resolution of 0.25 microns per pixel. This is also expressed as 4 pixels per micron (PPM).

All of this means that in order to present our 5 cm x 2 cm physical specimen from the first part of this tutorial series in a 40X resolution we need (5 * 10 * 1000 * 4) * (2 * 10 * 1000 * 4) = 200k x 80k = 16B pixels

Now clearly that image is way too big to load in memory all at once, and even with advanced compression techniques, the physical sizes of these is roughly around one gigabyte per slide. So what people have thought of is to package the image data as a pyramidal stack.

Pyramid of Cestius as a metaphore for pyramidal stack images. By Francesco Gasparetti from Senigallia, Italy – Piramide Cestia, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=2614848

Ok, perhaps not that kind of pyramid…

But you can imagine a very large image being downsampled a number of times until it receives a manageable size. You just keep dividing the number of pixels by two, and eventually you get a single image that still represents the whole slide, but is only maybe 782 x 312 pixels in size. This then becomes the top of your pyramid and we label it as zoomlevel 0.

At zoomlevel 1, we get a 1562 x 624 pixel image etc. It turns out that our original image of 200k x 80k pixels is found at zoomlevel 8. Projected onto our pyramid, we get something like this:

Worked out example (showing the different zoomlevels) of a pyramidal stack for a concrete whole slide image.

So the physical file representing the slide doesn’t just store the high-resolution image, it stored a pyramidal stack with as many zoomlevels as needed to reach the deepest level (or highest resolution). The idea is that depending on the magnification that you want to represent on the screen, you read data from a different zoomlevel.

Tiles

The pyramid stack works great up to certain resolution. But pretty quick we get into trouble and the images become too big once again to be shown in one pass. And of course, that is eventually what we want to do: Look at the images in their highest possible detail.

In order to work around this problem, the concept of tiles is introduced. The idea is that at each zoomlevel, a grid overlays the image data, arbitrarily breaking the image up in tiles. This leads to a representation like this:

Now, for any area of the slide that we want to display at any given time to the end-user, we can determine the optimal zoomlevel to select from, as well a select number of tiles that are sufficient to show the desired “field of view”, rather than asking the user to wait to download the entire (potentially huge!) image. This goes as follows:

Or, put the other way around (from the browser’s point of view):

So there you have it: whole slide images are nothing but tiled pyramid-shaped stacks of image data.

Tiles in PMA.start

We can access individual tiles within the tiled stack using PMA.start, but before we do that we should first look some more at a slide’s metadata.

We can start by making a table of all zoomlevels the tiles per zoomlevel, along with the magnification represented at each zoomlevel (using library TableSaw) :

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;

import com.pathomation.*;
import tech.tablesaw;

public class Test {

    public static void main(String[] args) {

        List<Map<String, Object>> levelInfos = new ArrayList<Map<String, Object>>();
        String slide = "C:/my_slides/alk_stain.mrxs";
        List<Integer> levels = Core.getZoomLevelsList(slide);
        //Initialize the components for TableSaw
        DoubleColumn colResX = DoubleColumn.create("resX");
        DoubleColumn colResY = DoubleColumn.create("resY");
        DoubleColumn colTilesX = DoubleColumn.create("tilesX");
        DoubleColumn colTilesY = DoubleColumn.create("tilesY");
        DoubleColumn colApproxMag = DoubleColumn.create("approxMag");
        DoubleColumn colExactMag = DoubleColumn.create("exactMag");
        for (int lvl : levels) {
            int resX = Core.getPixelDimensions(slide, lvl).get(0);
            int resY = Core.getPixelDimensions(slide, lvl).get(1);
            List<Integer> tilesXyz = Core.getNumberOfTiles(slide, lvl);            
            colResX.append(resX);
            colResY.append(resY);
            colTilesX.append(tilesXyz.get(0));
            colTilesY.append(tilesXyz.get(1));
            colApproxMag.append(Core.getMagnification(slide, lvl, false));
            colExactMag.append(Core.getMagnification(slide, lvl, true));
        }
        System.out.print(slide);
        Table results = Table.create("Results", colResX, colResY, colTilesX, colTilesY, colApproxMag, colExactMag);
        results.print();
    }
}

The result for our alk_stain.mrxs slide looks as follows:

Now that we have an idea of the number of zoomlevels to expect and how many tiles there are at each zoomlevel, we can request an individual tile easily. Let’s say that we wanted to request the middle tile at the middle zoomlevel:

import java.awt.BorderLayout;
import java.awt.Image;
import java.util.ArrayList;
import javax.swing.ImageIcon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;

import com.pathomation.*;

public class Test {

    public static void main(String[] args) {

        String slide = "C:/my_slides/alk_stain.mrxs";
        List<Integer> levels = Core.getZoomLevelsList(slide);
        int lvl = levels.get((int) (levels.size() / 2));
        List<Integer> tilesXyz = Core.getNumberOfTiles(slide, lvl);
        int x = (int) (tilesXyz.get(0) / 2);
        int y = (int) (tilesXyz.get(1) / 2);
        Image tile = Core.getTile(slide, x, y, lvl);
        displayImage(tile);
    }
    
    public static void displayImage(Image image) {
        JFrame frame = new JFrame();
        JLabel lblImage = new JLabel(new ImageIcon(image));
        JPanel mainPanel = new JPanel(new BorderLayout());
        mainPanel.add(lblImage);
        frame.add(mainPanel);
        frame.setVisible(true);
    }
}

This should pop up a single tile:

.Ok, perhaps not that impressive.

In practice, you’ll typically want to loop over all tiles in a particular zoomlevel. The following code will show all tiles at zoomlevel 1 (increase to max_zoomlevel at your own peril):

List<Integer> tileSz = Core.getNumberOfTiles(slide, 1); // zoomlevel 1
IntStream.range(0, tileSz.get(0)).forEach(xTile -> {
    IntStream.range(0, tileSz.get(1)).forEach(yTile -> {
        Image tile = Core.getTile(slide, xTile, yTile, 1);
        displayImage(tile);
    });
});

The advantage of this approach is that you have control over the direction in which tiles are processed. You can also process row by row and perhaps print a status update after each row is processed.

However, if all you care about is to process all rows left to right, top to bottom, you can opt for a more condensed approach:

for (Image tile : Core.getTiles(slide, tileSz.get(0), tileSz.get(1), 4) {
    //converting the image into N-Dimensional Array
}

The body of the for-loop now processes all tiles at zoomlevel 4 one by one and converts them into an array structure, ready for image processing to occur, e.g. through opencv.

 

The best kind of criticism

Nothing like honest feedback. I was so happy we finally got something cooking in Pyhon, that I sent it out to couple of people, asking (hoping) for immediate feedback. I failed to provide the full context of PMA.python though. Therefore, here’s one response that I got back:

So there’s obviously a couple of assumptions (leading to misconceptions) about PMA.python that are really easy to make, specifically:

  1. It’s a universal library for whole slide imaging; moreover it’s a self-contained library that doesn’t depend on anything else besides Python itself
  2. PMA.start is a dependency, which is hard to install and set up.
  3. The most important PMA.python dependency, PMA.start, runs on Windows as well as Linux (or Mac)

Lack of information leads to assumptions. Assumptions lead to misconceptions. Misconceptions lead to disappointment. Disappointment leads to anger. Anger leads to hate. Hate leads to suffering.

Aaargh, we single-handedly just tipped the balance of power in the Universe!

But on more serious note:

I hope we successfully prevented people from bumping into 1) by providing additional verbiage in our package description on PyPI:

As for 2) we don’t think this is the case: Downloading a .exe-file and running it is about as easy as we can make it for you. Due to the nature of the software however, the user that installs the software will need administrative access to his or her own computer.

Finally, there 3). We know and we acknowledge the existence of the Linux-community. Unfortunately, porting PMA.start to Linux is not a trivial thing to do. And so, for now, we don’t. We don’t have the resources for that, and we’d rather offer stable software for a single OS, than an average and mediocre  product that nevertheless runs on both. If you’re tied to Linux and require WSI processing with Python, you’ll have to stick with OpenSlide for now (which also requires a separate library to connect to; there simply isn’t a native WSI library for Python).

By the way: the argument actually only partially applies. We offer a commercial version of PMA.core that is not limited to just listening to incoming requests on localhost. PMA.python then can be installed on a Linux datacenter and be used to communicate with the PMA.core instance that hosts the digital slide / tile data. In fact, this is exactly the approach that is being used at HistoGeneX, Pathomation‘s parent company.

In closing

That being said, we really appreciated the honesty of the poster. The point is: while it’s great to receive positive praise (PMA.start now has about 150 users per month; these can’t all be wrong), it’s the negative criticism that helps you become better and prevents you from becoming complacent.