Working with digital microscopy imaging data using our Python SDK

Representation

PMA.start is a free desktop viewer for whole slide images. In our previous post, we introduced you to pma_python, a novel package that serves as a wrapper-library and helps interface with PMA.start’s back-end API.

The images PMA.start typically deals with are called whole slide images, so how about we show some pixels? As it turns out, this is really easy. Just invoke the show_slide() call. Assuming you have a slide at c:\my_slides\alk_stain.mrxs, we get:

from pma_python import core
slide =  "C:/my_slides/alk_stain.mrxs"
core.show_slide (slide)

The result depends on whether you’re using PMA.start or a full version of PMA.core. If you’re using PMA.start, you’re taken to the desktop viewer:

If you’re using PMA.core, you’re presented with an interface with less frills: the webbrowser is still involved, but nothing more than scaffolding code around a PMA.UI.View.Viewport is offered (which actually allows for more powerful applications):

Associated images

But there’s more to these images; if you only wanted to view a slide, you wouldn’t bother with Python in the first place. So let’s see what else we can get out of these?

Assuming you have a slide at c:\my_slides\alk_stain.mrxs, you can execute the following code to obtain a thumbnail image representing the whole slide:

from pma_python import pma
slide =  "C:/my_slides/alk_stain.mrxs"
thumb = core.get_thumbnail_image(slide)
thumb.show()

But this thumbnail presentation alone doesn’t give you the whole picture. You should know that a physical glass slide usually consists of two parts: the biggest part of the slide contains the specimen of interest and is represented by the thumbnail image. However, near the end, a label is usually pasted on with information about the slide: the stain used, the tissue type, perhaps even the name of the physician. More recently, oftentimes the label has a barcode printed on it, for easy and automated identification of a slide. The label is therefore sometimes also referred to as “barcode”. Because the two terms are used so interchangeably, we decided to support them in both forms, too. This makes it easier to write code that not only syntactically makes sense, but also applies semantically in your work-environment.

A systematic representation of a physical glass slide can then be given as follows:

The pma_python library then has three methods to obtain slide representations, two of which are aliases of one another:

core.get_thumbnail_image() returns the thumbnail image

core.get_label_image() returns the label image

core.get_barcode_image() is an alias for get_label_image

All of the above methods return PIL Image-objects. It actually took some discussion to figure out how to package the data. Since the SDK wraps around an HTTP-based API, we settled on representing pixels through Pillows. Pillows is the successor to the Python Image Library (PIL). The package should be installed for you automatically when you obtained pma_python.

The following code shows all three representations of a slide side by side:

from pma_python import core
slide =  "C:/my_slides/alk_stain.mrxs"
thumb = core.get_thumbnail_image(slide)
thumb.show()
label = core.get_label_image(slide)
label.show()
barcode = core.get_barcode_image(slide)
barcode.show()

The output is as follows:

Note that not all WSI files have label / barcode information in them. In order to determine what kind of associated images there are, you can inspect a SlideInfo dictionary first to see what’s available:

info = core.get_slide_info(slide)
print(info["AssociatedImageTypes"])

AssociatedImageTypes may refer to more than thumbnail or barcode images, depending on the underlying file format. The most common use of this function is to determine whether a barcode is included or not.

You could write your own function to determine whether your slide has a barcode image:

def slide_has_barcode(slide):
    info = core.get_slide_info(slide)
    return "Barcode" in info["AssociatedImageTypes"]

Tiles in PMA.start

We can access individual tiles within the tiled stack using PMA.start, but before we do that we should first look some more at a slide’s metadata.

We can start by making a table of all zoomlevels the tiles per zoomlevel, along with the magnification represented at each zoomlevel:

from pma_python import pma
import pandas as pd

level_infos = []
slide = "C:/my_slides/alk_stain.mrxs"
levels = core.get_zoomlevels_list(slide)
for lvl in levels:
    res_x, res_y = core.get_pixel_dimensions(slide, zoomlevel = lvl)
    tiles_xyz = core.get_number_of_tiles(slide, zoomlevel = lvl)         
    dict = {
        "res_x": round(res_x),
        "res_y": round(res_y),
        "tiles_x": tiles_xyz[0],
        "tiles_y": tiles_xyz[1],
        "approx_mag": core.get_magnification(slide, exact = False, zoomlevel = lvl),
        "exact_mag": core.get_magnification(slide, exact = True, zoomlevel = lvl)
     }
     level_infos.append(dict)

df_levels = pd.DataFrame(level_infos, columns=["res_x", "res_y", "tiles_x", "tiles_y", "approx_mag", "exact_mag"])
print(slide)
print(df_levels)

The result for our alk_stain.mrxs slide looks as follows:

Now that we have an idea of the number of zoomlevels to expect and how many tiles there are at each zoomlevel, we can request an individual tile easily. Let’s say that we wanted to request the middle tile at the middle zoomlevel:

slide = "C:/my_slides/alk_stain.mrxs"
levels = core.get_zoomlevels_list(slide)
lvl = levels[round(len(levels) / 2)]
tiles_xyz = core.get_number_of_tiles(slide, zoomlevel = lvl)         
x = round(tiles_xyz[0] / 2)
y = round(tiles_xyz[1] / 2)
tile = core.get_tile(slide, x = x, y = y, zoomlevel = lvl)
tile.show()

This should pop up a single tile:

.Ok, perhaps not that impressive.

In practice, you’ll typically want to loop over all tiles in a particular zoomlevel. The following code will show all tiles at zoomlevel 1 (increase to max_zoomlevel at your own peril):

tile_sz = core.get_number_of_tiles(slide, zoomlevel = 1) # zoomlevel 1
for xTile in range(0, tile_sz[0]):
    for yTile in range(0, tile_sz[1]):
        tile = core.get_tile(slide, x = xTile, y = yTile, zoomlevel = 1)
        tile.show()

The advantage of this approach is that you have control over the direction in which tiles are processed. You can also process row by row and perhaps print a status update after each row is processed.

However, if all you care about is to process all rows left to right, top to bottom, you can opt for a more condensed approach:

for tile in core.get_tiles(slide, toX = tile_sz[0], toY = tile_sz[1], zoomlevel = 4):
    data = numpy.array(tile)

The body of the for-loop now processes all tiles at zoomlevel 4 one by one and converts them into a numpy array, ready for image processing to occur, e.g. through opencv. But that will have to wait for another post.

We did it! SDK update for Python.

SDK update

Now that I have a basic understanding of Python (including the popular packages pandas, matplotlib.pyplot, and (to a slightly lesser extend) numpy), I’m moving ahead and am putting out Java, R, Eiffel, Haskell, and F# API wrapper libraries as part of our SDK!

Ok, perhaps not.

To prevent ending up with a plethora of half-finished sourcecode files across a variety of languages, we thought it more prudent this time to work on a comprehensive library in a single programming language first, and then port it to other environments.

For Python, this means writing one of more modules and publishing it in Python. Others have done this before us, so how hard could it be, right?

So we set off to write an initial module, with a number of procedures to do basic tasks such as obtaining lists of slides, navigating a hierarchical folder structure, and, of course, extracting tiles. The code was deposited in GitHub.

After GitHub typically comes support for the Python Package Installer (PyPI). We recruited somebody through UpWork to help us with this process and here we are: getting an interface to Pathomation software in Python is now as easy as issuing the command “python3.exe -m pip install pma-python”.

Oh, and we already tested this in other environments, too. The following screenshot was taken on a Mac (thank you, Pieter-Jan Van Dam):

Getting started

What can you do with our Python SDK today? If you have PMA.start installed on your system, you can go right ahead and try out the following code:

from pma_python import pma

if pma.is_lite():
    print("Congratulations; PMA.start is running on your system")
    print("You’re running PMA.core.lite version " + pma.get_version_info())
else:
    print("PMA.start not found. Either you don’t have it installed, or you don’t have the server-component running currently")
    raise Exception("PMA.start not detected")

You can use the same is_lite() method by the way to ask your end-user to make sure PMA.start IS running before continuing the script:

from pma_python import pma
from sys import exit

if (not pma.is_lite()):
    print("PMA.core.lite is NOT running.")
    print("Make sure PMA.core.lite is running and press <enter> to continue")
    input()

if (not pma.is_lite()):
    print("PMA.core.lite is NOT running.")
    exit(1)

Slides

Now that you know how to establish the availability of PMA.start as a back-end for whole slide imaging (WSI) data, you can start looking for slides:

from pma_python import pma

if not pma.is_lite():
    raise Exception("PMA.start not detected")

# assume that you have slides in C:\my_slides (note the capital C)
for slide in pma.get_slides("C:/my_slides"):
    print(slide)

But you knew already that you had slides in that folder, of course. By, the way, if NO data shows up, check the specified path. It’s case sensitive, and drive letters have to be capitalized. Also make sure to use a forward slash instead of the more traditional (on Windows at least) backslash.

Now what you probably didn’t know yet is the dimensions of the slide, both in pixels as well as micrometers.

print("Pixel dimensions of slide:")
xdim_pix, ydim_pix = pma.get_pixel_dimensions(slide)
print(str(xdim_pix) + " x " + str(ydim_pix))

print("Slide surface area represented by image:")
xdim_phys, ydim_phys = pma.get_physical_dimensions(slide)
print(str(xdim_phys) + "µm x " + str(ydim_phys) + "µm = ", end="")
print(str(xdim_phys * ydim_phys / 1E6) + " mm2")

Below is the output on our computer, having 3 3DHistech MRXS slides in the c:\my_slides folder. You can use this type of output as a sanity check, too.

While the numbers in µm seems huge, they start to make more sense once translated to the surface area captured. As a reminder: 1 mm2 = 1,000,000 µm2, which explains why we divide by 1E6 to get the area in mm2. 1020 mm2 still not saying much? Then keep in mind that 100 mm2 equals 1 cm2, and that 10 cm2 can very will constitute a 2 cm x 5 cm piece of tissue. A physical slide’s dimensions are typically 10 cm x 4 cm. Phew, glad the data matches reality!

Determining a slide’s magnification

We can also determine the magnification at which an image was registered. The get_magnification function has a Boolean exact= parameter that works as follows: when set to True, get_magnification will round to the nearest “whole number” magnification that’s typically mentioned on a microscope’s objective lens. This could be 5X, 20X, 40X… But bear in mind that when a microscopist looks through his device, he can fine-focus on a sample, thereby slightly modifying the actual magnification used, perhaps from 40X to 38X (even though the label on the lens still says 40X of course). Scanners work in the same manner; because of auto-focusing, the end-result of a scan may be in 38X instead of 40X, or 21X instead of 20X. And this is the number that is returned when the exact= parameter is set to True.

 

Of course, when building a pandas Dataframe, you might as well include columns for both measurements (perhaps using the rounded measurement later for a classification task):

from pma_python import pma
import pandas as pd

if not pma.is_lite():
    raise Exception("PMA.start not detected")

# create blank list (to be converted into a pandas DataFrame later)
slide_infos = []

# assume that you have slides in C:\my_slides (note the capital C)
for slide in pma.get_slides("C:/my_slides"):
    dict = {
        "slide": pma.get_slide_file_name(slide),
        "approx_mag": pma.get_magnification(slide, exact=False),
        "exact_mag": pma.get_magnification(slide, exact=True),
        "is_fluo": pma.is_fluorescent(slide),
        "is_zstack": pma.is_z_stack(slide)
        }
    slide_infos.append(dict)

df_slides = pd.DataFrame(slide_infos, columns=["slide","approx_mag","exact_mag", "is_fluo", "is_zstack"])
print(df_slides)

The output of this script on our computer is as follows:

Note that for one slide, both the exact and the approximate magnification is 0. This is because that particular slide is a .jpg-file, which doesn’t contain any useful (pixels per micron) metadata to use to determine the magnification.

Almost there

In our next post, we’ll show how you can retrieve various types of image data from your digitized slides.

E-learning update

Remember my training goals for 2018? I think I’m doing okay. I completed the Excel XSeries at edx. When completing the XSeries, you get a comprehensive dedicated certificates for completing the full track; individual certificates are still available on a per-course basis, too.

Now that I’m done with this, I can definitely recommend an XSeries track at edx, because across the different courses, various aspects on a subject are really approached from different angles. The repetition you get over time is hugely beneficial to get a solid grasp on that subject.

Moving on then.

Since all ML-courses at one point or another converge to Python (including Ng’s new Deep Learning curriculum at Coursera), I temporarily switched to Datacamp’s Python track. This morning, I finished the Python Programmer track (consisting of 10 individual courses).

In the next two blog posts, I’ll talk more about how I’m putting my newly learned Python skills to work. Stay tuned.