Look sharp

Blurred or focused?

In an earlier post, we offered some ideas on how to detect tissue in a scanned slide.

The next step people often want to take is to examine how the sharpness of the tissue is distributed throughout the slide. No scanner catches all, and you will see blurry areas in pretty much all your scans.

It then helps to be able to differentiate the tiles that have poor focus from the tiles that are sharp. As it turns out, we already approached a likewise problem when we were determining the sharpest tiles within z-stacked slides.

For this particular exercise (focus variation within a single plane) Sied Kebir in Germany was kind enough to provide us with relevant sample data for this one.

And here are two relevant extracted tiles to illustrate the problem.

Sied is looking for a method to systematically map the blurry tiles vs the crisp ones.

Blur detection with OpenCV

A good introduction on blur detection with the OpenCV library is offered by pysource in the following video tutorial:

Let’s see what that gives when we apply it to Sied’s sample images:

Great! The numbers are not as far apart as in pysource’s video, but that makes sense: even in focused tissue we’ll find many more gradients and sloping color ranges than in the average picture of person sitting a room, which contains distinctive features like outlined walls and facial contours.

Pysource suggests converting your original images to grayscale. Does it make a difference? In our experiments we find different values (of course), but the trend is the same. Since the retention color of leads to slightly bigger differences, we’re inclined to sticking with the original color images.

If you do want to convert your color tiles to grayscale, here’s a great StackOverflow article about how this works.

Distribution and Exploratory Data Analysis (EDA)

Our next step is to put it all in a loop and systematically examine how sharp or blurry each individual tile actually is. For semantic ease, we create a get_blurriness function:


from pma_python import core
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
import sys

def is_tissue(tile):
    pixels = np.array(tile).flatten()
    mean_threahold = np.mean(pixels) < 192   # 75th percentile
    std_threshold = np.std(pixels) < 75
    return mean_threahold == True and std_threshold == True

def is_whitespace(tile):
    return not is_tissue(tile)

def get_sharpness(img):
    pixels = np.array(img).flatten()
    return cv2.Laplacian(pixels, cv2.CV_64F).var()

slide = "C:/wsi/sied/test.svs"
max_zl = 5 # or set to core.get_max_zoomlevel(slide)
dims = core.get_zoomlevels_dict(slide)[max_zl]

means = []
stds = []
tissue_map = []
sharp_map = []

for x in range(0, dims[0]):
    for y in range(0, dims[1]):
        tile = core.get_tile(slide, x=x, y=y, zoomlevel=max_zl)
        tiss = is_tissue(tile)
        tissue_map.append(tiss)
        if (tiss):
            sharp_map.append(get_sharpness(tile))
        else:
            sharp_map.append(0)
    print(".", end="")
    sys.stdout.flush()
print()

After getting all the result, it is worth examining the histogram of this data.

Ideally, we would like to see a bimodal distribution (sharp vs blurred), but that’s not what we see here. The reason is that unevenness in tissue is actually not distributed unevenly.

Putting it all together

Now that we know what we can expect, it’s just a matter of putting it all together. and use it to construct an image map, in similar fashion as we did for our original tissue detection.

The final result looks like this:

In closing

Sectioning a slide is a continuous operation, and except for folding artifacts, you shouldn’t expect any abrupt changes. Tissue can be expected to gradually fade in and out of focus. And while scanners have gotten better at compensating for uneven tissue thickness, we’re not quite there yet, and automated analysis based on a technique like we’re here proposing can help.

Last but now least, we decided on a new way to organize our sample code. At http://host.pathomation.com/realdata/ you can from now on see all sample data in a single location. For example, the Jupyter notebook that belongs with this entry, is available at http://host.pathomation.com/realdata/jupyter/realdata%20033%20-%20look%20sharp.ipynb

Exploiting the Pathomation software stack for business intelligence (BI)

A customer query

As part of our commercial offering, Pathomation offers hosting services for those organization that don’t want to make a long term investment in on-premise server hardware, or whose server farm is incompatible with our system requirements (PMA.core requires a Microsoft Windows stack).

One such customer came to us recently and asked how much space they were consuming on the virtual machine they rented from us.

This was our answer:

And this is how we obtained the answer. We wrote the following script:

In Python:


from pma_python import core
def get_slide_size(session, slideRef):
    info = core.get_slide_info(slideRef, sessionID = session)
    return info["PhysicalSize"]
def get_total_storage(session, path):
    total = 0
    for slide in core.get_slides(path, session, True):
        total = total + get_slide_size(session, slide)
    return total
def map_storage_usage(srv, usr, pwd):
    sess = core.connect(srv, usr, pwd)
    map = {}
    for rd in core.get_root_directories(sess):
        map[rd] = get_total_storage(sess, rd)
    return map

You can also do this in PHP, of course:


<?php
require "lib_pathomation.php"; 	// PMA.php library

use Pathomation\PmaPhp\Core;

function getTotalStorage($sessionID, $dir) {
        $slides = Core::getSlides($dir, $sessionID, $recursive = FALSE);
        $infos = Core::GetSlidesInfo($slides, $sessionID);
        
        echo "Got slides for ".$dir.PHP_EOL;

        $func = function($value) {
            return $value["PhysicalSize"];
        };

        $s = array_sum(array_map($func, $infos));
        return $s;
}

function mapStorageUsage($serverUrl, $username, $password) {
    $sessionID = Core::Connect($serverUrl, $username, $password);
    
    $map = array();
    $rootdirs = Core::getRootDirectories($sessionID);
    foreach ($rootdirs as $rd) {
        $map[$rd] = getTotalStorage($sessionID, $rd);
        
    }
    return $map;
}
?>

Creating a dictionary that contains the number of consumed bytes per root-directory now comes down to using just a single line of code:

In Python:

print(map_storage_usage("http://server/pma.core", "user", "secret_password"))

And in PHP:


print_r( mapStorageUsage ("http://server/pma.core", "user", "secret_password"));

Making our code mode robust

Unfortunately, chances are that if you’ve been running your PMA.core tile server for a while, the script comes down crashing miserably. It the “should have could have would have” syndrome of programming. There’s probably a meme for this out there somewhere (in the interest of productivity, we won’t search for it ourselves but let you look for that one). What it comes down to is: mounting points and access permissions chance, and at some point in time in any large enough data repository some files are going to end up corrupt, meaning either the core.get_slides() call is going to go wrong, or the core.get_slide_info() call.

So to make our script a bit more robust, we can add try… catch… exception handling in Python:


def get_slide_size(session, slideRef):
    try:    
        info = core.get_slide_info(slideRef, sessionID = session)
        return info["PhysicalSize"]
    except:
        print("Unable to get slide information from", slideRef)
        return 0

def get_total_storage(session, path):
    total = 0
    try:
        for slide in core.get_slides(path, session, True):
            total = total + get_slide_size(session, slide)
    except:
        print("unable to get data from", path)
    return total

def map_storage_usage(srv, usr, pwd):
    sess = core.connect(srv, usr, pwd)
    map = {}
    for rd in core.get_root_directories(sess):
        map[rd] = get_total_storage(sess, rd)
    return map

As well as in PHP:


<?php
require_once "lib_pathomation.php";

use Pathomation\PmaPhp\Core;

function getTotalStorage($sessionID, $dir) {
    try {
        $slides = Core::getSlides($dir, $sessionID, $recursive = FALSE);
        $infos = Core::GetSlidesInfo($slides, $sessionID);
        
        $func = function($value) {
            return $value["PhysicalSize"];
        };

        $s = array_sum(array_map($func, $infos));
        return $s;
    }
    catch(Exception $e) {
        // echo "unable to get data from ".$dir.PHP_EOL;
    }
}

function mapStorageUsage($serverUrl, $username, $password) {
    $sessionID = Core::Connect($serverUrl, $username, $password);    
    $map = array();
    $rootdirs = Core::getRootDirectories($sessionID);
    foreach ($rootdirs as $rd) {
        $map[$rd] = getTotalStorage($sessionID, $rd);   
    }
    return $map;
}

print_r(mapStorageUsage("http:/server/core/", "user", "secret"));
?>

In Python, you can also add the prettyprint library to clean up the output a bit:

And in PHP, we add a convenient method to make the numbers a bit easier to read:


function human_filesize($bytes, $decimals = 2) {
$size = array('B','kB','MB','GB','TB','PB','EB','ZB','YB');
$factor = floor((strlen($bytes) - 1) / 3);
return sprintf("%.{$decimals}f", $bytes / pow(1024, $factor)) . @$size[$factor];
}

Input for business intelligence (BI)

In this article we showed how you can use automation to let Pathomation’s PMA.core tile server generate an overview report of how much space your slides consume. Depending on your specific folder structure, you can further customize this for breakdowns into sizable data-morsels that fit your particular appetite.

Given the size of the average size of a slide, this kind of information can be vital to your organization: Slides can accumulate fast, and it is important to keep a handle on their growth and space occupation.

Pathomation’s software platform is more than just a slide viewing solution: it can help to generate insights in your storage resource consumption and be used as a veritable planning and evaluation tool before actual investments take place.

A look at PMA.core

The core

PMA.core is the centerpiece of the Pathomation software platform for digital microscopy. PMA.core is essentially a tile server. It does all the magic described in our article on whole slide images, and is optimized to serve you the correct field of view, any time, any place, on any device. PMA.core enables digital microscopy content when and where you want it.

Our free viewer, PMA.start, is built on top of PMA.core technology as well. PMA.start contains a stripped version of PMA.core, lovingly referred to as PMA.core.lite 🙂

PMA.core supports the same file formats as PMA.start does, and then some.

It acts as an “honest broker”, by offering pixels from as many vendor formats as possible.

Storage

With PMA.start, you’re limited to accessing slide content that’s stored on your local hard disk. External hard disks are supported as well, but at one point you end up with multiple people in your organization that need to access the same slide content. At that point, PMA.core’s central storage capabilities come into play.

PMA.core is typically installed on a (Windows) server machine and can access a wider variety of storage media than PMA.start can. You can store your virtual slides on the server’s local hard disk, but as your data grows, this is probably not the place you want to keep them. So you can offload your slides to networked storage, or even S3 bucket repositories (object storage).

Pathomation does not, in contrast with other vendors, require a formal slide registration or import process to take place. Of course our software does need to know where the slides are. This is done by defining a “root-directory”, which is in its most generic terminology “a place where your slides are stored”.

A root-directory can be location on the server’s hard disk, like c:\wsi. You can instruct your slide scanner to drop off new virtual slides on a network share, and likewise point PMA.core to \\central_server\incoming_slides\. Finally, you can store long-term reference material in an AWS bucket and define a root-directory that points to the bucket. The below screenshot shows a mixture of S3- and HDD-derived rootdirectories in one of our installations:

After defining your root-directory, the slides are there, or they are not, and representation of them is instant. An implication of this that you can manage your slides with the tools that you prefer; any way you want to. You can use the Windows explorer, or even using the command-line, should that end up being more convenient for you. Your S3 data can be managed through the AWS console, CloudBerry tools, or S3 explorer.

Security – Authentication

Another important aspect of PMA.core is access control. PMA.start is “always on”; no security credentials are checked when connecting to it. PMA.core in contrast requires authentication, either interactively through a login dialog, or automatically through the back-end API. In either case, upon success, a SessionID is generated that is used to track a user’s activity from thereon.

User accounts can be created interactively through the PMA.core user interface, or controlled through use of the API. Depending on your environment, a number of password restrictions can be applied. Integration with LDAP providers is also possible.

User accounts can be re-used simultaneously in multiple applications. You can be logged in through the PMA.core user interface, and at the same time use the same credentials to run an interactive script in Jupyter (using the PMA.core interface to monitor progress).

The interface in PMA.core itself at all times gives an overview what users are connected through what applications, and even allows an administrator to terminate specific sessions.

Security – Authorization

Our software supports authorization on top of authentication.

User permissions in PMA.core are kept simple and straightforward: a user account can have the Administrative flag checked or not, meaning that they can get access to PMA.core directly, or only indirectly through other downstream client application like PMA.view, PMA.control or the API. Another useful attribute to be aware of is CanAnnotate, which is used to control whether somebody can make annotations on top of a slide or not. Finally, an account can be suspended. This can be temporary, or can be mandated from a regulatory point of view as an alternative for deletion.

A root-directory can be tagged either as “public” or “private”. A public root-directory is a root-directory that is available to all authenticated users. In contrast, when tagged as “private”, the root-directory has an accompanying Access Control List (ACL) that determines who can access content in the root-directory.

The screenshot below shows the Administrative and Suspended flags for my individual user account, as well as what public and private root-directories I do or do not have access to:

Future versions of PMA.core can be expected to offer CRUD granularity.

A powerful forms engine

Form data exists everywhere. Information can be captured informally, like the stain used, or as detailed as an Electronic Lab Request Form (ELRF). This is why Pathomation offers the possibility to define forms as structured and controllable data entities. A form can consist of a couple of simple text-fields, or be linked to pre-defined (ontology-derived) dictionaries. Various other Pathomation software platform components help in populating these forms, including PMA.view.

Forms can be accompanied by ACLs. In order to avoid redundancy, a form ACL consists of a list of root-directories rather then user accounts. In a project-oriented environment, it makes more sense that certain forms apply to certain root-directories which represent types of slides. Similarly, in a clinical environment, it makes sense to have slides organized in root-directories per application-type or by processing-stage. Freshly scanned slides that haven’t undergone a QA-check yet can be expected to have different form-data associated with them than FISH-slides.

On-slide annotations

PMA.core support graphical on-side annotations. We support three types:

  • Native annotations embedded within a vendor’s file format
  • Third-party annotations coming from non-specific (image analysis) software
  • Pathomation annotations

Pathomation-created annotations are the easiest to understand. You have a slide, and you want to indicate a region of interest on it. This region of interest can be necrotic tissue, or proliferated tumor cells. For teaching purposes, you could have a blood smear and highlight the different immune-celltypes.

Pathomation annotations are stored as WKT and can be anything that can be encoded in WKT (which is a lot). You need a downstream client to create them, but the basic viewer included in PMA.core can be used to visualize them, and our PMA.UI JavaScript framework can be used to create your own annotation workflows.

You could run an algorithm that does tissue detection and pre-annotates these regions for you.

In addition to making your own annotations, Pathomation can be used to integrate annotations from other sources. Certain file formats like 3DHistech’s MRXS file format or Aperio’s SVS file format have the ability to incorporate annotations. If you have such slides, the embedded annotations should automatically show when viewing the slide using any Pathomation slide rendering engine.

Last but not least, we can integrate third-part annotations. Currently, we support three formats:

Third-party as well as native annotations are read-only; you cannot modify them using Pathomation software.

Even more slide metadata

What about other structured data?

We think our forms engine is pretty nifty, but we’re not as arrogant (or clueless) to pretend that we foresee everything you ever want to capture in any form, shape, or size. It is also quite possible that a slide meta-database already exists in your organization.

For those instances where existing data stores are available, we offer the possibility to link external content. Rather than importing data into PMA.core (also a possibility actually), we allow you to specify an arbitrary connection string that points to an external resource that may represent an Oracle database. Your next step is to define the query to run against this resource, along with a field identifier (which can be a regular expression) that is capable to match specific records with individual slides.

Examples of external data sources can be:

  • Legacy IMS data repositories that are too cumbersome to migrate
  • Proprietary database systems developed as complement to lab experiments
  • Back-end LIMS/VNA/PACS databases that support other workflows in your organization

Do try this at home

In this post, we’ve highlighted the main features of our PMA.core “honest broker” WSI engine aka tile server aka pixel extractor aka Image Management Server (IMS).

Warning: sales pitch talk following below…

If you’ve liked interaction with PMA.start and work in an organization where slides are shared with various stakeholders, you should consider getting a central PMA.core server as well. PMA.core is the center-piece of the Pathomation software platform for digital microscopy, and whether you prefer all-inclusive out-of-the-box viewing software, or are developing your own integrated processing pipelines, PMA.core can be the ideal middleware that you’ve been looking for. Contact us today for a demo or sandboxed environment where you can try out our components for yourself.

Ok, we’re done. Seriously, PMA.core is cool. Let us help you in your quest for vendor-agnostic digital pathology solutions, and (amongst others) never worry about proprietary file formats again.

To index or not to index?

A question we get frequently from potential customers is “how do we import our slides into your system (PMA.core)?”. The answer is: we don’t. In contrast with other Image Management Systems (IMS), we opted to not go for a central database. In our opinion, databases only seem like a good idea at first, but inevitable always cause problems down the road.

People also ask us “how easy is it to import our slides?”. The latter phrasing is probably more telltale than the first, as it assumes that is not the case apparently with other systems, i.e., other systems often put you in a situation where it is not easy to register slides. It still puts us in an awkward position, as we then actually have to explain that there is no import process as such. Put the slides where you want them, and that’s it. You’re done. Finito.

Here are some of the reasons why you would want a database overlaying your slides:

  • Ease of data association. Form data and overlaying graphical annotation objects can be stored with the slide’s full path reference as a foreign key.
  • Ease of search for a specific slide. Running a search query on a table is decidedly faster than parsing a potentially highly hierarchical directory tree structure
  • Rapid access to slide metadata. Which is not the same as our first point: data association. Slide metadata is information that is already incorporated into the native file format itself. A database can opt to extract such information periodically and store it internally in a centralized table structure, so that it is more easily extracted in real time when needed.

When taken together, the conclusion is that such databases are nothing more but glorified indexing systems. Such an indexing system invariable turns into a gorilla… An 800 lbs gorilla for that matter… Let’s talk about it:

  • An index takes time to build
  • An index consumes resources, both during and after construction.
  • With a rapidly evolving underlying data structure, the index is at risk of being behind the curve and not reflecting the actual data
  • In order to control the index and not constantly having to rebuild it, a guided (underlying) data management approach may be needed
  • At some point, in between index builds, and outside the controlled data entry flow, someone is going to do something different to your data
  • Incremental index builds to bypass performance bottlenecks are problematic when data is updated

Now there are scenarios where all of the above doesn’t matter, or at least doesn’t matter all that much. Think of a conventional library catalog; does it really matter if your readers can only find out about the newest Dean Koontz book that was purchased a day after it was actually registered in the system? Even with rapidly moving inventory systems: when somebody orders an item that is erroneously no longer available from your webstore… Big whoop. The item is placed on back-order, or the end-user simply cancels the order. It you end up making the majority of your customers mad this way, then the problem is not in your indexing system, but in your supply chain itself. There’s no doubt that for webshops and library catalogs, indexes speed up search, and the pros on average outweigh the cons.

But digital pathology is different. Let’s look at each of the arguments against indexing and see how much weight they carry in a WSI environment:

  • An index takes time to build. When are you going to run it? Digital pathology was created so you can have round the clock availability of your WSI data. Around the clock. Across time-zones. Anything that takes time, also takes resources. CPU cycles, memory. So expect performance of your overall system to go down while this happens.
  • Resource (storage) consumption during and after construction. So be careful about what you are going to index in terms of storage. Are you going to index slide metadata? Thumbnails? How much data are your practically talking about? How much data are you going to index to begin with? And how much of your indexed data will realistically ever be accessed (more on that subject in a separate post)?
  • Rapidly evolving underlying data structure. Assume a new slide is generated once every two minutes, and a quantification algorithm (like HistoQC) takes about a minute to complete per slide. This means you have a new datapoint every minute. And guess which datapoint the physician wants to see now now NOW…
  • Guided data management approach. One of the great uses of digital pathology is the sharing of data. You can share your data, but other can also share it with you. So apart from your in-house scanner pipeline; what do you going to do with the external hard disk someone just sent you? Data hierarchies come in all shapes and sizes. Sometimes it’s a patient case; sometimes it’s toxicological before/after results; sometimes it’s a cohort from a drug study. Are you going to setup data import pipelines for all these separate scenarios? Who’s going to manage those?
  • Sometimes, somewhere, someone is going to do something different to your data. Because the above pipelines won’t work. No matter how carefully you design them. Sometimes something different is needed. You need to act, and there’s no time for re-design first. The slide gets replaced, and now the index is out-of-date. Or the slide is renamed because of earlier human error, and the index can’t find it anymore. And as is often the case: this isn’t about the scenarios that you can think of; but about the scenarios you can’t.

Safe to say that we think an indexing mechanism for digital pathology and whole slide images is not a good idea. Just don’t do it.

Do you DICOM?

Standardization efforts in digital pathology

DICOM has been working on a standard description of digital pathology (DP) imaging data has been underway for a few years now. Digital pathology and whole slide imaging (WSI) is the focus of DICOM workgroup 26. A summary of its efforts can be found in David Clunie‘s paper in the Journal of Pathology informatics at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236926/

In 2014, we published our own conference paper on the effort (during the 12th European Conference on Digital Pathology in Paris, France) . The abstract is available through the Researchgate website; the full presentation from the ECP Paris conference is available through https://www.slideshare.net/YvesSucaet/digital-pathology-information-web-services-dpiws-convergence-in-digital-pathology-data-sharing

The focus of this blog is on imaging. To be complete, readers interested in digital pathology standardization efforts, should also have a look at the IHE PaLM initiative.  Additional resources can also be found on slide 45 of our SlideShare publication.

Pathomation supports DICOM

At Pathomation, we’ve been supporting DICOM supplement 145 file format extension (PDF available here) for a while now. We recently added our own “dicomizer” tool to our free PMA.start software. This is a command-line tool (CLI) that allows for the conversion of any WSI file format into a into a DICOM VL (Visible Light) Whole Slide Image IOD (Information Object Definition).

The Pathomation dicomizer is currently available on Windows only (other platforms pending) and can be installed as part of the regular setup process.

After installation, you have to navigate to the folder in which you installed the tool (typically c:\program files\pathomation\dicomizer) , and then you can invoke the tool through the command-line, like this:

Running the validation tool from David Clunie on our generated DICOM slides results in the following:

We can also visualize the slides side by side in two browser windows, empirically “proving” that the DICOM output is equivalent to the original slide:

So there you have it. DICOM has been involved in the digital pathology standardization process for a while now. For those interested to support it, you can now use Pathomation’s free dicomizer tool to get hands-on experience.

 

 

 

 

 

PHP SDK & Packagist, making your life easier!

Over the past few years, dependency management tools have become a key component to successfully building high-level applications.

The number of third party libraries (with all the dependencies involved) has just insanely grown during the last few years – and expected to grow even more in the upcoming years – that managing the packages, the versions and the dependencies can become a serious nightmare for both developers and project managers. Dependency management tools therefore provide solutions to relieve them of the hassle of keeping dependencies up to date and organized so they can focus on their main mission, building awesome applications.

To make libraries easily shareable online and available for downloads, submissions, updates, bug reporting…Dependency management tools rely naturally on central repositories (we can name Maven central repository for Java, PyPI for python…and the list goes on).

As introduced in our previous posts, our Java SDK is already available on Maven, our Python SDK is also available on PyPI, so the next move logically is to add the PHP SDK to a central repository for PHP to make it easily available for the public…but what do we have for PHP?

For PHP, the dependency management tool to go for is composer, as pointed out earlier, it relies – like other dependency management tools – on its own central repository Packagist.

Installing Composer

In the same logic as most dependency management tools , Composer is available via command line interface which offers indeed way more powerful possibilities than a GUI.
To be able to execute Composer commands, you need to have it installed first on your computer, for more information you can refer to install composer.

Once composer is successfully installed, you can check if your environment is well set up via the following command :

Your first sample code

Next step is to create a sample code to call method getVersionInfo() from Class Core on pma-php (once obviously installed via composer).

To install pma-php via composer in a PHP project, you simply issue the following command in the project folder :

As you could already notice, we didn’t specify any version to install for composer, so it installed automatically the latest release (v2.0.0.40).

Targeting a specified release is straightforward via this command :

For the list of available releases for pma-php library, you can refer to its official Packagist page

Once the library pma-php is successfully installed on a project folder, the following files/folders are created :

Next step is to create a PHP file to call method getVersionInfo() of Class Core on pma-php :


<?php

require_once 'vendor/autoload.php';
use Pathomation\PmaPhp\Core;

echo Core::getVersionInfo();

Running this script returns value “2.0.0.1346”, which is the version of PMA.start installed on my own computer.

That’s all it takes! easy, isn’t it?

Run & share your Python code online with Jupyter notebooks

If you’ve had a look on previous posts on our Python SDK, you have a already a pretty good idea about scripts you can run against PMA.start (or PMA.core) for retrieving image data, Slide visualization

Most of the times running code locally is all what’s needed, however there are situations where being able to run it online and share it with collaborators (students, colleagues…) on the other side of the globe can come very handy and offer a richer and more interactive experience.
There are couple of free/paid services to run & share Python code online, our focus in this tutorial will be on Azure notebooks powered by Jupyter.

What are Azure notebooks & Jupyter?

As stated on the official documentation, Azure Notebooks is a free service for anyone to develop and run code in their browser using Jupyter. Jupyter is an open source project that enables combing markdown prose, executable code, and graphics onto a single canvas.

Azure Notebooks currently supports Python 2, Python 3, R and F# and their popular packages (e.g for Python the Anaconda distro is preinstalled), but our focus for the moment will be on Python 3 since it’s the minimum required version for the Python SDK

Your first Azure notebooks project

To create your first Azure notebooks project, navigate to the home page
click on Try it now then login (any Microsoft, Gmail,…. email address).

Once successfully logged in, navigate to section My Projects

Click on button New Project :

Introduce a Project name and and ID then click on Create (It’s worth mentioning that you have to set the project as Public if you wish to be able to share your notebooks with other users)

Click on the newly created project :

Notebooks are organized by projects, this makes it very efficient to create separate projects depending on scripts, target hosts and target audience to share notebooks with.

To create a new notebook, click on menu then select Notebook. Add a name for the notebook and select your Python 3 version (either 3.5 or 3.6 as both compatible with the Python SDK)

Notebooks are organized into cells, each cell can contain one or multiple python scripts to execute.

First cell should always be the following one as it’s required to install first pma_python packages on the running server before being able to interact with the Python SDK.

From there on you can add as many cells as you wish to interact with the Python SDK via scripts we introduced on previous posts or ones you create yourself.

Once your notebook modified and saved, you can easily share the created project with other users via share button on My projects page.

The terminal

In addition to its intuitive and easy-to-use interface, Azure notebooks do provide provides access to a complete terminal running on the server. To access the terminal first click on the Jupyter icon in the upper left hand corner of your notebook server. Then click on the New button on the upper right hand side of the notebook list. Finally click Terminal.

Your newly created notebook is accessible on the running server and executing shell commands there which can be useful for downloading data, copying files, inspecting processes, or editing files with traditional Unix tools.

What whole slide images (WSIs) are made of

Whole Slide Images

If you already know about pyramidical image files, feel free to skip this paragraph. If you don’t, sticks around; it’s important to understand how microscopy data coming out of slide scanners is structured to be able to manipulate it.

It all starts with a physical slide: a physical slide is a thin piece of glass, with the dimensions

When a physical slide is registered in a digital fashion, it is translated into a 2-dimensional pixel matrix. At a 40X magnification, it takes a grid of4 x 4 pixels to represent 1 square micrometer. We can also say that the image has a resolution of 0.25 microns per pixel. This is also expressed as 4 pixels per micron (PPM).

All of this means that in order to present our 5 cm x 2 cm physical specimen from the first part of this tutorial series in a 40X resolution we need (5 * 10 * 1000 * 4) * (2 * 10 * 1000 * 4) = 200k x 80k = 16B pixels

Now clearly that image is way too big to load in memory all at once, and even with advanced compression techniques, the physical sizes of these is roughly around one gigabyte per slide. So what people have thought of is to package the image data as a pyramidal stack.

Pyramid of Cestius as a metaphore for pyramidal stack images. By Francesco Gasparetti from Senigallia, Italy – Piramide Cestia, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=2614848

Ok, perhaps not that kind of pyramid…

But you can imagine a very large image being downsampled a number of times until it receives a manageable size. You just keep dividing the number of pixels by two, and eventually you get a single image that still represents the whole slide, but is only maybe 782 x 312 pixels in size. This then becomes the top of your pyramid and we label it as zoomlevel 0.

At zoomlevel 1, we get a 1562 x 624 pixel image etc. It turns out that our original image of 200k x 80k pixels is found at zoomlevel 8. Projected onto our pyramid, we get something like this:

Worked out example (showing the different zoomlevels) of a pyramidal stack for a concrete whole slide image.

So the physical file representing the slide doesn’t just store the high-resolution image, it stored a pyramidal stack with as many zoomlevels as needed to reach the deepest level (or highest resolution). The idea is that depending on the magnification that you want to represent on the screen, you read data from a different zoomlevel.

Tiles

The pyramid stack works great up to certain resolution. But pretty quick we get into trouble and the images become too big once again to be shown in one pass. And of course, that is eventually what we want to do: Look at the images in their highest possible detail.

In order to work around this problem, the concept of tiles is introduced. The idea is that at each zoomlevel, a grid overlays the image data, arbitrarily breaking the image up in tiles. This leads to a representation like this:

Now, for any area of the slide that we want to display at any given time to the end-user, we can determine the optimal zoomlevel to select from, as well a select number of tiles that are sufficient to show the desired “field of view”, rather than asking the user to wait to download the entire (potentially huge!) image. This goes as follows:

Or, put the other way around (from the browser’s point of view):

So there you have it: whole slide images are nothing but tiled pyramid-shaped stacks of image data.

A look at PMA.view

Architecture

Apart from PMA.start, Pathomation also offers a professional range of products. Yes, professional is a euphemism for “not free”, but we do feel you get quite some value in return. And some of the money flows back to our developers so they can also keep working diligently on improving PMA.start, and the free product offering around it, including our SDKs and software plugin for ImageJ/FIJI.

At the core of it all always sits PMA.core. Even PMA.start runs on top of PMA.core; albeit a restricted version, that only can access local data on your personal system. Hence the name PMA.core.lite. The professional version, PMA.core, can do loads more, including making annotations, capture form meta-data, as well as track user activity in a 21CFR.11 compliant manner. Both PMA.core has been validated conform GAMP 5 guidelines.

In a different article on this blog we explained how big (and why!) these whole slide images get. PMA.core then is responsible for extracting tiles from the original images when the users wants it. These tiles can be extracted via one of our language-specific SDKs, or end-users can use a viewer software built on top of PMA.core, and understand how user (mouse) operations need to be translated into tile requests.

At Pathomation, our viewer software is PMA.view. Like PMA.core, it comes in two flavors: PMA.view and PMA.view.lite. The distinction is made in order to provide better interaction with the respective underlaying versions of PMA.core. One could also say that PMA.start as a product is the combination of PMA.view.lite and PMA.core.lite. PMA.view in turn interacts with (multiple instances of) PMA.core.

As you can suspect, Pathomation also offers other applications next to PMA.view, that are also built on top PMA.core. But that’s the focus of a different post (sneak preview of what we mean through our YouTube channel).

PMA.view features

Below is a screenshot of PMA.view. The main element of the user interface are a ribbon, a central viewing panel for slide visualization, and two side panel which in turn may contain one or more sections.

The content of the ribbon (as well as the number of tabs etc) is completely configurable through an XML file. Similarly, the content and sections of the side panels is configurable through XML configuration files. Editors for all are provided in PMA.view administrative interface. Syntax highlighting and restore options are provided as well.

The central viewport for slide viewing is a Zooming User Interface (ZUI); you navigate slides by panning left and right, up and down, and by zooming in and out. You can use the mouse scrollwheel or drag a rectangle with the mouse while holding down the shift-key (on your keyboard) to zoom in on a specific area of your choosing.

In the left panel you typically see a navigation tree, representing the slides hosted by PMA.core. PMA.view can connect to multiple PMA.core instances simultaneously. This is useful when involved in international collaboration, of even in a situation where you have a central hospital hub with several smaller satellite offices spread throughout a region. Just put a tile server in each location to prevent having to transport (digital or – worse – physical) slides around.

Apart from convenient slide management across multiple sites, PMA.view offers many other features that people have come to expect from modern slide viewers, including:

  • Capture structured or free text meta-data
  • Seamless support for different scanning modalities (brightfield, fluorescence, and z-stacking)
  • Brightness and contrast controls
  • On-slide annotations in arbitrary colors and shapes (rectangle, circle, freehand etc.)
  • Annotation toggling based on type and author

Slide sharing

One of the big selling points of digital pathology is sharing slide content, without the need to physically distribute the slides via regular mail. Apart from the obvious improvement this bring regarding speed, there’s a secondary advantage that you can send slide to multiple parties at the same time. The third advantage is that the slides can’t get lost in the mail or damaged during transport anymore. In return for that of course, we occasionally encounter over-eager spam filters.

Two important impediments that prevent slide sharing however are the following:

  • When I share a slide with you, I have to make sure to specify which file format I’m sharing with you, so you get get the appropriate viewer
  • When I share a slide with you, I have to upload a LOT of data to WeTransfer, Aspera, or a good ole’ fashion FTP site, where you in turn can download… a LOT of data… again.

Pathomation’s PMA.core and PMA.view combo solve both problems for you. PMA.core abstracts any proprietary slide file format to “just” pixels, and PMA.view allows you to share slides with a counter-party in the form of HTML hyperlinks.

How does this work? PMA.view has a dedicated “Share” button on its ribbon to create links that point directly to selected content. There are different kinds of content that you can share:

  • You can share all slides in a selected folder, thus mimicking a patient case
  • You can share an individual slide
  • You can share a pre-selected region of interest within a slide

Share links are always formatted the same, but they can be used in multiple ways. You can:

  • Use links directly as they are. You can share them with your buddies via email, during a Skype chat session, WebEx, GoToMeeting, whatever.
  • Convert links into scannable QR codes. When you’re giving a presentation during a conference, or in a classroom setting, text-based links are cumbersome to present. Ironically, text-based links are not well suited for print media, either, for the same reason: it’s too easy to make a type copying the link character by character. It’s more convenient to present a QR-code then that people can scan with their smartphones or tablets, and immediately view, or convert into the actual text-link for use elsewhere.
  • Embed them into your own web-content. If you still have an actual website, that is, a place on a server somewhere where you deposit your own HTML code, you can now sprinkle live slides throughout the site and have them embedded in an <iframe> tag. Because not everybody knows how these work, PMA.view will give you the necessary HTML code that you can past directly into your own website. You’ll notice that within the HTML snippet, the plain old original link from above resurfaces. And it gets better: Whether you use plain old notepad to make your website in the traditional sense, or you use a CMS like WordPress or Drupal, an LMS like Canvas, Moodle, or Blackboard, or a social media platform: these too boil down to sending HTML code to the browser, so there’s usually a way to use <iframe>s there, too.

Remember the Zooming User Interface (ZUI) terminology we introduced you to earlier? Well, last but not least, when you click on a PMA.view slide-link, you’re essentially instantiating our ZUI. There are no plugins required, nothing to download, it’s just all basic JavaScript and HTML 5. As a consequence, it’s also easy to configure the layout of the ZUI. And that’s what the last set of options at the bottom of the share dialog is about.

How do you want your audience to experience your slide when they go to it? Do you want them to see the barcode? The overview?… It’s all in your hands, and we think this level of control and flexibility is pretty awesome.

Organizing pipelines

So as awesome as we think ourselves to be, there’s always room for improvement, right? So here’s a scenario that a customer of our came across recently:

  • We have a large number of slides that we want to embed throughout various pages of our proprietary customer portal. We like the PMA.view slide embedding <iframe> capability, but it’s really a pain to generate all these links one by one. Because there are so many, it’s also rather tedious making sure that they are ALL clicked on.

Is there a better way? Yes, there is.

When you look at the links that are generated, it’s not rocket science to figure out how they’re built. The customer wanted to have a link to a thumbnail of a slide, which always looks like this:

http://yourserver/view/EmbedThumbnail/{seemingly random charachters}

As well as a link to the actual slide ZUI, which always looks like this:

http://yourserver/view/Embed/{seemingly random charachters}

The character string at the end of these links is a particular slide’s unique identifier (UID). When we switch over to our PHP SDK, we can write just a few lines of code that gets all UIDs from all slides in a particular directory:

<?php
require_once "lib_pathomation.php";
?>
<html><head><title>All thumbnails for all slides</title></head><body>
<?php
$session = connect("http://yourserver/core", "username", "secret");
echo "SessionID for universal viewer account = ".$session."<br>";
foreach (getSlides("rootdir/subdir", $session) as $slide) {
    echo "<h3>$slide</h6>";
    $uid = getUID($slide, $session);
    $thumb = "http://yourserver/view/EmbedThumbnail/$uid";
    echo "<a href='$thumb'><img border=0 src='$thumb' height='50' align='left' /></a>";
    echo "<tt>$thumb</tt><br />";
    echo "<br clear='all'>";
}
?>
</body></html>

 

Of course, you can modify this script anyway you want to compensate for your particular directory hierarchy and structure.

Then, it was just a matter of simple string concatenation to provide the client with a custom website where they were able to retrieve all of the links to their slides in batch. As the page interact with PMA.core directly at that point,

So, for our client, we figured out how to organize a pipeline to facilitate their content production process. We user our PHP SDK on top of PMA.core to generate links that in turn exploit the slide sharing capabilities of PMA.view. Now that’s cool!

But we still want more

Do you have a scenario that you have difficulties with or want to see optimized? Let us know; we’ll he happy to talk to you.

 

Slide visualization in Python

Now that both PHP and Java have methods for embedded slide visualization, we can’t leave Python out. Originally we didn’t think there would be much need for this, but it’s at least confusing to have certain methods available in one version of our SDK, while not in others.

In addition, interactive visualization is definitely a thing in Python; just have a look at what you can do with Bokeh. Ideally and ultimately, we’d like to add digital pathology capabilities to an already existing framework like Bokeh, but in this blog post we’ll just explore how you can embed a slide into your IPython code as is.

As PMA.python is not a standard library, it bears to start your notebooks with the necessary detection code for the library. If it doesn’t work, it’s bad manners to leave your users in the dark, so we’ll provide some pointers on what needs to be done, too:


try:
    from pma_python import core
    print("PMA.python loaded: " + core.__version__)
except ImportError:
    print("PMA.python not found")
    print("You don't have the PIP PMA.python package installed.\n"
        + "Please obtain the library through 'python -m pip install pma_python'")

If all goes well, you’ll see something like this:

Once you’re assured the PMA.python library is good to go, you should probably verify that you can connect to your PMA.core instance (which can be PMA.start, too, of course; just leave the username and password out in that case):


server = "http://yourserverhere/pma.core/"
user = "your_username"
pwd = "your_password"
slide = "rootdir/subdir/test.scn"
session = core.connect(server, user, pwd)
if (session):
    print("Successfully connected to " + server + ": " + session)
else:
    print("Unable to connect to PMA.core " + server + ". Did you specify the right credentials?")

If all goes well, you should get a message that reads like this:

Successfully connected to http://yourserverhere/pma.core

Finally, the visualization part. Note that Pathomation provides a complete front-end Javascript-framework for digital pathology. In order to bring these capabilities into (I)Python then, it sufficient to write some encapsulation code around this basic demonstration code:


def show_slide(server, session, slide):
    try:
        from IPython.core.display import display, HTML
    except ImportError:
        print("Unable to render slide inline. Make sure that you are running this code within IPython")
        return
    
    render = """
        <script src='""" + server + """scripts/pma.ui/pma.ui.view.min.js' type="text/javascript"></script>

<div id="viewer" style="height: 500px;"></div>
<script type="text/javascript">
            // initialize the viewport
            var viewport = new PMA.UI.View.Viewport({
                    caller: "Jupyter",
                    element: "#viewer",
                    image: '""" + slide + """',
                    serverUrls: ['"""+ server + """'],
                    sessionID: '""" + session + """'
                },
                function () {
                    console.log("Success!");
                },
                function () {
                    console.log("Error! Check the console for details.");
                });
        </script>"""
display(HTML(render))

Our method is a bit more bulky than strictly needed; it’s robust in this sense that it makes sure that it is actually running in an IPython environment like Anaconda, and will also provide output to the JavaScript console in case the slide can load for some reason.

Rendering a slide inline within a Python / Jupyter notebook is now trivial (just make sure you ran the cell in which you define the above method, before invoking the following piece of code):


show_slide(server, session, slide)

The result look like this:

There is never an excuse not to use exploratory data analysis to get initial insights in your data. As switching environments, browsers, screens… can be tedious, and notebooks are meant to encapsulate complete experiments, interactive visualization of select whole slide images may just be one more thing you want to include.

The .ipynb file can be downloaded here and used as a starting point in your own work.

By studying the PMA.UI framework, you can learn more about how to further modify and customize your interactive views.

Now, anybody out there who wants to pick up our Bokeh challenge?