More business intelligence

The story of the three Qs

In this article, we explain how Pathomation was recently able to assist its one of its customers with Performance Qualification tests for a new slide scanner.

At our customer’s lab, each piece of equipment that they take into operation, goes through a rigorous qualification pipeline before putting it to work for day-to-day lab activities.

This goes for slide scanners as well.

The qualification pipeline (the “validation procedure”) consists of three steps:

  • Installation Qualification (IQ)
  • Operational Qualification (OQ)
  • Performance Qualification (PQ)

The first step is straightforward: you’re asking yourself if the scanner can scan your slides or not. Is the type of glass slide that your lab use, in combination with the coverslip, compatible with the new scanner? Is the barcode on the slide label scanned properly, too? What about (automated) tissue detection?…

The second step is operational qualification: can the new equipment handle the edge parameters of your day-to-day operations? If you plan to feed the scanner up to 400 slides per day; will that work?

The third step in the validation can be somewhat subjective, and it’s here that Pathomation’s software platform came in particularly handy.

Performance Qualification

Performance Qualification (PQ) is a test procedure that takes place in order to verify if a new piece of equipment is good enough to be put to work in the daily workflow of a company. It happens after the hardware has already gone through IQ and OQ.

Our customer recently wanted to know how their various scanners compare to a new one in terms of speed. They took a representative set of slides, divided it in three groups (“requests”), and then put each scanner to work.

their various scanners compare to a new one in terms of speed. They took a representative set of slides, divided it in three groups (“requests”), and then put each scanner to work.

In the vendor’s viewer software, one of the parameters that could be seen was “scanning duration”. Considering the side of the dataset however, opening each slide individually in the viewer software, noting down the scanning time, recording it into an Excel sheet etc., would have been a very tedious task.

So they turned to Pathomation for help: Can Pathomation’s software be used to create a table with all scan duration values for all scanned slides?

Our API

Pathomation’s API offers a get_slide_info() call. We used it in our first article on business intelligence to extract specific bits of information with it. The method be default returns a nested hierarchy of information.

But not every scanner exports the same information. Some of the slide information we expose ourselves through ImageInfo, some not. This is because vendor 1 exposes (A, B, C, D, E), vendor 2 exposes (A, B, F, G, H), vendor 3 exposes (A, B, C, I, J, K, L) etc. Therefore, Pathomation offers the common denominator information only, something like (A, B, C).

In order to maintain some standardization across the different vendors, part of the returned information by get_slide_info is a MetaData array that contains key-value pairs that may or may not be provided by your scanner vendor.

As it turns out, we didn’t have the scanning time in there yet, but because of the already provided structure, it was straightforward to add it.

Like our first business intelligence exercise, we then write a wrapper method to extract the scanning duration as we need it:

def get_scanning_duration(slide_ref):
    info = core.get_slide_info(slide)
    meta = info["MetaData"]
    for meta_el in meta:
        if (meta_el["Name"] == "ScanningDuration"):
            return meta_el["Value"]
    return -1

The remainder of the script is straightforward:

  • Loop over the different scanner output
    • Foreach scanner (“request”), get all the slides (recursively in our case)
      • Foreach slide, extract the scanning duration
  • Wrap all output into a Pandas dataframe structure
  • Export the Dataframe to a spreadsheet

A word about that last step. On occasion, people have called us old-fashioned for this one. Surely Excel is spread wide and far enough by now so that it can be considered a de facto “standard” file format, too, can’t it?

I disagree. I still prefer to use csv instead of Excel. Why? Because csv files are simple, and transportable to many other platforms and applications. Our data in this case consists of a single table with three columns. It’s simple. We don’t need a complex data format to store this kind of data.

Generating fancy file format output is not part of the assignment here. Keep It Simple.

Our final code looks like this and can be downloaded as a Jupyter notebook, so you can play around with it yourself.


server = "http://***/***"
user = "***"
pwd = "***"
print("Session initiated ", core.connect(server, user, pwd))
requests = ["RQ105", "RQ204", "RQ695"]
base_folder = " Images/pq"
print(len(core.get_directories(base_folder)), " subfolders detected in root base folder", base_folder)
s_times = []
for req in requests:
    print(base_folder + "/" + req)
    for slide in core.get_slides(base_folder + "/" + req, recursive=True):
        s_times.append({"slide": str(slide).replace(base_folder, ""), "scan_time": get_scanning_duration(slide), "request": str(req)})
scan_times = pd.DataFrame(s_times, columns=["request", "slide", "scan_time"])
scan_times.to_csv("scanning duration.csv")

What about the results?

We can import the resulting CSV file in Excel and see what it looks like:

As we’re interested in comparing the different scanners to one another, one direct way to do this is with a pivot-chart.

These are the results. What do you think? Do all three scanners perform equally? Does the new scanner pass performance qualification (PQ)?

Your challenge here

This is how Pathomation works with its customers.

Our philosophy has been and remains to develop local, and then scale as you handle more complex scenarios. You do not need the commercial version of PMA.core to get to work with the code in this post. The Jupyter notebook that comes with this blog post is suited for use for both PMA.start and PMA.core.

Do you use Pathomation software in your daily workflows already? Tell us your business intelligence challenge and perhaps we’ll address is in an upcoming post!

A look at PMA.slidebox

So what if you “just” want to show your slides?

Education is one of the main application domains of digital pathology. And there are many instances where you just have a couple of slides that you want people to look at. When Pathomation first became involved in deploying its software for facilitating seminars, we used PMA.view.

But while it’s possible to do this, PMA.view is not a good solution for this particular problem:

  • People still need to login in PMA.view
  • PMA.view requires telling people where to navigate to (which root-directories / paths); your root-directories on the PMA.core side of things may not exactly reflect the content that you want people to see.
  • PMA.core is folder-based navigation, and PMA.view is too. This means that the concept of a case (a group of slides belonging to a patient or experiment) is not intuitively represented
  • neither PMA.view nor PMA.core support any of the visual cue-elements that we’ve all gotten accustomed to in recent years such as avatars.
  • The learning curve of PMA.view is still too steep for people that just need to look at slides. PMA.view is overshooting for what you want people to actually experience

How did people used to do it? They traveled to conferences with a slidebox in their hand luggage. Within the slidebox; neatly organized slides, sorted by case. We don’t want to be sensational here and say that the slides got lost all the same, or that they broke all the time, or got confiscated by security in a post-9/11 world (glass slides + ninja pathologist = impromptu shiroken?).

But: things could happen when traveling with physical slides, and the most likely issue was probably still somebody forgetting to take their slides with them in the first place!

Also: when traveling with physical slides you depend heavily on the organization’s talent of providing and calibrating multi-headed microscope equipment. As more people attend, aligning all the optics of these becomes ever harder, and for large groups this is just impractical.

Cue PMA.slidebox

So we got thinking… If people are used to physical slideboxes, why not just make a virtual slidebox? This is exactly what PMA.slidebox is and does!

PMA.slidebox, like all our software, relies on PMA.core. It means that you can use all the great features of PMA.core (different root-directories, access control), without having to explain it to your audience.

All your audience sees, without to register or login or need to install or download anything (zero footprint) is this:

So how does it work? PMA.slidebox shows up to four collections in the top-left corner of the screen (screenshot only shows three). When you select a collection, you see the “cases” appear underneath it, along with the slides for each “case”.

We put the word “case” between quotes deliberately because you don’t have to set it up this way. You can have a simple list of slides without any hierarchy or structure to it, and PMA.slidebox will pick it up. Similarly, if you only have a couple of cases, you could turn those into individual collections, and present them as such.

What you want to show and how you want to show it is completely up to you. It’s just like a real slidebox: you put the slides in that you want, and you organize them the way you want them, too.

Want to give it a go yourself? Here’s an example how such a virtual slidebox works in practice: http://host.pathomation.com/p0022_slidebox/

Configuration

PMA.slidebox is flexible. It is hosted on a website, somewhere (can be on your infrastructure, or on ours). If you don’t have PHP, we can configure it for you; but if you do, you can configure everything by yourself via a configuration panel.

What you first need to do is decide how you want to have everything structured. You can build a hierarchy of up to three deep, with the following levels:

As you have only 4 cases in the above screenshot, you could simplify your hierarchy like this:

While PMA.slidebox is flexible, we should point out that it is necessary to have some kind of hierarchy at least. You cannot just dump all your slides into a single folder and expect the software to figure it out from there.

Note: if you do have large repositories of slides, and you want structured case creation and organization, you should have a look at PMA.control.

Here’s what the setup looks like when your spread your cases across only two collections:

And here’s what that same group of slides looks like, but this time with each case being defined as its own separate collection:

In closing

With our PMA.slidebox product, Pathomation solves the problem of mass-distribution of slide collections. When you just want to share your slides with people in a somewhat organized (collection and cases) fashion, PMA.slidebox is the perfect solution for you. The easy to use configuration panel behind the front-end makes it a breeze to point to the exact content that you want to display, and under no circumstances does the end-user have to do anything else except click on the URL that you provide them with.

Find out more about PMA.slidebox at our website at http://www.pathomation.com/pma-slidebox.

Pathomation on the web

Web presence

We want talk a bit more about our different communication channels this week.

When you’re reading this article, you’ve obviously found one of them.

The RealData blog at http://realdata.pathomation.com is a wordpress website that we set up a couple of years ago to allow us to communicate about or explain topics that don’t necessarily have a dedicated place yet on our “main” company website at http://www.pathomation.com.

Pathomation is a small company, and things can move quickly. We simply don’t have time to re-do our website each month or so because our product offering changes, or because there’s a spurt in creative writing that needs to find a landing spot and reach an audience. A free-form blog then seemed like a good idea.

And we still think it is 😊

There are companies whose website is a blog, but we do think there’s still a need to offer structured information and a general product overview as well.

So while you can’t constantly rewrite your website, we did manage to re-work http://www.pathomation.com this month and we’re pretty proud of the result. If you haven’t checked it out yet, go ahead and do so. It’s a lot more comprehensive than anything we’ve had up before.

And if you’ve read this blog, of course you’ve heard about PMA.start before, our free whole slide image / digital pathology viewer software that can be used by anybody for anything to manage their local slide content. Our http://free.pathomation.com website is the third axis of our online web-presence strategy.

PMA.start comes with no limitations, except the one that is built-in: you can only use it on local content. If you want to share data with colleagues via a network, you need to upgrade to our professional PMA.core product. If you’re not quite sure what that’s all about, you can still sign up for our beta program until the end of this month (just a few days left, so be quick).

Check out our beta landing page at https://www.pathomation.com/beta

And there you have it; our three pronged strategy to provide you, our valued customer and end-user, with background information about the Pathomation universe, and our great products.

A look at PMA.control

Educational needs and virtual microscopy

Pathomation’s software platform includes software for a number of digital pathology application.

At the most basic end of the spectrum we offer PMA.start for local viewing and research purposes. Even with PMA.start, you get full access to our front-end Javascript-based visualization framework, and back-end automation API (more on those in a separate blogpost).

At the opposite end of the spectrum, we offer a sophisticated training software package called PMA.control. Consider this:

  • Current slide-based approaches to microscopy teaching face the logistical challenge of transporting people, slides and training material.
  • The size, location and number of instructional sessions is limited (in time, place, and size)
  • Concurrent training on the same material is not possible. One microscope can offer one unique slide. Musical chair… erh… microscopes, anyone?

We identified a need has evolved to train students and professionals alike to accurately evaluate tissue material with a broad range of (assay-specific) algorithms. Systematically organizing training materials and bringing training participants together in a virtual settings, is what it’s all about.

Projects

You start in PMA.control by setting up a project. A project describes what it is that you want to organize instructional material around. A project can represent a course at a university, or a drug for a pharma company. A project can delineate a geographical territory. It’s totally up to you.

Projects have various properties. Apart from their name, you can identify them with an icon. This is convenient as your list of project becomes larger and more people become involved in your project. Speaking of involved people: you can identify one or more project managers. This is particularly useful for larger organizations, where one person is seldom available all the time, but it’s relatively easy to find a replacement in case of absence.

Training sessions

A project consists of training sessions. Again, what these mean semantically is completely up to you and your imagination:

  1. One client of ours uses PMA.control to organize weekly seminars in various places across the globe. Each seminar/country combination translates into its own training session in PMA.control, with specific start- and end-dates
  2. One medical school uses PMA.control to train residents. A training session can refer to the class coming in on a particular week, but it can also be linked to small research projects that students participate in.
  3. Another client integrates PMA.control into a web-portal, so all training sessions by definitions are open ended. The client has a drug portfolio, so rather than have them be restricted in time, training sessions refer to various indications for different drugs.

Safe to say that training sessions can be exploited for diverse applications.

Case collections

All right, we have projects and training sessions… When do we put digital slide content in them? This is where case collections come in.

The idea is that you organize your training sessions in different parts. During a three-day seminar, you could have one day dedicated to guided lectures (that’s a case collection with its own slides). On day two, you allow people to evaluate themselves through some hands-on exercises (on a second case collection, which holds different slides than the first one). On day three, it’s crunch time, and attendees take an actual test to see how well they absorbed the material (on yet another third case collection with once again unique slides).

A case collection is coupled to a project, but is independent from any training sessions, so you can re-use them throughout the curriculum that you’re building. Think about it; otherwise if you organized the same training session repeatedly, you would continuously have to re-define the case collections, too!

A case collection consists of cases, which in turn consist of slides. You can choose to construct a case such that it pre-focuses on a particular region of interest (ROI) within a slide. You can also add various meta-data at case-level as well as slide-level. Not unimportantly: you can configure the initial rotation angle for each slide in the case. This is particularly relevant if your case consists of serial sections that may not be all in the exact same orientation.

Interaction modes

Remember the three-day seminar we just mentioned? And you also remember that we called the software “PMA.control”, right?

Ok.

The name PMA.control refers to the fact that the owner of the software is in total control of what participants within a session at any given time.

Consider the following situations during our three-day seminar:

  • On the first day, the instructor wants his pupils to stay nicely in the kiddie pool. They should give their undivided attention only to the material intended for the first day.
  • However, this one person in the afternoon of the first day is taking the seminar for the second time. She asks if she can skip ahead already to the content from day two.
  • On the second day, people are learning and experimenting with a different dataset. Do they understand the material well enough to pass the test on the last day? Clearly the material from day three must not be visible to anybody yet.
  • On day three, it’s crunch time. The actual test material is now released. Depending on the intention of the instructor, earlier discussed material can now even be closed off.

For all these conditions, PMA.control offers interaction modes. An interaction mode controls if and how a case collection presents itself to the end-user.

Training sessions consist of multiple case collections and multiple users participate in a training session. At any given time, the instructor of a session can specify whether a particular case collection within the session is accessible to a specific user and how.

When signing into PMA.control, the instructor sees a grid with users and case collections. This grid can be used to control what user interacts with what case collection.

PMA.control ships with a number of default interaction modes, but these can be customized via a matrix interface where one stipulates what properties are associated with each.

Let’s see how interaction modes come into play during our three-day seminar:

  • Before leaving for the seminar, the instructor applies the interaction mode “locked” to all case collections for all participants.
  • On the morning of the first day, the instructor walks in the seminar room an hour early an sets the interaction mode of the first case collection to “browse” for everybody to see. That was easy! He goes to the hotel bar to grab a nice cup of coffee.
  • In the afternoon of the first day, an attendee asks about being allowed to skip ahead with the material a bit. The instructor asks if any other people are in the same situation. For those, he sets the interaction mode for the second case collection to “self-test”. Users can interactively fill out a pre-determined scoring form that goes with each case, and they can see each other’s results to discuss their findings amongst themselves.
  • On the second day, the second case collection is unlocked for everybody. Everybody now sees the second case collection in self-test mode. The first case collection remains in “browse” mode, so participants can use these as reference material. The third case collection remains off limits today.
  • On the morning of the third day, the first thing the instructor does is reset the first and second case collection in the training session to “locked”. Rien ne va plus. The third and last case collection is switched to “test”, and students can take their final assessment. Users can interactively fill out a pre-determined scoring form that goes with each case, but they can’t see each other’s data anymore.

It’s possible for an instructor to be an instructor for one session, but only a “regular” participant in another. This is especially useful in medical schools where different specialists consult with and train each other on various subject matters on a continuous basis.

Conclusion

PMA.control allows you to assemble whole-slide images, scoring forms, consensus scores and scoring manuals into digital training modules. Full service, no-hassle, management of the software is offered through the PathoTrainer service, which is organized through Pathomation’s parent company CellCarta .

If you’re interested in higher education teaching of histology or pathology, make sure to also have a look at Pathomation’s education landing page. For pharma and CRO application, we have a separate landing page.

Look sharp

Blurred or focused?

In an earlier post, we offered some ideas on how to detect tissue in a scanned slide.

The next step people often want to take is to examine how the sharpness of the tissue is distributed throughout the slide. No scanner catches all, and you will see blurry areas in pretty much all your scans.

It then helps to be able to differentiate the tiles that have poor focus from the tiles that are sharp. As it turns out, we already approached a likewise problem when we were determining the sharpest tiles within z-stacked slides.

For this particular exercise (focus variation within a single plane) Sied Kebir in Germany was kind enough to provide us with relevant sample data for this one.

And here are two relevant extracted tiles to illustrate the problem.

Sied is looking for a method to systematically map the blurry tiles vs the crisp ones.

Blur detection with OpenCV

A good introduction on blur detection with the OpenCV library is offered by pysource in the following video tutorial:

Let’s see what that gives when we apply it to Sied’s sample images:

Great! The numbers are not as far apart as in pysource’s video, but that makes sense: even in focused tissue we’ll find many more gradients and sloping color ranges than in the average picture of person sitting a room, which contains distinctive features like outlined walls and facial contours.

Pysource suggests converting your original images to grayscale. Does it make a difference? In our experiments we find different values (of course), but the trend is the same. Since the retention color of leads to slightly bigger differences, we’re inclined to sticking with the original color images.

If you do want to convert your color tiles to grayscale, here’s a great StackOverflow article about how this works.

Distribution and Exploratory Data Analysis (EDA)

Our next step is to put it all in a loop and systematically examine how sharp or blurry each individual tile actually is. For semantic ease, we create a get_blurriness function:


from pma_python import core
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
import sys

def is_tissue(tile):
    pixels = np.array(tile).flatten()
    mean_threahold = np.mean(pixels) < 192   # 75th percentile
    std_threshold = np.std(pixels) < 75
    return mean_threahold == True and std_threshold == True

def is_whitespace(tile):
    return not is_tissue(tile)

def get_sharpness(img):
    pixels = np.array(img).flatten()
    return cv2.Laplacian(pixels, cv2.CV_64F).var()

slide = "C:/wsi/sied/test.svs"
max_zl = 5 # or set to core.get_max_zoomlevel(slide)
dims = core.get_zoomlevels_dict(slide)[max_zl]

means = []
stds = []
tissue_map = []
sharp_map = []

for x in range(0, dims[0]):
    for y in range(0, dims[1]):
        tile = core.get_tile(slide, x=x, y=y, zoomlevel=max_zl)
        tiss = is_tissue(tile)
        tissue_map.append(tiss)
        if (tiss):
            sharp_map.append(get_sharpness(tile))
        else:
            sharp_map.append(0)
    print(".", end="")
    sys.stdout.flush()
print()

After getting all the result, it is worth examining the histogram of this data.

Ideally, we would like to see a bimodal distribution (sharp vs blurred), but that’s not what we see here. The reason is that unevenness in tissue is actually not distributed unevenly.

Putting it all together

Now that we know what we can expect, it’s just a matter of putting it all together. and use it to construct an image map, in similar fashion as we did for our original tissue detection.

The final result looks like this:

In closing

Sectioning a slide is a continuous operation, and except for folding artifacts, you shouldn’t expect any abrupt changes. Tissue can be expected to gradually fade in and out of focus. And while scanners have gotten better at compensating for uneven tissue thickness, we’re not quite there yet, and automated analysis based on a technique like we’re here proposing can help.

Last but now least, we decided on a new way to organize our sample code. At http://host.pathomation.com/realdata/ you can from now on see all sample data in a single location. For example, the Jupyter notebook that belongs with this entry, is available at http://host.pathomation.com/realdata/jupyter/realdata%20033%20-%20look%20sharp.ipynb

Exploiting the Pathomation software stack for business intelligence (BI)

A customer query

As part of our commercial offering, Pathomation offers hosting services for those organization that don’t want to make a long term investment in on-premise server hardware, or whose server farm is incompatible with our system requirements (PMA.core requires a Microsoft Windows stack).

One such customer came to us recently and asked how much space they were consuming on the virtual machine they rented from us.

This was our answer:

And this is how we obtained the answer. We wrote the following script:

In Python:


from pma_python import core
def get_slide_size(session, slideRef):
    info = core.get_slide_info(slideRef, sessionID = session)
    return info["PhysicalSize"]
def get_total_storage(session, path):
    total = 0
    for slide in core.get_slides(path, session, True):
        total = total + get_slide_size(session, slide)
    return total
def map_storage_usage(srv, usr, pwd):
    sess = core.connect(srv, usr, pwd)
    map = {}
    for rd in core.get_root_directories(sess):
        map[rd] = get_total_storage(sess, rd)
    return map

You can also do this in PHP, of course:


<?php
require "lib_pathomation.php"; 	// PMA.php library

use Pathomation\PmaPhp\Core;

function getTotalStorage($sessionID, $dir) {
        $slides = Core::getSlides($dir, $sessionID, $recursive = FALSE);
        $infos = Core::GetSlidesInfo($slides, $sessionID);
        
        echo "Got slides for ".$dir.PHP_EOL;

        $func = function($value) {
            return $value["PhysicalSize"];
        };

        $s = array_sum(array_map($func, $infos));
        return $s;
}

function mapStorageUsage($serverUrl, $username, $password) {
    $sessionID = Core::Connect($serverUrl, $username, $password);
    
    $map = array();
    $rootdirs = Core::getRootDirectories($sessionID);
    foreach ($rootdirs as $rd) {
        $map[$rd] = getTotalStorage($sessionID, $rd);
        
    }
    return $map;
}
?>

Creating a dictionary that contains the number of consumed bytes per root-directory now comes down to using just a single line of code:

In Python:

print(map_storage_usage("http://server/pma.core", "user", "secret_password"))

And in PHP:


print_r( mapStorageUsage ("http://server/pma.core", "user", "secret_password"));

Making our code mode robust

Unfortunately, chances are that if you’ve been running your PMA.core tile server for a while, the script comes down crashing miserably. It the “should have could have would have” syndrome of programming. There’s probably a meme for this out there somewhere (in the interest of productivity, we won’t search for it ourselves but let you look for that one). What it comes down to is: mounting points and access permissions chance, and at some point in time in any large enough data repository some files are going to end up corrupt, meaning either the core.get_slides() call is going to go wrong, or the core.get_slide_info() call.

So to make our script a bit more robust, we can add try… catch… exception handling in Python:


def get_slide_size(session, slideRef):
    try:    
        info = core.get_slide_info(slideRef, sessionID = session)
        return info["PhysicalSize"]
    except:
        print("Unable to get slide information from", slideRef)
        return 0

def get_total_storage(session, path):
    total = 0
    try:
        for slide in core.get_slides(path, session, True):
            total = total + get_slide_size(session, slide)
    except:
        print("unable to get data from", path)
    return total

def map_storage_usage(srv, usr, pwd):
    sess = core.connect(srv, usr, pwd)
    map = {}
    for rd in core.get_root_directories(sess):
        map[rd] = get_total_storage(sess, rd)
    return map

As well as in PHP:


<?php
require_once "lib_pathomation.php";

use Pathomation\PmaPhp\Core;

function getTotalStorage($sessionID, $dir) {
    try {
        $slides = Core::getSlides($dir, $sessionID, $recursive = FALSE);
        $infos = Core::GetSlidesInfo($slides, $sessionID);
        
        $func = function($value) {
            return $value["PhysicalSize"];
        };

        $s = array_sum(array_map($func, $infos));
        return $s;
    }
    catch(Exception $e) {
        // echo "unable to get data from ".$dir.PHP_EOL;
    }
}

function mapStorageUsage($serverUrl, $username, $password) {
    $sessionID = Core::Connect($serverUrl, $username, $password);    
    $map = array();
    $rootdirs = Core::getRootDirectories($sessionID);
    foreach ($rootdirs as $rd) {
        $map[$rd] = getTotalStorage($sessionID, $rd);   
    }
    return $map;
}

print_r(mapStorageUsage("http:/server/core/", "user", "secret"));
?>

In Python, you can also add the prettyprint library to clean up the output a bit:

And in PHP, we add a convenient method to make the numbers a bit easier to read:


function human_filesize($bytes, $decimals = 2) {
$size = array('B','kB','MB','GB','TB','PB','EB','ZB','YB');
$factor = floor((strlen($bytes) - 1) / 3);
return sprintf("%.{$decimals}f", $bytes / pow(1024, $factor)) . @$size[$factor];
}

Input for business intelligence (BI)

In this article we showed how you can use automation to let Pathomation’s PMA.core tile server generate an overview report of how much space your slides consume. Depending on your specific folder structure, you can further customize this for breakdowns into sizable data-morsels that fit your particular appetite.

Given the size of the average size of a slide, this kind of information can be vital to your organization: Slides can accumulate fast, and it is important to keep a handle on their growth and space occupation.

Pathomation’s software platform is more than just a slide viewing solution: it can help to generate insights in your storage resource consumption and be used as a veritable planning and evaluation tool before actual investments take place.

A look at PMA.core

The core

PMA.core is the centerpiece of the Pathomation software platform for digital microscopy. PMA.core is essentially a tile server. It does all the magic described in our article on whole slide images, and is optimized to serve you the correct field of view, any time, any place, on any device. PMA.core enables digital microscopy content when and where you want it.

Our free viewer, PMA.start, is built on top of PMA.core technology as well. PMA.start contains a stripped version of PMA.core, lovingly referred to as PMA.core.lite 🙂

PMA.core supports the same file formats as PMA.start does, and then some.

It acts as an “honest broker”, by offering pixels from as many vendor formats as possible.

Storage

With PMA.start, you’re limited to accessing slide content that’s stored on your local hard disk. External hard disks are supported as well, but at one point you end up with multiple people in your organization that need to access the same slide content. At that point, PMA.core’s central storage capabilities come into play.

PMA.core is typically installed on a (Windows) server machine and can access a wider variety of storage media than PMA.start can. You can store your virtual slides on the server’s local hard disk, but as your data grows, this is probably not the place you want to keep them. So you can offload your slides to networked storage, or even S3 bucket repositories (object storage).

Pathomation does not, in contrast with other vendors, require a formal slide registration or import process to take place. Of course our software does need to know where the slides are. This is done by defining a “root-directory”, which is in its most generic terminology “a place where your slides are stored”.

A root-directory can be location on the server’s hard disk, like c:\wsi. You can instruct your slide scanner to drop off new virtual slides on a network share, and likewise point PMA.core to \\central_server\incoming_slides\. Finally, you can store long-term reference material in an AWS bucket and define a root-directory that points to the bucket. The below screenshot shows a mixture of S3- and HDD-derived rootdirectories in one of our installations:

After defining your root-directory, the slides are there, or they are not, and representation of them is instant. An implication of this that you can manage your slides with the tools that you prefer; any way you want to. You can use the Windows explorer, or even using the command-line, should that end up being more convenient for you. Your S3 data can be managed through the AWS console, CloudBerry tools, or S3 explorer.

Security – Authentication

Another important aspect of PMA.core is access control. PMA.start is “always on”; no security credentials are checked when connecting to it. PMA.core in contrast requires authentication, either interactively through a login dialog, or automatically through the back-end API. In either case, upon success, a SessionID is generated that is used to track a user’s activity from thereon.

User accounts can be created interactively through the PMA.core user interface, or controlled through use of the API. Depending on your environment, a number of password restrictions can be applied. Integration with LDAP providers is also possible.

User accounts can be re-used simultaneously in multiple applications. You can be logged in through the PMA.core user interface, and at the same time use the same credentials to run an interactive script in Jupyter (using the PMA.core interface to monitor progress).

The interface in PMA.core itself at all times gives an overview what users are connected through what applications, and even allows an administrator to terminate specific sessions.

Security – Authorization

Our software supports authorization on top of authentication.

User permissions in PMA.core are kept simple and straightforward: a user account can have the Administrative flag checked or not, meaning that they can get access to PMA.core directly, or only indirectly through other downstream client application like PMA.view, PMA.control or the API. Another useful attribute to be aware of is CanAnnotate, which is used to control whether somebody can make annotations on top of a slide or not. Finally, an account can be suspended. This can be temporary, or can be mandated from a regulatory point of view as an alternative for deletion.

A root-directory can be tagged either as “public” or “private”. A public root-directory is a root-directory that is available to all authenticated users. In contrast, when tagged as “private”, the root-directory has an accompanying Access Control List (ACL) that determines who can access content in the root-directory.

The screenshot below shows the Administrative and Suspended flags for my individual user account, as well as what public and private root-directories I do or do not have access to:

Future versions of PMA.core can be expected to offer CRUD granularity.

A powerful forms engine

Form data exists everywhere. Information can be captured informally, like the stain used, or as detailed as an Electronic Lab Request Form (ELRF). This is why Pathomation offers the possibility to define forms as structured and controllable data entities. A form can consist of a couple of simple text-fields, or be linked to pre-defined (ontology-derived) dictionaries. Various other Pathomation software platform components help in populating these forms, including PMA.view.

Forms can be accompanied by ACLs. In order to avoid redundancy, a form ACL consists of a list of root-directories rather then user accounts. In a project-oriented environment, it makes more sense that certain forms apply to certain root-directories which represent types of slides. Similarly, in a clinical environment, it makes sense to have slides organized in root-directories per application-type or by processing-stage. Freshly scanned slides that haven’t undergone a QA-check yet can be expected to have different form-data associated with them than FISH-slides.

On-slide annotations

PMA.core support graphical on-side annotations. We support three types:

  • Native annotations embedded within a vendor’s file format
  • Third-party annotations coming from non-specific (image analysis) software
  • Pathomation annotations

Pathomation-created annotations are the easiest to understand. You have a slide, and you want to indicate a region of interest on it. This region of interest can be necrotic tissue, or proliferated tumor cells. For teaching purposes, you could have a blood smear and highlight the different immune-celltypes.

Pathomation annotations are stored as WKT and can be anything that can be encoded in WKT (which is a lot). You need a downstream client to create them, but the basic viewer included in PMA.core can be used to visualize them, and our PMA.UI JavaScript framework can be used to create your own annotation workflows.

You could run an algorithm that does tissue detection and pre-annotates these regions for you.

In addition to making your own annotations, Pathomation can be used to integrate annotations from other sources. Certain file formats like 3DHistech’s MRXS file format or Aperio’s SVS file format have the ability to incorporate annotations. If you have such slides, the embedded annotations should automatically show when viewing the slide using any Pathomation slide rendering engine.

Last but not least, we can integrate third-part annotations. Currently, we support three formats:

Third-party as well as native annotations are read-only; you cannot modify them using Pathomation software.

Even more slide metadata

What about other structured data?

We think our forms engine is pretty nifty, but we’re not as arrogant (or clueless) to pretend that we foresee everything you ever want to capture in any form, shape, or size. It is also quite possible that a slide meta-database already exists in your organization.

For those instances where existing data stores are available, we offer the possibility to link external content. Rather than importing data into PMA.core (also a possibility actually), we allow you to specify an arbitrary connection string that points to an external resource that may represent an Oracle database. Your next step is to define the query to run against this resource, along with a field identifier (which can be a regular expression) that is capable to match specific records with individual slides.

Examples of external data sources can be:

  • Legacy IMS data repositories that are too cumbersome to migrate
  • Proprietary database systems developed as complement to lab experiments
  • Back-end LIMS/VNA/PACS databases that support other workflows in your organization

Do try this at home

In this post, we’ve highlighted the main features of our PMA.core “honest broker” WSI engine aka tile server aka pixel extractor aka Image Management Server (IMS).

Warning: sales pitch talk following below…

If you’ve liked interaction with PMA.start and work in an organization where slides are shared with various stakeholders, you should consider getting a central PMA.core server as well. PMA.core is the center-piece of the Pathomation software platform for digital microscopy, and whether you prefer all-inclusive out-of-the-box viewing software, or are developing your own integrated processing pipelines, PMA.core can be the ideal middleware that you’ve been looking for. Contact us today for a demo or sandboxed environment where you can try out our components for yourself.

Ok, we’re done. Seriously, PMA.core is cool. Let us help you in your quest for vendor-agnostic digital pathology solutions, and (amongst others) never worry about proprietary file formats again.

To index or not to index?

A question we get frequently from potential customers is “how do we import our slides into your system (PMA.core)?”. The answer is: we don’t. In contrast with other Image Management Systems (IMS), we opted to not go for a central database. In our opinion, databases only seem like a good idea at first, but inevitable always cause problems down the road.

People also ask us “how easy is it to import our slides?”. The latter phrasing is probably more telltale than the first, as it assumes that is not the case apparently with other systems, i.e., other systems often put you in a situation where it is not easy to register slides. It still puts us in an awkward position, as we then actually have to explain that there is no import process as such. Put the slides where you want them, and that’s it. You’re done. Finito.

Here are some of the reasons why you would want a database overlaying your slides:

  • Ease of data association. Form data and overlaying graphical annotation objects can be stored with the slide’s full path reference as a foreign key.
  • Ease of search for a specific slide. Running a search query on a table is decidedly faster than parsing a potentially highly hierarchical directory tree structure
  • Rapid access to slide metadata. Which is not the same as our first point: data association. Slide metadata is information that is already incorporated into the native file format itself. A database can opt to extract such information periodically and store it internally in a centralized table structure, so that it is more easily extracted in real time when needed.

When taken together, the conclusion is that such databases are nothing more but glorified indexing systems. Such an indexing system invariable turns into a gorilla… An 800 lbs gorilla for that matter… Let’s talk about it:

  • An index takes time to build
  • An index consumes resources, both during and after construction.
  • With a rapidly evolving underlying data structure, the index is at risk of being behind the curve and not reflecting the actual data
  • In order to control the index and not constantly having to rebuild it, a guided (underlying) data management approach may be needed
  • At some point, in between index builds, and outside the controlled data entry flow, someone is going to do something different to your data
  • Incremental index builds to bypass performance bottlenecks are problematic when data is updated

Now there are scenarios where all of the above doesn’t matter, or at least doesn’t matter all that much. Think of a conventional library catalog; does it really matter if your readers can only find out about the newest Dean Koontz book that was purchased a day after it was actually registered in the system? Even with rapidly moving inventory systems: when somebody orders an item that is erroneously no longer available from your webstore… Big whoop. The item is placed on back-order, or the end-user simply cancels the order. It you end up making the majority of your customers mad this way, then the problem is not in your indexing system, but in your supply chain itself. There’s no doubt that for webshops and library catalogs, indexes speed up search, and the pros on average outweigh the cons.

But digital pathology is different. Let’s look at each of the arguments against indexing and see how much weight they carry in a WSI environment:

  • An index takes time to build. When are you going to run it? Digital pathology was created so you can have round the clock availability of your WSI data. Around the clock. Across time-zones. Anything that takes time, also takes resources. CPU cycles, memory. So expect performance of your overall system to go down while this happens.
  • Resource (storage) consumption during and after construction. So be careful about what you are going to index in terms of storage. Are you going to index slide metadata? Thumbnails? How much data are your practically talking about? How much data are you going to index to begin with? And how much of your indexed data will realistically ever be accessed (more on that subject in a separate post)?
  • Rapidly evolving underlying data structure. Assume a new slide is generated once every two minutes, and a quantification algorithm (like HistoQC) takes about a minute to complete per slide. This means you have a new datapoint every minute. And guess which datapoint the physician wants to see now now NOW…
  • Guided data management approach. One of the great uses of digital pathology is the sharing of data. You can share your data, but other can also share it with you. So apart from your in-house scanner pipeline; what do you going to do with the external hard disk someone just sent you? Data hierarchies come in all shapes and sizes. Sometimes it’s a patient case; sometimes it’s toxicological before/after results; sometimes it’s a cohort from a drug study. Are you going to setup data import pipelines for all these separate scenarios? Who’s going to manage those?
  • Sometimes, somewhere, someone is going to do something different to your data. Because the above pipelines won’t work. No matter how carefully you design them. Sometimes something different is needed. You need to act, and there’s no time for re-design first. The slide gets replaced, and now the index is out-of-date. Or the slide is renamed because of earlier human error, and the index can’t find it anymore. And as is often the case: this isn’t about the scenarios that you can think of; but about the scenarios you can’t.

Safe to say that we think an indexing mechanism for digital pathology and whole slide images is not a good idea. Just don’t do it.

Do you DICOM?

Standardization efforts in digital pathology

DICOM has been working on a standard description of digital pathology (DP) imaging data has been underway for a few years now. Digital pathology and whole slide imaging (WSI) is the focus of DICOM workgroup 26. A summary of its efforts can be found in David Clunie‘s paper in the Journal of Pathology informatics at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6236926/

In 2014, we published our own conference paper on the effort (during the 12th European Conference on Digital Pathology in Paris, France) . The abstract is available through the Researchgate website; the full presentation from the ECP Paris conference is available through https://www.slideshare.net/YvesSucaet/digital-pathology-information-web-services-dpiws-convergence-in-digital-pathology-data-sharing

The focus of this blog is on imaging. To be complete, readers interested in digital pathology standardization efforts, should also have a look at the IHE PaLM initiative.  Additional resources can also be found on slide 45 of our SlideShare publication.

Pathomation supports DICOM

At Pathomation, we’ve been supporting DICOM supplement 145 file format extension (PDF available here) for a while now. We recently added our own “dicomizer” tool to our free PMA.start software. This is a command-line tool (CLI) that allows for the conversion of any WSI file format into a into a DICOM VL (Visible Light) Whole Slide Image IOD (Information Object Definition).

The Pathomation dicomizer is currently available on Windows only (other platforms pending) and can be installed as part of the regular setup process.

After installation, you have to navigate to the folder in which you installed the tool (typically c:\program files\pathomation\dicomizer) , and then you can invoke the tool through the command-line, like this:

Running the validation tool from David Clunie on our generated DICOM slides results in the following:

We can also visualize the slides side by side in two browser windows, empirically “proving” that the DICOM output is equivalent to the original slide:

So there you have it. DICOM has been involved in the digital pathology standardization process for a while now. For those interested to support it, you can now use Pathomation’s free dicomizer tool to get hands-on experience.

 

 

 

 

 

A look at PMA.view

Architecture

Apart from PMA.start, Pathomation also offers a professional range of products. Yes, professional is a euphemism for “not free”, but we do feel you get quite some value in return. And some of the money flows back to our developers so they can also keep working diligently on improving PMA.start, and the free product offering around it, including our SDKs and software plugin for ImageJ/FIJI.

At the core of it all always sits PMA.core. Even PMA.start runs on top of PMA.core; albeit a restricted version, that only can access local data on your personal system. Hence the name PMA.core.lite. The professional version, PMA.core, can do loads more, including making annotations, capture form meta-data, as well as track user activity in a 21CFR.11 compliant manner. Both PMA.core has been validated conform GAMP 5 guidelines.

In a different article on this blog we explained how big (and why!) these whole slide images get. PMA.core then is responsible for extracting tiles from the original images when the users wants it. These tiles can be extracted via one of our language-specific SDKs, or end-users can use a viewer software built on top of PMA.core, and understand how user (mouse) operations need to be translated into tile requests.

At Pathomation, our viewer software is PMA.view. Like PMA.core, it comes in two flavors: PMA.view and PMA.view.lite. The distinction is made in order to provide better interaction with the respective underlaying versions of PMA.core. One could also say that PMA.start as a product is the combination of PMA.view.lite and PMA.core.lite. PMA.view in turn interacts with (multiple instances of) PMA.core.

As you can suspect, Pathomation also offers other applications next to PMA.view, that are also built on top PMA.core. But that’s the focus of a different post (sneak preview of what we mean through our YouTube channel).

PMA.view features

Below is a screenshot of PMA.view. The main element of the user interface are a ribbon, a central viewing panel for slide visualization, and two side panel which in turn may contain one or more sections.

The content of the ribbon (as well as the number of tabs etc) is completely configurable through an XML file. Similarly, the content and sections of the side panels is configurable through XML configuration files. Editors for all are provided in PMA.view administrative interface. Syntax highlighting and restore options are provided as well.

The central viewport for slide viewing is a Zooming User Interface (ZUI); you navigate slides by panning left and right, up and down, and by zooming in and out. You can use the mouse scrollwheel or drag a rectangle with the mouse while holding down the shift-key (on your keyboard) to zoom in on a specific area of your choosing.

In the left panel you typically see a navigation tree, representing the slides hosted by PMA.core. PMA.view can connect to multiple PMA.core instances simultaneously. This is useful when involved in international collaboration, of even in a situation where you have a central hospital hub with several smaller satellite offices spread throughout a region. Just put a tile server in each location to prevent having to transport (digital or – worse – physical) slides around.

Apart from convenient slide management across multiple sites, PMA.view offers many other features that people have come to expect from modern slide viewers, including:

  • Capture structured or free text meta-data
  • Seamless support for different scanning modalities (brightfield, fluorescence, and z-stacking)
  • Brightness and contrast controls
  • On-slide annotations in arbitrary colors and shapes (rectangle, circle, freehand etc.)
  • Annotation toggling based on type and author

Slide sharing

One of the big selling points of digital pathology is sharing slide content, without the need to physically distribute the slides via regular mail. Apart from the obvious improvement this bring regarding speed, there’s a secondary advantage that you can send slide to multiple parties at the same time. The third advantage is that the slides can’t get lost in the mail or damaged during transport anymore. In return for that of course, we occasionally encounter over-eager spam filters.

Two important impediments that prevent slide sharing however are the following:

  • When I share a slide with you, I have to make sure to specify which file format I’m sharing with you, so you get get the appropriate viewer
  • When I share a slide with you, I have to upload a LOT of data to WeTransfer, Aspera, or a good ole’ fashion FTP site, where you in turn can download… a LOT of data… again.

Pathomation’s PMA.core and PMA.view combo solve both problems for you. PMA.core abstracts any proprietary slide file format to “just” pixels, and PMA.view allows you to share slides with a counter-party in the form of HTML hyperlinks.

How does this work? PMA.view has a dedicated “Share” button on its ribbon to create links that point directly to selected content. There are different kinds of content that you can share:

  • You can share all slides in a selected folder, thus mimicking a patient case
  • You can share an individual slide
  • You can share a pre-selected region of interest within a slide

Share links are always formatted the same, but they can be used in multiple ways. You can:

  • Use links directly as they are. You can share them with your buddies via email, during a Skype chat session, WebEx, GoToMeeting, whatever.
  • Convert links into scannable QR codes. When you’re giving a presentation during a conference, or in a classroom setting, text-based links are cumbersome to present. Ironically, text-based links are not well suited for print media, either, for the same reason: it’s too easy to make a type copying the link character by character. It’s more convenient to present a QR-code then that people can scan with their smartphones or tablets, and immediately view, or convert into the actual text-link for use elsewhere.
  • Embed them into your own web-content. If you still have an actual website, that is, a place on a server somewhere where you deposit your own HTML code, you can now sprinkle live slides throughout the site and have them embedded in an <iframe> tag. Because not everybody knows how these work, PMA.view will give you the necessary HTML code that you can past directly into your own website. You’ll notice that within the HTML snippet, the plain old original link from above resurfaces. And it gets better: Whether you use plain old notepad to make your website in the traditional sense, or you use a CMS like WordPress or Drupal, an LMS like Canvas, Moodle, or Blackboard, or a social media platform: these too boil down to sending HTML code to the browser, so there’s usually a way to use <iframe>s there, too.

Remember the Zooming User Interface (ZUI) terminology we introduced you to earlier? Well, last but not least, when you click on a PMA.view slide-link, you’re essentially instantiating our ZUI. There are no plugins required, nothing to download, it’s just all basic JavaScript and HTML 5. As a consequence, it’s also easy to configure the layout of the ZUI. And that’s what the last set of options at the bottom of the share dialog is about.

How do you want your audience to experience your slide when they go to it? Do you want them to see the barcode? The overview?… It’s all in your hands, and we think this level of control and flexibility is pretty awesome.

Organizing pipelines

So as awesome as we think ourselves to be, there’s always room for improvement, right? So here’s a scenario that a customer of our came across recently:

  • We have a large number of slides that we want to embed throughout various pages of our proprietary customer portal. We like the PMA.view slide embedding <iframe> capability, but it’s really a pain to generate all these links one by one. Because there are so many, it’s also rather tedious making sure that they are ALL clicked on.

Is there a better way? Yes, there is.

When you look at the links that are generated, it’s not rocket science to figure out how they’re built. The customer wanted to have a link to a thumbnail of a slide, which always looks like this:

http://yourserver/view/EmbedThumbnail/{seemingly random charachters}

As well as a link to the actual slide ZUI, which always looks like this:

http://yourserver/view/Embed/{seemingly random charachters}

The character string at the end of these links is a particular slide’s unique identifier (UID). When we switch over to our PHP SDK, we can write just a few lines of code that gets all UIDs from all slides in a particular directory:

<?php
require_once "lib_pathomation.php";
?>
<html><head><title>All thumbnails for all slides</title></head><body>
<?php
$session = connect("http://yourserver/core", "username", "secret");
echo "SessionID for universal viewer account = ".$session."<br>";
foreach (getSlides("rootdir/subdir", $session) as $slide) {
    echo "<h3>$slide</h6>";
    $uid = getUID($slide, $session);
    $thumb = "http://yourserver/view/EmbedThumbnail/$uid";
    echo "<a href='$thumb'><img border=0 src='$thumb' height='50' align='left' /></a>";
    echo "<tt>$thumb</tt><br />";
    echo "<br clear='all'>";
}
?>
</body></html>

 

Of course, you can modify this script anyway you want to compensate for your particular directory hierarchy and structure.

Then, it was just a matter of simple string concatenation to provide the client with a custom website where they were able to retrieve all of the links to their slides in batch. As the page interact with PMA.core directly at that point,

So, for our client, we figured out how to organize a pipeline to facilitate their content production process. We user our PHP SDK on top of PMA.core to generate links that in turn exploit the slide sharing capabilities of PMA.view. Now that’s cool!

But we still want more

Do you have a scenario that you have difficulties with or want to see optimized? Let us know; we’ll he happy to talk to you.