How to upload your slides to PMA.core?

It’s a question people ask us all the time: How can I upload my slides to PMA.core? And usually we tell them: you don’t. You put the slides somewhere, point PMA.core to it by means of a root-directory, and that’s it: it’s the DP/WSI equivalent of What you see if what you get (WYSIWYG).

Sometimes this is insufficient. “But how can we still upload them?” is asked next, after we explain the concept of root-directories, mounting points, and other technicalities.

To be clear: Pathomation’s PMA.core tile server is NOT like the others: there is no registration mechanism, or upload procedure to follow. You really do drop your files off in a folder, and that’s it.

But it’s the dropping off that can be misconstrued as an “upload” procedure in itself. Instead, we prefer the term “transfer” instead. It’s also why we names one of our helper programs for this PMA.transfer.

Procedure

There are two scenario whereby slidetransfer is particularly relevant:

Within an organization, slide acquisition occurs on one computer, whereas the IMS solution runs on a centralized server. The challenge is to get the slides from the original acquisition PC to the central IMS (PMA.core tile server). The upload (or transfer) operation required is highlighted with the yellow arrow in the diagram below:

The second procedure is a variation of this; a PMA.core tile server customer has his own customers that want to transfer slides to a central repository for things like quality control or AI algorithms.

The scenario changes only a little and it’s mostly a matter of labels:

The latter is also known as a multi-tenant scenario. It’s possible to imagine for something like this to happen:

In the diagrams above we illustrate manual flow with PMA.transfer, but once this process becomes repetitive, it can be further automated by combining operating system functionality along with PMA.core SDK methods.

In this article, we present how to automate uploading your slides from the slide scanner PC to your PMA.core server (the first scenario).

Let’s have a look at how this could work in a Microsoft Windows environment.

File system auditing

The first step is to enable auditing to record when a user puts a new file in the folder you want to track. This user can be “Everyone” or a specific user, like the user that the slide scanner uses to output the digital slides.

  • Open the “Run” program using the shortcut “WIN + R”
  • Enter “mmc” to open the Management console (Fig.1-1).
  • Once the Management console is open, you will follow “File → Add/Remove Snap-in”
  • Double-click on “Group Policy Object” in the “Available snap-ins” section
  • Choose “Local Computer” as the “Group Policy Object”
  • Continue by clicking on the “Finish” button
  • Followed by the “OK” button

Continue on with the following steps:

  • Expand the “Local Computer Policy” and follow “Computer Configuration → Windows Settings → Security Settings → Local Policies → Audit Policy”
  • Then, you will double-click on the object access audit in the list of audit policies
  • Check the successful attempts checkbox in the properties window.
  • You can also audit the failed attempts to do some error handling, but we will only focus on the successful attempts in this post.
  • Confirm your changes by clicking on the “OK” button

Now that you have the configuration correctly set up, you can enable audit trailing on the folder that you want to track (in other words: the folder where your whole slide images will be deposited).

  • Go to the folder you want to track; this could be the folder where the slide scanner outputs the digital slides.
  • Open the folder’s properties and the advanced security settings under the “Security” tab
  • Next, you will open the “Auditing” tab
  • Click on the “Add” button

Next you can select the user that you want to track:

  • Click on the “select a principal” link. This user can be “Everyone” or a specific user, like the user that the slide scanner uses to output the digital slides.
  • Select the success type in the “Type” dropdown, and
  • Choose the “This folder, subfolders, and files” option in the “Applies” dropdown
  • Click on the “show advanced permissions” link in the permissions section and
  • make sure only the “Create files / write data” checkbox is checked
  • Confirm your changes by clicking all the OK buttons until the properties window closes

Congratulations; you will now be notified every time a new file (slide) is dropped in the folder.

The upload script

Let’s create the upload script. This script will be triggered to upload the slides to your PMA.core. Find the whole script at the end of this post.

First, you will import the needed packages. The only third-party package is the pathomation package. You can get it by running the “pip install pma_python” command in the command prompt.

There are three modules/packages used:

The script uses timestamps, so only the recently added files get uploaded. First, it will check if the “date.txt” file exists and create it if it doesn’t. The “date.txt” file contains the timestamp of the last time the script ran. It gets read if it already exists. It gets created if it doesn’t exist, and the current timestamp gets written into it:

if os.path.exists(r"./date.txt"):
    last_timestamp = open('./date.txt','r')
    timestamp = last_timestamp.readlines()
    time.sleep(1)
    now = str(time.time())
    if timestamp[0] < now:
        print(timestamp[0] + ' ~ ' + now)
        last_timestamp = open('./date.txt','w')
        last_timestamp.write(now)
        check_connection(timestamp[0])
else:
    now = str(time.time())
    last_timestamp = open('./date.txt','w')
    last_timestamp.write(now)
    check_connection(5.5)

last_timestamp.close()

Once “date.txt” gets read or created, the “check_connection” function gets called with the timestamp or a low float number as the “last_trigger” argument. In this example, you will see that the “low float number” is “5.5”. This low float number ensures that the files in the folder get uploaded if the “date.txt” file does not exist.

The “check_connection” function checks if PMA.start is running and if you have a valid session id. If they are both true, the “last_trigger” argument gets passed to the “upload_files” function. You can expand the “check_connection” function to automatically run PMA.start, try other credentials, send emails, etc.

def check_connection(last_trigger):
    sessionID = core.connect('https://snapshot.pathomation.com/PMA.core/3.0.1', 'pma_demo_user', 'pma_demo_user')

    if core.is_lite() and sessionID is not None:
        print("PMA.start is running")
        upload_files(sessionID, last_trigger)
    else:
        print("PMA.start is not running")
        input("Press Enter to continue...")

In the “upload_files” function, you check and upload the recently added files to your PMA.core.

<code class="python"><pre>
supported_files = [".jpeg", ".jpg", ".jp2", ".png", ".vsi", ".dzi", ".vms", ".ndpi", ".ndpis", ".dcm", ".mrxs", ".svs", ".afi", ".cws", ".scn", ".lif", "DICOMDIR", ".zvi", ".czi", ".lsm", ".tf2", ".tf8", ".btf", ".ome.tif", ".nd2", ".tif", ".svslide", ".ini", ".mds", ".zif", ".szi", ".sws", ".qptiff", ".tmap", ".jxr", ".kfb", ".jp2", ".isyntax", ".rts", ".rts", ".bif", ".zarr", ".tiff", ".mdsx", ".oir"]
…
def upload_files(sessionID, last_trigger):
    list_content = os.listdir(os.getcwd())

    for list_item in list_content:  
        if os.path.isfile(list_item):
            root, ext = os.path.splitext(list_item)
            if ext in supported_files:
                creation_file = os.path.getctime(list_item)
                if float(last_trigger) < creation_file:
                    print('Uploading S3...')
                    core.upload(os.path.abspath(list_item), "_sys_aws_s3/testing_script", sessionID, True)
            else:
                print(list_item, 'is not supported')
</pre></code>

First, all the objects i.e. files and directories in the tracked folder, get put into the “list_content” array. A for loop loops over all the items and performs a couple of checks to get the recently added files. The first check checks if the item in the “list_content” array is a file. This blocks directories from being uploaded.

list_content = os.listdir(os.getcwd())

    for list_item in list_content:  
        if os.path.isfile(list_item):

Next, the file path gets split into 2 parts: The file’s root and the file’s extension. The file’s extension gets compared with the “supported_files” array to ensure that only supported files get uploaded to your PMA.core.

supported_files = [".jpeg", ".jpg", ".jp2", ".png", ".vsi", ".dzi", ".vms", ".ndpi", ".ndpis", ".dcm", ".mrxs", ".svs", ".afi", ".cws", ".scn", ".lif", "DICOMDIR", ".zvi", ".czi", ".lsm", ".tf2", ".tf8", ".btf", ".ome.tif", ".nd2", ".tif", ".svslide", ".ini", ".mds", ".zif", ".szi", ".sws", ".qptiff", ".tmap", ".jxr", ".kfb", ".jp2", ".isyntax", ".rts", ".rts", ".bif", ".zarr", ".tiff", ".mdsx", ".oir"]
…
root, ext = os.path.splitext(list_item)
            if ext in supported_files:

The last check compares the “created on” timestamp of the file with the “last_trigger” argument that got passed on from the “check_connection” function. The upload starts if the “created on” timestamp is bigger than the “last_trigger” argument using the “core.upload()” function. The arguments of the “core.upload()” function are:

  • local_source_slide: This is the absolute path of the file.
  • target_folder: This is the PMA.core folder you want to upload to.
  • target_pma_core_sessionID: This is the session id you created in the “check_connection” function.
  • callback=None: This argument will print the default progress of the upload if set to True. You can also pass a custom function that will be called on each file upload. The signature for the callback is “callback(bytes_read, bytes_length, filename)”.
creation_file = os.path.getctime(list_item)
                if float(last_trigger) < creation_file:
                    print('Uploading S3...')
                    core.upload(os.path.abspath(list_item), "_sys_aws_s3/testing_script", sessionID, True)

Save the script and put it in the track folder.

Create a scheduled task

In this last step, you will create a scheduled task in the Task Scheduler to run the script when the scanner outputs the digital slides in the track folder.

  • Open the Task Scheduler and click on “Create Task” in the “Actions” section
  • Under the “General” tab, enter the name of the task and
  • Choose a user in the “Security options” section. You do this by clicking on the “Change User or Group” button (Fig.6-3). This is the slide scanner user that outputs the digital slides in the track folder. You can keep the default user if the current user stays logged on.
  • Next, you will select the “Run only when user is logged on” radio button (Fig.6-4) and
  • Check the “Run with highest privileges” checkbox (Fig.6-5).

Under the “Triggers” tab, you can now create the task’s trigger by clicking on the “New” button:

  • In the “Begin the task” dropdown, you will select what the trigger will listen to.
  • You will choose the “On an event” option, and
  • Select the “Custom” radio button.
  • Click on the “New Event Filter” button, and
  • Go to the “XML” tab. Under the “XML” tab, you will check the “Edit query manually” checkbox

Paste the following XPath code:

<QueryList>
  <Query Id="0" Path="Security">
    <Select Path="Security">
            *[System[(EventID='4663')]] 
    and
       *[EventData[Data[@Name='AccessMask']and(Data='0x2')]]
    and
       *[EventData[Data[@Name='ProcessName']and(Data='C:\Windows\explorer.exe')]]
</Select>
  </Query>
</QueryList>

Confirm all your changes by clicking on the OK button until you get to the “Create Task” window.

The next tab is “Actions”. The task will perform this action when the event occurs. The action here is running the script in the track folder. To create an action you

  • Click on the “New” button and
  • Select “Start a program” in the “Action” dropdown
  • In the “Settings” section, you will browse the python executable which should be in "C:\Program Files\Python311\python.exe", and
  • In the “Add arguments” field you will enter the name of the script with the “.py” extension.
  • Finally, you enter the path of the track folder in the “Start in” field and
  • Confirm your changes by clicking the OK button.

The second to last tab is “Conditions”. The only thing you need to do here is to uncheck all the checkboxes in the “Power” section

Finally, you can create the task by clicking on the “OK” button. You can test the task by selecting the task and clicking on “Run” in the “Selected Item” section. This should run the script in the track folder.

The full script:

import os
import time
from pma_python import core

supported_files = [".jpeg", ".jpg", ".jp2", ".png", ".vsi", ".dzi", ".vms", ".ndpi", ".ndpis", ".dcm", ".mrxs", ".svs", ".afi", ".cws", ".scn", ".lif", "DICOMDIR", ".zvi", ".czi", ".lsm", ".tf2", ".tf8", ".btf", ".ome.tif", ".nd2", ".tif", ".svslide", ".ini", ".mds", ".zif", ".szi", ".sws", ".qptiff", ".tmap", ".jxr", ".kfb", ".jp2", ".isyntax", ".rts", ".rts", ".bif", ".zarr", ".tiff", ".mdsx", ".oir"]

def check_connection(last_trigger):
    sessionID = core.connect('pma_core_url’, 'pma_core_username', 'pma_core_pw')

    if core.is_lite() and sessionID is not None:
        print("PMA.start is running")
        upload_files(sessionID, last_trigger)
    else:
        print("PMA.start is not running")
        input("Press Enter to continue...")

def upload_files(sessionID, last_trigger):
    list_content = os.listdir(os.getcwd())

    for list_item in list_content:  
        if os.path.isfile(list_item):
            root, ext = os.path.splitext(list_item)
            if ext in supported_files:
                creation_file = os.path.getctime(list_item)
                if float(last_trigger) < creation_file:
                    print('Uploading S3...')
                    core.upload(‘local_source_slide’, ‘target_folder’, ‘target_pma_core_sessionID’, callback=None)
            else:
                print(list_item, 'is not supported')

    

if os.path.exists(r"./date.txt"):
    last_timestamp = open('./date.txt','r')
    timestamp = last_timestamp.readlines()
    time.sleep(1)
    now = str(time.time())
    if timestamp[0] < now:
        print(timestamp[0] + ' ~ ' + now)
        last_timestamp = open('./date.txt','w')
        last_timestamp.write(now)
        check_connection(timestamp[0])
else:
    now = str(time.time())
    last_timestamp = open('./date.txt','w')
    last_timestamp.write(now)
    check_connection(5.5)

last_timestamp.close()

Conclusion

In this article we looked at how you can gradually convert manual slide transfer workflows over to automated ones.

After you implement the steps and code from this program, you will be able to deposit slides in an audit-trailed folder on your hard disk, and subsequently be transferred to your PMA.core tile server (for testing purposes these can even reside on the same machine).

We wrote the code in this article from the point of view of Microsoft Windows, and we used the Python scripting language as glue. But other combinations are possible. Let us know if there's a particular scenario that you would like to see explored in a follow-up article.

Key to this procedure is the SDK's Core.upload() method. This is really where all the magic happens. In a subsequent article, it is our intention to offer a comparison look between how this particle method (and its Core.download() counterpart) functions in each of the respective language platforms we support: Python, Java, and PHP.

Multi-tenant with PMA.core and PMA.vue

In our recent posts, we’ve talked a lot about PMA.studio and high-end topics in digital pathology like AI and upload automation workflows.

In this article, we want to discuss a more mundane scenario, whereby you are a pathology service provider (or maybe a group practice).

Multi-tenant

Simply put and for our purposes: a multi-tenant scenario is where you have a single software installation, with different users from different origins. Your users don’t just make up the internal users within your own organization; they represent your customers as well.

Typical of a multi-tenant environment is that each tenant may have vastly different interests. Regardless, each tenant is very keen on its own privacy, and so its paramount that tenant A under no circumstances can see the content of tenant B.

Suppose you have a total of 7 actual users (representing physical persons), they may be distributed across your tenants and your own organization like this:

How do you deal with that?

Let’s look how do PMA.core and PMA.vue help you achieve pathology exchange nirvana.

PMA.core security provisions

Let’s start with the obvious one: security. Keeping the data from your own organization separate from your customers is achieved by defining separate root-directories and users in PMA.core.

Once you’ve defined both users and root-directories, you can use PMA.core’s built-in access control rights (ACL) to control who gets to see what.

As your own customer-base grows (because you’re using this awesome Pathomation software!) you can create groups as well, and organize your customer-accounts accordingly.

Once the accounts, root-directories and credentials are configured, you can go to PMA.vue. A tenant user that logs in, will only be able to see the tenant-specific slides. There is no way see the other tenant’s slides, either on purpose or by accident.

What about your own users?

You can set permissions on the various root-directories so that your internal users at least can have access to the respective tenant slides that they have to interact with. You could even go more granular so that internal users only have access to select tenants, based on the specific terms of for SoWs and MSAs. For tenant users themselves nothing changes because of this: they still can only their own slides.

Licensing PMA.core

Once people are convinced that our security model can fit their deployment scenarios, the next question is usually about licensing.

You don’t have to buy PMA.core licenses for all the your tenant users. It is typical for a tenant to have at least two user accounts: a primary and a backup (for when the original user is out, or on vacation or so).

PMA.core wotks with concurrent roaming licenses. This means that in the above scenario, you would only buy a single roaming seat license to accommodate access for TenantA’s users at any given time.

It gets better actually: when it’s unlikely that all of your tenants will be using your system at the same time, you can distribute the seats amongst all the tenants (as well as your internal users) together.

Let’s have a look at the following scenario: you run a small group practice with 5 pathologists, and have 20 tenants. Reading the slides typically happens overnight, and during the daytime, you estimate that about a fourth of your customer base at any given time would be uploading new content to you using PMA.transfer, and another fourth consulting analytical results. Consulting results typically happens in the morning, uploading new content in the afternoon.

Your seat occupation would therefore be about 5 users at night, about 5 users in the morning (one fourth of 20 tenants), and 5 users in the afternoon (one fourth of 20 tenants again).

So even as you have a total or 20 (tenants) x 2 (primary, backup) + 5 (internal staff) = 45 people interact with your system, at any given time you would provision 5 simultaneously occupied seats. Let’s just add one more to that number to be sure, because you may occasionally also need parallel administrative access, maybe an extra hand helps out on busy days etcetera.

You can configure PMA.core to send out automatic notification email should the number of concurrent licenses is insufficient at any given time. Do you notice at some point that you are effectively short on seats? Not a problem; contact us to purchase additional roading license seats at any given time, and we can send you an updated license key file right away.

More information about our licensing policy (and latest updates) can be found at https://www.pathomation.com/licensing/.

PMA.vue

PMA.vue is Pathomation’s lightweight centralized viewing solution. Like PMA.core, it can be installed on-premise on your end, or hosted by us on a Virtual Machine in a data center of your choice (currently we support Amazon AWS and Microsoft Azure).

With PMA.vue, you can offer powerful viewing and basic annotation capabilities, without having to compromise PMA.core itself: all registered PMA.core users can access PMA.vue, but not vice versa: in order to be allowed to log into PMA.core, a special attribute must be enabled for the user.

With PMA.vue, you can extend slide viewing features to your tenant users, without to need to invest into having your own customer portal developed. Tenants can look at slides, create screenshots, prepare publications etcetera. For your own internal employees, PMA.vue has a whole arsenal of data sharing options. So even if you don’t open up PMA.vue to tenant users, it is still a great tool for internal communication, as you can use it to share folders, slides, and even regions of interest.

Learn more?

To learn more about PMA.vue, visit https://www.pathomation.com/pma.vue

To learn more about PMA.core, visit https://www.pathomation.com/pma.core

Not sure about whether you should go with PMA.vue or build your own customer portal (using our SDK and PMA.UI visualization framework)? We have a separate blog article to help you decide who to put in the driver seat.

Quod AI (What about AI)?

A few months ago we published a blog article on how PMA.studio could be used as a platform in its own right to design bespoke organization-specific workflow and cockpit-solutions.

In the article we talked about LIMS integration, workflow facilitation and reporting.

We didn’t talk about AI.

We have separate articles though in which we hypothesize and propose how AI – Pathomation platform integration could work; see our YouTube videos on the subject.

So when we recently came across one AI provider with a stable and well documented API, we jumped at the opportunity to pick up where we left off: and we built an actual demonstrator with PMA.studio as a front-end, and the AI engine as a back-end engine.

AI and digital pathology

At a very, very, VERY high level, here’s how AI in digital pathology works:

  • The slide scanner takes a picture of your tissue at high resolution and stores it on the computer. The result of this process is a whole slide image (WSI). So when you look at the whole slide images on your computer, you’re looking at pixels. A pixel is very small (about 2 – 3 mm on your screen, if you really have to know).
  • You are a person. You look at your screen and you don’t see pixels; they blend together to see images. If you’ve never looked at histology before, WSI data can look like pink and purple art. If you’re a biology student, you see cells. If you’re a pathologist, you see tumor tissue, invasive margins, an adenocarcinoma or DCIS perhaps.
  • The challenge for the AI, is make it see the same things that the real-life person sees. And it’s actually much harder than you would think: for a computer program (algorithm), we really start at the lowest level. The computer program only sees pixels, and you have to teach it somehow to group those together and tell it how to recognize features like cells, or necrotic tissue, or the stage of cell-division.

It’s a fascinating subject really. It’s also something we don’t want to get into ourselves. Not in the algorithm design at least. Pathomation is not an AI company.

So why talk about it then even at all? Because we do have that wonderful digital pathology software platform for digital pathology. And that platform is perfectly suited to serve as a jumping board to not only rapidly enrich your LIMS with DP capabilities, but also to get your AI solution quicky off the ground.

If you’re familiar with the term, think of us a PaaS for AI in digital pathology.

Workflow

In one of our videos, we present the following abstract workflow:

Now that we’ve established connections to an AI providers ourselves, we can outline a more detailed recipe with specific steps to follow:

  • Indicate your region(s) of interest in PMA.studio
  • Configure and prepare the AI engine to accept your ROIs
  • Send off the ROI for analysis
  • Process the results and store them in Pathomation (at the level of PMA.core actually)
  • Use PMA.studio once again to inspect the results

Let’s have a more detailed look at how this works:

Step 1: Indicate your regions of interest

With the rectangle tool on the Annotations tab of the ribbon, you can draw a region of interest.

Don’t worry about the resolution (“zoomlevel”) at which you make the annotation; we’ll tell our code later on to automatically extract the pixels at the highest available zoomlevel. If you have a slide scanned at 40X, the pixels will be transferred as if you made the annotation at that level (even though you made have drawn the region while looking at the slide in PMA.studio in 10X).

You can actually configure a separate AI-tab in the back-end of PMA.studio, and repeat the most common annotation tools you’ll use to indicate your regions of interest: if you know you’ll always be drawing black rectangular regions, there’s really no point having other things like compound polygon or a color selector:

Step 2: Configure the AI engine for ingestion

We added buttons to our ribbon to interact with two different AI engines.

When clicking on one of these buttons, you can select the earlier indicated region of interest to send of for analysis, as well as what algorithm (“scenario” in their terminology) to run on the pixels:

The way this works behind the scenes: the panel that’s configured in the ribbon ends up loading a custom PHP script. Information about the current slide in PMA.studio is passed along through PMA.studio’s built-in mechanism for custom panels.

The first part of the script than interacts both with the AI provider to determine which algorithms are available, as well as with the Pathomation back-end (the PMA.core tile server that hosts the data for PMA.studio) to retrieve the available ROIs:

Step 3: Extract the pixels from the original slide, and send them off the AI engine

First, we use Pathomation’s PMA.php SDK to extract the region of pixels indicated from the annotated ROI:

We store a local copy of the extracted pixels, and pass that on to the AI engine, using PHP’s curl syntax:

Step 4: Retrieve the result and process it back into PMA.core

Finally, the returned JSON syntax from the AI engine is parsed and translated (ETL) into PMA.core multipoint annotations.

A form is also generated automatically to keep track of aggregate data if needed:

Step 5: Visualize the result in PMA.studio

The results can now be shown back into PMA.studio.

The data is also available through for the forms panel and the annotations panel:

What’s next?

How useful is this? AI is only as good as what people do with it. In this case: it’s very easy to select the wrong region on this slide where a selected algorithm doesn’t make sense to apply in the first place. Or what about running a Ki67 counting algorithm on a PR stained slide?

We’re dealing with supervised AI here, and this procedure only works when properly supervised.

Whole slide scaling is another issue to consider. In our runs, it takes about 5 seconds to get an individual ROI analyzed. What if you want to have the whole slide analyzed? In a follow-up experiment, we want to see if we can parallelize this process. The basic infrastructure at the level of PMA.core is already there.

Demo

The demo described in this article is not publicly available, due to usage restrictions on the AI engine. Do contact us for a demonstration at your earliest convenience though, and let us show you how the combination of PMA.core and PMA.studio can lift your AI endeavors to the next level.

An in-depth look at polygon annotations in PMA.studio

We’re all familiar with basic annotation patterns like rectangles, circles, lines etcetera. Pathomation’s PMA.studio support all of the through the “Simple shapes” panel on the Annotations tab:

Next to this tab, a more interesting group is shown: polygon shapes. The various icons on these come from the unique way pathologists typically make their annotations.

In this article, we’re going to distinguish between all 7 buttons in the “polygon shapes” group.

Multi point

With the multi-point annotation you can quickly annotate a great amount of individual cells. A counter runs along as you indicate additional landmarks, and it is always possible to revert back on the latest annotated point.

In WKT, the annotations are stored as “MULTIPOINT((x1 y1),(x2 y2),…” strings

You can (of course) make multiple multi-point annotations. An overview is always available via the Annotations panel:

Polygon

Drawing polygon is the typical run off the mill polygon annotation tool you find in many graphics tools like Photoshop, Paint.Net etcetera. The tool, as implemented, assumes there’s an automatic connection between your first and last drawn point. You finish drawing your shape by double-clicking (unlike the multi-point tool, where you need to press an explicit “Finish” button).

In WKT, polygons are stored as POLYGON((x1 y1,x2 y2,x3 y3,…,x1 y1)) strings. This means that when you have a polygon with 4 points, the WKT representation of it will actually indicate 5 points (the first and last one being the same).

The annotation browser this time doesn’t indicate of how many points the polygon shape is comprised.

The annotation browser this time doesn’t indicate of how many points the polygon shape is comprised.

Closed freehand

The closed freehand results is a kind of polygon shape annotation, but this time you keep the mouse button pressed down during the whole process of drawing the shape.

It’s best suited to annotate small areas at high magnification, possibly in combination with an alternative pointing device like a stylus (instead of a mouse).

Like the earlier mentioned polygon (and as the name suggests), the closed freehand is a closed form. It’s equally stored as a WKT POLYGON string:

Because it’s a closed form, the annotation browser can determine both its area and surface:

Compound freehand

Right; this is one we’re particularly proud of. What’s the problem with annotating tissue regions? Sometimes you can do it at a coarse resolution, but sometimes you need a finer (higher) resolution, to indicate the exact border between two tissue types. A magic wand tool could work, but not always.

What you want to do in these cases is pan and zoom in and out a couple of times while tracing the border of your shape. But all modes of interaction with your mouse are already taken with conventional tools like polygon and closed freehand.

So the compound freehand gives you exactly that option: you can trace a boundary one segment at the time.

You zoom in and out, and pan as needed.

When you feel you have enough line segments drawn to delineate your entire area of interest, you press the “Finish” button, and all segments are joined into a single one.

From a technical perspective, this is a PMA.UI front-end feature only (albeit a cool one!). Once completed and saved to PMA.core, the WKT representation is the same boring POLYGON:

Why is this relevant? Because this means that once the different line segments are merged, it is not possible to pry them apart again into their origins.

Freehand

The freehand tool lets you draw an arbitrary line to indicate a margin.

The way this is stored in WKT this time is by means of a LINESTRING. The final point can be whatever it is; it does not automatically revert back to the origin (that would make it a closed freehand).

Because it’s a line this time, it’s only 1-dimensional, so there are no area measurements to display (none that would make sense anyway).

Wand

We refuse to call our wand “magic”, like in other environments. Think of our wand more as a magnetic lasso (which is another indication used oftentimes for similar tools).

The wand tool can be useful to select large regions with homogeneous coloring schemes, but we think it actually has limited use in histology and pathology. There are a number of reasons for this: Oftentimes what you really want is cell segmentation (e.g. highlight all the Ki67-positive cells), but a wand tools is not smart enough for that; it only acts on neighboring pixels (and based of the difference between already selected pixels, decides whether the next pixel is to be included or not).

Our tool can still be useful to select lumens, or perhaps regions of fatty tissue. We’re including it at least in part because people have asked us for it. Like rectangles and circles, people sort of expect it (but tell us the last time you used a rectangle or indicate a cell or a region of interest… nature just doesn’t work that way).

The Wand tool therefore is probably one of the more experimental tools we have in our arsenal. Our best advice it to play around with it a bit and see if it’s useful in your particular observations. You can tweak it and set a couple of different parameters to change its sensitivity, as well. Use it at your discretion, and let us know how we can make it work (better) for you in a subsequent version of PMA.studio.

As with our compound polygon (cf. supra), technically speaking the Wand is a client-side tool. At the end of the day, it translates like all polygon-like object into the same POLYGON WKT string:

Brush

Don’t you hate it? You just spent 2 minutes painstakingly delineating a tissue feature (perhaps using the compound polygon), only to find out that you missed a couple of features.

That’s where the Brush comes in handy. You can use it on pretty much all 2D-annotations (i.e. those that have an area measurements).

The brush and the wand tool are next to each other in the ribbon, because they can actually nicely complement each other: a coarse annotation can be made with the wand, followed by a finetuning around the edges with the brush tool.

What more do you want?

PMA.studio offers advanced annotation tools, specifically designed for the whole slide image data.

We’re not saying we wrote the ultimate set of annotation tools here, but we think we’ve got the basics covered that should help you get pretty far already.

Don’t agree with how we implemented our tools? Need something else? Have an idea for something completely different? Do continue the conversation. We can only get better if you let us know what you’re looking for.

Audit trailing

Our PMA.core tile server is all about providing connectivity, and we talk a lot about how great we are in terms of:

What we don’t boast about too often though are our audit trailing capabilities. It’s sort of our secret sauce really, that overlays everything that takes place within the PMA.core environment (and by extension really almost all of our other products, too).

The need for monitoring

Yes, we know that monitoring conjures all sorts of “Big Brother is watching you” memes, but there are a number of good reasons to provide this kind of service, too:

  • When enrolling new users, you want to keep an eye out for them that they can do things as they are intended. Some of us are introverts, and some pre-emptive corrections (if needed) may actually be appreciated.
  • Too many audit events may be a clue that there’s a security breach in your system, requiring other actions to be initiated.
  • It’s a general sanity check for hard working staff that can now at the end of the day make sure that they indeed looked at everything they needed to, and that it was recorded as such.
  • During an audit, or when an outlier event takes place, an audit trail can provide supportive evidence that indeed things did take place as intended.
  • As a professor, you assign homework cases to your students. Did the student really go look at the assignment?
  • In some cases, the features of an audit trail can be prescribed in the form of legal obligations (depending on your jurisdiction). One such example are the FDA’s 21CFR.11 guidelines.

An audit trail need not be a mystery, either. In essence, it means that for each change in any data point, you keep track of:

  • Who did it
  • When it happened
  • Where it happened
  • Why it happened

Correspondingly, if there’s no audit trail record of an operation, it didn’t happen.

Creating content

Let’s see how that is implemented in PMA.core. Let’s create a new user:

After you create the user, the “Audit trail” tab immediately becomes visible. When you click on it, you see that new data was entered.

The same audit principle applies to other information types across PMA.core. For convenience, we sometimes combine different entities in a single report. An example is a root-directory: A root-directory always consists of a symbolic name (the root-directory itself), and one or more mounting points. You can’t have one without the other. So the audit trail for both is combined and shown as follows:

Note that sensitive and private information like someone’s password is still obscured, even at this level.

Editing content

Let’s make some changes to the user’s record:

The audit trail tab shows the changes:

And the same principle applies to all entities, like the aforementioned root-directories. Note that multiple subsequent edits are shown as separate records:

Deleting content

Getting rid of content is probably where the audit trail comes in the most useful.

After deleting a record, you can search for it, and the fact that no results are returned proves that as far as PMA.core is concerned, the record is indeed deleted.

It is possible to transiently see the operation in retrospect, by typing in the direct URL to the entity’s original audit trail:

The red color is used to indicate that something final happened here:

However, when logging out of PMA.core, and logging back it, it’s harder to retrieve the data, as you need to remember to entity’s original identifier, and even if you do, it may be taken over by a new entity.

We’ll show you in a minute how to get access to deleted data in a more reliable and predictable way.

Why would you want to keep track of deleted data? Do you care to find out whether the student did his homework last semester? Probably not, but at least in the context of clinical trials, as well as hospital operations, this makes sense, because:

  • Clinical trials can run for many years. For rare diseases, the phase I clinical trial can particularly stretch on for a long time. People switch jobs and roles in between, and when final approval of the drug approaches, it’s important to still have a track record of who was involved
  • Regulatory at the country level often require patient data to be kept for dozens of years. Just as important as it is to keep the patient-data, are the meta-data describing the actions performed with the patient records.

Back-end database

PMA.core only offers audit trail views for the most commonly referenced data types. Whether you can consult the data through the PMA.core end-user interface or not; all operations on any data entities in PMA.core eventually are tracked through a single table structure, which is defined in our wiki:

This means that even if there is no visual interface within PMA.core, or you can’t remember the original URL to the entity’s audit trail, there’s always the possibility to go dig into the audit trail in the back-end:

The above shows how the data from our first user record creation is represented. Below is what the update looks like:

And finally, the delete event:

Scaling and resource allocation

All this extra data means extra storage of course. Microsoft SQL Server can definitely handle a lot of records, but there are still situations where extra care is warranted.

When a lot of data passes through the system transiently, it’s possible for the logfiles (tables) to grow quicker than the rest of the database. Consider that also annotations and meta-data (form data) is audit-trailed.

In order to give some guidance as to how much data there actually is, as well as when it was generated, the installation check view gives high-level statistics on this:

If the number of records in the audit trail increases rapidly, you should be able to explain why this is (many users, lots of annotation activities taking place, lifetime of the total installation…). It’s important also at that point to go through our latest recommendations on SQL Server compliance.

In closing

In an earlier article, we talked about the differences between adapting an open-source strategy versus a commercial platform like our own.

This article adds more substance to this discussion: to be truly prepared for enterprise-level deployment of digital pathology, it’s important to know who’s doing what with your system. It’s important to be able to prove that to the necessary stakeholders, including governments and regulatory agencies.

Pathomation’s PMA.core tile server then has all the necessary infrastructure to get you started off the right foot.

New updated version of PMA.transfer released: easily manipulate whole slide images in digital pathology and virtual microscopy

Just as important as having state of the art digital pathology software, are the tools that are built and provided around such infrastructure. Today we release PMA.transfer 2.2.3. PMA.transfer is an important component in the Pathomation software platform for digital pathology when scaling up and automating your slide manipulation capabilities.

Why even build a separate tool like PMA.transfer? Virtual slides are complex in build-up. Any type of data that is stored on a computer comes down to a physical file on a hard disk. Various software vendors have come up with different strategies to store virtual slides as such physical data. A single slide can be stored as a single file, but also as multiple files. This distinction is not always obvious, and often leads to confusion when sharing slides amongst colleagues and collaborators alike

Similar in feel to FileZilla and CloudBerry Explorer, PMA.transfer helps end-users and administrators to manipulate large amounts of virtual slides. The tool allows you to manipulate virtual slides on a transaction-based scale. PMA.transfer is smart enough to figure out what physical files belong to what slides. The end-user only needs to tell the software to “transfer these slides from A to B”, and PMA.transfer takes care of it.

The specific nature of how virtual slides are physically represented on storage media is hidden from end-users. A typical application for PMA.transfer would be to upload recently scanned slides from a local hard disk to the PMA.core image management system (IMS). It doesn’t matter than in this regard whether the final storage destination is a local on-premise RAID device, or (cloud-based) S3 storage.

In addition to providing the necessary abstraction of the underlying complex data representation, much effort went into introducing the necessary checks and balances before, during, and after a transfer operation.

PMA.transfer works in close collaboration with PMA.start, a free local version of PMA.core, our CE-IVD certified tile server. This is necessary because it is PMA.start that is responsible for interacting with the local hard disk. It is this interaction that also allows for the software to perform a slide integrity check before the transfer even begins; there’s nothing more frustrating than to wait for 2 Gigabytes of data to upload, only to have to realize afterwards that the initial slide was somehow corrupt from the beginning.

Slide transfers that are interrupted during the upload (or download) process automatically resume once a connection is re-established. Upon completion, a secondary integrity check takes place, to confirm that a transferred slide’s fingerprint corresponds to the source slide.

PMA.transfer 2.2.3 was updated to interact better with My Pathomation, and is expected to be particularly like by its users. While the My Pathomation user interface is very user friendly to upload slides one by one, it is not that well suited for larger batch-based transfers.

Upon launching PMA.transfer, users can choose to either connect to an on-premise PMA.core instance, or an institutional cloud-based My Pathomation account. For power-users, different connection profiles can be tracked as well, similar to FileZilla’s host manager.

For existing Pathomation customers (including users of My Pathomation), PMA.transfer can be downloaded free of charge from https://www.pathomation.com/pma.transfer.

De-Identification and procedures to increase data privacy

This is a follow-up blog post. If you missed the first part, you can read it here: WSIs and pseudonymization.

De-identification

De-identification, is a term used for any process of reducing personal information, aiming at increasing personal privacy and data compliance. For example, pseudonymization is a de-identification procedure, further de-identification steps may lead to anonymization.

Pseudonymization

Essentially, pseudonymization consists in the replacement of direct identifiers by a pseudonym. It permits the WSI to be traced back to the patient as it is often needed in the clinical practice, while allowing it to be shared with other areas of medical research, often subject to different legal agreements. Due to its recognized importance, in practice all patient samples entering a pathology lab are pseudonymized at the lab registration step, each new sample gets a sequential biopsy number. In the specific case of research projects, pseudonymization is sufficient on the condition there is patient consent and approval of an Ethical Committee.

Anonymization

Anonymization, is the most radical de-identification action, – it ensures patient privacy. In the specific case of the WSI, it requires the erasure of any written data/metadata entry from the WSI so that the data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party (ENISA, 2019).

Additional security steps for pseudonymized data

Further measures can be employed to secure patient data from unauthorized access. Increasing the number of steps and gatekeepers needed to identify a patient, increases security. For example, Provenance information management, where a WSI can be traced back to the original biological material, without exposing patient information might be employed to further protect the likelihood of unauthorized access, despite the use of pseudonymization. (Holub et. al; 2022).

Pathomation’s PMA Core: a security and privacy gatekeeper

At Pathomation we offer a series of tools and configurable options to ensure data compliance with major personal data privacy laws (GDPR, HIPAA). As data processors, whenever we store client data in our servers, the data is received pseudonymized, – Scenario 3 of the ENISA guidelines “[…]the data controller [our clients] again performs the pseudonymisation but this time the processor is not involved in the process but only receives the pseudonymised data from the controller.”

Despite managing pseudonymized data, which confers a considerable degree of privacy, the design of our  IT infrastructure and systems, ensures that the data is kept secure and access is controlled.

For example, our Image Management Solution (IMS) PMA.core, the centerpiece in our software platform for everything slide-management, is also the guardian of sensitive information. PMA-core is CE-IVD labelled for clinical diagnostic use, offering best practices and data management tools covering among others, robust permissions, encryption, passwords, audit trailing and data anonymization tools.

Thinking of data management practices aiming at ensuring data privacy in a daily workflow, with PMA-core is now possible to incorporate  the following management practices and tools into your workflow:

De-identification tools: Beyond pseudonymization

When presenting a viewport to serve slide content, one can opt to hide  elements in the viewport that could lead to inadvertently revealing sensitive information: both the label (barcode) and filename widgets can easily be hidden with a single line of JavaScript code. Check our online viewport configuration demo for relevant technical information on this.

The viewport configuration is useful in an environment where sometimes you do, yet sometimes you don’t want to show slide identification information. At root-directory level, therefore, one can opt to stipulate that label information is never to be revealed. A hybrid environment can thus be created in a medical school, where human pathology samples for teaching can be prevented from ever exposing label information, while in diagnostic routine these remain visible.

Permissions

The above are software solutions. PMA.core never manipulates the original raw data. This means that the original whole slide image will always retain the original slide label information unless you opt to anonymize the data. Hypothetically somebody could download an original slide from one PMA.core, and open it somewhere else. In order to prevent people from downloading selected data that they’re not allowed to see, PMA.core as a final back-stop supports granular permission settings: you can hence specify that people can view a slide (which essentially means piecemeal controlled serving of individual tiles), but they’re not allowed to download it; meaning they could never gain access to the original data.

Anonymization

Eventually you decide that a particular image or slide collection can (or must be) anonymized. In fact, whenever you need to share WSIs publicly such as along a congresses and other symposia or provide pathology training, any information that could lead to patient identification should be removed. In other words, pseudonymization is no longer enough, you must anonymize the WSI.  In that case we offer you our in-house developed tool, Dicomizer. It is possible to export any WSI file format into a Dicom standard image, while at the same being able to remove even metadata embedded in the image files, together with the label on the WSI.

Trail Audit

Under GDPR,  data processors have a legal obligation to maintain records of their processing activities. We facilitate this obligation by providing real time trail audit; which translates in the ability to understand who has used a particular image and when.

On top of all these, we love to bespoke our applications and services to accommodate your very specific needs.

Glossary of terms

  • GDPR (ambit 26) The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person.
  • Datacontroller/dataprocessor: Under GDPR, there are two key entities: the data controller determines the purposes for which and the means by which personal data is processed. The ‘why’ and ‘how”. The data processor processes personal data on behalf of the controller. The controller can be also the processor.
  • Pseudonymization: GDPR (Article 4(3b) “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable individual.”
  • Anonymization: Recital 26, EU GDPR: “(…)information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

Data compliance: WSIs and Pseudonymization

GDPR & Patient Data: Aiming at Privacy by Design

GDPR (General Data Protection Regulation) in force in the European Europe since May 2018, regulates how personal/patient data is handled in Europe. The GDPR defines the obligations that data controllers and data processors must comply to.

Health data under the GDPR is considered sensitive data, as such, data processing is prohibited unless there is individual’s explicit consent, which clearly must define how the data must be processed, by whom and for what. On top of explicit consent, there are yet five additional lawful basis for processing data: Contract, Legal obligation, Vital Interests, Public Task and Legitimate Interest (Art. 9 GDPR).  In practice some of these lawful basis are observed in the direct therapeutic relationship.

Yet, because what is lawful might not be necessarily ethical, and by the end of the day the data controller/processor are accountable for any data breach or ethical issues, the role of the Ethical Committee within a health care organization, is often essential to define ethical safeguards and ensure the ethical management of patient data, which will impact security as well.

Essentially, the GDPR aims to safeguard the management of personal data in a lawful, fair and transparent manner, limit the purpose of data, increase integrity and confidentiality and make data controllers/processors accountable for the ethical and lawful utilisation of data.

When it comes to ensure personal privacy, fulfilling legal and ethical obligations can be complex, but is it now understood that along with sound technical and organisational infrastructures and procedures, de-identification methods, (as explorer along this article) can be an important part of meeting these legal and ethical obligations. (M. Hintze, 2016). Health care companies specifically have developed technical and organizational measures intended to protect personal health data by design in their products.

WSI: New Health Data

Whole Slide Imaging (WSI) is a relatively new category of patient data.  As such, recently, some efforts have been made to better understand how WSIs (whole slide images) can be utilized in a balancing act: maximizing individual and public benefit, while at the same time, restricting and protecting its access, whenever such poses a potential privacy risk and non-compliance with data privacy laws (e.g. GDPR). Although by itself the patient’s photographic tissue image falls under the category of de facto anonymity of the GDPR, the existence of patient-ids and/or other tags attached to the WSI, increase the likelihood of patient identification, and potential privacy issues. (Holub et al., 2022).

The novelty of this type of personal data is revealed when histopathologists describe their knowledge and confidence when dealing with WSIs as personal data. In a survey (n=198) conducted by the Oxford University Hospitals, addressed to histopathologists members of major pathology associations in the UK, 41% of the respondents did not know when WSIs would fall under relevant legal frameworks, while 47% were not confident “At all” when it came to understand the WSI consent in a research context (Coulter et al.; 2022).

Digital imaging Storage and management standards are widely adopted in the field of Radiology, which facilitates the adoption of sound data compliance practices. When it comes to digital pathology and data management standards adoption differences,  alongside with workflow specificities and the novelty of the field, another major difference lies in the role of the pathology professional societies. The ARS took the lead and through  the DICOM initiative managed to impose standards and provide clear guidelines for Radiologists and industry. The Pathology societies have not played such an active role yet, and previous efforts to make WSI part of DICOM have not been successful due to the complexity of WSI and differences in pathology clinical workflows. Despite the absence of reference literature from professional pathology societies, general guidelines and use cases published in the recent years focusing on how to lawfully and ethically process WSIs as patient data, point to pseudonymization as a relevant strategy to ensure data compliance, and perhaps most importantly, as a strategy to avoid data breaches and its negative consequences for both patients and organizations. In fact pseudonymization is often part of basic but sound strategies when it comes to protect personal data in general, as explained by Thomas Zerdick (2021).

If you do not need personal data, do not collect personal data (…)if you really need personal data, then start by pseudonymising this personal data. Thomas Zerdick.

In a follow-up to this post we’ll look at practical steps that you can take to ensure compliance, assuage your Data Protection Officer (DPO), and generally sleep better at night.

If you’re interested in finding out what features Pathomation’s PMA.core CE-IVD certified tile server offers today with regards to anonymization and pseudonymization, we refer you to our wiki overview page.

The second part of this post can be found here.


Free alternatives to Pathomation

Evil money-grabbing thieves

Pathomation is a for-profit company. We pride ourselves in offering various commercial products. Over the years, we’ve developed a comprehensive software platform for digital pathology and virtual microscopy. This platform consists of various components, which can be purchased, rented, or even licensed.

We strive to contribute to the Open Source community where we can. Some of our developers get referenced occasionally in papers, and our SDKs are publicly available through GitHub. On occasion, we contribute to Open Source projects ourselves.

Sample slides

Occasionally, people also ask us about where to find sample slides.

A great inventory of publicly available whole slide images can be found at both OpenSlide and the OMERO project:

What else do you need?

Oh yes, a way to actually read these data!

Reading WSI data

There are two well established Open Source projects for WSI data and virtual microscopy:

As you’re getting started adding digital pathology features to your own software, there are other routes to explore, too:

  • Work with slide scanner vendors
  • Adapt or translate code from BioFormats
  • Adapt or translate code from OpenSlide

All of these have downsides however:

  • Scanner vendors: You can start negotiating with any particular scanner vendor to adapt their proprietary SDK. Congrats; after months of lobbying work, you’ll have the (rather unpleasant) job of tasking one of your engineers to incorporate a new arcane DLL into your code, for a single framework (so expect to repeat this process over and over again for the next couple of years as your platform gains broader adoption)
  • Bioformats: is open source, so you could learn from their code and translate parts of it into your own. You can try it at least. Bioformats is a Java stack. If you can integrate that directly, you’re lucky. You could possibly transcribe the original Java-code to your own platform of course (C#, C++, Python, Rust, Haskell, Martian…). Same thing here: you’ll have the (still rather unpleasant, IMHO) task of tasking one of your engineers to actually do all of this, for a single file format (and again expect to repeat this process over and over again for the next couple of years as your platform gains broader adoption)
  • OpenSlide: lots of support, lots of bindings, but a bit outdated (limited file format support), limited functionality (not a server, no concept of “tiles”, …), and not adapted for today’s regulatory environment (GAMP, CE-IVD(R), 21CFR.11, …).

Of course all of this might change in the future. But as you sit and wait, we are here to help now, and  you may already consider what Pathomation has to offer as an alternative:

PMA.start

A central theme in our platform is the PMA.core tile server. A light-weight version of PMA.core is available free of change as PMA.start through https://free.pathomation.com.

PMA.start (and the entire Pathomation software stack by extension) supports any number of native file formats. So, you have to choices to go with here actually:

The advantage of this approach? Once you’re in the DICOM or TIFF world, you’re pretty much independent from any proprietary software vendor. Use Paint.Net to make snapshots of particular Regions of Interest (ROIs), use Photoshop to run a sequence of filters to detect hormone receptors in breast biopsies.

The downside: this doesn’t scale particularly well with large datasets or high throughput.

It is our view then that reading native file formats directly will most likely remain the most efficient way to offer digital pathology services in the foreseeable future.

The hidden cost of free software

Our point with all the above is: free is not free. You can always download free software, but after 10 years I daresay it’s NEVER going to do what you actually need it to do. But you’ll need to invest R&D, write your own unit and integration tests, add additional regulatory and compliance layers, combine development stacks and frameworks (the proverbial round peg in square hole)… Assuming that you have these people in-house, or can find them: these DON’T work for free.

Last but not least, there’s an opportunity cost here as well: again, if you have super-dev-guy/girl; don’t you want him/her to work on the core mission of your company (like saving patients’ lives, or developing awesome eXplainable AI)?

Plan ahead if you can

We’re not against Open Source software. We don’t advocate a radical choice between the two. We think this can be an inclusive and/and story, where both models can live in harmony.

We do think however that in many scenarios, Pathomation is the right (and even the better) course of action from the start. Adapt PMA.start early on in your projects as an individual single user solution. Then, as you learn more and expand the scope of your application, talk to us on how to scale up and how to adapt PMA.core as a tile server in your enterprise architecture or organization.

We can resolve the file format handling problem once and for all _for_ you, so you have more time and energy to focus on your _actual_ application.

Even in the early stages; the limited cost that comes with our commercial solution and proper implementation support hugely outweighs the incurred costs from trying to tackle this middleware problem yourself.

Bioinformatics is coming

The back-story of Pathomation is well-known: once upon a time, there was an organization that had many WSI scanners. All these scanners came with different pieces of software, which made it inconvenient for pathologists to operate digitally. After all, you don’t alternate between Microsoft Office and Apache OpenOffice depending on what kind of letter you want to write and which department you want to address it to.

Tadaaaah, ten years later Pathomation’s PMA.core tile server acts as a veritable Rosetta stone for a plethora of scanner vendors, imaging modalities, and storage capacities alike.

There’s a second story, too, however: Pathomation could not have been founded with a chance encounter of a couple of people with exactly the right background at the right time. As it happened, in 2012, the annual Digital Pathology Association in Baltimore (coinciding with a pass-through of hurricane Sandy) had a keynote speech by John Tomaszewski about the use of bioinformatics tools to (amongst others) determine Gleason scores on prostate slides. One of the co-founders of Pathomation was in the audience and thought “what a great idea…”.

Bioinformatics meets digital pathology

Pathomation reached out to the bioinformatics community early on. Within the ISCB and ECCB communities however, interest was initially low: only one or two talks (or even posters) at the Vienna, Berlin, and Dublin editions discussed anything even remotely related to microscopy. The few people that did operate in this intersection of both fields, expressed mostly frustration having to spend a seemingly outrageous amount of time just learning how to extract the pixels from the raw image data files.

A simple observation emerged: bioinformatics could contribute a lot more to digital pathology, if only we could make it easier to port the data back and forth between the two communities. Say: A universal honest broker for whole slide imaging.

But having such software (in the form of its CE-IVD certified PMA.core tile server and freeware “starter kit” PMA.start) is not enough. We still had to get the word out.

So Pathomation next set out on its own and started organizing its own digital / computational pathology workshops. Coinciding with the European-based bi-annual ECCB events in The Hague (The Netherlands) and Athens (Greece) The proceedings of these are still available online.

A maturing relationship

Fast forward to 2022 and things are a bit different.

As Pathomation has forged on as a middleware provider, so has bioinformatics gradually started to contribute to digital pathology. As we already hypothesized 10 years ago: the datasets in pathology (and by extension: microscopy, histology, time lapse data, MSI etc) are too interesting not to be explored with tools that have already contributed so much to other fields like genetics and medicine.

When you go to OUP’s leading Bioinformatics journal and search for “pathology”, more than 10,000 hits spring up.

Correspondingly, a search for “bioinformatics” in the Journal of Pathology Informatics (JPI) yields significantly fewer results. That’s not unexpected, as oftentimes “bioinformatics” wouldn’t be mentioned as a wholesale term, but one would rather reference a particular protocol, method or technique instead. Chubby Peewee, anyone?

The relationship between digital pathology and bioinformatics is clearly well established and maintained.

Building bridges

Occasionally we’re asked what we offer in the field of image analysis (IA). We’ve always maintained our stance that we don’t want to get into this directly ourselves by means of offering an alternative product (or software component) to well established packages like ImageJ, QuPath (open source) or Visiopharm, HALO, or Definiens (commercial).

Instead we’ve pursued our mission of becoming the one true agnostic broker for digital pathology. This means that we look at all of these (and more) and determine how we can best contribute to help people transfer information back-and-forth between different environments.

SDKs

Many in silico experiments start of as scripts. In bioinformatics, one of the first extensively supported scripting languages was Perl, in the form of the BioPerl framework. It still exists and is in use today, but (at the risk of alienating Perl-aficionados; we mean no offense) has been surpassed in popularity by Python (and BioPython).

Looking at the success of (bio)python, the first language we decided to support in the form of providing an SDK for it was Python: our PMA.python library is included in the Python Package Index (PyPI), and very easy to install through an interactive framework like Jupyter or Anaconda.

Several articles on this blog tackle digital pathology challenges by means of our own PMA.python library, and can be used as an inspiration for your own efforts:

All our SDKs are available as open source through GitHub, and since PMA.python, we’ve also added on PMA.java and PMA.php.

We have a development portal with sections on each programming language that we support, and we use the respective “best practice” mechanisms in each to provide exhaustive documentation on provided functions and methods as well.

Plugins

The plugin concept is a powerful one: many companies that build broadly used software have realized that they can’t themselves foresee all uses and provide a way for third-party providers to extend the software themselves through a plugin mechanism. In fact, software has a better chance of becoming broadly used when it offers this kind of functionality from the start. It’s called “the network effect”, and the network always wins.

Not everybody wants (or needs) to start from the ground up. There are many environments that can serve as a starting point for image analysis. Two specific ones related to digital pathology are ImageJ and QuPath. As we needed a proving ground for our own PMA.java SDK anyway, we decided to work towards these two environments. Today, we have plugins available for both.

Our plugins are available from our website, free of charge. You can download our plugins for image analysis, which are bioinformatics related. For more mundane applications, we also offer plugins to content management systems (CMS) and learning management systems (LMS).

The PMANN interface

If there is no plugin architecture available, oftentimes an intermediary file can be used to exchange data. We apply this technique ourselves in PMA.studio, where a virtual tray can persist over times by exporting it to and importing it back from a CSV file as needed.

So it is with commercial providers. Visiopharm has its own (binary) MLD file format to store annotations in, and Indica Labs’ HALO uses an XML-based file.

What you typically want to do, is run an algorithm repeatedly, with slightly different parameters. This results then in different datasets, that are subject to comparison and often even manual inspection or curation.

The PathoMation ANNotation interface assists in this: As the analytical environments are oftentimes unsuited for curators to work with (both in terms of complexity as well as monetary cost), a slide in PMA.core can be instructed to look for external annotations in various locations. You can have slide.svs associated with algo1.mld,  algo2.mld, and algo3.mld. You can interpret the overlaying datalayers computationally, or you can visualize them in an environment like PMA.studio.

What’s more: PMANN allows you to overlay data from multiple environments simultaneously. So you can do parameter optimization in one environment, but you can also run an algorithm across different environments and see which environment (or method like deep learning or random forest) performs the best.

Don’t reinvent the wheel

Bioinformatics is a much more mature field than digital pathology, with a much broader community. At Pathomation, we strongly believe that many techniques developed are transferable. Data-layers and data-sets are equally apt to enrich each other. Genetics and sequencing offer resolution, but tissue can add structure as an additional source of information to an experiment outcome, and help with interpretation. The work of Ina Koch that we referenced earlier is a typical example of this. Another great example is the TCGA dataset, which has been incorporating whole slide images as well as sequencing data for a couple of years now.

At Pathomation, we aim to offer connectivity. Connectivity to different slide scanner vendors (by supporting as many file formats as possible), but also by allowing different software tools to exchange data in a standard manner. To this end, Pathomation also recently signed up for the Empaia initiative.

We strongly believe that as a physician or researcher alike, you should be able to focus on what you want to do, rather than how you want to do it. We build middleware, SDKs, and plugins to help you do exactly that.

Do you have experience with coupling bioinformatics and digital pathology? Have you used our software already in your own research? Let us know, and you may find yourself the subject of a follow-up blog post.