PMA.UI on React.JS with Collaboration using PMA.live

PMA.UI is a Javascript library that provides UI and programmatic components to interact with Pathomation ecosystem. It’s extended interoperability allows it to display WSI slides from PMA.core, PMA.start or Pathomation’s cloud service My Pathomation. PMA.UI can be used in any web app that uses Javascript frameworks (Angular.JS, React.JS etc) or plain HTML webpages.
PMA.live (now called PMA.collaboration) is a Javascript library that provided functionality to share your PMA.UI app/page with other users keeping all parts of the page synchronized, allowing for drawing annotations simultaneously, share multiple viewports just like with multi-head microscopes but over the internet.
We will create an example React application and integrate both PMA.UI and PMA.live as an example. You can use an existing React application or create a new one using create-react-app e.x.

npx create-react-app CollaborationReact

First we have to install PMA.UI and PMA.live library using npm by running

npm i @pathomation/pma.ui
npm i @pathomation/pma.collaboration

inside application’s directory.

Next step is to add JQuery in our app. Open index.html file inside public directory and add

<script src="https://code.jquery.com/jquery-3.6.0.min.js"
    integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4=" crossorigin="anonymous"></script>

in the head tag.

It’s time to implement the main functionality of PMA.UI and fire up 4 simultaneous slide viewports in the same page. We will use the PMA.UI components to easily do that namely the context, slideloader and autologin. So go ahead and replace the default App.js with the following initialization page

const imageSet = ["Reference/Aperio/CMU-1.svs", "Reference/Aperio/CMU-2.svs", "Reference/3DHistech/CMU-1.mrxs", "Reference/3DHistech/CMU-2.mrxs"];

const createSlideLoader = (context, element) => {
  return new SlideLoader(context, {
    element: element,
    overview: { collapsed: false },
    dimensions: { collapsed: false },
    scaleLine: true,
    annotations: { visible: true, labels: true, showMeasurements: false },
    digitalZoomLevels: 2,
    loadingBar: true,
    highQuality: true,
    filename: true,
    barcode: false
  });
}

function App() {
  const viewerContainer = useRef(null);
  const slideLoaderRefs = useRef([]);
  const context = useRef(new UIContext({ caller: "Vas APP" }));
  const [slideLoaders, setSlideLoaders] = useState([]);
  const location = useLocation();

  useEffect(() => {
    if (slideLoaderRefs.current.length == 0) {
      slideLoaderRefs.current = [...Array(20)].map((r, i) => slideLoaderRefs[i] || createRef());
    }
    let autoLogin = new AutoLogin(context.current, [{ serverUrl: pmaCoreUrl, username: username, password: password }]);

  }, []);

  useEffect(() => {
    if (slideLoaderRefs.current.filter(c => !c || !c.current).length > 0) {
      return;
    }

    let slLoaders = [];
    for (let i = 0; i < slideLoaderRefs.current.length; i++) {
      slLoaders.push(createSlideLoader(context.current, slideLoaderRefs.current[i].current));
    }

    setSlideLoaders(slLoaders);
  }, [slideLoaderRefs.current]);

  return (
    <div className="App">
      <div ref={viewerContainer} className="flex-container">
        {slideLoaderRefs.current && slideLoaderRefs.current.map((r, i) =>
          <div className={"flex-item"} key={i} ref={slideLoaderRefs.current[i]}></div>)
        }
      </div>
    </div >
  );
}

export default App;

To properly show all 4 viewers in the same page we need some css to style it up, so we need to add this to index.css. This will split the page to a grid of 2×2 viewers using css flex.

.flex-container {
  display: flex;
  flex-direction: row;
  flex-wrap: wrap;
  width: 100%;
  height: 850px;
}

.flex-item.pma-ui-viewport-container {
  flex: 0 0 50%;
  height: 400px
}

.ml-1 {
  margin-left: 15px

Well that was easy to set up!

Collaboration

So let’s synchronize this page for all the user’s joining so they can see and interact with the same slides. For this we will be using the PMA.live server and to pma.collaboration package we installed earlier. To enable users to collaborate they have to join the same session, as it is called, one user will be the master of the session which controls the current viewports, slides and annotations(even though this can be changed with a setting for all users with a setting called EveryoneInControl).

PMA.live uses SignalR and the WebSocket protocol to achieve real time communication between participants, so we need to include this scripts in our page. We can include this scripts in the index.html as we did for jQuery, but we need to be sure that the scripts are properly loaded before trying to initialize any PMA.collaboration in our React application. So we will use the same trick used by Google Maps to load the scripts asynchronously and notify our React app when they are ready. So create a new file called collaborationHelpers.js with the following function.

export const loadSignalRHubs = (collaborationUrl, scriptId, callback) => {
    const existingScript = document.getElementById(scriptId);
    if (!existingScript) {
        const script = document.createElement('script');
        script.src = `${collaborationUrl}bundles/signalr`;
        script.id = scriptId;
        document.body.appendChild(script);
        script.onload = () => {
            const script2 = document.createElement('script');
            script2.src = `${collaborationUrl}signalr/hubs`;
            script2.id = scriptId + "hubs";
            document.body.appendChild(script2);

            script2.onload = () => {
                if (callback) {
                    callback();
                }
            };
        };
        return;
    }
    
    if (existingScript && callback) {
        callback()
    };
};


To notify our React app that the scripts are ready and to proceed with the initialization we need to create a new state in the App.js page called loadedScripts which we will set to true when the scripts are loaded in our previous useEffect function
useEffect(() => {
    if (slideLoaderRefs.current.length == 0) {
      slideLoaderRefs.current = [...Array(20)].map((r, i) => slideLoaderRefs[i] || createRef());
    }

        let autoLogin = new AutoLogin(context.current, [{ serverUrl: pmaCoreUrl, username: "zuidemo", password: "zuidemo" }]);
    loadSignalRHubs(collaborationUrl, "collaboration", () => {
      setLoadedScripts(true);
    });
  }, []);

So now everything is ready to establish a connection to the PMA.live backend, and also joining a session (joining a non-existing session will just create a new one) 
const initCollaboration = (nickname, isMaster, getSlideLoader, collaborationDataChanged, chatCallback) => {
  return Collaboration.initialize(
    {
      pmaCoreUrl: pmaCoreUrl,
      apiUrl: collaborationUrl + "api/",
      hubUrl: collaborationUrl + "signalr",
      master: isMaster,
      getSlideLoader: getSlideLoader,
      dataChanged: collaborationDataChanged,
      owner: "Demo",
      pointerImageSrc: "…",
      masterPointerImageSrc: "…"
    }, []).
    then(function () {
      var sessionName = "DemoSession";
      var sessionActive = false;
      var everyoneInControl = false;
      return Collaboration.joinSession(sessionName, sessionActive, nickname, everyoneInControl);
    }).
    then(function (session) {
      // after join session
      ////var appData = Collaboration.getApplicationData();
      if (isMaster) {
        Collaboration.setApplicationData({ a: 1 });
      }
    });
}

The initialize method tells to the PMA.live Collaboration static object where to find PMA.core, PMA.live backend, whether or not the current user is the session owner, what icons to show for the users’ and the master’s cursor and accepts a couple of callback functions. The joinSession method will create a session if it does not exist and then join it. If the session doesn’t exist, it is possible to specify whether or not it is currently active and if all users can take control of the synced viewports or only if the session owner can do this. Once the session has been created, only the session owner can modify it and change it’s active status or give control to others.

In order for PMA.live to be able to sync slides, it has to know the viewports it should work with. Earlier in our code we created an array of maximum of 20 slideloaders which we kept in a react ref object. Now let’s go back to the implementation of the “getSlideLoader” callback that we used during the initialization of PMA.live. This function will be called by PMA.live when it needs to attach to a viewport in order to control it. So we will need to return the appropriate slideLoader from this React Ref array with this function

const getSlideLoaderCb = (index, totalNumberOfImages) => {
    if (!master && totalNumberOfImages < numberOfImages) {
      for (let i = totalNumberOfImages; i < numberOfImages; i++) {
        slideLoaders[i].load(null);
      }
      setNumberOfImages(totalNumberOfImages);
    }

    return slideLoaders[index];
  }

So now we can initialize the collaboration in a useEffect which executes after the SignalR and Hubs scripts are properly initialized

useEffect(() => {
    if (loadedScripts && viewerContainer.current && slideLoaderRefs.current && slideLoaders.length > 0) {
      if (slideLoaderRefs.current.filter(c => !c || !c.current).length > 0) {
        return;
      }

      if (collaborationInit) {
        return;
      }

      initCollaboration("demo user", master, (index, totalNumberOfImages) => {
        return getSlideLoaderCb(index, totalNumberOfImages);
      },
        () => {
          let data = Collaboration.getApplicationData();
          let session = Collaboration.getCurrentSession();
          setCollaborationData({ data: data, session: session });
        })
        .then(() => {
          setCollaborationInit(true);
        });
    }
  }, [loadedScripts, viewerContainer, master, slideLoaders.length, collaborationInit, slideLoaderRefs, slideLoaderRefs.current.length]);

Finally, let’s talk about the “collaborationDataChanged” callback. Besides the out of the box slide syncing capabilities, it gives you the ability to exchange data between the users, in real time. This could be useful for example if you wanted to implement a chat system on top of the session. Every user can change the session’s data by invoking the Collaboration.setApplicationData method. It accepts a single object that will be shared among users. Whenever this method is called, all other users will receive it through the collaborationDataChanged callback. To do this is a React way we simply set the application data to a React state object whenever the callback is called.

To allow other users to join our application as guests we will implement a query string parameter called master. When this parameter is set to false users joining this session will be guests. We keep this value in a React state called master. So we change our initial useEffect function to add this

var urlSP = new URLSearchParams(location.search);
if (urlSP.get("master") === "false") {
   setMaster(false);
}

Congratulations you’ve done it! You can now have a working React application with PMA.UI and collaboration featured enabled.


You can download a complete demo here

Additional developer resources are provided here

Custom panels and functionality in PMA.studio

Because sometimes you want more

PMA.studio has a lot of functionality out of the box, but sometimes you want more. Having PMA.studio open on one screen and your organization’s (AP)L(I)MS on another is not always ideal. And what if you don’t have multiple screens?

As we pointed out before, PMA.studio offers a panel-based layout. The standard panels can be moved around and even stacked. The standard ribbon in PMA.studio offers some convenient default layouts, too.

Further configuration of PMA.studio is available through the Configure tab, where you can enable individual panels.

But what if you can’t find the panel that you’re looking for? Maybe the content that you’re looking for is at a different website, and if you just could have that particular page available withing PMA.studio as a separate panel…

Custom panels

PMA.studio offers the possibility the add a custom panel w/ select content from a particular URL.

Let’s say that you want to have a reference website available next to your slide content. We’ll use our own PMA.studio wiki website at https://docs.pathomation.com/pma.studio as an example.

You could start by pulling up PMA.studio in one browser, and our wiki in another browser. You start off nice and smooth like this:

But soon your layout gets messy. The 50% screen width really gives too much to the wiki, and let’s not even get into how easy it is to have that other browser window snowed under a ton of other applications (the word processor you’re using to write a paper in, your ELN, your EPR, your LIMS…)!

There’s a straightforward solution you. Click on the “More” button in the Layout group from the Home tab:

And a new dialog shows up. In addition to selecting any number of default panels, you can also define custom panels. Like so:

After clicking ok, your new panel appears, making your overall screen layout look like this:

Now that’s more like it!

This is already great for reference data, but what if you want to combine this with slide awareness? In other words: you want to have the content in the panel change automatically depending on the selected slide.

Passing parameters

The custom panel mechanism in PMA.studio automatically passes along references to other webpages that trace back to a current PMA.core tile server and selected slide.

This mechanism is typically hidden from plain view to reduce the functional complexity of it all, but a single line of PHP brings up the necessary data:

print_r($_GET);

When we create yet another custom panel that refers to this new page, we see the following appear:

And let’s just say that we don’t like the way PMA.studio displays a slide’s thumbnail and label image. We’d rather just have those in separate panel, too, so we have more control over how they’re displayed.

We also know that the thumbnail of slide X can be reached via:

https://server/pma.core/thumbnail?sessionID=…&pathOrUid=X…

And the label image of slide X can be reached via:

https://server/pma.core/barcode?sessionID=…&pathOrUid=X…

We can therefore make two new scripts, that translate the received input parameters from PMA.studio, and translates those to the correct querystring variables for our respective thumbnail and label images:

thumbnail.php:
<?php
$url = $_GET["server"].
	"thumbnail?sessionID=".$_GET["sessionId"].
	"&pathOrUid=".$_GET["slideUid"];
header("location: $url");
?>
barcode.php:
<?php
$url = $_GET["server"].
	"barcode?rotation=0&sessionID=".$_GET["sessionId"].
	"&pathOrUid=".$_GET["slideUid"];
header("location: $url");
?>

We place the new files on a server, and reference them via two separate custom panels:

When we navigate the slides in a folder one by one, we now see that the panels are updated accordingly, too.

Other applications

In this post, we showed you how you can configure custom panels in PMA.studio exactly to your liking, and how the content of such panels can synchronize with currently viewed slides.

We showed you how to pass along information through some trivial examples, referring back to our own infrastructure.

Now you can build your interfaces, like we demonstrated in an earlier blog article.

PMA.studio is more than just a universal slide viewer; you can turn it into your own veritable organizational cockpit. Think e.g. about custom database queries against your back-end LIMS, bio-repository, or data warehouse. You can show the data where you want it, when you want it, all with a few configuration tweaks. No longer do you have to juggle multiple browsers, as PMA.studio simply allows you to build your own custom dashboards.

Find out more about PMA.studio through our landing page at https://www.pathomation.com/pma.studio.

Sharing facilities in PMA.studio

Let’s get together

Sharing content is arguably one of the most important applications of digital pathology, if not for the Web in general.

PMA.studio allows you to share content in a variety of ways. There is a dedicated group for sharing content on the ribbon:

When you just want to share what you’re currently looking at, chances are that you can get by with one of the quick share buttons:

  • If you want to share the current folder you’re navigating, click on the “Share folder” button
  • If you want to share the current slide that you’re looking at, click on the “Share slide” button
  • If you want to share the current grid that you’re looking at, click on the “Share grid” button
  • Etc.

If you want more control over what and how you’re sharing content, you can click on the final “Share” button of the group. You could say that that’s our “universal” share button.

It allows for further customization of your share link, including:

  • Password-protect your link
  • Expire the link (e.g. students can only access it for the duration of a test)
  • Include or exclude annotations from the shared link
  • Use a QR code instead of a plain text link
  • Etc.

Our best advice is for you to play with the various options. But do let us know when you think there are some features missing or you think something is broken.

Share administration

We’ve worked hard on making the sharing concept in PMA.studio broadly applicable to a variety of content. We’ve also worked on making it easy to share content.

So with all this sharing going on then, it’s only natural to be asking after a while “wait, what am I actually sharing?”.

On the configure tab, in the “Panels” group, you can active the “Shared links panel”

Once clicked, you get a new panel with an overview of everything you’ve shared so far.

The buttons behind each link allow various operations.

One application of this is to recycle a share link and re-use as you see fit.

You can also (temporarily) invalidate links, or delete them altogether.

The history is linked to your PMA.core login, so if at first you don’t see anything, make sure that you’re connected to the PMA.core instance for which you expect to see share links.

Monitoring

On the back-end of PMA.studio, administrators can get an overview of all created shares across all users. They can also use this view to temporarily suspend or even delete shares.

Automation

While we highly advocate the implementation of the PMA.UI framework in third-party software like (AP)L(I)MS, PACS, VNA, and other digital pathology consumers, we realize that this is not trivial. In a proof-of-concept phase, all you may want to do is show a button in your own user interface that then subsequently just pops up a viewport to the content that you want to launch. Easy-peasy, as they say.

Let’s say that you have an existing synoptic reporting environment that looks like this:

In order to convince your administration that adding digital pathology to it is a really good idea, you want to upgrade the interface to this:

With PMA.studio, you can now get exactly this effect.

Let’s switch to Jupiter to see how this works:

.First some homekeeping. We import the pma_python core module, and connect to the PMA.core instance that holds our slide.

Our slide is stored at “cases_eu/breast/06420637F/HE_06420637F0001S.mrxs”. Let’s make sure that the slide exists in that location by requesting its SlideInfo dictionary:

Alternatively, we can also write some exploratory code to get to the right location:

The PMA.studio API

Ok, we’ve identified our slide. Now let’s go to the PMA.studio. Unfortunately, PMA.python doesn’t have a studio module yet, so we’ll have to interface with the API directly for the time being.

The back-end call that we need is /API/Share/CreateLinkForSlide and takes on the following parameters:

We create the URL that invokes the API by hand first. We can do this accordingly:

Never mind that pma._pma_q() method that we use. It’s a fast and easy way for ourselves to properly encode HTTP querystring arguments. You’re free to piggy-back on ours, or use your own preferred method.

After execution of the code, you get a URL that looks like this:

https://yourserver/pma.studio.2/api/Share/CreateLinkForSlide?userName=user1&password=user1&serve.rUrl=https%3A%2F%2Fyourserver%2Fpma.core.2&pathOrUid=IRVGLPWBFT

The URL by itself doesn’t do anything, but create the share link. So you still need to invoke it. You can do this by either copying the URL to a webbrowser, or by invoking it from Python as well:

Again: it’s the return result from the URL that you want to distribute to others and not the initial URL.

To confirm that it worked, you go back to PMA.studio and check your panel with the share link overview:

But you can also just pull up the resulting URL in a new browser window:

Yay, it worked!

Automating folders

You can also create links that point to folders.

That slide that we just referenced? It turns out to be an H&E slide, and along with the other slides in the folder, actually comprises a single patient case.

So you can emulate cases by organizing their slides per folder, with each folder representing a case. Your hierarchy can then be:

[patient]

    [case]

        [slides]

Say that we want to offer a case-representation of breast cancer patient 06420637F. We use Share/CreateLinkForFolder and point to a folder instead of a slide:

The result again appears on the PMA.studio side. And clicking on it results in a mini-browser interface:

What’s next

After PMA.core, we’re starting to provide back-end API calls into PMA.studio as well. Even though as we prefer developers to integrate with PMA.UI directly, there are scenarios where automation through PMA.UI makes sense. When you’re in one of the following scenarios when:

  • PMA.studio is your main cockpit interface to work with slide content, but there are a few other routes (like an intranet) through which you want to provide quick access to content, too.
  • You have an (AP)LI(M)S, PACS, VNA, or other system and you’re in a PoC phase to add digital pathology capabilities to your own platform, PMA.studio automation may be a quicker route to go than adapting our SDKs.

Do keep in mind however that we’re providing the PMA.studio back-end mostly for convenience, at least for the time being. There are any number of ways in which you may want to integrate digital pathology in your infrastructure and workflows. For a high level of customization, you’re really going to have to move up to PMA.UI, as well as a back-end counterpart like PMA.python, PMA.php, or PMA.java.

Four ways to identify slides

By filename

The most straightforward way to identify a slide is by its filename.

When you request the slides that are in a subfolder, you get a list of filenames back. Each filename refers to a slide.

By Unique Identifier (UID)

Once you obtain a filename referring to a slide, you typically want to do something with it.

Using filenames as references throughout your software is problematic however, for a variety of reasons:

  • A full path reference can become really long, and may not fit a field. No matter how careful you are, at some point there’s always that 51-character string that just won’t quite fit into the varchar field that was defined with a standard field size of varchar(50)
  • Unicode-encoding can be tricky, and many languages complicate matters further by providing different methods for querystrings, querystring parameters etc. Not to mention the databasefield that you forgot to make nvarchar instead of varchar. Good luck chasing that one!
  • Using filename references is just not safe. Imagine that you’re passing on a URL that looks like lookAtMySlide.jsp?slide=case35%2fslide03.svs… It’s all too easy (or even tempting) for the recipient to want to try out variations on that scheme: “hmm, I wonder what slides 2 and 4 look like?” or “let’s have a look at cases 1-34 too”

For this purpose, we’ve introduced the UID-principle. A UID is a 6-character random string, tied to a particular slide in a particular location (folder). The UIDs are generated by the PMA.core engine, so there’s no collusion possible between UIDs referring to different slides. By their nature, there’s no sequential logic to them either, so there’s no point wondering asking information about slides YT4TGQ or YT4TGS, after finding out slide YT4TGR exists.

You can retrieve the UID of any slide through the GetUID() method. For this, you’ll need an instance of PMA.core, because slide anonymization is not supported by our free PMA.start viewer.

If you’ve had a look at our API calls and SDK methods, you’ll notice that many calls have a parameter argument PathOrUid, rather than just Path. This means that each time you specify a filename to identify a slide, you might as well make life a bit more easy on yourself (as well as possibly your compliance department!) and use the UID parameter instead. Have a look then at the following semantically identical calls:

The one notably exception to this would be the invocation of the GetUID() method itself, of course. But don’t worry; you can’t accidentally request the UID of a UID. If a slide reference passed on doesn’t refer to any existing content, you’ll just get a runtime error instead.

By Fingerprint

UIDs are great, but what if you want to track virtual slides? Physical slides aren’t static and move around; this real-life environment is oftentimes mimicked in the virtual world, where a lifecycle of a slide can go like this:

Different systems may be responsible for the different types of movement, making it very hard to track the virtual slide’s lifecycle in its entirety.

This is where a slide’s fingerprint can come in handy. Unlike a UID, the fingerprint is a signature string that is calculated based on a slide’s actual characteristics. We have a whole separate article on the subject.

The bottom line is that when you have new_slides/slide17.svs, and you move it to validated_slides/slide17.svs, you’ll be able to identify these slides as being identical through their fingerprint signature.

Let’s say that we have a slide slide54123.mrxs in the incoming folder, that get subsequently moved (and renamed) to a folder related to bladder research, to finish its lifecycle in an archival folder.

Have a look at the following code then:

Note that the UID is different for all three slides, but the fingerprint remains the same, even as the filename changes!

And you can use this for even more applications:

If you’re later wondering if new_slides/slide17.svs is a re-scanned slide, or whether it’s just the original slide that somebody forgot to delete, you can also use the fingerprint for this. If it is the old version that is just lingering, it will still have the same fingerprint. However, if it’s a newly re-scanned version of the physical slide, the fingerprint will be different, due to subtle changes in the image capturing process. It’s interesting to note than that in the latter case, the UID will still be the same.

Why would you still use UID instead of a fingerprint signature? B/c a fingerprint takes some time to generate: a slide must actually be (at least partially) and analyzed to obtain its fingerprint. The UID in contrast is only a random string that is generated in a split second. In many cases, all you want is a point to a slide. For a variety of reasons, the UID is then a faster alternative.

By barcode

Virtual slides comes from physical slides, so how did people identify slides before the advent of technology? Well, first they invented the are on the slide that we now refer to as slide label, and coated it with a material they could easily write (typically in pencil). Later on, label printers were introduced, and in combination with bio-repository systems provided by the (AP)LI(M)S, random (bar)codes could now be imprinted on small stickers that could be pasted on the slides directly, so that no scribbling was necessary anymore.

The idea is that the barcodes are machine-readable, and could be used to match all sorts of information afterwards, at the same time guaranteeing anonymity of the slide itself (a barcode identifier only makes sense in the context a particular hospital / lab / (AP)LI(M)S / biorepository).

A barcode in many ways is the real-world equivalent of a UID, but it doesn’t have to be. Consider this:

  • There can be structure to encoded barcodes, most often sequence information
  • The barcode can still encode certain patient or doctor information, though this is rather rare. People aren’t doing this anymore because of practical concerns, as a barcode only holds a limited number of characters.
  • Unlike UIDs, barcodes can take on all sorts of shapes and forms, making it difficult to provide universal identification services
  • You can have more than one barcode on a slide
  • The barcode can still be pasted on manually at an angle, which can make it hard for machines to recognize, whereas a human lab technician in such a case would just rotate her hand-scanner a bit.
  • The barcode can be applied to a slide before it goes through a series of chemical steps, that can in turn degrade the barcode and make it less legible

Even with the above caveats, we don’t argue the value of barcoding per se. It’s definitely way better than the alternative (pencil, scribbling). At the same time, barcoding makes most sense within a setting of one lab (or lab-group), one set of hardware, and one (AP)LI(M)S, so that the entire pipeline can be calibrated to a (beforehand agreed-upon) specific format.

At Pathomation, we offer the possibility to extract barcodes from label images through the GetBarcode(). We have our basic implementation in such a way that it works on a wide variety of labels:

Another way to study this behavior is via the debugger in your webbrowser:

The thing about barcodes is unfortunately: it’s not waterproof. The resolution of your scanner may not be high-quality enough, and there are a number of other reasons it can still go wrong. We’ve seen stupid things happen, like the number “1” being interpreted like the lowercase letter “L”  etcetera.

You can use the existing implementation for test scenarios. However, for production environments, we should be involved in the IQ /OQ / PQ loop, so we can advise properly on how best to roll this out. When you know that your identification scheme only includes numbers, we can configure the recognition engine to not accidentally pick up letters (and prevent a 1 > l switch from ever happening in the first place).

In summary

At Pathomation, we pride ourselves with the slogan “digital pathology for pathologists, by pathologists”. We know of the struggles to identify and keep track of slides, both physically and virtually. Therefore, we offer different ways of identification. We’ve published content on this before, but this is the first article in which we neatly outline all options next to each other.

Virtualize your multi-head microscope

Syncing viewports across different users

Ever wanted to be able to have one user at the driver’s seat while others are watching, each looking at their own screen, just like with multi-head microscopes but over the internet? This can probably be accomplished with a screen sharing & conferencing tool, but the image quality results may be poor. How about allowing all users to take over the viewport at the same time? Drawing annotations simultaneously? Allowing users to share multiple viewports? Have this functionality in your application? It gets more complicated now, right? Wrong.

PMA.live enables exactly this functionality out of the box and can be integrated in your application without a lot of coding or complicated setup procedures. How does it work and why is it more efficient than your traditional screen share? Because PMA.live tells connected clients which tiles to download, and each client then retrieves said tiles directly from the tile server. This is more efficient and elegant than to broadcast pixels.

The ingredients you need are:

  • PMA.core – Where digital slides come from
  • PMA.UI – the UI Javascript library
  • PMA.live – Pathomation’s collaboration platform

In the page where we want to add collaboration functionality we need the following JS libraries included:

Let’s start by syncing a single viewport. In the PMA.live terminology, enabling users to collaborate with each other by looking at the same slides is called a session. Therefore, a user must first create a session which the rest of the participants have to join.

The first step is to establish a connection to the PMA.live backend:

Collaboration.initialize(
{
	pmaCoreUrl: pmaCoreUrl,
	apiUrl: `${collaborationUrl}api/`,
	hubUrl: `${collaborationUrl}signalr`,
	master: isMaster,
	dataChanged: collaborationDataChanged,
	pointerImageSrc: "pointer.png",
	masterPointerImageSrc: "master-pointer.png",
	getSlideLoader: getSlideLoader,
}, [])

The initialize method tells to the PMA.live Collaboration static object where to find PMA.core, PMA.live backend, whether or not the current user is the session owner, what icons to show for the users’ and the master’s cursor and accepts a couple of callback functions which we will explain later.

Once PMA.live has been initialized, we can continue by either creating or joining a session:

Collaboration.joinSession(sessionName, sessionActive, userNickname, everyoneInControl);

The joinSession method will create a session if it does not exist and then join it. If the session doesn’t exist, it is possible to specify whether or not it is currently active and if all users can take control of the synced viewports or only if the session owner can do this. Once the session has been created, only the session owner can modify it and change it’s active status or give control to others.

In order for PMA.live to be able to sync slides, it has to know the viewports it should work with. In this example, we will first create a slide loader object:

const sl = new PMA.UI.Components.SlideLoader(context, {
				element: slideLoaderElementSelector,
				filename: false,
				barcode: false,
			});

Now let’s tell PMA.live that we are only going to be syncing a single viewport:

Collaboration.setNumberOfImages(1);

Now let’s go back to the implementation of the “getSlideLoader” callback that we used during the initialization of PMA.live. This function will be called by PMA.live when it needs to attach to a viewport in order to control it. So the implementation in this example looks like this:

function getSlideLoader(index, totalNumberOfImages) {
	return sl;
}

We just return the one and only slide loader that we instantiated earlier.

Finally, let’s talk about the “collaborationDataChanged” callback. PMA.live uses SignalR and the WebSocket protocol to achieve real time communication between participants. Besides the out of the box slide syncing capabilities, it gives you the ability to exchange data between the users, in real time. This could be useful for example if you wanted to implement a chat system on top of the session. Every user can change the session’s data by invoking the Collaboration.setApplicationData method. It accepts a single object that will be shared among users. Whenever this method is called, all other users will receive it through the collaborationDataChanged callback, which looks like this:

function collaborationDataChanged() {
	console.log("Collaboration data changed");
	console.log(Collaboration.getApplicationData());
}

Summing it all up, PMA.live provides an easy way to enable real time collaboration between users. It takes away the burden, of syncing data and digital slides, from the developer and allows you to focus on the integration of the Pathomation toolbox in your application.

You can find a complete example of PMA.live here.

Who’s in the driver’s seat?

We offer a platform…

Pathomation is not just about selling software. We offer a comprehensive platform with a variety of technical components that help you build tailor-made digital pathology environments and set up custom workflows.

From simple viewing to automated back-end image analysis and data integration, we believe we have the broadest offering on the market today. Best of all: our technology is centered around PMA.core, a powerful tile server on top of which everything else can be connected and integrated.

Recently, we published a video in which we showcase how one of our customers adopted our components into their own SaaS solution:

The customer offers services for second opinion counseling. It had already built a proprietary workflow portal for patients and pathologists to log in and submit new or evaluate existing data. Until the Pathomation components were integrated however, slide exchange was limited to upload and download mechanisms.

The front-end uses primarily two controls: PMA.UI Gallery and PMA.UI Viewport.

Now that the Pathomation PMA.UI slide visualization framework is integrated in the customer’s codebase, things run a lot smoother: slides are uploaded directly to the customer’s website, visualization is instantly thanks to PMA.UI’s viewport component, and even annotations can be added by various actors throughout a submitted case’s workflow process.

To help customers like Agoko get started and guide them to the process, we have our own developer portal. There, you can find articles and tutorials on how to adapt our technology both on the client (JavaScript) and server-side (Java, Python, PHP).

We’re working on a dedicated YouTube channel for Pathomation software developers, too. Head over there to get the basic skills you need to get started in the respective programming language of your choice.

… in more ways than one

While we have a number of customers that currently integrate our platform into their own online infrastructure, this is not for all. If you already have an application, and you have the technical resources (people) to work on this, that’s great. But what if you’re more limited in terms of time, money, staff… If you’re a startup, every decision has its implications. If you’re an image analysis or algorithm shop for instance, you may invest more heavily into your back end, rather than your front-end presentation (or at least postpone that for a later stage).

But you still want to allow people to upload slides. You still want to be able to allow experts (human or AI) to make annotations, and you still want _your_ end-users to see findings and results.

In that case, PMA.studio may be a more convenient solution, than learning how to integrate individual components. PMA.studio is a web-based slide viewer. In its simplest form, it looks like this:

However, PMA.studio is much more than just a slide viewer. All those integration capabilities that we mentioned earlier in the context of PMA.core can be visualized through various controls and panels. So while you can use PMA.studio as a viewer, it’s more likely you end up with interfaces that look like this:

PMA.studio brings modern user interface elements like a ribbon and panel-layout to a browser-based digital pathology environment. For the panels, we use the GoldenLayout library. From the start, we realized that one size will never fit all, so we made the layout of both the ribbon and the panel orientation completely configurable via XML configuration files. You can do this both for panels:

And for the ribbon:

Not only that, but you can also custom create new panels, that retrieve data from your own databases, present workflows etc. This means that our customer might as well have ended up with an interface that looks like this:

Flexibility and choices

The flexibility offered through both our SDKs and PMA.studio allows any customer to easily white-label our various software components.

Which route you decide to go with depends on you, but having had the experience with various customers on different projects, we’d be happy to guide you along the way. Contact us for a free consultation today and see how we can help take your digital pathology infrastructure to the next level.

Three ways to transfer your virtual slides

Uploading and downloading

The Pathomation software platform for digital pathology and virtual microscopy offers powerful slide presentation and interaction capabilities.

Much of the focus (especially with respect to end-users) is on slide visualization. But workflow can be organized through our platform as well.

In this article we focus on upload- and download-capabilities. We distinguish between three different mechanisms. We also provide background and insights into how and why we decided to provide the functionality in this fashion.

PMA.transfer

For many cases, PMA.transfer is your initial workhorse of choice. It’s a user-friendly end-user facing application with many features. For ad hoc slide transfers from your local hard disk to a PMA.core instance, PMA.transfer is the go-to tool.

There are two ways to obtain PMA.transfer: you can download it individually from its own website, or it can be installed in combination with your latest PMA.start download during the installation procedure.

If you already have PMA.start up and running and don’t want to go through the trouble of downloading the complete setup package again, you can download PMA.transfer individually through its own website at https://www.pathomation.com/pma.transfer. From there, you can also select individual versions of the software.

Regardless of how you obtain PMA.transfer, the software does require you to have PMA.start up and running, to function itself.

The reason for this is that PMA.transfer relies on PMA.start to tell its what slides reside on the hard disk. PMA.start already has all the logic on board for slide data processing, and we didn’t want to re-implement all this code in PMA.transfer. The result is that PMA.transfer knows what a slide is on your hard disk through PMA.start. You need not concern yourself figuring out if it’s a single- or multi-file file format that you’re working with. PMA.transfer deals with slides and that’s it. Do read our other blog article to find out why virtual slides are so big and complicated in the first place.

Transferring slides with PMA.transfer

Once you have PMA.transfer open, you see the slides on your own hard disk in the left-hand panel. You can connect to any PMA.core instance that you have the credentials for. If you find yourself re-connecting to the same server again and again, we recommend that you use the site manager. You can also use the site manager for complex scenarios like geo-replicated PMA.core instances.

You can both upload and download slides with PMA.transfer. What it is exactly that you do depends a bit on your point of view. Typically upload goes from your local computer to the PMA.core server; download is the reverse (from PMA.core to your hard disk). But there are ways to rig PMA.transfer so it operates between two server instances of PMA.core instead of interfacing with just your local computer.  Essentially, you’re simply transferring slides from one location to another. The same API calls are being user underneath the hood regardless of whether you’re doing one or the other. More on that API later in this article by the way. Stay tuned.

We tried to make PMA.transfer extremely low-threshold and userfriendly, which means that there’s usually more than one way to get something accomplished. For instance, you can transfer slides from one side to the other either with context-menus, or via drag and drop. Selecting multiple slides works the same way as you’re used to from the Windows Explorer with Ctrl+Click and Shift+Click actions.

A full manual of PMA.transfer workings, interfaces, and best practices is available as an online wiki.

Pathomation PMA.transfer wiki

PMA.core

As mentioned above: PMA.transfer can be configured for site to site transfer. But even in that scenario, it still relies on PMA.start. This isn’t always convenient or even possible. Therefore; PMA.core 2.0 and higher contains its own slide transfer interface. It’s not as feature-rich as PMA.transfer, but if you quickly want to copy a large volume of slides from one server to another, it will get the job done. Plus, you can use this to copy slides from anywhere to anywhere; Migrating your old FTP server to updated S3-based cloud storage becomes a breeze. You can do it remotely, and asynchronously: just close your browser after initiating the operation and check back later.

Back-end API features

Occasionally, we get the question from people “ok, but I don’t want drag and drop stuff. I have an [incoming] folder somewhere on my system, and I want to automatically transfer all of those slides to their final destination overnight, when network use is low and I don’t feel like I’m hogging other people’s bandwidth…”. They then ask for the API call to make this happen.

If you’re one of these people: you’re on the right track. But what you’re really asking for is automation. Bear with us here. The API is part of that, but not the whole story.

See, these are the various API calls involved in slide transfer:

PMA.transfer uses these; and so does PMA.core. They work. But we still highly recommend that you do not engage in calling these methods yourself. Instead, you should rely on the SDKs instead.

The reason for this is related to the underlying structure of the virtual slides themselves. There can be many files, they can be big. Therefore, the API methods are mostly involved with uploading chunks or partial slide data to PMA.core. We understand that there may be instances where it is convenient to have direct interaction with these, to perhaps build progress indicators to monitor a processing workflow. For those uses, we recommend studying the implementation of the Core::upload() methods in PMA.python.

Automation

We illustrate here how slide transfer automation can work for you by means of a Jupyter notebook utilizing the PMA.python SDK.

The Core module contains both upload() and download() methods, which each rely on PMA.start.

When using the upload() function, the SDK assumes you’re transferring slides from PMA.start to PMA.core. Therefore, it’s good to do some preliminary verification and make sure that everything is in place before starting your transfer:


from pma_python import core
sourceSession = core.connect() targetSession = core.connect("https://test.pathomation.com/ pma.core", "username", "scrt") print(sourceSession, targetSession)

This block of code should result in two meaningful strings. If not, you can already interrupt the flow and send an email to a system administrator, informing him or her that something is wrong.

After confirming the connection, you should also check that both your source folder on your local hard disk (this can be the folder where your scanner deposits new WSIs).

Since you’re handling two PMA.core instances (PMA.start utilizes a limited of PMA.core, too), it’s a good idea to explicitly mentioned the PMA.core sessionID values as optional arguments in your code (we typically don’t do this when only interacting with a single instance; remember: good programmers are not only lazy but also know when to be):

core.get_directories("C:/", sessionID = sourceSession)
core.get_slides("C:/wsi", sessionID = sourceSession)
core.get_directories("_sys_ref/", sessionID = targetSession)
core.get_slides("_sys_ref/experiment", sessionID = targetSession)

Once you’re assured that your source data is available and your target folder is ready (and doesn’t have the data you’re looking for yet), it’s time to do the actual upload:

core.upload("C:/wsi/OS-3.ndpi", "_sys_ref/experiment", targetSession)

This call is spectacularly unimpressive. But behind the scenes, it figures out which files belong to the slide, and transfers them one by one to your target destination,. Large files are automatically split in smaller blocks.

And of course this can be made a lot more complex. This is the step where the magic happens: you can build a loop around all the slides found in the source folder, and you can add checks to make sure the target folder is empty to begin with (or at least doesn’t contain the to be transferred slide yet).

Regardless, after the transfer is complete, you want to verify that everything went right.

You instinct probably tells you to compare the ImageInfo objects on both sides:

However, this is not a good idea, as the ImageInfo will actually differ, as it contains a couple of location-dependent and slide-specific criteria as well. Comparing the information then just becomes confusing.

Rather, we recommend that after transferring a slide, you merely pull up the fingerprint for each slide:

core.get_fingerprint('_sys_ref/experiment/OS-3.ndpi', sessionID = targetSession)
core.get_fingerprint('C:/wsi/OS-3.ndpi', sessionID = sourceSession)

Comparing these two strings is a lot easier, whether through visual (interactive) or automated detection:

Which way do I take?

Due to their size and file structure, whole slide images are complicated data-beasts. Wrestling Managing them takes some time and practice.

The Pathomation software platform for digital pathology and virtual microscopy recognizes that both simple and complex slide transfer scenarios emerge from daily practice. Therefore, we offer different tools and routes to efficiently transfer slides, depending on which category of user you fall into and the scenario you with to implement.

Annotations et al

On-slide annotations

Do you recognize the above image? It’s a rendering of the mitotic figures and algorithmic classifications  as determined by Bertram at al in their 2019 paper.

Digital pathology and virtual microscopy are concerned with slides (duh), but those are only part of the equation. External data as well as on-slide annotations are an integral part of the package, and any self-respecting software in this space supports various flavors of annotations. The various components in the Pathomation platform are no exception.

First some terminology: we generally refer to on-slide annotations that are geometric shapes and figures presented on top of virtual slide pixels in order to distinguish from (text-based) slide meta-data. The latter is presented as data attached to a slide, but not particularly or directly associated with a specific region on the slide. We discussed how Pathomation handles various flavors of slide metadata in an earlier blog post.

Creating on-slide annotations in PMA.studio

With the forthcoming release of PMA.studio 2.0, you can create on-slide annotations interactively.

PMA.studio offers a ribbon tab for this purpose. The first group of buttons lets you control what exactly is visible: the annotations themselves can be toggled, and you can choose whether to include the annotations labels or not.

https://realdata.pathomation.com/wp-content/uploads/2021/06/blog-post-42_20.png

The Style group in the ribbon is used to change the presentation of annotations. Annotations can have an edge color, fill color, which can be set independently from each other. Filled shapes can have a transparency attribute, too.

Each annotation can be associated with a class and a description attribute. An example would be a set of polygon shapes that each indicate necrosis and therefore have the “necrosis” class attribute, while individual shapes would be referred as a “necrosis-region-1”, “necrosis-region-2” etcetera.

Once you’ve made the annotations, you can follow-up on them through the annotations panel. Here, annotations are grouped per class. You can also use this panel to filter annotations per user.

PMA.studio’s ribbon can be customized. This means that custom annotations are possible, too. You can use this feature to implement protocols.

In the example below, we’ve used XML to define pre-sets for pathologists to indicate various types of TLS regions:

https://realdata.pathomation.com/wp-content/uploads/2021/06/blog-post-42_40.png

Behind the scenes sits PMA.core

Let’s spend some time on how it all works behind the scenes.

When you make annotations in PMA.studio, you interact with a PMA.UI viewport. Once you decide to save you annotations, it’s the PMA.UI viewport that sends your annotations to PMA.core, where they are saved in the back-end database.

The format in which annotations are saved is Well-Known Text (WKT).

PMA.core has API calls to work with annotations. PMA.UI makes heavy use of these.

External annotations

Because we totally understand that PMA.studio may not be your first environment of choice to make your on-slide annotations with (though we think it’s a really good one!), PMA.core supports various other types of on-slide annotations, too.

We distinguish between “native” and “third-party” annotations.

Native annotations are shapes and forms included in the original vendor’s file format. Several vendors provide the option of making on-slide annotations in their own native viewers. Examples include 3DHistech and Aperio. If you have a 3Dhistech MRXS file that has overlaying annotations created by Case Center or the Pannoramic viewer, PMA.core will render them accordingly. The same goes for Aperio SVS files: just make sure you put the .xml file from ImageScope next to the corresponding .svs file.

Other vendors support annotations, too. If you find yourself with a vendor-specific annotation file format, do tell us, and we’ll add it to our next version of PMA.core.

Third-party annotations are another kind than “native” annotations. Like PMA.core’s own (WKT-encoded) annotations, these are created in software that is not coming from the vendor. Typically we’re talking about image analysis software. The three big names are out there are Definiens, Visiopharm, and Indica Labs HALO. Each of these environments is supported.

We’re not technically hindered to only support these abovementioned flavors of image analysis. So if you have a different environment from the one listed, let us know.

Transient annotations through PMA.live

Two products of Pathomation support real-time conferencing: PMA.studio and PMA.control. While in a conference, it is possible for participants to make live annotations that only exist for the duration of the conference.

Transient annotations in PMA.studio and PMA.control (through a third component referred to as PMA.live) can perhaps best be compared to the annotation toolbar that appears while giving a Powerpoint presentation: several tools are made available to temporarily highlight specific features, but these don’t become part of the original presentation.

Bringing it all together

As the “middleware for digital pathology and virtual microscopy” company, we take our job seriously. Therefore, apart from managing slides, we also allow our tile server PMA.core to be used to organize graphical annotations. The Pathomation platform offers many ways to organize

Pathomation can ingest both native and third-party annotations. The difference is that we consider native annotations to be annotations made with the original manufacturer’s (viewer) software, while third-party annotations are annotations created by independent vendors like Indica Labs or Visiopharm.

We also allow people to make annotations on top of slides using only components within the Pathomation platform. You don’t need external software to get started with annotating your slides; you can use our own PMA.studio, or even couple OpenCV output back to PMA.core through our API.

There’s more to say about annotations. In upcoming articles, we plan to show you how you can manage heterogenous annotation data from many sources, as well as how to use the back-end API directly to feed annotations back to PMA.core.

Want to see us work out a specific use case for your annotation workflows? Let us know.

Research publications in digital pathology

At Pathomation, we’re big fans of XKCD. And Randall Munroe seems to have hit a particular sore point with scientists recently, if the follow-up article in the Atlantic is to be believed.

We’re not endorsing the Atlantic here. It’s easy to critique the team if you’re a bystander and not actually playing the game.

All the same, we conducted our own small study and found that nobody had created an adaptation for the field of digital pathology yet. So we made our own.

The following data is therefore completely made up. We post it in the public domain, free for all to share, and hope that it will benefit humanity as a whole:

Read the original version of the cartoon at https://www.xkcd.com/2456.

Fingerprinting applications

A naïve way to detect duplicate data

We often copy data for a variety of reasons. While testing a new program, we can copy data any number of times so the software is able to work on a dataset instead of a single datapoint. Temporary data unfortunately is all too often forgotten about, lingers around, and unnecessarily clutters our hard disks.

There are a number of ways to solve this problem. With Python, we can create a script that retrieves all slides, inspects the file size of each slide, and reports which two slides have the same size.

Using our PMA.python SDK, the code looks like this:

from pma_python import core
core.connect()
all_slides = core.get_slides("C:/", recursive = True)
def get_slide_size(slide):
    info = core.get_slide_info(slide)
    return info["PhysicalSize"]

all_sizes = {}
for slide in all_slides:
    fp = get_slide_size(slide)
    if "d" in fp.keys():
        fp = fp["d"]
        # print (slide, fp)
        if not fp in all_sizes.keys():
            all_sizes[fp] = []
        all_sizes[fp].append(slide)
for (k, v) in all_sizes.items():
    if len(v) > 1:
        fn = v[0]
        if not (".png" in fn.lower() or ".jpg" in fn.lower()):
            print(k, len(v))
            print(v)

But wait, what if two slides are the same size? Whole slide images are typically 100s of megabytes in size. Based on this characteristic, you could assume that it’s unlikely two slides result in identical file sizes. But macroscopic images are important in pathology, too, and then we’re talking about file sizes that are only a couple of megabytes in size at most. Now think of a scenario where tens of 1000s of cases are treated annually, with multiple macroscopic photos being taken of each resection piece… Suddenly the chances of two files having the same size becomes plausible.

There’s a second, albeit somewhat more hypothetical reason, why slide size is a poor indicator here. Wsi data could be stored in a container format, and the container format can have certain limitations. We observed e.g. that our PMA.start installation package has now not changed in size for the last 7 releases or so. But of course our code did change. So, empirically, file size is not a good discriminant for executable files. We feel therefore that we cannot assume that this would be the case for image file formats. Since re-scans are a specific concern with microscopy and WSI data, something better is needed than just the filesize.

Introducing fingerprinting

We can think of a way to unambiguously distinguish slides from one another by combining a number of characteristics into a digital fingerprint. These would include:

  1. Filesize (we didn’t say this was a bad one; just an insufficient one)
  2. Pixel size
  3. Pixels per micron
  4. Number of channels
  5. Number of z-stack layers

If we had infinite real-time computing power, we can think of more:

For practically, we define a slide’s fingerprint in the Pathomation platform as a combined hash https://en.wikipedia.org/wiki/Hash_function of physical file size, as well as most of the parameters returned through the GetImageInfo method.

We also consider this fingerprint method to be essential, and so for stability, it is incorporated at the level of PMA.core (PMA.start) itself rather than at SDK level, so it can be transferred across programming boundaries. A fingerprint for slide [foo] requested through PMA.java yields the same results as when requested through PMA.php or PMA.python.

Slide integrity

The fingerprint method is a good way to confirm the integrity of a slide itself. When a file is not what it pretends to be, the fingerprint cannot be calculated, and an error follows.

Note that the above would not be possible if we stuck to conventional CRC-like checks, since those don’t take into account the nature of a slide. Of course, you can do a CRC check on any file regardless of whether it actually is a slide or not.

Applications

We recently introduced PMA.transfer. Have you ever been frustrated by people sending you individual VSI or MRXS slides without anything else? Did you ever feel uneasy about just having transferred a gazillion number of bytes half across the world, without any reassurance of whether it actually worked? Then you should definitely have a look at PMA.transfer. It’s like FileZilla, but for slides. SlideZilla.

PMA.transfer uses fingerprinting to ensure data integrity in between transfers. Whether you’re moving slides from and to PMA.start, PMA.core, or My Pathomation, the same fingerprint calculation algorithm is used to compute a slide’s unique signature. This means that PMA.transfer can obtain the fingerprint of a source and a target instance and simply compare one with another to see that they’re identical.

Another application is found in our upcoming PMA.studio product. Of course, the actual fingerprint of a slide is shown in the slide info panel. But a string like “” is not saying a whole lot and is for purely informative purposes at best.

PMA.studio can also be used to create annotations on slides. When you make an annotation, it is stored in our back end with a reference to the original slide, as well as the slide’s fingerprint.

This serves two purposes:

  • After moving a slide to a new folder path or even physical location, you can still retrieve its annotations.
  • When you have two identical copies of slides, each annotated separately, you can use fingerprinting to combine the annotations from both in a single view.

The possibility to combine annotations from identical slides stored in different locations in a single view offers opportunities for blinded studies and validation exercises. Inter- and even intra-observer variability can be measured this way, too.

Retrieving annotations by fingerprint is not available by default; you need to invoke this with an explicit button in the ribbon. It’s a performance thing.

Last but not least, fingerprinted annotations can be used to keep track of annotations during migration processes. As your applications for digital pathology increase, you will occasionally restructure your folder structures, or perhaps move to an entire new storage device altogether.

Finding duplicates

Back to our original question: imagine that you’ve been managing a whole slide repository for a while, and as careful as you’ve been, you suspect that you now have ended up with a number of copies of a variety of slides in different locations. You know: you copy a slide to test something, pinky-promise yourself that you’ll remove the slide again afterwards, that for real you’re really not going to forget this time… and then… you forget about it.

Thanks to the fingerprinting method and a few lines of Python, it is easy to trace duplicates however.

Here’s the basic code to build a dictionary that has all the possible fingerprints as keys. Each entry then contains a list that specifies where the exact copies that share a particular fingerprint:

slides_by_fingerprint = {}
for slide in slides:
    fp = core.get_fingerprint(slide)
    if not fp in slides_by_fingerprint:
        slides_by_fingerprint[fp] = [slide]
    else:
        slides_by_fingerprint[fp].append(slide)

When a slide is unique, then the list of the dictionary will only have one entry. Alternatively, it can have two or more entries. So the duplicated slides are detected and flagged as follows:

for fprint in slides_by_fingerprint:
    if len(slides_by_fingerprint[fprint]) > 1:
        print(slides_by_fingerprint[fprint][0], "is copied", len(slides_by_fingerprint[fprint]), "times")

If you want, you can further automate the pruning and the deletion of these duplicates. Sometimes it’s easy; sometimes it’s not. You need to make sure that you have the original copy in its intended place. And in some cases, you may actually want to keep at least a second copy of a slide around, as one may be transient in a clinical setting, whereas its copy may have just been added to a reference repository to teach students and staff.

Coming full circle

Fingerprinting serves a triple function:

  • Detect whether a dataset is a real slide or not
  • Guard data integrity when transferring from one medium to another
  • Trace slides and associated content through a complex storage hierarchy

Fingerprinting applies the concept of hash-functions to slides. Like everything in the Pathomation platform, the slide itself is the key unit to interact with. There is only one fingerprint for a slide, whether it consists of a single, or multiple files. Consequently, you can only obtain a fingerprint for a slide. If the file is somehow corrupt or the file format isn’t recognized by PMA.core, you’re not going to get a fingerprint from it. Last but not least, fingerprints are invariant across storage media and instances of PMA.core (PMA.start), making it a useful feature for slide tracking.