Tuesday, May 19, 2009

Managing Scientific Data for Competitive Advantage

It is 2009 (right??) and I still talk to scientists in leadership positions who believe spreadsheets are the best way to capture, store, search, distribute and otherwise gain competitive advantage from all the hard work their scientists do.

The conversations are interesting... because in the short term, for any one-off experiment, the case can be made that Excel IS the best choice. After all, every scientist knows how to use Excel, and mistakes on spreadsheets are rare (right?).

Then the analysis stops. Those who hold this view ask no further questions. They make no further demands on the economics of their own research budget!

Below is a partial list of questions that forward-looking scientific data managers DO ask. They consider the entire “life-cycle” of managing valuable data:
  1. How do you track versions of your spreadsheets that you are using?

    When we help groups transition to Mosaic (our platform for scientific data management), we have many cases where scientists are amazed/embarrassed to find that they were using the wrong template to analyze their data, or using the right template, incorrectly.

    How do you make sure data on a spreadsheet was not changed since the analysis was performed?

  2. When employees leave and new scientists are hired, how reliable is the knowledge transfer on each assay?

  3. Once the analysis is done and key results computed, where do they go? Do you as the manager get emails of results?

  4. How do you conveniently review previous work?

  5. How do you audit the analysis methods used? (See the formulae).

  6. How do you create reports on the data?

    In Mosaic, you can extract any time-period of data all at once to a single spreadsheet – how would you conveniently aggregate a series of results from a bunch of spreadsheets?

    We also have a Create Summary Report that extracts all data and graphs for a study into a Word document. This can be used for external reporting.

  7. How do you associate written summaries with a study and keep track of it?

    Mosaic has a document management component (that optionally integrates with Microsoft SharePoint) that allows storing versioned copies of documents with the study.

  8. How do scientists prepare for a group meeting to summarize their week’s work?

    Time preparing is time wasted if all you need to do is log into Mosaic and click through your results?

  9. How do you integrate results from other sources (collaborators, CROs) conveniently into your repository?

    With Mosaic, you can capture external studies along with your own data.

    If you give a collaborator access to Mosaic, they can securely upload data directly to your database.

    If a collaborator has Mosaic, you can exchange study data with one click.

The answers to these questions are why scientific data management systems (SDMS) are growing at 20% per year. All large pharma have efforts well underway to manage their R&D data. Smaller biotechs are positioning for success by getting serious about data management and value preservation early in their life, before the backlog of studies becomes too large.

Friday, May 1, 2009

Silverlight Multiple File Uploader

We do a lot of web applications that manage scientific data. Uploading files (data files, images, study documents) with just Javascript and HTML takes a lot of code, and the user doesn't get a good, simple multiple file selector.

You can use an ActiveX control such as the tools at http://www.aurigma.com/, but installing ActiveX can be difficult in some environments.

We decided to use Silverlight's capabilities, and the result is the Silverlight File Uploader project which we submitted as open source to CodePlex. The user experience is quite good, and it is easy to slot this component into your own web project.

Wednesday, April 22, 2009

Colony Management - Build versus Buy versus Both

So you've been managing your mouse colony for a while now, and you've decided that pencil and paper or even Excel spreadsheets just aren't cutting it. You're having "issues" and you've decided that you need a "real solution" to the bookkeeping associated with tracking mice (and other animals).

If you haven't come to that conclusion yet, this post is not for you. (Don't worry, you'll be back.)

Many people have already come to the conclusion that there is a better way to manage these data. Perhaps you are:
  • an individual investigator at a university studying disease models, or
  • an animal study manager at a small biotech, or
  • a facility manager at a pharmaceutical company...
Your task is now to find a cost-effective solution that will make your life easier and save your organization money (or just give you more time to do science). You decide to dig out those old marketing emails you've been stashing for a rainy day and learn a bit about some of the commercial solutions. You find that features and prices vary widely, but a couple solutions rise above the rest. They seem to do 95% of what you want.

OK - let's talk price. Hold on a minute! These things cost some money. How much? Some prices seem crazy, but some seem... fair. But then a new thought crosses your mind.

Investigator:
Hum... maybe I could just have an undergrad put something together for me. Kids are pretty smart with computers these days!
Study Manager:
Hey - we've got an IT guy - he says he can put together an Access database to do what I want. Maybe we'll just do that.
Facility Manager:
I've been around the block, I know this solution does what I want... but I sure don't want to deal with budget justification and battles with our IT group which is telling me that everything needs to fit our corporate IT roadmap. I'll let them create something.
So far so good. This decision implies that the cost of the commercial solution exceeds the total cost of creating new software from scratch. The rest of this post examines that analysis. But if you just want the bottom line, here's the simplest way to summarize this decision. To believe you should build versus buy, you must believe:
I can run a research group (or, department, or facility) and create, support and maintain software in a timely and economical fashion for one customer: my group (or my department, or my facility). Furthermore, I can do this for less money and time than a vendor which has spread costs out over multiple customers (of which I could be one if I pay the license fee).
If you believe this economic statement, then you should build your own software. The rest of this post examines this position in more detail.

There are several kinds of costs associated with a technology decision:
  1. direct costs (cost of people's salaries; cost of software licenses)
  2. opportunity costs (cost of waiting; cost of choosing wrong)
  3. efficiency costs (cost of using a poor system, like paper - the reason you're reading this)
The most objective costs are the direct costs. Opportunity costs and efficiency costs can be measured, but they are more difficult and sometimes more subjective - but still important.

Investigator:

Let's say we can get an undergrad for $12/hour (cheap!) who promises to build what you need in a solid month (160 hours) of work. So, if company quotes you more than $1920 for a commercial solution, then you've got a winner, right?

Sure, as long as you can honestly reconcile yourself to the implicit assumptions of this decision:
  1. The commercial software you reviewed probably had person-years invested in its development. They are probably reasonably smart people, and they probably have diverse experience in managing these kind of data (since, presumably, they've worked with many customers). The student must be up to the challenge of reinventing the functionality that you need within your budget.

  2. What is your tolerance for error? Presumably, you were quoted a fixed price for the commercial software. Can your budget tolerate an overage if the student requires more time than advertised? Even professional software projects have a tendency to run long...

  3. Now the student is a busy person, and can't really be working 40 hours a week on this project. Ten hours per week seems more likely, so the month of billable work might be four months of elapsed time. If you've already decided you want a software solution, then there must be a cost to waiting for something that you could get right now from the software vendor for a fixed price. What is that cost? (And what if it is eight months instead of four?)

  4. Well, you say, this stuff is not rocket science, and even if it takes 4-8 months, so be it. Presumably, you'll be using this solution for a while - what do you do when there are bugs and the student has left? Hire a new student. OK - but that's more cost, and less efficient spend since the new student has to figure out what the previous one did.

  5. What about backups? (One of the commercial solutions that looked really good offered the software online via the web, and the vendor takes care of all backups...)

  6. What about software maintenance? We just installed Windows Vista and the student's software no longer runs, and I can't see my mice! How to deal with that? And, it will happen. (That online software offering never required me to do any installation or updates...)
How about having a student that is already paid for on a grant do this? Hum.... OK. But even if the $1920 is hidden, you still need to be comfortable with issues 1-6.

Study Manager:

If you've read through the Investigator's analysis, you probably realize that just about everything is similar, except your IT people cost more. Let's say you have a genius working for you on a $60K salary (forget overhead). Your professional's time costs your company $30/hour. (Software engineers are laughing now - the required skill set doesn't exist at that rate, but roll with me.) So that software vendor better not charge more than $4800, or I'll just have my guys build it. Furthermore, we can task the engineer full-time, so I only have to wait a month!

True, as far as it goes. But let's be honest; if you are looking for colony management software for more than a single investigator's group, you probably want a greater fraction of the commercial software's functionality. Maybe you want robust security, task-based privileges, web access, etc. Perhaps you can agree this is really a two month job, and the vendor could really ask $9600, and it would be well worth your company's money.

(I should reiterate here that these time estimates are absurd and chosen only as lower-bound figures that are impossible to reasonably argue with. Don't go tell your boss or your IT group that you heard that robust colony management software can be built in two months.)

Considerations 1-6 above come into play, but being at a company you have a little more infrastructure support. For example, you have people to do software updates, and hardware to do backups. Nevertheless, these things are costing your company money, and you can put a dollar figure on them.

You also have a greater challenge: in dealing with more people, you need to make sure the software will be adopted by all relevant parties. This may seem obvious, but many internal IT projects fail because the engineers lack the domain experience needed to produce software that works well for the users. A single research group with fewer animals may be more tolerant of quirks or inefficiencies of user interface, but a larger set of users may simply reject or ignore the software they are being asked to use. The result is a failed project and wasted money.

You also need to consider software maintenance carefully: if you are successful, and your software is adopted internally, your users will generate a constant stream of requests for bug fixes and feature enhancements. How about when the original developer leaves? Your company will pay a price for bringing a new person up to speed.

Facility Manager:

You've got a good handle on your requirements, and you know the costs of running your facility. Consider your costs in terms of a fraction of your facility per diem. The industry norm ranges from $1.00 to $2.00 per cage-day. If a commercial solution will improve the accuracy and efficiency of your facility for a penny or even a nickel of that per diem, it should be a no-brainer.

However, if you are requested to perform a build-vs.-buy analysis, you can assume several things:
  1. You will be faced with everything discussed above and more.

  2. The amount of feature and function you require will be enterprise level: development estimates less than a year are not credible.

  3. Your company is not in the business of software development. Internal IT personnel can and should identify commercial solutions that can be integrated with other systems in order to create competitive advantage for your company. The not-invented-here syndrome is too costly given the economic pressures on the pharmaceutical industry as well as academic centers.
This build-versus-buy discussion summarizes key issues we've seen over and over again during a decade of creating data management solutions for life sciences, including Mosaic Vivarium. There is perhaps one more "trump card" that sometimes gets pulled out. We call it the "Pragmatic versus Perfect Problem," and when it appears, it is usually in medium to larger organizations, sometimes when an existing solution can no longer be maintained.

The Pragmatic versus Perfect Problem

The Pragmatic versus Perfect Problem occurs when a set of influential users are not happy until the software does precisely what they want. They may be used to an existing process, or part of legacy solution that behaves some specific way. The premise of this argument is that the process of these users is so special that the cannot modify their workflow at all in order to gain the broader benefit of a complete software solution.

To accept this, you must accept the argument that this particular facility or set of users is significantly different from all other facilities of comparable size (large or small) that are already using the software to great advantage. Generally, this is not true: the argument is an excuse not to change. In these cases, the users have not truly made the decision to modernize their processes.

Build versus Buy versus.... Both

There are real business cases that make changing to a commercial solution difficult. For example, perhaps there is a special requirement for a non-standard integration with another piece of software at the company. This brings us to the "Both" part of our "Build versus Buy versus Both" discussion: the economical solution should be to seek a vendor that has:
  • a platform which can be efficiently extended,
  • a development team that can work with internal IT to accomplish the integration or feature extension, and
  • the ability to provide on-going support
The concept of finding a vendor as an informatics partner is not new. See for example this article on the issues by Daniel C. Weaver of Array Biopharma.

The considerations we've presented in this post apply more generally to other kinds of software as well. For example, we work with customers seeking to manage the study data coming downstream of the colony management / husbandry. Interestingly, the smallest groups (i.e., investigators) tend to see this need first because they end up managing both the maintenance of their colony and the acquisition of scientific data. Larger sites tend to compartmentalize these software functions, largely because different sets of users perform the husbandry versus the experiments. Nevertheless, it makes sense to have an integrated system, and the same Build versus Buy versus Both arguments apply to the entire animal informatics domain.

Saturday, April 4, 2009

Scientific Data Management

The topics on this blog will explore the business case and tools available for scientific data management.