Unstructured Data – Episode 2

Video Interview with Chris Dale

Chris Dale of the eDisclosure Information Project interviewed Brendan Sullivan, CEO of S2|DATA and Fred Moore, President of Horison Information Strategies, Inc. this past summer.

Over the course of three interviews, they cover the growing problem of managing legacy data and the increasing expectation that organizations comprehend what data they have and how they can find what matters using the solutions offered by S2|DATA.

In Episode 1, Brendan and Fred share with Chris how backup tapes have evolved from their introduction to the solutions now provided by S2|DATA.

In episode 2 of this 3-part series, Brendan and Fred share with Chris why using tape as an archive medium declined and then began to increase again in 2000 due to the evolution of tapes to meet demands of fast-changing technology.  Learn more about how tape could be an efficient and valuable means of archiving data.

Watch Episode 2


Chris Dale: Tape’s got a relatively poor reputation as an enterprise level storage medium. Why is that?


Fred Moore: You know the tape industry in the latter half of the 1990s was troublesome to be quite honest about it. The numerous formats in that period of time (4 MM, 8 MM, DAT, DDS, DLT tape) all had a variety of issues that were very troublesome for customers. Edge stretch tear damage, the servo track was written on the edge of the cartridge if you dropped it at the servo could be damaged you couldn’t read the tape anymore. And these things got people very disappointed with non-mainframe tape technologies. In that time


Chris Dale: I remember all that time as an individual business level


Fred Moore: Yeah you know how that got there, and that was what the bad image came from. It was the difficult issues people had in handling all that tape. So going into the year 2000, which is when things began to change. The tape industry had suffered a pretty big blow and disk was doing really well and people said I’m going to use disk instead of tape as my backup and/or archive target. As a result of that.


Brendan Sullivan: I think I’d add to that. You know Chris, you and I have worked in the Discovery or the Disclosure market for the last 20 or so years, and outside of the core technology issues the disparity of backup formats the use of helical scan media ,the use of iron oxide and chrome dioxide, that’s no longer there, They all made restoration of data from tape cumbersome. But I think that’s no fun.


Chris Dale: Each one sees new developments seem great at the time, but gradually overtaken by others as what really happened.


Brendan Sullivan: Yeah and I think there’s there’s there’s another factor as well, in that the use of data for legal production you know really came in 20 years ago. So, if you were an IT Department 20 years ago you were used to a typical budgetary cycle of How much data am I my creating? Where do I need to put it? How long do I need to keep it? And you had a typical budget cycle type environment within your data center. I think Enron changed all that. It ushered in a shift from that normal cycle of how you how do you finance and how fast can you can you get data, to a much more reactive based time sensitive produced environment, where budgets were not easy to follow. They were not easy to implement and consequently they each they pushed up the cost of restoration of data, the variation of requirements for that data, and and as a result of it that gave tape backups a problematic reputation. And then you throw further into that, the use of tape in Discovery. It’s rarely the first port of call, the first port of call might typically be a forensic collection or analysis of a network or a server environment ,and when data does not cover the investigation from those fairly readily available data sources, you go back to tape. When you go back to tape you are a litigant that is interviewing, that is faced with the typical box of tapes that it has 20 different types of tape, 20 different types of backup software and IT administrators that have probably left the company 10 years ago ,and as a result of that it gets a poor reputation, because it’s frankly it’s been complex to get data because of it. It’s not all too fair. It’s really a lack of knowledge as to how to go and ask, and what you can achieve and what you should expect and what resources you may have to go and get that data that has added to that bad reputation.


Chris Dale: We also had the Zubulake case, of course, where everybody drew the wrong message, I think, that they must keep everything.


Brendan Sullivan: Yes, and you know we have technicians here that actually worked on that. So you know we remember the time of producing that data and I think everybody was learning processes at the same time. And the results from that were “keep everything”. When you keep everything you don’t necessarily spend too much time on on well thought out records management, well-thought-out classifications and categorizations of data, and consequently typically the faster and easier it is to backup, the more complex it would be to restore down the road.


Chris Dale: So much for tapes bad reputation in the past then. Why is that no longer fair, Brendan?


Brendan Sullivan: It’s standardized. We’ve got a real tech technology guru here so I’m gonna pass this to Fred in a second, but it’s standardized. There was four millimeter, eight millimeter, quarter inch cartridge, mainframe tape types, three or four different mainframe tape types, multiple different AS 400 and windows are distributed based tape technologies. Ninety percent, probably 95 percent, of what we see now is LTO, Linear Tape Open. The open source that came in around ’99 I think it was. It’s a standardized form factor. There’s no more iron oxide; there’s no more chrome dioxide particle. It’s all metal based particle that is denser. The latest barium ferrite particles are extremely robust. So in terms of the actual tape format types it’s very much standardized, and that makes it easier for any third party vendor or data center to be able to reliably, they know what they’re gonna get when they get that box of unknown tapes. They largely know what the form factor is gonna be that that helps enormously. Backup software has somewhat standardized, there’s three or four big players. So we’re all used to working with the way that they backup data and therefore we’re used to working with the way to get data off those typical environments. Encryption was a problem probably 8-10 years ago, but third party devices are more or less disappeared. and Encryption methodologies have now standardized on on library based encryption and software based encrypt encryption that somewhat standardized. So when you know when we come across encrypted media, get the key name, get the key phrase, get the handshake with the library and that’s fairly straightforward to deal with. And also use of image based backups like NDMP Network Data Management Protocol, where large nas images are backed up. We now process those NDMP images in a non-native way. That means our software that that can receive a tape with an NDMP image. We don’t need to rebuild that native environment at all. We can, first restore the data, and then we can process the images directly without our piece of software as well. So as a result of that it’s much more robust. It’s much more reliable, it’s not less technical, it’s probably more technical, but we’re 20 years experienced, in that now our software engineers have continuously created solutions for all the different variants that we’ve seen. So it’s much more reliable these days.


Chris Dale: Fred if you’ve got anything to add to that.


Fred Moore: Around the year 2000, the tape industry again was suffering from a bad image as we said earlier and that was sort of the beginning of the tape renaissance. The LTO consortium began operation. It created a standardized format. Called LTO linear tape open and the LTO family products goal was to do away with all the other various formats for tape that people were just fed up with. So LTO began to borrow from the disk industry many of the key features that disk drives used. As I mentioned the servo tracks earlier were moved to the middle of the tape, so when you dropped a cartridge the cartridge was still readable. The tape industry borrowed GMR Giant MagnetoResistive heads from the disk industry to write a much narrower track with a much better signal to noise ratio with that. The air recovery techniques on tape, and at that time tape was viewed to be less reliable than disk the tape industry borrowed PRML which is a primary response maximum likelihood error recovery technique, it’s the best there is still today. It employed that on tape and that made the tape more reliable than it was before. It added quite a bit of robustness to the tape thing but the big game changer for tape out of many other capabilities barred from disk was when tape went from metal particle media to barium Ferrite. Metal particle media was not oxidized. So until the year 2005 all the tapes made were not oxidized They had metal particle on that and if you had a tape that sat on a shelf for 20 years or 15 years, the particles could literally come off. You couldn’t read the tape. And that was a huge concern with most all the people that had tape stored on vaults I hadn’t used in a long time. Would it be readable it all the data is still there or whatever. But in 2005 Barium Ferrite kind of began to appear in that actually made the tape already oxidized. Barium Ferrite does oxidize, so the particles wouldn’t fall off and that’s what gave the tape media today all modern tape technology a 30 plus media year life on this. So the media has a 30 plus year life on that it’s longer than the drives that underlie that so the media is going to be in good shape if it does happen to sit on a shelf for a long time. This is what the tape renaissance was all about borrowing from disc, new oxidized media, better media, better substrates with that, ruggedized cartridges, intelligent tape libraries. and all of a sudden a tape library had gone through a renaissance. By 2010 tape was more reliable than disk it had a higher capacity than disk, a faster raw data rate than disk, it couldn’t do random access which disk can can do, and that’s a big plus for disk. Tape can make leaps and bounds now to recover its image. It’s not a technology issue today. Tapes technology is absolutely very robust today. That’s the good news. The rest of that news is not everybody still knows that, and there’s an evangelism effort that the tape industry needs to really embark on to let the world know this.


Chris Dale: Brendan, we have a new world. We have new privacy regulations, like the General Data Protection Regulation and the California Consumer Protection Act, which we’ve mentioned. How will they affect the treatment of unstructured data backed up to tapes in vaults?


Brendan Sullivan: So I don’t think we know…yet. I think May 2018 GDPR has been in effect since then. I think certainly within my circles of discussion. There’s reference within GDPR that suggests that it does indeed in fact back up. I think we’re talking specifically about Article 17 requests there right for an erasure.


Chris Dale: Well it is partly that, but it’s partly also the rights to the data subject to access requests, where people in a very short timeframe are entitled to know what data you’re holding. That doesn’t sit well with the historic idea, at least of tape.


Brendan Sullivan: Yeah and it probably doesn’t play well with traditional potential solutions for this as well. It does look, in the reading of those articles that it’s covered. It covers backups and therefore if it covers backups it covers unstructured data on tape, and of course there’s a there’s a ton of data out there. I think it kind of plays with, you know for us we like it because anytime you have to you have to go and find out what’s on a backup tape then of course that’s a technology that can be that can be used. So from our perspective, we welcome it. I don’t know how sharp the regulators teeth are going to be on this. I think time will tell. I haven’t heard of any massive great sanctions against companies that have had to tackle and not been able to tackle data that’s on backup that are subject to those provisions.


Chris Dale: I think can I just interrupt you there. I don’t think the problem for the organizations is simply, “how quickly can I react to this request?”. It’s more an organizational preemptive organizational thing of the kind that you seem well-placed to deal with. How can I set up my data so that as in when anybody makes a request I can find if I’ve got anything


Brendan Sullivan: It sets up beautifully for us with with the post extraction of intelligence, metadata, or backup session level information that resides on currently unstructured tape that is outside of the management of an active piece of backup software. So, if you have a backup software and you’ve got tapes offsite you can learn about what’s on those backup tapes, but as we know there’s huge amounts of data that backup software where the license has lapped. The database is no longer active, and you don’t have any information about what might be on there. In addition to that, there’s embedded data. So if we’re talking about individual people’s data that they’re likely going to be within a database that’s backed up on a piece of media as well so it’s compounded that way. I do think it plays in with with recent changes as well in terms of a corporation’s strategy. I think it plays in well with the last round of changes that were made to the federal rules the FICP as it relates to rule 37e, which softened from my perspective. It reads now that that inadvertent spoliation is not going to be looked at as aggressively by the courts should there be good intent, should there be policies in place and good intent and fair effort to actually remediate data. So I think that means that companies should be more inclined, should be less fearful of tackling and purging these large unstructured data archives. That’s the pull, the push of the regulators of shouldn’t you proactively go and figure out A. what you have. and B. what you shouldn’t have and tackle those archives as well. And of course the move to take metadata from mountains of unstructured data that’s out at offsite storage vaults. Figure out what’s on that. Put that on a database, provide some level of analytics to determine what is there, what isn’t there, what do we want to keep, what don’t we want to keep…we have the tools. We have the services. Those problems can be solved. I’ve also I’ve read recently online that you know it’s impossible to delete data that is within a backup particularly within a database, and while I would agree with that, that’s not the only means to solve this problem. Our approach to solving this problem and purging these environments and learning about what’s on there is: gaining intelligence about that unstructured data while migrating it. So we’re not landing anything therefore we’re not learning anything we don’t need to delete anything. We’re moving stuff and we’re discussing and agreeing up front while we’re moving it. This is what you’ve got. What do you want to pick up during that move? What do you want to leave behind during that move? So, you know we’re quite bullish about where this might take us because it solves so many problems in one go. It compacts the environments. It shrinks, it dramatically reduces the amount of tapes that you have to keep in offsite storage. It gives you a level of intelligence that allows you to drop stuff off that you don’t want while you’re migrating it., and ultimately if you want to go to the cloud then you’ve got good data. Go to the cloud, you’re not going to dump 90 percent of junk to the cloud not knowing it’s junk, just because it’s a physical volume. You’ve actually implemented a good process that means it’s you’re not going to overpay for storage wherever you want to leave it whether or not you want to leave it on tape or whether or not you want to put it on the cloud afterwards, there’s a good process.