Showing posts with label deduplication. Show all posts
Showing posts with label deduplication. Show all posts

Thursday, November 25, 2010

Quantum refreshes the system high range at SNW data deduplication backup

VIGNE, Texas - Quantum Corp. completed an update of its DXi data de-duplication disk backup portfolio today, launch of the company DXi8500 virtual tape (VTLS) at Storage Networking World (SNW) library.

The DXi8500 replaces the DXi7500 as the top range of quantum backups on disk portfolio and comes after a review of year DXi.Quantum platform launched storage attached to the network (NAS) for midrange DXi6500 last October, a medium range DXi6700 deduplication backup appliance in Augu...
To continue reading free save as orloginMore you read must become a member of the SearchDataBackup.com



St and DXi4500 appliances for small-medium-sized enterprises (SMEs) in May.

The redesign came after EMC acquired Data Domain and completed its OEM agreement to sell quantum dedupe software. EMC Data Domain is now principal competitor of quantum déduplication.Quantum backup market position as a competitor of the EMC Data Domain DD8800 DXi8500.

A box of DXi8500 20 TB TB usable 200 scales. It is built with six Nehalem processors and quantum claims that it can perform 6.4 TB per hour – three times as fast as DXi7500.Le DXi8500 supports charge RAID 6, GPBs SAS 6 drives and 8 GB / S Fibre Channel 10 Gigabit Ethernet connectivity.

Steve Whitner, quantum, disc products product marketing manager, said faster processors and 15,000 rpm SAS drives are the main reasons for boosting performance kick. Quantum has changed the way it indexes the metadata and search dismissals on disk for better draw party SAS drives faster, he said.

"What we do is placing part of fast disk data to verify dedupe," added Whitner. "Certain data need many e/S and if speak you to fast disk then it speeds up the whole operation.»

Server and StorageIO analyst Greg Schulz said early discussions surrounding deduplication systems based on the ratio of reduction, but now more clients and service providers recognize the time and performance are also.

"For most customers, performance issues", he said. "" ""Transfer rate question as well as reduction ratios. Quantum, like other providers, is aware of the need for speed, especially for larger customers, issues and they do some chose.Toute application has a window backup and data recovery runs, you must move data quickly.»

The DXi8500 supports VTL, NAS and Symantec Corp. OST interfaces simultaneously and its feature live bands bypasses the server backup and writes data directly to tape.

"People still use tape, the DXi8500 is able to migrate to the band," said Whitner. " But he did not go back to the client backup server.We have ties with backup applications.Backup software knows that data had been displaced.»

Advanced information of quantum and Quantum Vision 4.0 management software is part of the package database license.Report shows the amount of data is passing in each port and displays the CPU operations, use e/S disk capacity and the amount of data is stored for quantum pointe.Vision Deduplication is a central console for the overall management of disk and tape.

The DXi8500 list price is $430,000 DXi8500 with 90 to usable and VTL $731,000 fresh interface.

No DXi8500 global de-duplication backup media

Quantum does not yet support global de-duplication, allowing customers to Deduplicate data in several software most boîtes.La de-duplication applications support global de-duplication as Sepaton Inc. and IBM diligent.EMC added a global array of deleting duplicates in April that cancellations on both nodes.

Whitner said quantum uses replication to obtain comparable benefits between the two systems. ""We can go up to 200 TB per controller," he said. ""If there is a second area, replicate us across systems".

Huawei Symantec introduces SAN and storage unified

Huawei Symantec Technologies, a company of joint-venture between networking Huawei and Symantec, Chinese society has launched two systems storage disk star company produces Nord.Le America Oceanspace S2600 SAN is targeted at SMEs and Oceanspace N8300 unified storage product can be used to level environments and small business.

Both products have already tested other markets such as Europe, Asia, South America and Asia, said Jane Li, General Manager of North America, Huawei Symantec.

Huawei Symantec OEM for his Oceanspace N8300 using its own equipment for the device that supports memory-file supports clustered NAS, SAN and iSCSI.Il mode active-active cluster, 48 GB of cache and scales with a capacity of 8 TB.Le filesystem scales up to 256 a tuberculosis dynamic storage tiering and manages different types of disk, SSD, SAS, SATA and CF.

SAN S2600 Oceanspace multimode handles product, multiniveau.Il data protection and multi-site disaster recovery data replication disk has a 64-bit, 4 GB of memory expandable cache processor and holds 96 disques.Il supports SATA and SAS drives and can handle 256 hosts connections.


View the original article here

Sunday, November 21, 2010

Implementation in a virtualized environment data de-duplication technology

Translate Request has too much data
Parameter name: request
Error in deserializing body of reply message for operation 'Translate'. The maximum string content length quota (8192) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 1, position 9267.
More and more businesses are showing an interest in implementing data deduplication technology in their virtualized environments because of the amount of redundant data in virtual server environments.

In this Q&A with Jeff Boles, senior analyst with the Taneja Group, learn about why organizations are more interested in data dedupe for server virtualization, whether target or source deduplication is better for a virtualized environment, what to watch out for when using dedupe for virtual servers, and what VMware's vStorage APIs have brought to the scene. Read the Q&A or listen to the MP3 below.

Listen to the data deduplication in virtualized environments FAQ

Table of contents:

>> Have you seen more interest in data deduplication technology among organizations with a virtualized environment?
>> Is source or target deduplication being used more? Does one have benefits over the other?
>> Does deduplication introduce any complications when you use it in a virtual server environment?
>> Are vendors taking advantage of vStorage APIs for Data Protection?

Have you seen more interest in data deduplication technology among organizations that have deployed server virtualization? And, if so, can you explain what's driving that interest and the benefits people might see from using dedupe when they're backing up virtual servers?

Absolutely. There's lots of interest in using deduplication for virtualized environments because there's so much redundant data in virtual server environments. Over time, we've become more disciplined as IT practitioners in how we deploy virtual servers.

We've done something we should've done a number of years ago with our general infrastructures, and that's creating a better separation of our core OS data from our application data. And consequently, we see virtualized environments that are following best practices today with these core OS images that contain most operating system files and configuration stuff. They separate that data out from application and file data in their virtual environments, and there are so many virtual servers that use very similar golden image files with similar core OS image files behind a virtual machine. So you end up with lots of redundant data across all those images. If you start deduplicating across that pool you get even better deduplication ratios even with simple algorithms than you do in a lot of non-virtualized production environments. There can be lots of benefits from using deduplication in these virtual server environments just from a capacity-utilization perspective.

What kind of data deduplication is typically being used for this type of application? Do you see source dedupe or target, and does one have benefits over the other?

There are some differences in data deduplication technologies today. You can choose to apply it in two places -- either the backup target (generally the media server), or you can choose to apply it at the source through the use of technologies like Symantec's PureDisk, EMC Avamar or some of the other virtualization-specialized vendors out there today.

Source deduplication is being adopted more today than it ever has before and it's particularly useful in a virtual environment. First you have a lot of contention for I/O in a virtualization environment, and what you see when you start doing backup jobs there. Generally, when folks start virtualizing, they try to stick with the same approach, and that's with a backup agent that's backing up data to an external media server to a target, following the same old backup catalog jobs, and doing it the same way they were in physical environments. But you end up packing all that stuff in one piece of hardware that has all these virtual machines (VMs) on it, so you're writing a whole bunch of backup jobs across one piece of hardware. You get a whole lot of I/O contention, especially across the WANs, and more so across LANs. But any time you're going out to the network you're getting quite a bit of I/O bottlenecking at that physical hardware layer. So the traditional backup approach ends up stretching out your backup windows and messes with your recovery time objectives (RTOs) and recovery point objectives (RPOs) because everything is a little slower going through that piece of hardware.

So source deduplication has some interesting applications because it can chunk all that data down to non-duplicate data before it comes off the VM. Almost all of these agent approaches that are doing source-side deduplication push out a very continuous stream of changes. You can back it up more often because there's less stuff to be pushed out, and they're continually tracking changes in the background; they know what the deltas are, and so they can minimize the data they're pushing out.

Also, with source-side deduplication you get a highly optimized backup stream for the virtual environment. You're pushing very little data from your VMs, so much less data is going through your physical hardware layer, and you don't have to deal with those I/O contention points, and consequently you can get much finer grained RTOs and RPOs and much smaller backup windows in a virtual environment.

Does data deduplication introduce any complications when you use it in a virtualized environment? What do people have to look out for?

When you're going into any environment with a guest-level backup and pushing full strings of data out, you can end up stretching out your backup windows. The other often-overlooked dimension of deduplicating behind the virtual server environment is that you are dealing with lots of primary I/O that's pushed into one piece of hardware now in a virtual environment. You may have many failures behind one server at any point in time. Consequently, you may be pulling a lot of backup streams off of the deduplicated target or out of the source-side system. And, you may be trying to push that back on the disk or into a recovery environment very rapidly.

Dedupe can have lots of benefits in capacity but it may not be the single prong that you want to attack your recovery with because you're doing lots of reads from this deduplicated repository. Also, you're pulling a batch of disks simultaneously in many different threads. There may be 20 or 40 VMs behind one piece of hardware, and you're likely not going to get the recovery window that you want -- or not the same recovery window you could've gotten when pulling from multiple different targets into multiple pieces of hardware. So think about diversifying your recovery approach for those "damn my virtual environment went away" incidents. And think about using more primary protection mechanisms. Don't rely just on backup, but think about doing things like snapshots where you can fall back to the latest good snapshot in a much narrower time window. You obviously don't want to try to keep 30 days of snapshots around, but have something there you can fall back to if you've lost a virtual image, blown something up, had a bad update happen or something else. Depending on the type of accident, you may not want to rely on pulling everything out of the dedupe repository, even though it has massive benefits for optimizing the capacity you're using in the backup layer.

Last year VMware released the vStorage APIs for Data Protection and some other APIs as a part of vSphere. Are you seeing any developments in the deduplication world taking advantage of those APIs this year?

The vStorage APIs are where it started getting interesting for backup technology in the virtual environment. We were dealing with a lot of crutches before then, but the vStorage APIs brought some interesting technology to the table. They have implications for all types of deduplication technology, but I think they made particularly interesting implications for source-side deduplication, as well as making source-side more relevant. One of the biggest things about vStorage APIs was the use of Changed Block Tracking (CBT); with that you could tell what changed between different snapshots of a VM image. Consequently, it made this idea of using a proxy very useful inside a virtual environment, and source-side has found some application there, too. You could use a proxy with some source-side technology so you can get the benefits of deduplicating inside this virtual environment after taking a snapshot, but it only deduplicates the changed blocks that have happened since the last time you took a snapshot.

Some of these vStorage API technologies have had massive implications in speeding up the time data can be extracted from a virtual environment. Now you can recognize what data has changed between a given point in time and you can blend your source-side deduplication technologies with your primary virtual environment protection technologies and get the best of both worlds. The problem with proxies before was that they were kind of an all-or-nothing approach. You use the snapshot, and then you come out through a proxy in the virtual environment through this narrow bottleneck that will make you do a whole bunch of steps and cause compromises with the way you were getting data out of your virtual environment.

You could choose to go with source-side, but you have lots of different operations going on in your virtual environment. Now you can blend technologies with the vStorage APIs. You can use a snapshot plus source-side against it and get rapid extraction inside your virtual environment, and a finer application of the deduplication technology that's still using source-side to this one proxy pipe, which mounts up this snapshot image, deduplicates stuff and pushes it out of the environment. vStorage APIs have a lot of implications for deduping an environment and blending deduplication technologies with higher performing approaches inside the virtual environment. And you should check with your vendors about what potential solutions you might acquire out there in the marketplace to see how they implemented vStorage APIs in their products to speed the execution of backups and to speed the extraction of backups from your virtual environment.


View the original article here

Friday, November 19, 2010

Problems of common data de-duplication system problems

Error in deserializing body of reply message for operation 'Translate'. The maximum string content length quota (8192) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 1, position 9007.
Error in deserializing body of reply message for operation 'Translate'. The maximum string content length quota (8192) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 1, position 9114.

By W. Curtis Preston

It's been said that we never really solve any problems in IT -- we just move them. Data deduplication is no exception to that rule. While deduplication systems have helped make data backup and recovery much easier, they also come with a number of challenges. The savvy storage or backup administrator will familiarize themselves with these challenges and do whatever they can to work around them.

Your backup system creates duplicate data in three different ways: repeated full backups of the same file system or application; repeated incremental backups of the same file or application; and backups of files that happen to reside in multiple places (e.g., the same OS/application on multiple machines). Hash-based deduplication systems (e.g., CommVault Systems Inc., EMC Corp., FalconStor Software, Quantum Corp., Symantec Corp.) will identify and eliminate all three types of duplicate data, but their level of granularity is limited to their chunk size, which is typically 8K or larger. Delta-differential-based deduplication systems (e.g., IBM Corp., ExaGrid Systems, Sepaton Inc.) will only identify and eliminate the first two types of duplicate data, but their level of granularity can be as small as a few bytes. These differences typically result in a dedupe ratio draw, but can yield significant differences in certain environments, which is why most experts suggest you test multiple products.

Because roughly half of the duplicate data in most backup data comes from multiple full backups, people using IBM Tivoli Storage Manager (TSM) as their backup product will experience lower deduplication ratios than customers using other backup products. This is due to TSM's progressive incremental feature that allows users to never again do a full backup on file systems being backed up by TSM. However, because TSM users perform full backups on their databases and applications, and because full backups aren't the only place where duplicate data is found, TSM users can still benefit from deduplication systems -- their dedupe ratios will simply be smaller.

The second type of duplicate data comes from incremental backups, which contain versions of files or applications that have changed since the most recent full backup. If a file is modified and backed up every day, and the backup system retains backups for 90 days, there will be 90 versions of that file in the backup system. A deduplication system will identify the segments of data that are unique and redundant among those 90 different versions and store only the unique segments. However, there are file types that do not have different versions (video, audio, photos or imaging data, and PDF files); every file is unique unto itself and is not a version of a previous iteration of the same file. An incremental backup that contains these types of files contains completely unique data, so there is nothing to deduplicate them against. Since there is a cost associated with deduplicated storage, customers with significant portions of such files should consider not storing them on a deduplication system, as they will gain no benefit and only increase their cost.

Data deduplication systems and encryption: What to watch out for

Data deduplication systems work by finding and eliminating patterns; encryption systems work by eliminating patterns. Do not encrypt backup data before it is sent to the deduplication system -- your deduplication ratio will be 1:1.

Compression works a little like encryption in that it also finds and eliminates patterns, just in a very different way. The way most compression systems work results in a scrambling of the data that has a similar effect as encryption; it can also completely remove all abilities of your deduplication system to deduplicate the data.

The compression challenge often results in a stalemate between database administrators who want their backups to go faster and backup administrators who want their backups to get deduped. Since databases are often created with very large capacities and very small actual amounts of data, they tend to compress very well. This is why turning on the backup compression feature often results in database backups that go two to four times faster than they do without compression. The only way to get around this particular challenge is to use a backup software product that has integrated source deduplication and client compression, such as CommVault Simpana, IBM TSM or Symantec NetBackup.

Multiplexing and deduplication systems

The next dedupe challenge with backups only applies to companies using virtual tape libraries (VTLs) and backup software that supports multiplexing. Multiplexing several different backups to the same tape drive can also scramble the data and completely confounds all dedupe. Even products that are able to decipher the different backup streams from a multiplexed image (e.g., FalconStor, Sepaton) tell you not to multiplex backups to their devices because it simply wastes time.

Consider the dedupe tax

The final backup dedupe challenge has to do with the backup window. The way that some deduplication systems perform the dedupe task actually results in a slow-down of the incoming backup. Most people don't notice this because they are moving from tape to disk, and a dedupe system is still faster. However, users who are already using disk staging may notice a reduction in backup performance and an increase in the amount of time it takes to back up their data. Not all products have this particular characteristic, and the ones that do demonstrate it in varying degrees -- only a proof-of-concept test in your environment will let you know for sure.

The restore challenge is much easier to understand; the way most deduplication systems store data results in the most recent backups being written in a fragmented way. Restoring deduplicated backups may therefore take longer than it would have taken if the backup had not been deduplicated. This phenomenon is referred to as the dedupe tax.

When considering the dedupe tax, think about whether or not you're planning to use the dedupe system as the source for tape copies, because it is during large restores and tape copies that the dedupe tax is most prevalent. Suppose, for example, that you plan on using LTO-5 drives that have a native speed of 140 MBps and a native capacity of 1.5 TB. Suppose also that you have examined your full backup tapes and have discovered that you consistently fit 2.25 TB of data on your 1.5 TB tapes, meaning that you're getting a 1.5:1 compression ratio. This means that your 140 MBps tape drive should be running at roughly 210 MBps during copies. Make sure that during your proof of concept you verify that the dedupe system is able to provide the required performance (210 MBps in this example). If it cannot, you may want to consider a different system.

The final challenge with deduplicated restores is that they are still restores, which is why dedupe is not a panacea. A large system that must be restored still requires a bulk copy of data from the dedupe system to the production system. Only a total architectural change of your backup system from traditional backup to something like continuous data protection (CDP) or near-CDP can address this particular challenge, as they offer restore times measured in seconds not hours.

Data deduplication systems offer the best hope for making significant enhancements to your current backup and recovery system without making wholesale architectural changes. Just be sure that you are aware of the challenges of dedupe before you sign a purchase order.

W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."


View the original article here

Monday, November 8, 2010

Data deduplication backup Gets a few new twists

Arkeia Software and ExaGrid Systems this week came new approaches data deduplication backup products. Arkeia added what he calls "progressive de-duplication" in its software backup while ExaGrid Systems an algorithm to expand taken applications supported by its disk devices added.

Arkeia launched 9 backup network thanks to data de-duplication technology acquired vendor at the time when he bought Kadena Systems in November dernier.Arkeia have no data de-duplication in previous versions.

Kadena technology uses progressive de-dupe, which differs from that block fixed and variable block deduplication approaches...
To continue reading free save as or login More you read must become a member of the SearchDataBackup.com



providers use more identification data deduplication.Network Backup 9 is multi-application and creates a window of different depending on the application size is deduping.Il must approach in two stages, first looking for likely with a slight algorithm and then confirming correspondence with a hash matches.Arkeia CEO Bill Evans calls it a sliding window approach and argued that his method is faster than only with the help of a hash.

De-duplication blocks fixed examines the pieces are all the same size and has a rate lower than variable-block deduplication, which examines blocks in different sizes and takes longer to de-dupe compression. " "Progressive de-duplication with a window offers you the best of both worlds," said Evans.

Arekeia also offers a Profiler application to calculate compression rates deduplication by measuring the accurate reports of .the specific data compression' tool is used for planning projects de-duplication.

Arkeia deduplication option will cost $2,000 per server data.It will be available at the beginning of 2011 with the profiling tool due out next month.

Evans said backup network improvements usually come approximately 18 months apart, but version 9 has taken two years to develop because of the integration of Kadena de-duplication. He said that it was important to get right deduplication in this version, because most Arkeia competitors already and considered part of the larger breed backup devices on the spot in the cloud image. This is an ability that arkeia provides added.

"We believe that the battle of Deduplication is not even more," he said. "We wanted a compelling approach de-duplication differentiate on part of our strategy in the long term to have replicated to the nuage.La backups done marché.Il long pole in the tent of this model is deduplication because rare resource is the bandwidth in the cloud."

Analyst, Enterprise Strategy Group Lauren Whitehouse said that arkeia has a twist on an established technology.

"It is difficult to create something completely different," she says. "Arkeia evolves what others have fait.Hachage deduplication and compare are not new.Length fixed- and variable-length are not nouveaux.lorsque you take the best of these approaches and combine them, you can do something better.It takes what is already out and is included and seeks to improve on it.»

Whitehouse said the opportunity for surrebuttal deduped data is an important element Arkeia is still reported missing.

"There is no replication storage storage", she says. "If I don't want to use the band, how can I get the data off site?If I optimize disk, I would like to move to a secondary site and make sure it is optimized.I could use my storage system functionality, but I can having a - it to move in de-duplication thread. »

ExaGrid goes "generic" with DeltaZone data deduplication

ExaGrid presented DeltaZone, a new algorithm deduplication allows customers using a generic bytewise as well as a content-aware byte-level deduplication deduplication.

ExaGrid has always offered content-aware de-dupe for its midrange data backup appareils.La content-aware deduplication EX series scales better and is granted for the individual applications, but ExaGrid VP product marketing said Marc Crespi is less flexible because it limited the applications that the product could support and it took ExaGrid more time to free upgrades and support new applications.

Application-aware deduplication requires that vendor to optimize its software with specific applications, until it supports applications for its clients.ExaGrid backup supports CA ARCserve, CommVault Simpana, IBM Corp. System I platform iSeries (AS400), Hewlett-Packard (HP) Co. Data Protection, Symantec Corp. Backup Exec and NetBackup, Veeam Software, vRangerPro Vizioncore (now Quest Software Inc.) and VMware backup.

"Bytewise methods require more knowledge of the content," said Crespi. "We must do more work to develop applications on the marché.Maintenant you'll see accelerating produit.Nous announcements are going to add backup applications and venture into the archive and nearline use cases.

For those who use now the case excludes primary storage, Crespi said: "We will remain focused on secondary storage – backup, archiving and the row of data," he said. "However, technology DeltaZone promise storage principal.Vous can imagine what it can do this variant storage byte-level main de-duplication.»

ExaGrid clients can have their systems to analyze the data content or said générique.Crespi DeltaZone mode has been tested by a small group of clients since January and is now part of the OS ExaGrid.

Whitehouse said ExaGrid generic mode may cause less disruption for its customers when adding upgrades and new features.

"End users don't want to risk failure once the solution is in place," says. "Scalability for performance and capacity is a key factor in purchasing end users age in their use of that early adopters are OK with a proposal for high-risk/high-reward at the first stage of the adoption of deduplication technologie.alors, today the majority of customers want high reward without risk.»


View the original article here

Saturday, April 17, 2010

Restore deduped data deduplication systems

By Curtis Preston w.

Recovery from system data deduplication can be slower than modern tape drives, and restores from other systems of deduplication can be faster tape could ever dream de.Dans this topic, you learn what you need to do to ensure your system de-duplication is located on the right side of these two extremes.

To explain why there are that many call a "de-duplication fee", we need to go back in time to when there is was no data de-duplication technology.Before de-duplication, we wrote data to tape or disk block contiguous .(Écrire_deles_de_blocs_contigus_de_données_de_sauvegarde_sur_disque_nécessite_un_système_de_fichiers_vide_ou_un_système_de_disque_conçu_à_cet_effet,_comme_une_librairie_de_bandes_virtuelle).All blocks containing a given backup were located near each other.

Backup data may also perform a full backup, a synthetic full backup, or otherwise colocaliser files needed for full recovery (for example, active IBM TSM data pools) .these behaviors typical of backup systems meant that much needed to restore a system blocks would all be placed contiguously on disk or tape, do a restore full of this very easy to implement System.



View the Original article