StorSimple uses three layers of storage - SSD, HDD and Cloud storage. The read & write of fresh data will always happen in the SSD tier. When data gets aged and is accessed less, it is tiered to HDD layer. The cold data, ie the least accessed data will be tiered to the Azure Cloud storage tier. With this architecture, organizations need not worry about local storage capacity management and planning since cloud storage is integrated with the solution and archival data is automatically tiered to it.
Now let us take a look into the automated tiering workflow. When data is written first to the device it goes to the SSD tier. Inline deduplication and compression will be active but the archival procedure doesn't kick in until much later. So the data continues to be written in this tier until first a defined low threshold limit and later a high threshold is reached. At this point the system starts identifying the non-working set of data, ie the oldest data and this data is spilled over to the next tier ie HDD. Lower and higher threshold is always kept empty because we want to keep some buffer space available if the user wants to restore some archival data. at a later point. The threshold limit is 94% for 8000 series, after which data is migrated from SSD to HDD and from HDD to AzureStorage . These processes are transparent to users and applications, and there is no impact on how they access the data.
In the case of StorSimple Virtual array, we do not have a concept of SSD and HDD tiers. Hence the data is tiered directly to Cloud from the local storage ie the virtual hard disk .It is done based on a data heat map, which tracks usage of data , its age and relationship with other data. The active data or hot data is stored locally and cold /inactive data is tiered to cloud storage
Deduplication, Compression and Encryption
Dedeplication is enabled by default in StorSimple devises and there are no special licenses associated with it. When data comes into the StorSimple device, it is written as 64 Kb blocks. For every block, hashkeys are built and a metadata map is created. SSD tier consists of raw storage and this metadata map. Deduplication happens in the SSD layer, thereby ensuring performance. When data comes in, it is matched with the metadata map. If block exists it discarded and the pointers are updated. Same is the case with data being read. This helps in optimal utilization of local capacity and makes operations like data migrations time efficient.
As mentioned earlier , data is spilled over to HDD layer once the high watermark is reached in SSD tier. It is in HDD tier that lossless compression of the data sets are done. The type of compression used is deflate compression. That means, data residing in the HDD tier is fully deduped and compressed . However the users can continue to access the data in the device without any noticeable difference as the entire process is transparent
When data is tiered out from StorSimple array local storage to Cloud it is encrypted using AES-256 encryption. Customer holds the encrypted data. Data is converted to iscsi blocks , deduped ,compressed and then encrypted before sending to Azure. The data is sent to Azure over HTTPS. Data residing in Azure is further protected by mechanisms RBAC, login password, auditing, Access keys etc. These are the different layers of security for your data in StorSimple.
To summarize the process , deduplication happens in SSD tier and when SSD reaches capacity the data is compressed and moved to HDD tier. When the data is ready to be tiered to cloud it is encrypted and send to cloud storage over HTTPS. However in case of virtual array there is a small difference wherein the data that resides in the local storage is not deduplicated and compressed .The deduplication, compression and encryption happens before data is tiered to Cloud storage .
Local Snapshots and Cloud Snapshots
Snapshots refer to the inbuilt backup mechanism of StorSimple devices. There are two types of backups - local snapshots and cloud snapshots
Local snapshots are point in time copies of data in StorSimple local storage. They are usually scheduled on daily and weekly basis with shorter retention periods. They are useful for restoring any recently deleted data. Local Snapshots uses Copy reference On Write(CROW) method. It makes use of volume metadata references for creating storage efficient snapshots and is stored locally in the devices
Cloud snapshots are point in time copies of data in Azure Cloud storage. Cloud snapshots are typically scheduled with longer retention periods, like weeks and months and are useful in DR scenarios. In case of cloud snapshots, entire data and the metadata is copied over to cloud when the snapshot is taken for the first time. All subsequent snapshots are incremental , ie only the changed data and metadata is copied over to cloud thereby optimizing the cloud storage usage
StorSimple physical array supports both local and cloud snapshots. However Virtual array supports only Cloud snapshots
In the next part of this blog series we will look into the different management tools and some important StorSimple terminologies /concepts