HDFS Snapshots

Submitted by admin on Thu, 04/18/2019 - 12:52

HDFS Snapshots are read-only point-in-time copies of the file system.

HDFS snapshot is saved copy of an existing directory. Snapshots are used for  for restoring the corrupted data.

 

1)Create a local file with number entries.

[hadoop@hadoop1 ~]$ cat numbers.txt

1

2

3

4

5

6

7

89

12

13

14

15

45

67

78

89

 

2)Create a folder on hdfs and upload local file to HDFS directory

 

 The following commands create a folder called numbers in HDFS directory /user/hdfs    and upload local file called numbers to HDFS directory.

hdfs dfs -mkdir /user/hdfs/numbers
hdfs dfs -ls /user/hdfs/numbers
hdfs dfs -put numbers.txt /user/hdfs/numbers

 

3)Try to create a snapshot on  an HDFS directory

Snapshots can not be created on a folder directly. We need to enable snapshot on the directory before creating snapshots on it.

Directory is not a snapshottable directory error is thrown if snapshots are not enabled.

hdfs dfs -createSnapshot /user/hdfs/numbers

 

[hadoop@hadoop1 ~]$ hdfs dfs -createSnapshot /user/hdfs/numbers

createSnapshot: Directory is not a snapshottable directory: /user/hdfs/numbers

 

4) Allow snapshots and create snapshots

allowSnapshot command enables snapshots on a HDFS directory.

The folowing commands first enable snapshot on /user/hdfs/numbers and create snapshot on the same.

 

[hadoop@hadoop1 ~]$ hdfs dfsadmin -allowSnapshot /user/hdfs/numbers

Allowing snaphot on /user/hdfs/numbers succeeded

 

[hadoop@hadoop1 ~]$ hdfs dfs -createSnapshot /user/hdfs/numbers

Created snapshot /user/hdfs/numbers/.snapshot/s20190418-175417.343

 

 

5) List snapshots using ls command

We can check snapshots in a directory using ls command. Snapshots of a directory will be stored in .snapshot directory of the folder.

 

[hadoop@hadoop1 ~]$ hdfs dfs -ls /user/hdfs/numbers/.snapshot

Found 1 items

drwxr-xr-x   - hadoop supergroup          0 2019-04-18 17:54 /user/hdfs/numbers/.snapshot/s20190418-175417.343

[hadoop@hadoop1 ~]$  hdfs dfs -ls /user/hdfs/numbers/.snapshot/s20190418-175417.343

Found 1 items

-rw-r--r--   3 hadoop supergroup         42 2019-04-18 17:50 /user/hdfs/numbers/.snapshot/s20190418-175417.343/numbers.txt

 

If numbers.txt file in /user/hdfs/numbers is corrupted , We can restore numbers.txt file from /user/hdfs/numbers/.snapshot/s20190418-175417.343 directory.

 

6) List snapshottable directories in entire HDFS

 

lsSnapshottableDir  command lists all HDFS directory those have snapshots enabled.

[hadoop@hadoop1 ~]$ hdfs lsSnapshottableDir

drwxr-xr-x 0 hadoop supergroup 0 2019-04-18 17:54 1 65536 /user/hdfs/numbers

 

7) Create  snapshot with a specific name

 

By default snapshots are created with timestamp as a folder name. We can even name snapshot of directory at the time of creating snapshots.

The command below creates a snapshot called secondSS on HDFS directory /user/hdfs/numbers.

[hadoop@hadoop1 ~]$ hdfs dfs -createSnapshot /user/hdfs/numbers secondSS

Created snapshot /user/hdfs/numbers/.snapshot/seconds

 

[hadoop@hadoop1 ~]$  hdfs dfs -ls /user/hdfs/numbers/.snapshot

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2019-04-18 17:54 /user/hdfs/numbers/.snapshot/s20190418-175417.343

drwxr-xr-x   - hadoop supergroup          0 2019-04-18 18:05 /user/hdfs/numbers/.snapshot/seconds

 

 

8) Delete file from HDFS folder

The command below deletes file numbers from directory /user/hdfs/numbers to see how to restore it.

[hadoop@hadoop1 ~]$ hdfs dfs -rm /user/hdfs/numbers/numbers.txt

 

9) Restore snapshot from HDFS directory

Snapshots will be restored using HDFS command cp.

 

[hadoop@hadoop1 ~]$  hdfs dfs -cp /user/hdfs/numbers/.snapshot/secondSS/numbers.txt /user/hdfs/numbers/

 

10) Try to disable snapshots

We need to delete all snapshots before disabling snapshots on a HDFS directory.

[hadoop@hadoop1 ~]$ hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

disallowSnapshot: The directory /user/hdfs/numbers has snapshot(s). Please redo the operation after removing all the snapshots.

 

11) Delete snapshots and disallow snapshot 

The commands below first delete all snapshots before disabling snapshots.

 hdfs dfs -deleteSnapshot /user/hdfs/numbers secondSS
 hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

 

[hadoop@hadoop1 ~]$ hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

disallowSnapshot: The directory /user/hdfs/numbers has snapshot(s). Please redo the operation after removing all the snapshots.

[hadoop@hadoop1 ~]$

[hadoop@hadoop1 ~]$  hdfs dfs -deleteSnapshot /user/hdfs/numbers secondSS

[hadoop@hadoop1 ~]$  hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

disallowSnapshot: The directory /user/hdfs/numbers has snapshot(s). Please redo the operation after removing all the snapshots.

[hadoop@hadoop1 ~]$  hdfs dfs -deleteSnapshot /user/hdfs/numbers s20190418-175417.343

[hadoop@hadoop1 ~]$  hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

Disallowing snaphot on /user/hdfs/numbers succeeded

 

12) Rename a snapshot

renameSnapshot  command is used to change the name of a snapshot.

[hadoop@hadoop1 ~]$   hdfs dfs -renameSnapshot /user/hdfs/numbers secondSS thirds

 

13)Get Snapshots Difference Report

 hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

 

[hadoop@hadoop1 ~]$   hdfs snapshotDiff /user/hdfs/numbers firstSS secondSS

 

Blog tags