What is FS Image and Edit Log in Hadoop?
These are the two files that together form the complete memory of the NameNode. When the NameNode starts up, it reads both of these files to reconstruct the entire state of the file system in memory
These are the two files that together form the complete memory of the NameNode. When the NameNode starts up, it reads both of these files to reconstruct the entire state of the file system in memory.
FS Image:
- The FS Image is a persistent snapshot of the file system metadata at a specific point in time. It stores:
- Directory structure and file names
- Block IDs and how many blocks each file has
- File permissions and ownership
- Replication factor per file
- It does NOT store DataNode locations — those are rebuilt dynamically every time DataNodes send their block reports on startup.
- Think of FS Image as a photograph of the file system at the last checkpoint.
Edit Log:
- The Edit Log is a transaction log that records every single change made to the file system after the last FS Image was taken — every file creation, deletion, rename, permission change. It keeps growing continuously until a checkpoint happens.
- Think of Edit Log as a diary of everything that happened after the last photograph.
Together on startup:
- NameNode loads FS Image (base state) + replays Edit Log (recent changes) = complete current state of the file system.
Checkpointing Over time the Edit Log gets very large, making restarts slow. Checkpointing solves this by merging the Edit Log into the FS Image to produce a fresh, compact FS Image.
It is triggered when either condition is met — whichever comes first:
- Every 1 hour (default time interval)
- Every 1 million transactions in the Edit Log
Who does it:
- Non-HA (Non-High Availability) setup → Secondary NameNode performs checkpointing.
- HA (High Availability) setup → Standby NameNode performs checkpointing.
It depends on which setup you are using — there is no "both at the same time."
-
Non-HA setup (older/simpler clusters):
- There is NO Standby NameNode
- Secondary NameNode does the merging
- It pulls the FS Image + Edit Log from Active NameNode, merges them on its own machine, and sends the new FS Image back to Active NameNode
- Secondary NameNode is not a backup — it cannot take over if Active fails. It only does checkpointing.
-
HA setup (modern production clusters):
- There is NO Secondary NameNode
- Standby NameNode does the merging
- It already has the Edit Log (via Journal Nodes), so it merges locally and sends the fresh FS Image back to Active NameNode
- Standby NameNode does TWO jobs — checkpointing AND acting as failover backup