Archival node - Recovery of missing data
Incident description
In early 2024 there have been a few incidents on archival node storage. As result of these incidents archival node might have lost some data from January to March.
The practical effect of this issue is that requests querying the state of an account may fail, returning instead an internal server error.
Check if my node has been impacted
The simplest way to check whether or not a node has suffered data loss is to run one of the following queries:
replace <RPC_URL> with the correct URL (example: http://localhost:3030)
curl -X POST <RPC_URL> \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "b001b461c65aca5968a0afab3302a5387d128178c99ff5b2592796963407560a", "block_id": 109913260, "request_type": "view_account" } }'
curl -X POST <RPC_URL> \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "token2.near", "block_id": 114580308, "request_type": "view_account" } }'
curl -X POST <RPC_URL> \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "timpanic.tg", "block_id": 115185110, "request_type": "view_account" } }'
curl -X POST <RPC_URL> \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "01.near", "block_id": 115514400, "request_type": "view_account" } }'
If for any of the above requests you get an error of the kind MissingTrieValue it means that the node presents the issue.
Remediation steps
Option A (recommended): download a new DB snapshot
- 
Stop neard
- 
Delete the existing hotandcolddatabases. Example assuming default configuration:rm -rf ~/.near/hot-data && rm -rf ~/.near/cold-data
- 
Download an up-to-date snapshot, following this guide: Storage snapshots 
Option B: manually run recovery commands
Follow the instructions below if, for any reason, you prefer to perform the manual recovery steps on your node instead of downloading a new snapshot.
Requirements:
- neardmust be stopped while recovering data
- coldstorage must be mounted on an SSD disk or better
- The config resharding_config.batch_delaymust be set to 0.
After the recovery is finished the configuration changes can be undone and a hard disk drive can be used to mount the cold storage.
Important considerations:
- The recovery procedure will, most likely, take more than one week
- Since neardmust be stopped in order to execute the commands, the node won't be functional during this period of time
 
- Since 
- The node must catch up to the chain's head after completing the data recovery, this could take days as well
- During catch up the node can answer RPC queries. However, the chain head is still at the point where the node was stopped; for this reason recent blocks won't be available immediately
 
We published a reference recovery script in the nearcore repository. Your neard setup might be different, so the advice is to thoroughly check the script before running it. For completeness, here we include the set of commands to run:
neard view-state -t cold --readwrite apply-range --start-index 109913254 --end-index 110050000 --shard-id 2 --storage trie-free --save-state cold sequential
RUST_LOG=debug neard database resharding --height 114580307 --shard-id 0 --restore
RUST_LOG=debug neard database resharding --height 114580307 --shard-id 1 --restore
RUST_LOG=debug neard database resharding --height 114580307 --shard-id 2 --restore
RUST_LOG=debug neard database resharding --height 114580307 --shard-id 3 --restore
RUST_LOG=debug neard database resharding --height 115185107 --shard-id 0 --restore
RUST_LOG=debug neard database resharding --height 115185107 --shard-id 1 --restore
RUST_LOG=debug neard database resharding --height 115185107 --shard-id 2 --restore
RUST_LOG=debug neard database resharding --height 115185107 --shard-id 3 --restore
RUST_LOG=debug neard database resharding --height 115185107 --shard-id 4 --restore
Verify if remediation has been successful
Run the queries specified in the section: Check if my node has been impacted. All of them should return a successful response now.