hadoop修复hdfs中文件块错误:Cannot obtain block length for LocatedBlock

错误日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1529808326-127.0.0.1-1581044994027:blk_1075632921_1893326; getBlockSize()=1702; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.101.179.206:50010,DS-66edcaf3-52d8-47fa-b6a8-17b8d80cdd43,DISK], DatanodeInfoWithStorage[10.101.179.209:50010,DS-bfa64a38-cd7a-4ab4-a3cc-16376d2705e4,DISK]]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:360)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:267)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:182)
at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1042)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:57)
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:156)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:418)
... 22 more

解决问题

  • 问题原因

1、重启flume导致hdfs块异常

2、重启hadoop集群导致hdfs块异常

  • 问题根源

此时这些block并不是已经损坏了,只是租约未释放,导致其他程序无法读写,我们可以将其租约恢复或者暴力的直接删除该文件;

  • 查找问题文件
1
hdfs fsck /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16 -openforwrite
  • 定位块位置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[hdfs@vm10-101-179-203 root]$ hdfs fsck /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16 -openforwrite
Connecting to namenode via http://vm10-101-179-203.ksc.com:50070/fsck?ugi=hdfs&openforwrite=1&path=%2Fwarehouse%2Fhive%2Faiotclouddb%2Fods%2Fods_gateway%2Fdt%3D2020-06-16
FSCK started by hdfs (auth:SIMPLE) from /10.101.179.203 for path /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16 at Thu Jun 18 19:27:07 CST 2020
/warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16/event-node7.1592294424296.lzo 1702 bytes, replicated: replication=2, 1 block(s), OPENFORWRITE: /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16/event-node8.1592294423698.lzo 2917 bytes, replicated: replication=2, 1 block(s), OPENFORWRITE:
Status: HEALTHY
Number of data-nodes: 10
Number of racks: 1
Total dirs: 1
Total symlinks: 0

Replicated Blocks:
Total size: 280527873 B
Total files: 50
Total blocks (validated): 50 (avg. block size 5610557 B)
Minimally replicated blocks: 48 (96.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.92
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)

Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Thu Jun 18 19:27:07 CST 2020 in 8 milliseconds
  • 修复命令
1
hdfs debug recoverLease -path /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16/event-node8.1592294423698.lzo -retries 3
  • 修复结果
1
2
3
4
recoverLease returned false.
Retrying in 5000 ms...
Retry #1
recoverLease SUCCEEDED on /warehouse/hive/aiotclouddb/ods/ods_gateway/dt=2020-06-16/event-node8.1592294423698.lzo
  • 重跑mr

没报异常,完美运行

HDFS租约机制

在HDFS中,当每次客户端用户往某个文件中写入数据的时候,为了保持数据的一致性,此时其它客户端程序是不允许向此文件同时写入数据的。那么HDFS是如何做到这一点的呢?答案是租约(Lease)。换句话说,租约是HDFS给予客户端的一个写文件操作的临时许可证,无此证件者将不被允许操作此文件,客户端在每次读写HDFS文件的时候获取租约对文件进行读写,文件读取完毕了,然后再释放此租约

  • 每个客户端用户持有一个租约。
  • 每个租约内部包含有一个租约持有者信息,还有此租约对应的文件Id列表,表示当前租约持有者正在写这些文件Id对应的文件
  • 每个租约内包含有一个最新近更新时间,最近更新时间将会决定此租约是否已过期。过期的租约会导致租约持有者无法继续执行写数据到文件中,除非进行租约的更新。

参考文章

https://blog.csdn.net/androidlushangderen/article/details/52850349


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!