Hadoop Hive 集群 Operation category READ is not supported in state standby 故障解决

原创 hadoophive

测试反馈 Hive 环境异常,登录服务器执行 hive 命令时报错:

Exception in thread "main" java.lang.RuntimeException:
	org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
	Operation category READ is not supported in state standby

详情如下:

[hadoop@hd-node1 sbin]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk1.8.0_112/bin:/opt/mysql/bin:/home/hadoop/app/zookeeper/bin:/home/hadoop/app/hadoop/bin:/home/hadoop/app/hive/bin:/home/hadoop/app/sqoop/bin:/home/hadoop/tools:/home/hadoop/.local/bin:/home/hadoop/bin)
19/03/30 00:07:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/30 00:07:28 WARN conf.HiveConf: HiveConf of name hive.jdbc_passwd.auth.hive does not exist

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1835)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1449)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4270)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:901)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:528)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:839)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:556)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1835)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1449)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4270)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:901)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:528)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:839)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1409)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2123)
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1253)
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1417)
	at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:643)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:591)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:534)
	... 8 more
[hadoop@hd-node1 sbin]$ 

Hive 所在的测试集群环境可以参考 Hadoop 生产环境规格的配置,这是一个 NameNode HA 双节点加 3 个 DataNode 的 Hadoop 集群。

初步判断是由于 HA 节点中处于 standby 状态造成的异常,首先执行 hdfs haadmin -getServiceState 节点名称 来确认是否属实。

[hadoop@hd-node1 sbin]$ hdfs haadmin -getServiceState hd-node1
standby
[hadoop@hd-node1 sbin]$ hdfs haadmin -getServiceState hd-node2
standby
[hadoop@hd-node1 sbin]

可以看到两个互为 HA 的 NameNode 节点确实均处于 standby 状态。

我们可以使用 hdfs haadmin -transitionToActive --forcemanual 节点名称 来强行将某个节点设置为 active 状态,不过这里要注意, hdfs 也会提示这个操作有可能导致脑裂,所以请三思而用。

[hadoop@hd-node1 sbin]$ hdfs haadmin -transitionToActive --forcemanual hd-node1
You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

You may abort safely by answering 'n' or hitting ^C now.

Are you sure you want to continue? (Y or N) Y
19/03/30 00:24:50 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hd-node2/192.190.20.14:9000
19/03/30 00:24:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/30 00:24:50 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at hd-node1/192.190.20.13:9000
[hadoop@hd-node1 sbin]$ 

这个时候我们再查看 hd-node1 状态,已经转换为 active 了。

[hadoop@hd-node1 sbin]$ hdfs haadmin -getServiceState hd-node1
active
[hadoop@hd-node1 sbin]$

执行 hive,工作正常,故障解决。

如果觉得这对你有用,请随意赞赏,给与作者支持
评论 0
最新评论