存储层由hdfs迁移到minio,基于hadoop的distcp
做数据迁移
依赖
更新aws sdk版本(option)1
2
3
4
5
6
7# 删除老版本的aws包
`find /opt/cdh -name '*aws*.jar' | grep hadoop | xargs -n1 rm`
# 下载aws依赖
cd /opt/cdh/lib/hadoop
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar
配置
core-site.xml新增配置1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53<property>
<name>fs.s3a.access.key</name>
<value>DYaDwXsj8VRtWYPSbr7A</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>z7HAEhdyseNX9AVyzDLAJzEjZChJsnAf1f7VehE</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://10.199.150.160:32030</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.multipart.size</name>
<value>104857600</value>
</property>
<property>
<name>fs.s3a.multipart.threshold</name>
<value>268435456</value>
</property>
<property>
<name>fs.s3a.fast.buffer.size</name>
<value>1048576</value>
</property>
<property>
<name>fs.s3a.threads.core</name>
<value>15</value>
</property>
<property>
<name>fs.s3a.threads.max</name>
<value>256</value>
</property>
<property>
<name>fs.s3a.block.size</name>
<value>33554432</value>
</property>
hadoop distcp
hdfs复制到s3a:hadoop distcp hdfs://ha/user/geosmart/spark s3a://bucket/spark
1 | 2021-12-02 06:09:11,482 INFO [main] tools.OptionsParser (OptionsParser.java:parseBlocksPerChunk(205)) - parseChunkSize: blocksperchunk false |
问题
远程调试排查问题
/bin/hdfs中添加agentlib远程调试排查问题
1
2DEBUG_OPTS=" -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=19999 "
exec "$JAVA" $DEBUG_OPTS -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"idea中引用hdfs的相关jar包,添加remote jvm打断点调试
1
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=19999
Name or service not known
java.net.UnknownHostException: ThinkT14: ThinkT14: Name or service not known
注意修改host为ip 域名
,如10.199.121.12 ThinkT14