Hadoop version: 2.10.2
JDK version: 1.8.0_291
I’m trying to start map_reduce using python.
I’ve configured hadoop on new hduser_.
After running this command in terminal:
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.10.2.jar \
-input orders.txt \
-output ccp-output \
-file /tmp/cross/pairs/mapper.py \
-mapper mapper.py \
-file /tmp/cross/pairs/reducer.py \
-reducer reducer.py
I’ve got this exception:
Log that everything start’s fine:
After that I get this exceptions.
Exception log:
23/10/18 10:41:22 INFO mapreduce.Job: Task Id : attempt_1697637865786_0004_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:113)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:113)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:221)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/home/hduser_/hdfs/hadoop-tmp/nm-local-dir/usercache/hduser_/appcache/application_1697637865786_0004/container_1697637865786_0004_01_000002/./mapper.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:208)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
and resulting log is:
23/10/18 10:41:40 INFO mapreduce.Job: Counters: 14
Job Counters
Failed map tasks=7
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=25675
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=25675
Total vcore-milliseconds taken by all map tasks=25675
Total megabyte-milliseconds taken by all map tasks=26291200
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
23/10/18 10:41:40 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
Interesting part is that both mapper and reducer are actually stored in appcahe in directories with numbers 10, 11 and e.t.c
python mapper:
#!/usr/bin/python3
"""mapper.py"""
import sys
for line in sys.stdin:
items = line.strip().split()
if len(items) >= 2:
for i in items:
for j in items:
if i != j:
print(f"{i} {j}\t1")
reducer.py:
#!/usr/bin/python3
"""reducer.py"""
import sys
(lastKey, full_sum) = (None, 0)
keys_list = set()
for line in sys.stdin:
(key, value) = line.strip().split("\t")
if lastKey and lastKey != key:
print(lastKey + '\t' + str(full_sum))
(lastKey, full_sum) = (key, int(value))
else:
(lastKey, full_sum) = (key, full_sum + int(value))
if lastKey:
print(lastKey + '\t' + str(full_sum))
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser_/hdfs/hadoop-tmp</value>
</property>
</configuration>