Hadoop mapreduce doesn’t use copyied file

Hadoop version: 2.10.2
JDK version: 1.8.0_291

I’m trying to start map_reduce using python.
I’ve configured hadoop on new hduser_.

After running this command in terminal:

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.10.2.jar \
-input orders.txt \
-output ccp-output \
-file /tmp/cross/pairs/mapper.py \
-mapper mapper.py \
-file /tmp/cross/pairs/reducer.py \
-reducer reducer.py

I’ve got this exception:
Log that everything start’s fine:
enter image description here
After that I get this exceptions.
Exception log:

23/10/18 10:41:22 INFO mapreduce.Job: Task Id : attempt_1697637865786_0004_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:113)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:456)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:113)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:221)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "/home/hduser_/hdfs/hadoop-tmp/nm-local-dir/usercache/hduser_/appcache/application_1697637865786_0004/container_1697637865786_0004_01_000002/./mapper.py": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:208)
        ... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

and resulting log is:

23/10/18 10:41:40 INFO mapreduce.Job: Counters: 14
        Job Counters 
                Failed map tasks=7
                Killed map tasks=1
                Killed reduce tasks=1
                Launched map tasks=8
                Other local map tasks=6
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=25675
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=25675
                Total vcore-milliseconds taken by all map tasks=25675
                Total megabyte-milliseconds taken by all map tasks=26291200
        Map-Reduce Framework
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
23/10/18 10:41:40 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

Interesting part is that both mapper and reducer are actually stored in appcahe in directories with numbers 10, 11 and e.t.c

python mapper:

#!/usr/bin/python3
"""mapper.py"""

import sys

for line in sys.stdin:
    items = line.strip().split()
    if len(items) >= 2:
        for i in items:
            for j in items:
                if i != j:
                    print(f"{i} {j}\t1")

reducer.py:

#!/usr/bin/python3
"""reducer.py"""

import sys

(lastKey, full_sum) = (None, 0)
keys_list = set()

for line in sys.stdin:
    (key, value) = line.strip().split("\t")

    if lastKey and lastKey != key:
        print(lastKey + '\t' + str(full_sum))
        (lastKey, full_sum) = (key, int(value))
    else:
        (lastKey, full_sum) = (key, full_sum + int(value))

if lastKey:
    print(lastKey + '\t' + str(full_sum))

core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser_/hdfs/hadoop-tmp</value>
</property>
</configuration>

Leave a Comment