标签:missing row foreach kafka fetch 有助于 continue exe segment
java.lang.OutOfMemoryError # on yarn org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl - Container [<edited>] is running beyond physical memory limits. Current usage: 18.0 GB of 18 GB physical memory used; 19.4 GB of 37.8 GB virtual memory used. Killing container. Container exit code: 137 Task process exit with nonzero status of 137.
除了 exit code 137
外其它OOM提示都很明显,yarn container 137退出码按照SO的大神说:“Exit code 137 is a typical sign of the infamous OOM killer.”
解决方法:
Ref: hadoop-streaming-job-failure-task-process-exit-with-nonzero-status-of-137
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 org.apache.spark.shuffle.FetchFailedException: Failed to connect to ip-xxxxxxxx org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer java.io.FileNotFoundException of shuffle files in HDFS/S3 # if open wal org.apache.spark.SparkException: Could not read data from write ahead log record FileBasedWriteAheadLogSegment
以上皆为可能的报错,大致原因是shuffle后的executor读取数据超出了内存限制,然后挂了并且清除了相关的中间临时文件,其他的executor在尝试与其进行数据沟通时,要么executor丢失,要么无法读取其写出的shuffle文件等。
如果你的单个shuffle block超过2g,然后又报类似以上列举的错误,你可能遭遇了以下 issue :
解决办法:
spark.reducer.maxMbInFlight
涉及源码参见 Spark技术内幕: Shuffle详解(二) ,因为shuffle fetch阶段是边fetch边处理的,所以适当调小该值有助于改善shuffle阶段的内存占用。 shuffle部分参数说明Ref:
what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept
fetchfailedexception-or-metadatafetchfailedexception-when-processing-big-data-set
java.lang.Exception: Could not compute split, block input-0-1412705397200 not found
Spark Streaming 中此错误产生原因是streaming产生了堆积,超过了receiver可承受的内存大小,因此旧的未执行的job被删除来去接收新的job。
解决方法:
spark.streaming.backpressure.enabled=true
, spark.streaming.backpressure.pid.minrate=0.001
Ref:
There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 4088 bytes for AllocateHeap An error report file with more information is saved as:
其实就是没有足够的物理内存去启动这个JVM了,比如你JVM申请5g,实际只剩下4g可用的物理内存,就会报错,然后jvm启动失败进程退出。
解决方法:
问题:这里的可用内存包不包括操作系统cache的内存呢? (free -m
可查看OS的free和cached内存)
Ref : insufficient-memory-for-the-java-runtime-environment-message-in-eclipse
标签:missing row foreach kafka fetch 有助于 continue exe segment
原文地址:http://www.cnblogs.com/lhfcws/p/6282797.html