본문 바로가기

BigData/Spark

Spark Memory Tuning Case-Study

by Tomining 2024. 4. 14.

Spark 기본 구조

Spark Memory

JVM 내부

Reserved Memory
Spark Memory
- Execution Memory (operation)
- Storage Memory (cache): RDD Persistance

JVM 외부

OffHeap Memory
External Process Memory

5GB 기준 메모리 영역 예제

참고: https://community.cloudera.com/t5/Community-Articles/Spark-Memory-Management/ta-p/317794

Q) 빠르다고 해서 Spark 를 사용하는데, 느려요~???

Memory 는 충분한가?
- 무한정 늘릴 순 없다
- YARN (Resource Manager) 적절히 분배해 주는가? => Spark Properties
정해진 메모리를 효율적으로 사용하고 있는가?
- spark.executor.memory 늘려준다
- spark.executor.cores 조정
  - (얼마가 적당할까?) 참고: executor 개수 정히가
  - 많으면 좋을까? 적은게 좋을까? => 왕도가 없음. 적절한 수치를 찾아야 함.
- 효율적 가이드 (케이스마다 다르니 꼭 테스트 할 것!)
  - spark.executor.memory >= 4G
  - 1 < spark.executor.cores <= 5

Q) 이게 다 인가???

Partition 수 조정 (Shuffle Partition)
- spark.sql.files.maxPartitionBytes
- coalesce vs repartition
  - repartition: full shuffle. 데이터 균등 분배. 파티션 수 늘리기 가능. 특정 컬럼 기준으로 파티션 가능
  - coalesce: partial shuffle. 데이터 skew. 파티션 수 늘리기 불가. 컬럼 기준으로 파티션 불가.
    - partial shuffle이라 보통 더 빠르지만, 불균형하게 분배된 데이터에서는 더 느릴 수 있음
spark.memory.fraction or spark.memory.storageFraction
Spill 방지
- file 을 나눠서 처리

YARN

Spark Properties

참고

(인프런 강의) Spark Memory Allocation & Management
아파치 스파크의 메모리 관리에 대해서
(Spark on YARN) yarn container, spark core, executor 개수 Memory 용량 계산법 및 최적화
Spark tunnin, job 에 따른 최적의 partition 크기, 개수 조정하기
Spark Performance Tuning & Best Practices
Spark Performance Tuning: Spill
- Spill 대응
  - If the spill is due to a skew problem, solve the skew first!
  - Set the cluster with more memory per worker (increase worker size).
  - Manage spark.sql.shuffle.partitions to reduce file size per partition.
  - Perform .repartitioning() in transformation code explicitly.
  - Set spark.sql.files.maxPartitionBytes to fit your nature of data.

저작자표시 비영리

'BigData > Spark' 카테고리의 다른 글

Spark Dataframe 에서 특정 컬럼 타입 변경 (0)	2023.10.27
Spark Streaming Resiliency(자동복구) (2)	2016.05.15
Spark 2.0 Technical Preview (0)	2016.05.15
Learning Spark Chapter. 10 Spark Streaming (0)	2015.08.20
Learning Spark Chapter. 9 Spark SQL (0)	2015.07.31

티스토리툴바