티스토리 뷰
# emr.sh
#!/bin/sh
TAG=$1
ENV=$2
CLASS=$3
ARGS=$4
SRC=s3://src/${ENV}/jar/batch/batch.jar
LOG=s3://log/${ENV}/batch/
SUBNET_ID=subnet-0e3653577617c98a3
echo $( \
aws emr create-cluster \
\
--auto-scaling-role EMR_AutoScaling_DefaultRole \
--instance-groups file://./batch/static/json/instance.json \
\
--name ${CLASS} \
--release-label emr-6.7.0 \
--auto-terminate \
--applications Name=Spark \
--use-default-roles \
--ec2-attributes SubnetId=${SUBNET_ID} \
--tags \
Env=${ENV} \
Name=${TAG} \
--log-uri ${LOG} \
--configurations file://./batch/static/json/spark.json \
\
--steps "Type=SPARK,Name=${CLASS},ActionOnFailure=TERMINATE_CLUSTER,Args=[--class,${CLASS},${SRC},${ARGS}]" \
)
# spark.json
[
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-11-amazon-corretto.x86_64"
}
}
]
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.executorEnv.JAVA_HOME": "/usr/lib/jvm/java-11-amazon-corretto.x86_64",
"spark.sql.broadcastTimeout": "3600",
"spark.default.parallelism": "200",
"spark.yarn.am.memory": "2g",
"spark.executor.extraJavaOptions": "-XX:+IgnoreUnrecognizedVMOptions",
"spark.rpc.askTimeout": "600s",
"spark.sql.shuffle.partitions": "360",
"spark.sql.cbo.enabled": "True",
"spark.sql.adaptive.enabled": "True",
"spark.sql.adaptive.coalescePartitions.enabled": "True",
"spark.sql.adaptive.advisoryPartitionSizeInBytes": "128m",
"spark.dynamicAllocation.enabled": "True",
"spark.dynamicAllocation.initialExecutors": "1",
"spark.dynamicAllocation.minExecutors": "1",
"spark.dynamicAllocation.maxExecutors": "300",
"spark.dynamicAllocation.executorAllocationRatio": "1",
"spark.sql.catalogImplementation": "hive",
"spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryoserializer.buffer.max": "1024m",
"spark.sql.autoBroadcastJoinThreshold": "60mb"
}
},
{
"Classification": "hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
# instance.json
[
{
"InstanceCount": 1,
"Name": "MASTER",
"InstanceGroupType": "MASTER",
"InstanceType": "r4.xlarge",
"BidPrice": "0.1"
},
{
"InstanceCount": 1,
"Name": "CORE",
"InstanceGroupType": "CORE",
"InstanceType": "r4.xlarge",
"BidPrice": "0.1",
"AutoScalingPolicy": {
"Constraints": {
"MinCapacity": 1,
"MaxCapacity": 100
},
"Rules": [
{
"Name": "Default-scale-out",
"Description": "Replicates the default scale-out rule in the console for YARN memory.",
"Action": {
"SimpleScalingPolicyConfiguration": {
"AdjustmentType": "CHANGE_IN_CAPACITY",
"ScalingAdjustment": 1,
"CoolDown": 300
}
},
"Trigger": {
"CloudWatchAlarmDefinition": {
"ComparisonOperator": "LESS_THAN",
"EvaluationPeriods": 1,
"MetricName": "YARNMemoryAvailablePercentage",
"Namespace": "AWS/ElasticMapReduce",
"Period": 300,
"Threshold": 15,
"Statistic": "AVERAGE",
"Unit": "PERCENT",
"Dimensions": [
{
"Key": "JobFlowId",
"Value": "${emr.clusterId}"
}
]
}
}
}
]
}
}
]
FROM python:3.10.6 as app
ENV TZ=Asia/Seoul
RUN pip install awscli
WORKDIR /app
COPY static/ /app/static/
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
export AWS_DEFAULT_REGION="ap-northeast-2"
TAG="test-emr"
ENV=prd
CLASS=test
output=$( \
./batch/static/sh/emr.sh \
${TAG} \
\
${ENV} \
${CLASS} \
"${TAG},${AWS_ACCESS_KEY_ID},${AWS_SECRET_ACCESS_KEY},${AWS_DEFAULT_REGION},${BUCKET_NAME},${S3_PATH},${MAIL},${TSV_FIELDS},\"${QUERY}\"" \
)
curl -L https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 -o ./jq
chmod a+x ./jq
cluster=$(echo ${output} | ./jq ".ClusterId")
cluster=$(echo ${cluster} | sed -e 's/"//g')
echo ${cluster}
AWS CLI로 EMR Spark Cluster 띄우기
커맨드 기록용이다. 윈도우라서 라인 구분자를 ^로 썼다. aws emr create-cluster ^ --name "emr-spark-cluster" ^ --release-label emr-5.11.1 ^ --instance-groups ^ InstanceGroupType=MASTER,InstanceType=m4...
huzz.tistory.com
Spark 구성 - Amazon EMR
이spark.decommissioning.timeout.threshold스팟 인스턴스를 사용할 때 Spark 복원력을 높일 수 있도록 Amazon EMR 릴리스 버전 5.11.0에 설정이 추가되었습니다. 이전 릴리스 버전에서는 노드가 스팟 인스턴스를
docs.aws.amazon.com
https://stackoverflow.com/questions/70886684/how-to-use-java-runtime-11-in-emr-cluster-aws
How to use java runtime 11 in EMR cluster AWS
I'm creating a cluter in EMR aws and when spark runs my application I'm getting error below: Exception in thread "main" java.lang.UnsupportedClassVersionError: com/example/demodriver/
stackoverflow.com
Facing error while trying to create transient cluster on AWS emr to run Python script
I am new to aws and trying to create a transient cluster on AWS emr to run a Python script. I just want to run the python script that will process the file and auto terminate the cluster post compl...
stackoverflow.com
https://docs.aws.amazon.com/ko_kr/emr/latest/ReleaseGuide/emr-configure-apps.html
애플리케이션 구성 - Amazon EMR
Amazon EMR 설명 및 나열 API 작업은 사용자 지정 및 구성 가능한 설정을 내보내며 이는 일반 텍스트로 Amazon EMR 작업 흐름의 일부로 사용됩니다. 이러한 설정에 암호와 같은 민감한 정보를 삽입하지
docs.aws.amazon.com
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html
Configure Spark - Amazon EMR
The spark.decommissioning.timeout.threshold setting was added in Amazon EMR release version 5.11.0 to improve Spark resiliency when you use Spot instances. In earlier release versions, when a node uses a Spot instance, and the instance is terminated becaus
docs.aws.amazon.com
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html
Using automatic scaling with a custom policy for instance groups - Amazon EMR
When you create a cluster that has an automatic scaling policy, you must use the --auto-scaling-role MyAutoScalingRole command to specify the IAM role for automatic scaling. The default role is EMR_AutoScaling_DefaultRole and can be created with the create
docs.aws.amazon.com
GitHub - WorksApplications/ansible_aws_emr: Unofficial Ansible module for Amazon EMR
Unofficial Ansible module for Amazon EMR. Contribute to WorksApplications/ansible_aws_emr development by creating an account on GitHub.
github.com
https://gist.github.com/tmusabbir/34fdab6bd30fd87bcdd69cf03f54090c
AWS CLI command to create EMR cluster with default auto-scaling task group
AWS CLI command to create EMR cluster with default auto-scaling task group - create-spark-cluster.sh
gist.github.com
'공부' 카테고리의 다른 글
[python] sort dict by key (0) | 2022.09.10 |
---|---|
`orc` vs `parquet` vs `avro` (0) | 2022.09.10 |
[flask] trouble shooting (requirements.txt) (0) | 2022.09.10 |
[Jsoup] GET & POST crawling (0) | 2022.09.10 |
[jenkins] pipeline groovy use secret `printenv NAME` (0) | 2022.09.08 |
- Total
- Today
- Yesterday
- 팔로워 수 세기
- 연애학개론
- 테슬라 레퍼럴
- 테슬라 리퍼럴 코드
- 책그림
- 테슬라 크레딧 사용
- 테슬라 레퍼럴 코드 확인
- Kluge
- 할인
- 어떻게 능력을 보여줄 것인가?
- 테슬라 리퍼럴 코드 혜택
- 모델y
- Bot
- 테슬라
- 테슬라 레퍼럴 적용 확인
- 테슬라 리퍼럴 코드 생성
- 레퍼럴
- COUNT
- wlw
- 테슬라 추천
- 클루지
- 김달
- 모델 Y 레퍼럴
- 인스타그램
- 메디파크 내과 전문의 의학박사 김영수
- 개리마커스
- 유투브
- follower
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |