배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

[HBase] region 수를 작게 해야 하는 이유

1. MSLAB(MemStore-local allocation buffer) Memstore 하나당 2MB Heap 공간 필요 2. 많은 수의 Compaction https://starblood.tistory.com/entry/HBase-%EC%97%90%EC%84%9C-region-%EC%88%98%EB%A5%BC-%EC%9E%91%EA%B2%8C-%ED%95%B4%EC%95%BC-%ED%95%98%EB%8A%94-%EC%9D%B4%EC%9C%A0 HBase 에서 region 수를 작게 해야 하는 이유 Region 수는 보통 Region Server 당 100개 정도가 적당하다. HBase 의 region 수를 작게 하는대에는 아래와 같은 이유가 있다. 1. MSLAB (MemStore-local al..

공부 2022. 9. 11. 18:58

[spark] dropTempView

fun drop(table: String) { val spark = SparkUtil.instance(CommonUtil.methodName()) spark.catalog().dropTempView(table) } object SparkUtil { fun instance(name: String? = ""): SparkSession { return make(name!!) } private fun make(name: String): SparkSession { return config( SparkSession.builder().appName(name) ).enableHiveSupport().orCreate } private fun config(builder: SparkSession.Builder): Spark..

공부 2022. 9. 10. 10:54

[python] sort dict by key

You need to iterate over steps.items(), because an iteration over dict only returns its keys. >>> x = sorted(steps.items()) >>> x [(1, 'value1'), (2, 'value3'), (5, 'value2')] Iterate over sorted keys: >>> for key in sorted(steps): ... # use steps[keys] to get the value https://stackoverflow.com/questions/16710112/python-iterate-over-dictionary-sorted-by-key python: iterate over dictionary sorte..

공부 2022. 9. 10. 10:48

`orc` vs `parquet` vs `avro`

https://qkqhxla1.tistory.com/1136 parquet vs orc vs avro (big data file format ) 1. www.datanami.com/2018/05/16/big-data-file-formats-demystified/ 공통점. 3개 타입은 전부 하둡에 저장하는데에 최적화되어있다. orc, parquet, avro 3개 전부 기계가 읽을수 있는 바이너리 포맷이다. orc, p.. qkqhxla1.tistory.com https://www.quora.com/Why-is-parquet-best-for-Spark-and-not-ORC-although-both-are-columnar-based-file-formats Why is parquet best for Spark..

공부 2022. 9. 10. 10:43

[aws] emr.sh

# emr.sh #!/bin/sh TAG=$1 ENV=$2 CLASS=$3 ARGS=$4 SRC=s3://src/${ENV}/jar/batch/batch.jar LOG=s3://log/${ENV}/batch/ SUBNET_ID=subnet-0e3653577617c98a3 echo $( \ aws emr create-cluster \ \ --auto-scaling-role EMR_AutoScaling_DefaultRole \ --instance-groups file://./batch/static/json/instance.json \ \ --name ${CLASS} \ --release-label emr-6.7.0 \ --auto-terminate \ --applications Name=Spark \ --u..

공부 2022. 9. 10. 10:38

[flask] trouble shooting (requirements.txt)

ImportError: cannot import name 'json' from 'itsdangerous' (/Users/seunggab.kim/seunggabi/workspace/ads/ads-bot/venv/lib/python3.10/site-packages/itsdangerous/__init__.py) https://velog.io/@___pepper/Flask-ImportError-cannot-import-name-json-from-itsdangerous [Flask] ImportError: cannot import name 'json' from 'itsdangerous' itsdangerous import error in flask velog.io ImportError: cannot import ..

공부 2022. 9. 10. 10:31

[Jsoup] GET & POST crawling

Example private fun token(email: String, password: String): String { val url = "https://asdf.com/tokens" val headers = mapOf( "Accept" to "application/json", "Content-Type" to "application/json", "Authorization" to "null" ) val json = mapOf( "auth_type" to "CREDENTIAL", "credential_type_payload" to mapOf( "email" to email, "password" to password, ), ).toJson() return CrawlUtil.post(url, headers,..

공부 2022. 9. 10. 10:26

[jenkins] pipeline groovy use secret `printenv NAME`

pipeline { agent { kubernetes { inheritFrom 'seunggabi-batch' defaultContainer 'seunggabi-batch' } } environment { COUNTRY = "kr" ENV = "prod" CLASS = "seunggabi.batch.job.TestJob" } stages { stage('Run Job') { steps { script { ARGS = sorted(params).collect { /$it.value/ } join "," } sh "/app/static/sh/emr.sh 1 20 ${COUNTRY} ${ENV} ${CLASS} \$(printenv MAIL_DOMAIN),\$(printenv MAIL_ID),\$(printe..

공부 2022. 9. 8. 23:03

[jenkins] pipeline groovy params sort by key

pipeline { agent { kubernetes { inheritFrom 'seunggabi-batch' defaultContainer 'seunggabi-batch' } } environment { COUNTRY = "kr" ENV = "prod" CLASS = "seunggabi.batch.job.SparkSubmitJob" } stages { stage('Run Job') { steps { script { ARGS = sorted(params).collect { /$it.value/ } join "," } sh "/app/static/sh/emr.sh 1 20 ${COUNTRY} ${ENV} ${CLASS} \"${ARGS}\"" } } } } @NonCPS def sorted(def m){ ..

공부 2022. 9. 8. 12:28

debugExecution failed for task ':seunggabi:bootJarMainClassName'.> Unable to find a single main class from the following candidates [seunggabi.job.SparkSubmit, seunggabi.job.SparkSubmit2]

bootJarMainClassName { enabled = false }

공부 2022. 9. 7. 10:43

[docker] bridge vs host

https://docs.docker.com/network/bridge/ Use bridge networks docs.docker.com https://docs.docker.com/network/host/ Use host networking docs.docker.com FeatureBridgeHost Driver The Bridge network is provided by the Bridge driver The host network is provided by the host driver. Default bridge is the default network and provided by a bridge driver Host does not default. Connectivity The bridge drive..

공부 2022. 9. 5. 10:22

[spark] RDD, DataFrame

https://alex-blog.tistory.com/entry/Spark-%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%98%EB%B0%8D-RDD-DataFrame Spark 프로그래밍 - RDD, DataFrame Spark는 Mapreduce의 대체자 MapReduce의 경우 Disk에서 매번 데이터를 처음부터 읽어야한다. (-> RDD는 데이터를 처음부터 읽을 필요가 없게 만들어준다.) Spark는 데이터를 메모리에 올려서 연산 방식 데 alex-blog.tistory.com

공부 2022. 9. 3. 16:56

[spark] lazy evalution

1. 스파크(SPARK)의 연산 방식은 lazy evaluation으로 수행된다. Lazy evaluation(굳이 번역해 보자면 느긋한 연산 정도 되겠다)을 사용함으로써 action이 시작되는 시점에 트랜스포메이션(transformation)끼리의 연계를 파악해 실행 계획의 최적화가 가능해진다. 사용자가 입력한 변환 연산들을 즉시 수행하지 않고 모아뒀다가 가장 최적의 수행 방법을 찾아 처리하는 장점을 가진다. 여기서 말하는 최적화란 대부분 지역성(locality)에 관한 것이다. 예를 들어 물건을 사오는 심부름을 시킬 때 A상점에서 파는 물건과 B상점에서 파는 물건을 따로따로 여러 번사오게 하는 것보다 필요한 물건을 한꺼번에 주문해서 한 번 방문했을 때 필요한 물건을 한 번에 사는 것이 효율적이기 떄문..

공부 2022. 9. 3. 16:47

[leetcode] 309. Best Time to Buy and Sell Stock with Cooldown

class Solution { public int maxProfit(int[] prices) { int buy = Integer.MIN_VALUE; int cooldown = 0; int sell = 0; for (int p : prices) { buy = Math.max(buy, cooldown - p); cooldown = Math.max(cooldown, sell); sell = Math.max(sell, buy + p); } return sell; } } https://gonewbie.github.io/2020/02/26/daily-algorithm-best-time-to-buy-and-sell-stock/ daily-algorithm-best-time-to-buy-and-sell-stock · ..

공부 2022. 9. 1. 15:23

[github] Why am I still getting "You've used 100% of included services for GitHub Storage (GitHub Actions and Packages)" after deleting all Artifacts?

' https://github.com/actions/upload-artifact/issues/161

공부 2022. 8. 31. 21:59

[docker] Dockerfile (sh file)

COPY . /app/ RUN ./bootstrap.sh https://stackoverflow.com/questions/34549859/run-a-script-in-dockerfile

공부 2022. 8. 31. 16:20

[aws] emr create-cluster --configurations file://

aws emr create-cluster --configurations file://hiveConfiguration.json https://stackoverflow.com/questions/39049958/passing-hive-configuration-with-aws-emr-cli Passing hive configuration with aws emr cli I am following doc: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-dev-create-metastore-outside.html and trying to create emr cluster using the awscli==1.10.38 . I use the st..

공부 2022. 8. 31. 16:15

[PyCharm] Unresolved reference ...

Directory contextMenu -> Mark Directory as -> Sources Root https://dev.plusblog.co.kr/32 PyCharm "Unresolved reference xxx" 에러 해결 파이썬 개발 환경인 PyCharm을 이용해 파이썬 프로그램을 개발할 때 "Unresolved reference xxx" 에러를 만나게 되었다. 분명 문제는 없어보이는데 PyCharm 환경에서는 에러를 내고 있었다. Module1 이라는 파 dev.plusblog.co.kr

공부/툴 2022. 8. 31. 16:08

[architecture] Lambda (Batch, Speed, Serving)

https://en.wikipedia.org/wiki/Lambda_architecture Lambda architecture - Wikipedia Flow of data through the processing and serving layers of a generic lambda architecture Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. Thi en.wikipedia.org

공부 2022. 8. 31. 16:03

[github] actions trigger push tags

on: push: tags: - 'v*.*.*' jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set env run: echo "RELEASE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV - name: Test run: | echo $RELEASE_VERSION echo ${{ env.RELEASE_VERSION }} on: push: tags: - 'v*.*.*' jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set output id: vars run: echo ::set-outpu..

공부 2022. 8. 31. 15:58

[java] array to list

Integer[] numbers = new Integer[] { 1, 2, 3 }; List list = Arrays.asList(numbers); https://stackoverflow.com/questions/2607289/converting-array-to-list-in-java Converting array to list in Java How do I convert an array to a list in Java? I used the Arrays.asList() but the behavior (and signature) somehow changed from Java SE 1.4.2 (docs now in archive) to 8 and most snippets I found on t... stac..

공부 2022. 8. 31. 15:54

[scala] javaMap to scalaMap

import scala.collection.JavaConverters._ // asScala creates mutable Scala Map // toMap after asScala creates immutable Map val scalaImmutableMap = javaMap.asScala.toMap https://stackoverflow.com/questions/38235908/converting-a-java-map-into-a-scala-immutable-map-in-java-code Converting a Java map into a Scala immutable map in Java code I have a Java class that needs to stay in Java for a variety..

공부 2022. 8. 31. 15:50

[Scala] foreach Map

def config(map: Map[String, Any]): Builder = synchronized { map.foreach { kv: (String, Any) => { options += kv._1 -> kv._2.toString } } this } https://github.com/apache/spark/pull/37478 [SPARK-40163][SQL] feat: SparkSession.config(Map) by seunggabi · Pull Request #37478 · apache/spark https://issues.apache.org/jira/browse/SPARK-40163 What changes were proposed in this pull request? as-is private..

공부 2022. 8. 31. 15:42

[MySQL] INSERT ... ON DUPLICATE KEY UPDATE ...

INSERT INTO lee(exp_id, created_by, location, animal, starttime, endtime, entct, inact, inadur, inadist, smlct, smldur, smldist, larct, lardur, lardist, emptyct, emptydur) SELECT id, uid, t.location, t.animal, t.starttime, t.endtime, t.entct, t.inact, t.inadur, t.inadist, t.smlct, t.smldur, t.smldist, t.larct, t.lardur, t.lardist, t.emptyct, t.emptydur FROM tmp t WHERE uid=x ON DUPLICATE KEY UPD..

공부 2022. 8. 31. 15:37

[python] if some_string: (some_string is empty)

if some_string: https://stackoverflow.com/questions/9573244/how-to-check-if-the-string-is-empty How to check if the string is empty? Does Python have something like an empty string variable where you can do: if myString == string.empty: Regardless, what's the most elegant way to check for empty string values? I find hard codin... stackoverflow.com

공부 2022. 8. 31. 15:33

[python] slack chat & files

import logging from slack_sdk import WebClient from slack_sdk.errors import SlackApiError logger = logging.getLogger(__name__) def chat(token, channel, text): client = WebClient(token=token) try: client.chat_postMessage(channel=channel, text=text) except SlackApiError as e: logger.error(f"[Error] posting message: {e}") return { "channel": channel, "text": text, } def files(token, channel, file):..

공부 2022. 8. 31. 15:29

[aws] s3 cp github action (workflow)

name: deploy-seunggabi-batch-prod on: push: branches: - main paths: - seunggabi-batch/** - seunggabi-core/** workflow_dispatch: env: MODULE: seunggabi-batch APP_NAME: seunggabi-batch APP_PROFILE: prod AWS_ACCESS_KEY_ID: ${{ secrets.seunggabi_AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.seunggabi_AWS_SECRET_ACCESS_KEY }} jobs: build: runs-on: [ self-hosted, Linux, X64, mortar-runner ] ..

공부 2022. 8. 31. 15:20

[MySQL] id 컬럼 데이터타입 INT? BIGINT?

https://dogleg.co.kr/?p=163 MySQL id컬럼 데이터타입 INT? BIGINT? – 괴발개발 개발일기 최신 버전의 Laravel 프레임워크와 WordPress 프레임워크를 설치하고 database를 생성하면 users(WordPress에서는 ‘{$table_prefix}users’)테이블이 설치가 된다. 그리고 users테이블의 id컬럼은 데이터형datatype dogleg.co.kr `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, INT형과 BIGINT형 중에서 어느 것을 id컬럼에 사용하는 것이 좋을지에 대한 결론을 내리자면, 딱 정답은 없겠지만, 대량의 데이터를 수정하는 일은 이 처럼 매우 시간이 오래 걸리는 일이므로, 서버운영의 효율..

공부 2022. 8. 31. 14:32

[golang] env -> variable

package main import ( "os" "strings" "text/template" ) func Format(s string, v interface{}) string { t, b := new(template.Template), new(strings.Builder) template.Must(t.Parse(s)).Execute(b, v) result := b.String() if result == "" { return s } return result } func KeyValue(item string) (key, val string) { splits := strings.Split(item, "=") key = splits[0] val = splits[1] return } func Env() map[..

공부 2022. 8. 24. 11:30

[aws] ECR docker login & pull

aws ecr get-login-password | docker login --username AWS --password-stdin {region}.dkr.ecr.ap-northeast-2.amazonaws.com/{region}.dkr.ecr.ap-northeast-2.amazonaws.com docker pull {region}.dkr.ecr.ap-northeast-2.amazonaws.com/{a}/{b}:0.0.1 https://stackoverflow.com/questions/69957693/docker-login-username-aws-password-stdin-https-accountnumber-dkr-ecr-re docker login --username AWS --password-stdi..

공부 2022. 8. 23. 12:30

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

티스토리툴바