배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

[DBT] basic

https://velog.io/@hsh/DBT-Data-Build-Tool DBT: Data Build Tool - 일종의 체계적인 view 시스템 - `ELT`: Extract→Load→`Transform` (NOT `ETL`) velog.io https://kgw7401.tistory.com/72 dbt 꼭 써야할까? dbt 정의/사용이유/필요성 🔎dbt를 써야할까? 데이터 엔지니어링 프로젝트를 진행하면서 dbt라는 도구를 알게 되었다. 대충 파이프라인 효율적으로 관리해주는 도구라는 이야기를 듣고, 이번 프로젝트에 한 번 사용해봐야 kgw7401.tistory.com https://towardsdatascience.com/aws-athena-dbt-integration-4e1dce0d97fc AWS ..

공부 2023. 3. 30. 20:21

[python] boto3_client

import boto3 from botocore.config import Config _MAX_ATTEMPTS = 15 def boto3_client(type): config = Config( retries={ "max_attempts": _MAX_ATTEMPTS, "mode": "standard" } ) return boto3.client(type, config=config)

공부 2023. 3. 17. 15:37

[hive] partition reload

d=db t=table p=partition ALTER TABLE ${d}.${t} SET TBLPROPERTIES('EXTERNAL'='TRUE'); ALTER TABLE ${d}.${t} DROP PARTITION (${p} ''); MSCK REPAIR TABLE ${d}.${t}; https://118k.tistory.com/349 [하이브] 매니지드 테이블과 익스터널 테이블 변경하기 하이브의 테이블은 매니지드(MANAGED) 테이블과 익스터널(EXTERNAL) 테이블 타입이 존재한다. 매니지드 테이블은 테이블을 drop 하면 관리하는 파일도 삭제가 되고, 익스터널 테이블은 파일은 보관된 118k.tistory.com https://stackoverflow.com/questions/46307667..

공부 2023. 3. 16. 19:18

[windows] COPY /b headers.txt+data.txt result.txt

https://stackoverflow.com/questions/19750653/how-to-append-text-files-using-batch-files How to append text files using batch files How can I append file1 to file2, from a batch file? Text files and only using what is "standard" on windows. stackoverflow.com

공부 2023. 3. 16. 00:48

[sh] csv string count

#!/bin/bash input=$1 output=$2 rm $output n=0 while read line; do comma="${line//[^,]}" cnt="${#comma}" echo "Line No. ${n} : ${cnt}" >> $output n=$((n+1)) done < ${input} var="text,text,text,text" res="${var//[^,]}" echo "$res" echo "${#res}" ,,, 3 https://stackoverflow.com/questions/16679369/count-occurrences-of-a-char-in-a-string-using-bash Count occurrences of a char in a string using Bash I..

공부 2023. 3. 16. 00:45

[sh] sudo systemctl list-units

sudo systemctl list-units

공부 2023. 3. 14. 12:33

[jinja2] {{ asdf|join(", ") }}

https://stackoverflow.com/questions/2061439/string-concatenation-in-jinja String concatenation in Jinja I just want to loop through an existing list and make a comma delimited string out of it. Something like this: my_string = 'stuff, stuff, stuff, stuff' I already know about loop.last, I just need to stackoverflow.com

공부 2023. 3. 14. 02:34

[airflow] variables `{{ var.value.aa }}`

https://stackoverflow.com/questions/69048706/airflow-how-to-get-an-airflow-variable-inside-the-bash-command-in-bash-operato

공부 2023. 3. 14. 02:33

[spark] show(count, false); no_truncated=false

https://stackoverflow.com/questions/33742895/how-to-show-full-column-content-in-a-spark-dataframe

공부 2023. 3. 14. 02:32

[yarn] memory & core

yarn.nodemanager.resource.memory-mb yarn.nodemanager.resource.cpu-vcores yarn.scheduler.minimum-allocation-mb yarn.scheduler.maximum-allocation-mb yarn.scheduler.minimum-allocation-vcores yarn.scheduler.maximum-allocation-vcores https://wooono.tistory.com/145 [Spark] java.lang.IllegalArgumentException: Required executor memory (13312), overhead (2496 MB), and PySpark memory (0 MB) is a 우선 YARN R..

공부 2023. 3. 14. 02:27

진짜 부자동네 사는 부자들의 공통점

좋은 에너지를 주는 것들 2023. 3. 14. 02:15

[zeppelin] lifecyclemanager

https://jaemunbro.medium.com/zeppelin-%EB%8B%A4%EC%A4%91-interpreter-binding%EA%B3%BC-interpreter-timeout-ce7ad4c3312c [Zeppelin] 다중 Interpreter binding과 Interpreter Timeout 설정하기 EMR의 Spark Zeppelin을 운영하고 있는데 여러 사용자가 들어와서 Job을 수행하는 경우가 잦다. 이러한 Multi Tenant Zepplin을 운영하는데 조금더 필요한 설정들이 무엇이 있을까? jaemunbro.medium.com https://aws.amazon.com/ko/premiumsupport/knowledge-center/yarn-uses-resources-after..

공부 2023. 3. 12. 23:27

[spark] Spark throwing FileNotFoundException when overwriting dataframe on S3

https://stackoverflow.com/questions/37254681/spark-throwing-filenotfoundexception-when-overwriting-dataframe-on-s3 Spark throwing FileNotFoundException when overwriting dataframe on S3 I have partitioned parquet files stored on two locations on S3 in the same bucket: path1: s3n://bucket/a/ path2: s3n://bucket/b/ The data has the same structure. I want to read the files from the... stackoverflow...

공부 2023. 3. 12. 23:25

[sh] sleep 3 (3s); sleep 0.3 (0.3s)

https://devpouch.tistory.com/127 [bash] 리눅스 쉘 스크립트에서 sleep 함수 사용법 리눅스 bash 쉘스크립트의 sleep 명령어를 통해 프로그램 실행을 일시적으로 정지할 수 있다. sleep 명령어는 아래와 같이 사용할 수 있다. sleep 명령어 사용법 sleep 1 # 1초 일시 정지 sleep 1s # 1초 일시 devpouch.tistory.com

공부 2023. 3. 12. 23:23

[sh] python jq

pip3 install jq parse() { key=$1 python3 -c " import sys import jq import json input = json.load(sys.stdin) output = jq.compile('$key').input(input).all() if(isinstance(output, list)): output = ' '.join(output) print(output) " } name=$(aws emr describe-cluster --cluster-id $id | parse ".Cluster.Name") echo $name https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools?page=2&tab=..

공부 2023. 3. 12. 23:22

[spark] get yarn application id

fun id(): String { return make() .sparkContext() .applicationId() } https://knight76.tistory.com/entry/YARN%EC%97%90-%EB%B0%B0%ED%8F%AC%EB%90%9C-Spark-%EC%95%A0%ED%94%8C%EB%A6%AC%EC%BC%80%EC%9D%B4%EC%85%98%EC%9D%98-Application-ID-%EC%96%BB%EA%B8%B0 YARN에 배포된 Spark 애플리케이션의 Application ID 얻기 How to get applicationId of Spark application deployed to YARN in ... https://spark.apache.org/docs/2.3.0/a..

공부 2023. 3. 10. 04:12

[MySQL] ALTER TABLE table ADD COLUMN asdf INT(10) FIRST

ALTER TABLE EMP_DTLS MODIFY COLUMN EMP_ID INT(10) FIRST ALTER TABLE EMP_DTLS MODIFY COLUMN EMP_ID INT(10) AFTER id https://stackoverflow.com/questions/20179801/place-an-existing-column-at-first-position-in-mysql place an existing column at first position in mysql please tell me how to place an existing column(contained values) at first position in mysql. Suppose i have a table EMP_DTLS and there..

공부 2023. 3. 10. 04:02

[kotlin] reduce & fold

val numbers = emptyList() val sumFromTen = numbers.fold(10) { total, num -> total + num } println("folded: $sumFromTen") // folded: 10 val sum = numbers.reduce { total, num -> total + num } println("reduced: $sum") folded: 10 Empty collection can't be reduced. java.lang.UnsupportedOperationException: Empty collection can't be reduced. at kr.leocat.test.FoldTest.test(FoldTest.kt:35) ... https://b..

공부 2023. 3. 10. 03:59

[spark] dataframe equals `a.expect(b).count() == 0`

@Test fun toNull() { // given data class Person( val name: String?, val job: String?, val age: Int ) val spark = SparkUtil.make() val data = spark.createDataFrame( mutableListOf( Person("null", "a", 25), Person("Bob", "null", 30), Person("null", "null", 35) ), Person::class.java ).toDF() val expected = spark.createDataFrame( mutableListOf( Person(null, "a", 25), Person("Bob", null, 30), Person(n..

공부 2023. 3. 8. 12:16

[flutter] minSdkVersion Declared in library

https://json8.tistory.com/177 [안드로이드] uses-sdk:minSdkVersion declared in library 에러 해결 방법 원인 : Android SDK 11 버전에서 지원하지 않은 library 사용 수정 방법 : build.gradle minSdkVersion 변경 (appcompat-v7:26.1.0 경우 min SDK 14로 변경 필요) build.gradle 기존 설정 상태 minSdkVersion 11 에러 로그 Manifest mer json8.tistory.com https://progdev.tistory.com/50 flutter.minSdkVersion, flutter.targetSdkVersion가 선언된 위치 defaultConfig { // T..

공부 2023. 3. 6. 00:56

[terraform] terraform apply -auto-approve

https://stackoverflow.com/questions/59958294/how-do-i-execute-terraform-actions-without-the-interactive-prompt How do I Execute Terraform Actions Without the Interactive Prompt? How am I able to execute the following command: terraform apply #=> . . . Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be acce... stackoverflow.com

공부 2023. 3. 6. 00:54

[spark] executor-cores = 5 (recommended)

https://aws.amazon.com/ko/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/ Best practices for successfully managing memory for Apache Spark applications on Amazon EMR | Amazon Web Services May 2022: Post was reviewed for accuracy. Since this post has been published, Amazon EMR has introduced several new features that make it easier to fu..

공부 2023. 3. 6. 00:53

[spark] DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1565110817-10.161.67.203-1521280878051:blk_1440726264_367066900java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper..

data -> split chunk -> loop https://www.ibm.com/support/pages/spark-dirver-reported-outofmemoryerror Spark dirver reported OutOfMemoryError Spark dirver reported OutOfMemoryError www.ibm.com

공부 2023. 3. 6. 00:52

[spark] drop view (createTempView)

DROP VIEW [ IF EXISTS ] view_identifier https://spark.apache.org/docs/3.0.0-preview2/sql-ref-syntax-ddl-drop-view.html DROP VIEW - Spark 3.0.0-preview2 Documentation You are using an outdated browser. Upgrade your browser today or install Google Chrome Frame to better experience this site. Overview Programming Guides API Docs Deploying More v3.0.0-preview2 --> spark.apache.org

공부 2023. 3. 6. 00:50

[spark] AQE (Adaptive Query Execution)

https://eyeballs.tistory.com/245 [Spark3] Adaptive Query Execution databricks 의 Adaptive Query Execution: Speeding Up Spark SQL at Runtime 을 기반으로 함. https://databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html 해당 포스트는 위 링크의 내용을 (모자란 실 eyeballs.tistory.com

공부 2023. 3. 1. 21:58

[spark] spark.sql.cbo.enabled=true

https://stackoverflow.com/questions/52058565/spark-sql-cbo-enabled-true-with-hive-table spark.sql.cbo.enabled=true with Hive table In Spark 2.2 the Cost Based Optimizer option has been enabled. The documentation appears to be saying that we need to analyze the tables in Spark before enabling this option. I would like to know i... stackoverflow.com

공부 2023. 3. 1. 21:54

[spark] joins

https://towardsdatascience.com/demystifying-joins-in-apache-spark-38589701a88e Demystifying Joins in Apache Spark This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which… towardsdatascience.com https://yeo0.tistory.com/entry/Spark-BroadCast-Hash-JoinBHJ-Shuffle-Sort-Merge-JoinSMJ [Spark] BroadCast Hash Join(BHJ) / Sh..

공부 2023. 2. 26. 15:28

[readme] data engineer

https://wikidocs.net/73623 2-분석 형태 빅데이터 분석 형태로 구분할 수 있습니다. + 대화형 분석 + 사용자가 입력한 쿼리에 바로 반응하여 결과를 반환하는 분석 방법 + 대화형 대쉬보드 + 배치 분석 + … wikidocs.net

공부 2023. 2. 26. 14:55

[spark] zipWithIndex

https://stackoverflow.com/questions/60645256/how-do-you-get-batches-of-rows-from-spark-using-pyspark How do you get batches of rows from Spark using pyspark I have a Spark RDD of over 6 billion rows of data that I want to use to train a deep learning model, using train_on_batch. I can't fit all the rows into memory so I would like to get 10K or so at a... stackoverflow.com https://www.tabnine.co..

공부 2023. 2. 26. 14:54

[spark] broadcast nested loop join

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) sql("select * from table_withNull where id not in (select id from tblA_NoNull)").explain(true) not exists를 사용하면 쿼리가 SortMergeJoin과 함께 실행됩니다. https://www.bigdatainrealworld.com/how-does-broadcast-nested-loop-join-work-in-spark/ How does Broadcast Nested Loop Join work in Spark? Broadcast Nested Loop join works by broadcasting one of the e..

공부 2023. 2. 26. 14:51

« 2024/04 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

티스토리툴바