'분류 전체보기' 카테고리의 글 목록 (31 Page)

(redirect) http -> https https://blog.airbrake.io/blog/http-errors/303-see-other

공부 2023. 3. 30. 20:29

[athena] schema updated & data format

Athena의 스키마 업데이트 및 데이터 형식예상 스키마 업데이트 유형요약CSV(헤더 포함 및 불포함) 및 TSVJSONAVROPARQUET: 이름으로 읽기(기본값)PARQUET: 인덱스로 읽기ORC: 인덱스로 읽기(기본값)ORC: 이름으로 읽기 열 이름 바꾸기 CSV 및 TSV 또는 ORC 및 Parquet(인덱스로 읽는 경우) 형식으로 데이터를 저장합니다. Y N N N Y Y N 테이블의 시작 또는 중간에 열 추가 JSON, AVRO 또는 Parquet 및 ORC(이름으로 읽는 경우) 형식으로 데이터를 저장합니다. CSV 및 TSV를 사용하지 마세요. N Y Y Y N N Y 테이블 끝에 열 추가 CSV나 TSV, JSON, AVRO, ORC 또는 Parquet 형식으로 데이터를 저장합니다. Y Y..

공부 2023. 3. 30. 20:26

[DBT] basic

https://velog.io/@hsh/DBT-Data-Build-Tool DBT: Data Build Tool - 일종의 체계적인 view 시스템 - `ELT`: Extract→Load→`Transform` (NOT `ETL`) velog.io https://kgw7401.tistory.com/72 dbt 꼭 써야할까? dbt 정의/사용이유/필요성 🔎dbt를 써야할까? 데이터 엔지니어링 프로젝트를 진행하면서 dbt라는 도구를 알게 되었다. 대충 파이프라인 효율적으로 관리해주는 도구라는 이야기를 듣고, 이번 프로젝트에 한 번 사용해봐야 kgw7401.tistory.com https://towardsdatascience.com/aws-athena-dbt-integration-4e1dce0d97fc AWS ..

공부 2023. 3. 30. 20:21

[python] boto3_client

import boto3 from botocore.config import Config _MAX_ATTEMPTS = 15 def boto3_client(type): config = Config( retries={ "max_attempts": _MAX_ATTEMPTS, "mode": "standard" } ) return boto3.client(type, config=config)

공부 2023. 3. 17. 15:37

[hive] partition reload

d=db t=table p=partition ALTER TABLE ${d}.${t} SET TBLPROPERTIES('EXTERNAL'='TRUE'); ALTER TABLE ${d}.${t} DROP PARTITION (${p} ''); MSCK REPAIR TABLE ${d}.${t}; https://118k.tistory.com/349 [하이브] 매니지드 테이블과 익스터널 테이블 변경하기 하이브의 테이블은 매니지드(MANAGED) 테이블과 익스터널(EXTERNAL) 테이블 타입이 존재한다. 매니지드 테이블은 테이블을 drop 하면 관리하는 파일도 삭제가 되고, 익스터널 테이블은 파일은 보관된 118k.tistory.com https://stackoverflow.com/questions/46307667..

공부 2023. 3. 16. 19:18

[windows] COPY /b headers.txt+data.txt result.txt

https://stackoverflow.com/questions/19750653/how-to-append-text-files-using-batch-files How to append text files using batch files How can I append file1 to file2, from a batch file? Text files and only using what is "standard" on windows. stackoverflow.com

공부 2023. 3. 16. 00:48

[sh] csv string count

#!/bin/bash input=$1 output=$2 rm $output n=0 while read line; do comma="${line//[^,]}" cnt="${#comma}" echo "Line No. ${n} : ${cnt}" >> $output n=$((n+1)) done < ${input} var="text,text,text,text" res="${var//[^,]}" echo "$res" echo "${#res}" ,,, 3 https://stackoverflow.com/questions/16679369/count-occurrences-of-a-char-in-a-string-using-bash Count occurrences of a char in a string using Bash I..

공부 2023. 3. 16. 00:45

[sh] sudo systemctl list-units

sudo systemctl list-units

공부 2023. 3. 14. 12:33

[jinja2] {{ asdf|join(", ") }}

https://stackoverflow.com/questions/2061439/string-concatenation-in-jinja String concatenation in Jinja I just want to loop through an existing list and make a comma delimited string out of it. Something like this: my_string = 'stuff, stuff, stuff, stuff' I already know about loop.last, I just need to stackoverflow.com

공부 2023. 3. 14. 02:34

[airflow] variables `{{ var.value.aa }}`

https://stackoverflow.com/questions/69048706/airflow-how-to-get-an-airflow-variable-inside-the-bash-command-in-bash-operato

공부 2023. 3. 14. 02:33

[spark] show(count, false); no_truncated=false

https://stackoverflow.com/questions/33742895/how-to-show-full-column-content-in-a-spark-dataframe

공부 2023. 3. 14. 02:32

[yarn] memory & core

yarn.nodemanager.resource.memory-mb yarn.nodemanager.resource.cpu-vcores yarn.scheduler.minimum-allocation-mb yarn.scheduler.maximum-allocation-mb yarn.scheduler.minimum-allocation-vcores yarn.scheduler.maximum-allocation-vcores https://wooono.tistory.com/145 [Spark] java.lang.IllegalArgumentException: Required executor memory (13312), overhead (2496 MB), and PySpark memory (0 MB) is a 우선 YARN R..

공부 2023. 3. 14. 02:27

진짜 부자동네 사는 부자들의 공통점

좋은 에너지를 주는 것들 2023. 3. 14. 02:15

[zeppelin] lifecyclemanager

https://jaemunbro.medium.com/zeppelin-%EB%8B%A4%EC%A4%91-interpreter-binding%EA%B3%BC-interpreter-timeout-ce7ad4c3312c [Zeppelin] 다중 Interpreter binding과 Interpreter Timeout 설정하기 EMR의 Spark Zeppelin을 운영하고 있는데 여러 사용자가 들어와서 Job을 수행하는 경우가 잦다. 이러한 Multi Tenant Zepplin을 운영하는데 조금더 필요한 설정들이 무엇이 있을까? jaemunbro.medium.com https://aws.amazon.com/ko/premiumsupport/knowledge-center/yarn-uses-resources-after..

공부 2023. 3. 12. 23:27

[spark] Spark throwing FileNotFoundException when overwriting dataframe on S3

https://stackoverflow.com/questions/37254681/spark-throwing-filenotfoundexception-when-overwriting-dataframe-on-s3 Spark throwing FileNotFoundException when overwriting dataframe on S3 I have partitioned parquet files stored on two locations on S3 in the same bucket: path1: s3n://bucket/a/ path2: s3n://bucket/b/ The data has the same structure. I want to read the files from the... stackoverflow...

공부 2023. 3. 12. 23:25

[sh] sleep 3 (3s); sleep 0.3 (0.3s)

https://devpouch.tistory.com/127 [bash] 리눅스 쉘 스크립트에서 sleep 함수 사용법 리눅스 bash 쉘스크립트의 sleep 명령어를 통해 프로그램 실행을 일시적으로 정지할 수 있다. sleep 명령어는 아래와 같이 사용할 수 있다. sleep 명령어 사용법 sleep 1 # 1초 일시 정지 sleep 1s # 1초 일시 devpouch.tistory.com

공부 2023. 3. 12. 23:23

[sh] python jq

pip3 install jq parse() { key=$1 python3 -c " import sys import jq import json input = json.load(sys.stdin) output = jq.compile('$key').input(input).all() if(isinstance(output, list)): output = ' '.join(output) print(output) " } name=$(aws emr describe-cluster --cluster-id $id | parse ".Cluster.Name") echo $name https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools?page=2&tab=..

공부 2023. 3. 12. 23:22

[spark] get yarn application id

fun id(): String { return make() .sparkContext() .applicationId() } https://knight76.tistory.com/entry/YARN%EC%97%90-%EB%B0%B0%ED%8F%AC%EB%90%9C-Spark-%EC%95%A0%ED%94%8C%EB%A6%AC%EC%BC%80%EC%9D%B4%EC%85%98%EC%9D%98-Application-ID-%EC%96%BB%EA%B8%B0 YARN에 배포된 Spark 애플리케이션의 Application ID 얻기 How to get applicationId of Spark application deployed to YARN in ... https://spark.apache.org/docs/2.3.0/a..

공부 2023. 3. 10. 04:12

[MySQL] ALTER TABLE table ADD COLUMN asdf INT(10) FIRST

ALTER TABLE EMP_DTLS MODIFY COLUMN EMP_ID INT(10) FIRST ALTER TABLE EMP_DTLS MODIFY COLUMN EMP_ID INT(10) AFTER id https://stackoverflow.com/questions/20179801/place-an-existing-column-at-first-position-in-mysql place an existing column at first position in mysql please tell me how to place an existing column(contained values) at first position in mysql. Suppose i have a table EMP_DTLS and there..

공부 2023. 3. 10. 04:02

[kotlin] reduce & fold

val numbers = emptyList() val sumFromTen = numbers.fold(10) { total, num -> total + num } println("folded: $sumFromTen") // folded: 10 val sum = numbers.reduce { total, num -> total + num } println("reduced: $sum") folded: 10 Empty collection can't be reduced. java.lang.UnsupportedOperationException: Empty collection can't be reduced. at kr.leocat.test.FoldTest.test(FoldTest.kt:35) ... https://b..

공부 2023. 3. 10. 03:59

배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

티스토리툴바

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31