배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

[Hive] ORDER BY, DISTRIBUT BY SORT BY, CLUSTER BY

https://knight76.tistory.com/entry/hive [hive] 정렬 키워드 - order by, sort by, cluster by, distribute by https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy 하이브에서 사용되는 정렬 키워드를 소개한다. * ORDER BY (ASC|DESC): RDMBS의 ORDER BY 문과 비슷하다. ORDER BY 문을 실행 시, 하나.. knight76.tistory.com

공부 2020. 12. 13. 01:06

[Python] thread

https://monkey3199.github.io/develop/python/2018/12/04/python-pararrel.html Nathan's Blog The blog to learn more. monkey3199.github.io

공부 2020. 12. 13. 01:04

[Pyspark] df.repartition(10).write.format('jdbc')...

df .repartition(10) // No. of concurrent connection Spark to PostgreSQL .write.format('jdbc').options( url=psql_url_spark, driver=spark_env['PSQL_DRIVER'], dbtable="{schema}.{table}".format(schema=schema, table=table), user=spark_env['PSQL_USER'], password=spark_env['PSQL_PASS'], batchsize=2000000, queryTimeout=690 ).mode(mode).save() https://stackoverflow.com/questions/58676909/how-to-speed-up-..

공부 2020. 12. 13. 01:00

[Spark] java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

spark = SparkSession .builder .appName("Your App") .config("spark.sql.broadcastTimeout", "36000") .getOrCreate() https://stackoverflow.com/questions/41123846/why-does-join-fail-with-java-util-concurrent-timeoutexception-futures-timed-ou Why does join fail with "java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]"? I am using Spark 1.5. I have two dataframes of the form: ..

공부 2020. 12. 13. 00:55

[Spark] sort other columns

https://stackoverflow.com/questions/43415974/sort-array-order-by-a-different-column-hive sort_array order by a different column, Hive I have two columns, one of products, and one of the dates they were bought. I am able to order the dates by applying the sort_array(dates) function, but I want to be able to sort_array(products) by... stackoverflow.com

공부 2020. 12. 13. 00:53

[Deploy] rolling, blue green, canary

https://reference-m1.tistory.com/211 배포 전략의 종류(롤링/카나리 배포/블루 그린) 요즘은 MSA 아키텍처를 많이 지향하고 있는 추세이다. 이런 트렌드에 맞춰 배포 전략도 다양하게 개발되고 발전하여 변화하고 있다. 1. 롤링(Rolling) 일반적인 배포를 의미하며, 단순하게 서버를 reference-m1.tistory.com

공부 2020. 12. 13. 00:51

[Spark] keytab & principal

spark-submit \ --master spark://Spark master_url \ -–conf spark.yarn.keytab=path_to_keytab \ -–conf spark.yarn.principal=principal@REALM.COM \ --class main-class application-jar hdfs://namenode:9000/path/to/input https://www.ibm.com/support/knowledgecenter/SSZU2E_2.3.0/managing_cluster/kerberos_hdfs_keytab.html

공부 2020. 12. 13. 00:50

[Java] new ArrayList<>(Arrays.asList("a", "b", "c"))

http://dveamer.github.io/backend/InitializingJavaVariable.html Dveamer 현실에서 살고 있지만 이상에 대한 꿈을 버리지 못한 몽상가의 홈페이지 입니다. 개인적인 기록을 주 목적으로 하며 일상과 프로그래밍 관련 글을 포스팅합니다. dveamer.github.io

공부 2020. 12. 13. 00:46

[Spark] hadoop.fs.default.name (default file system)

SparkContext context = new SparkContext(new SparkConf().setAppName("spark-ml").setMaster("local[*]") .set("spark.hadoop.fs.default.name", "hdfs://localhost:54310").set("spark.hadoop.fs.defaultFS", "hdfs://localhost:54310") .set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()) .set("spark.hadoop.fs.hdfs.server", org.apache.hadoop.hdfs.server.namenode.Name..

공부 2020. 12. 13. 00:46

[Spark] hive dynamic partition

spark-shell \ --conf "spark.hadoop.hive.exec.dynamic.partition=true" \ --conf "spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict" \ ... https://stackoverflow.com/questions/58633753/ignoring-non-spark-config-property-hive-exec-dynamic-partition-mode

공부 2020. 12. 13. 00:43

이 세상은 시뮬레이션인가 - 1분과학

좋은 에너지를 주는 것들 2020. 12. 13. 00:42

[Python] dict to string

import json json.dumps(dict) https://stackoverflow.com/questions/4547274/convert-a-python-dict-to-a-string-and-back Convert a python dict to a string and back I am writing a program that stores data in a dictionary object, but this data needs to be saved at some point during the program execution and loaded back into the dictionary object when the progra... stackoverflow.com

공부 2020. 12. 10. 11:44

[Python] call method by name of function

import foo method_to_call = getattr(foo, 'bar') result = method_to_call() result = getattr(foo, 'bar')() https://stackoverflow.com/questions/3061/calling-a-function-of-a-module-by-using-its-name-a-string Calling a function of a module by using its name (a string) What is the best way to go about calling a function given a string with the function's name in a Python program. For example, let's sa..

공부 2020. 12. 10. 11:43

[Python] beautify JSON

import json print json.dumps({'4': 5, '6': 7}, sort_keys=True, ensure_ascii=False, indent=4) { "4": 5, "6": 7 } def pretty(obj): return json.dumps( obj, ensure_ascii=False, sort_keys=True, indent=4) https://stackoverflow.com/questions/9105031/how-to-beautify-json-in-python How to beautify JSON in Python? Can someone suggest how I can beautify JSON in Python or through the command line? The only ..

공부 2020. 12. 10. 11:35

[SublimeText] reverse text plugin

Tools > Developer > New Plugin... import sublime import sublime_plugin class ReverseCommand(sublime_plugin.TextCommand): def run(self, edit): for region in self.view.sel(): stringContents = self.view.substr(region) self.view.replace(edit, region, stringContents[::-1]) View > Show Console view.run_command("reverse") https://stackoverflow.com/questions/28966185/reverse-all-line-of-text-in-sublime-..

공부/툴 2020. 12. 10. 11:23

[Spark] string to date

SELECT date_format(to_timestamp("2019-10-22 00:00:00", "yyyy-MM-dd HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z') https://stackoverflow.com/questions/58774777/how-to-format-date-in-spark-sql How to format date in Spark SQL? I need to transform this given date format: 2019-10-22 00:00:00 to this one: 2019-10-22T00:00:00.000Z I know this could be done in some DB via: In AWS Redshift, you can achieve t..

공부 2020. 12. 10. 10:31

[Spark] `foreach` vs `foreachPartition`

https://stackoverflow.com/questions/30484701/apache-spark-foreach-vs-foreachpartitions-when-to-use-what Apache Spark - foreach Vs foreachPartitions When to use What? I would like to know if the foreachPartitions will results in better performance, due to an higher level of parallelism, compared to the foreach method considering the case in which I'm flowing th... stackoverflow.com https://stacko..

공부 2020. 12. 10. 10:29

[Sh] HDFS exists directory

hdfs dfs -test -d $yourdir if [ $? == 0 ]; then echo "exists" else echo "dir does not exists" fi https://stackoverflow.com/questions/26513861/checking-if-directory-in-hdfs-already-exists-or-not Checking if directory in HDFS already exists or not I am having following directory structure in HDFS, /analysis/alertData/logs/YEAR/MONTH/DATE/HOURS That is data is coming on houly basis and stored in fo..

공부 2020. 12. 10. 10:26

[Spark] Encounter SparkException “Cannot broadcast the table that is larger than 8GB”

'spark.sql.autoBroadcastJoinThreshold': '-1' https://stackoverflow.com/questions/49567066/encounter-sparkexception-cannot-broadcast-the-table-that-is-larger-than-8gb Encounter SparkException "Cannot broadcast the table that is larger than 8GB" I am using Spark 2.2.0 to do data processing. I am using Dataframe.join to join 2 dataframes together, however I encountered this stack trace: 18/03/29 11..

공부 2020. 12. 10. 10:19

소프트웨어 장인 - 산드로 만쿠소

프로페셔널리즘 개발 조직 전체에 대한 팀 또는 조직에 도움이 될만한 이야기 계획, 전략, 태도, 원칙 등을 여러 가지 관점에서 조언했다. 테스트 주도 개발(TDD) 진행 방법 빠듯한 일정에 대응하는 방법 채용 공고 작성법 & 개발자 채용 인터뷰 동료나 관리자와의 협업 방법 멘토 커리어 방향과 개발자로서 스스로의 역량과 태도를 되돌아볼 수 있는 거울이 되리라 확신한다. 해법과 단초를 찾을 수 있을 거라 믿는다. 세 시간짜리 기술 시험 흥미로운 일들을 코딩이 직업인 사람이 동작하는 코드를 만드는 건 기본이에요. 코드를 같이 볼 거니까 가까이 오세요. 여기서 메모리 할당/해제를 하면 무슨 일이 일어나는지 알고 있나요? 이런 코드는 잠재적으로 메모리 릭을 일으켜요. 좀더 생각해보면 이 여덟 줄은 두줄로 줄일 수..

읽은책 2020. 11. 29. 01:21

[Spring] security actuator disable

https://stackoverflow.com/questions/53190729/springboot-disable-actuator-root SpringBoot disable Actuator root i'm using springboot and i'm exposing metrics with actuator and prometheus. I want to expose "info", "health", "metrics", "prometheus", "shutdown" and nothing more. But even if i specify into the stackoverflow.com

공부 2020. 11. 28. 07:33

[Docker] docker attach ${name}

https://github.com/docker/for-mac/issues/4273 Docker Desktop Dashboard open default terminal app · Issue #4273 · docker/for-mac [ x] I have tried with the latest version of my channel (Stable or Edge) I have uploaded Diagnostics Diagnostics ID: Expected behavior When clicking the Cli button the default terminal should open ... github.com

공부 2020. 11. 28. 07:32

[Kafka] describe & delete

kafka-consumer-groups --bootstrap-server url \ --group ${consumer} \ --describe kafka-consumer-groups --bootstrap-server url \ --group ${consumer} \ --delete https://github.com/yahoo/CMAK/issues/562 Kafka-manager shows deleted consumers · Issue #562 · yahoo/CMAK When I explicitly delete a console-consumer with the command below, it still shows up in Kafka-Manager. I think the list of consumers i..

공부 2020. 11. 28. 07:29

[Presto] convert date to string

select date_parse('7/22/2016 6:05:04 PM','%m/%d/%Y %h:%i:%s %p') https://stackoverflow.com/questions/39880540/presto-sql-converting-a-date-string-to-date-format Presto SQL - Converting a date string to date format I'm on presto and have a date formatted as varchar that looks like - 7/14/2015 8:22:39 AM I've looked the presto docs and tried various things(cast, date_format, using split_part to pa..

공부 2020. 11. 27. 10:48

[Java] Jackson Json <-> Object & hash MD5

public String hash() { try { String json = new ObjectMapper().writeValueAsString(this); return CryptUtils.md5(json); } catch (JsonProcessingException e) { e.printStackTrace(); } return ""; } import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class CryptUtils { private static final Logger LOGGER = Logg..

공부 2020. 11. 27. 10:25

[Django] timezone.localtime(timezone.now())

from django.utils import timezone timezone.localtime(timezone.now()) https://stackoverflow.com/questions/16037020/djangos-timezone-now-does-not-show-the-right-time django's timezone.now does not show the right time My server is in London. In my settings.py I have: TIME_ZONE = 'Europe/Moscow' USE_TZ = True But when I do this: from django.utils import timezone print timezone.now().hour It prints L..

공부 2020. 11. 27. 09:52

[Java] StringUtils.split separators 임 사용에 주의

"abcdabcdee".split("cd") -> ["ab", "abcedee"] StringUtils.split("abcdabcedee", "cd") -> ["ab", "ab", "e", "ee"]

공부 2020. 11. 26. 14:00

[Slack] API `+` sign is `%2B`

공부 2020. 11. 26. 01:59

[Sh] find keyword in files `grep -rnw '/Users/seunggabi/Desktop' -e 'naver+'`

grep -rnw '/Users/seunggabi/Desktop' -e 'naver+' https://stackoverflow.com/questions/16956810/how-do-i-find-all-files-containing-specific-text-on-linux How do I find all files containing specific text on Linux? I'm trying to find a way to scan my entire Linux system for all files containing a specific string of text. Just to clarify, I'm looking for text within the file, not in the file name. Wh..

공부 2020. 11. 23. 03:58

[Hive] ALTER TABLE table_name SET LOCATION "hdfs://mycluster:8020/jsam/j1";

https://community.cloudera.com/t5/Support-Questions/How-to-change-location-of-the-external-table-in-hive/td-p/134391 How to change location of the external table in hive . 1) CREATE EXTERNAL TABLE IF NOT EXISTS jsont1( json string ) LOCATION '/jsam'; Now I need to change the location from where above json1 points to. I tried this command - ALTER TABLE jsont1 SET LOCATION "/jsam/j2" ; but getting..

공부 2020. 11. 23. 03:52

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

티스토리툴바