Amazon Cloud/Athena

[AWS] ๐Ÿ“š Glue Crawler๋กœ ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ณ  Athena๋กœ ์กฐํšŒํ•˜๊ธฐ

์ธํŒŒ_ 2022. 8. 9. 09:50

Glue-Crawler-Athena

Glue Crawler๋กœ S3 ์Šคํ‚ค๋งˆ ์ƒ์„ฑ

์ง€๋‚œ ํฌ์ŠคํŒ…์—์„œ๋Š” csvํŒŒ์ผ์„ S3์— ์—…๋กœ๋“œํ•˜๊ณ  Athena์—์„œ ์ง์ ‘ ํ…Œ์ด๋ธ” ์ฟผ๋ฆฌ๋ฌธ์„ ์‹คํ–‰ํ•˜์—ฌ ์ˆ˜๋™์œผ๋กœ ๋งŒ๋“ค์–ด ์กฐํšŒํ•˜๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด์•˜๋‹ค.

์ด๋ฒˆ์—๋Š” AWS Glue ์„œ๋น„์Šค๊ฐ€ ์ œ๊ณตํ•˜๋Š” Glue Crawler๋ฅผ ์‚ฌ์šฉํ•ด S3์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์บ”ํ•˜๊ณ  ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด์ฃผ๋Š” ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•ด Athena๋กœ ์กฐํšŒํ•ด๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณผ ๊ฒƒ์ด๋‹ค.

 

Glue ํฌ๋กค๋Ÿฌ๋กœ ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด์„  ๋‹น์—ฐํžˆ ๋Œ€์ƒ์œผ๋กœ ํ•˜๊ณ  ๋ฉ”ํƒ€ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋ฐ์ดํ„ฐ ์†Œ์Šค๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

RDS, S3, DynamoDB ๋“ฑ AWS์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ JDBC๋ฅผ ์ง€์›ํ•˜๋Š” DB, file ๋“ฑ๋“ฑ ๊ฑฐ์˜ ๋ชจ๋“  ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ์— ์žˆ๋Š” ํŒŒ์ผ์„ ์ด์šฉํ•ด ํฌ๋กค๋Ÿฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

์ด๋ฒˆ ๊ฐ•์˜์—์„œ๋Š” json ํŒŒ์ผ์„ S3์— ์—…๋กœ๋“œํ•˜๊ณ  ์ด๋ฅผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šคํ™” ํ•ด๋ณด์ž.


S3 ๋ฒ„ํ‚ท ์ƒ์„ฑ ๋ฐ json ๋กœ๊ทธ ์˜ฌ๋ฆฌ๊ธฐ

Glue-Crawler-Athena

{"time":"2022-03-05 22:06:13","user_id":"bob","board_name":"game","action":"insert"}
{"time":"2022-03-05 22:06:14","user_id":"jake","board_name":"free","action":"delete"}
{"time":"2022-03-05 22:06:15","user_id":"jake","board_name":"stock","action":"delete"}
{"time":"2022-03-05 22:06:16","user_id":"jake","board_name":"free","action":"insert"}
{"time":"2022-03-05 22:06:17","user_id":"bob","board_name":"game","action":"insert"}

log_sample.json
0.09MB


Glue ํฌ๋กค๋Ÿฌ ์ƒ์„ฑํ•˜๊ธฐ

AWS Glue Crawler๋ฅผ ์‚ฌ์šฉํ•ด S3 ๋ฒ„ํ‚ท์— ์ €์žฅ๋œ ๋‚ด์šฉ๋“ค์„ ์ญ‰ ์‚ดํŽด๋ณด๊ณ  ์Šคํ‚ค๋งˆ์™€ ํ•จ๊ป˜ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•ด์ค€๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ฐธ๊ณ ํ•ด์„œ Athena๊ฐ€ ์ฟผ๋ฆฌ๋ฅผ ๋‚ ๋ฆด์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.

Glue-Crawler-Athena

 

1. ํฌ๋กค๋Ÿฌ ์ •๋ณด ์„ค์ •

๊ฐ„๋‹จํ•˜๊ฒŒ๋Š” ์ด๋ฆ„๋งŒ ์„ค์ •ํ•˜๊ณ  ๋„˜์–ด๊ฐ€๋„ ๋œ๋‹ค.

๋ฐ‘์˜ ํƒœ๊ทธ, ์„ค๋ช…, ๋ณด์•ˆ๊ตฌ์„ฑ, ๋ถ„๋ฅ˜์ž ์˜ต์…˜ ์„ค์ •์—์„œ ํŠน๋ณ„ํ•œ๊ฑด ์—†์ง€๋งŒ, ๋ถ„๋ฅ˜์ž๋Š” ๋•Œ์— ๋”ฐ๋ผ์„œ ํ•„์š”ํ•  ์ˆ˜ ์žˆ๋‹ค.

csv ํŒŒ์ผ์ผ ๋•Œ ํŠนํžˆ ๊ทธ๋Ÿฐ๋ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์‰ผํ‘œ๋ฅผ ๊ตฌ๋ถ„์ž๋กœ ์‚ฌ์šฉํ•˜๊ธด ํ•˜์ง€๋งŒ ๊ตฌ๋ถ„์ž๊ฐ€ ํ†ต์ผ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ๋ถ„๋ฅ˜์ž๋ฅผ ์ด์šฉํ•ด ๊ตฌ๋ถ„์ž๋ฅผ ํ†ต์ผํ•  ์ˆ˜ ์žˆ๋‹ค.

Glue-Crawler-Athena

 

2. ํฌ๋กค๋Ÿฌ ์†Œ์Šค ํƒ€์ž…

Data store(๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ)๋ฅผ ๋ฐ”๋ผ๋ณผ์ง€, ๊ธฐ์กด์— ์ƒ์„ฑํ•œ Data Catalog์˜ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์„ ๋ฐ”๋ผ๋ณผ ์ง€ ์„ ํƒํ•œ๋‹ค.

Glue-Crawler-Athena

 

3. ๋ฐ์ดํ„ฐ ์Šคํ† ์–ด ์ถ”๊ฐ€

๋ฉ”ํƒ€ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ๋ฅผ 1๊ฐœ ํ˜น์€ ์—ฌ๋Ÿฌ ๊ฐœ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

์œ„์—์„œ ๋งŒ๋“  ๋ฒ„ํ‚ท๊ฒฝ๋กœ๋ฅผ ํฌํ•จ ๊ฒฝ๋กœ ๋ž€์— ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.

Glue-Crawler-Athena

๋” ์ถ”๊ฐ€ํ•  ์ง€ ๋ฌผ์–ด๋ณด๋Š” ์˜ต์…˜์ด ๋‚˜์˜ค๋ฉด ์•„๋‹ˆ์˜ค๋ฅผ ์„ ํƒํ•˜๊ณ  ๋‹ค์Œ์„ ๋ˆŒ๋Ÿฌ์ค€๋‹ค.

Glue-Crawler-Athena

 

4. IAM ๊ทœ์น™ ์„ค์ •

๋ณธ๋ž˜ AWS์—์„œ๋Š” S3์— ์•ก์„ธ์Šค ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” IAM ๊ถŒํ•œ์ด ํ•„์š”ํ–ˆ๋‹ค.

Glue Crawler๋„ ์˜ˆ์™ธ๊ฐ€ ์•„๋‹ˆ๋ฉฐ, ์ด ๋‹จ๊ณ„์—์„œ IAM ์—ญํ• ์„ ์ƒ์„ฑํ•ด์„œ ๋“ฑ๋กํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

Glue-Crawler-Athena

 

5. ์Šค์ผ€์ฅด ์„ค์ •

ํฌ๋กค๋Ÿฌ์˜ ์ž‘๋™ ์Šค์ผ€์ค„์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

์‹œ๊ฐ„๋งˆ๋‹ค, ์ผ๋งˆ๋‹ค, ์ฃผ๋งˆ๋‹ค ๋“ฑ ๊ธฐ๋ณธ ์˜ต์…˜์ด ์žˆ๊ณ  ์ง์ ‘ ์ปค์Šคํ…€ํ•˜๊ฑฐ๋‚˜ ์›ํ• ๋•Œ๋งŒ ์ž‘๋™ํ•˜๋„๋ก ๊ณ ๋ฅผ ์ˆ˜๋„ ์žˆ๋‹ค.

๋ณดํ†ต ์˜จ๋””๋ฉ˜๋“œ(Run on demand)๋ฅผ ์„ค์ •ํ•œ ๋’ค ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋งŒ ๋žŒ๋‹ค(์„œ๋ฒ„๋ฆฌ์Šค) ๋“ฑ์œผ๋กœ ์‹คํ–‰ ์‹œํ‚ค๋Š” ์‹์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

Glue-Crawler-Athena

 

6. ํฌ๋กค๋Ÿฌ ์ถœ๋ ฅ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ ์„ค์ •

๋ฉ”ํƒ€ ํ…Œ์ด๋ธ”(Cralwer ๊ฒฐ๊ณผ)์„ ์ €์žฅํ•  ๊ณณ์„ ์ง€์ •ํ•œ๋‹ค.

Add database๋กœ ๋ฐ”๋กœ ๋งŒ๋“ค์–ด ๋ณด์ž.

Glue-Crawler-Athena
Glue-Crawler-Athena
Glue-Crawler-Athena

 

8. ์ตœ์ข… ํ™•์ธ

Glue-Crawler-Athena

 

10. ํฌ๋กค๋Ÿฌ ์‹คํ–‰

์ด์ œ ๋“ฑ๋กํ•œ ํฌ๋กค๋Ÿฌ๋ฅผ ์‹คํ–‰ํ•ด์„œ S3์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ…Œ์ด๋ธ”ํ™” ํ•œ๋‹ค.

ํŒŒ์ผ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํด์ˆ˜๋ก 1๋ถ„ ์ด์ƒ ๊ฑธ๋ฆด์ˆ˜ ์žˆ์œผ๋‹ˆ ์ฒœ์ฒœํžˆ ๊ธฐ๋‹ค๋ฆฌ์ž.

kb ๋‹จ์œ„์˜ ์ž‘์€ ํŒŒ์ผ๋กœ ์‹ค์Šต์„ ํ•˜๊ณ  ์žˆ์–ด ํฌ๊ฒŒ ์ƒ๊ด€์—†์ง€๋งŒ, glue๋Š” ์„œ๋น„์Šค ์š”๊ธˆ์ด ๊ฝค ๋น„์‹ผ ํŽธ์— ์†ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ˜น์‹œ ๋ชจ๋ฅผ ์š”๊ธˆํญํƒ„์— ์ฃผ์˜ํ•˜์ž.

Glue-Crawler-Athena

 

11. ํ…Œ์ด๋ธ” ํ™•์ธํ•˜๊ธฐ

์ƒ์„ฑ์ด ์™„๋ฃŒ๋˜๋ฉด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํ…Œ์ด๋ธ” ๋ฉ”๋‰ด์—์„œ ์ƒ์„ฑํ•œ ์Šคํ‚ค๋งˆ๋ฅผ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

Athena์—์„œ ์ง์ ‘ ์ฟผ๋ฆฌ๋ฌธ์œผ๋กœ ๋งŒ๋“  ํ…Œ์ด๋ธ”๋„ AWS Glue ํ…Œ์ด๋ธ” ์นดํƒˆ๋กœ๊ทธ์— ์ €์žฅ๋˜๊ฒŒ ๋œ๋‹ค.
ํ•œ๋งˆ๋””๋กœ Athena์˜ ํ…Œ์ด๋ธ”์€ ์—ฌ๊ธฐ์—์„œ ๊ด€๋ฆฌ๋œ๋‹ค๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

Glue-Crawler-Athena
Glue-Crawler-Athena
Glue-Crawler-Athena


Athena๋กœ ํ…Œ์ด๋ธ” ์กฐํšŒํ•˜๊ธฐ

Glue ํฌ๋กค๋Ÿฌ๋กœ ๋งŒ๋“  ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํ…Œ์ด๋ธ”์„ Athena ์„œ๋น„์Šค๋กœ ์กฐํšŒํ•ด๋ณด์ž.

๊ทธ ์ด์ „์—, ๋จผ์ € ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ์œ„์น˜๋ฅผ ์„ค์ •ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. ์šฐ์ธก์— ๋ณด๊ธฐ ์„ค์ •์„ ๋ˆŒ๋Ÿฌ S3 ๊ฒฝ๋กœ๋ฅผ ์„ค์ •ํ•ด ์ฃผ์ž.

Glue-Crawler-Athena
Glue-Crawler-Athena
Glue-Crawler-Athena

 

์ด์ œ Glue ํฌ๋กค๋Ÿฌ์—์„œ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์„ ํƒํ•ด์ฃผ๋ฉด, ํ•˜๋‹จ์— ํ…Œ์ด๋ธ”๋ช…์ด ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋œ๋‹ค.

ํ…Œ์ด๋ธ” ๋‚ด์šฉ์„ ์กฐํšŒํ•ด ๋ณด์ž.

Glue-Crawler-Athena
Glue-Crawler-Athena

 

json์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ…Œ์ด๋ธ” ํ˜•์‹์œผ๋กœ ์ž˜ ๋ณ€ํ™˜๋˜์–ด ์กฐํšŒ๋จ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด๋ฒˆ์—๋Š” ์ฟผ๋ฆฌ์˜ ๋‚ด์šฉ์„ ๋ฐ”๊พธ์–ด ์งˆ์˜ ํ•ด๋ณด์ž.

SELECT * FROM "demo-athena-log"."athena_log_test_11" where user_id = 'bob'

Glue-Crawler-Athena

 

๋‹ค์Œ๊ณผ ๊ฐ™์ด time ํ•„๋“œ๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌ์‹œ์ผœ ์งˆ์˜ ํ•˜๋ฉด ์‹œ๊ฐ„๋งˆ๋‹ค ๋กœ๊ทธ์˜ ์ƒํƒœ๋ฅผ ๋ณผ์ˆ˜ ์žˆ๋‹ค.

SELECT * FROM "demo-athena-log"."athena_log_test_11" 
where user_id = 'bob' 
order by time desc

Glue-Crawler-Athena

 

์ด๋ฒˆ์—” action ํ•„๋“œ๊ฐ€ 'insert'์ธ ์ฟผ๋ฆฌ๋ฅผ ์กฐํšŒํ•ด์„œ ์ง‘๊ณ„ํ•จ์ˆ˜๋กœ ์กฐํšŒํ•ด๋ณด์ž.

SELECT count(*) 
FROM "demo-athena-log"."athena_log_test_11" 
where user_id = 'bob' and action = 'insert'

Glue-Crawler-Athena