[NODE / 보안] 📚 sanitize-html 모듈 사용법

...

sanitize-html 모듈

sanitize(소독) 은 html의 input 또는 textarea 또는 기타등등의 사용자 입력정보에 <script>문자</script> 이란 문자열을 적을시, 웹브라우저에서 문자열이 txt가 아닌 script 기술로 받아들여서 생기는 문제를 방지하는 모듈이다.

사용자가 이를 악용하여 <script>,<a> 등등 기타 태그들을 삽입해서 악성 스크립트로 변질시켜 실행 시킬 수 있기 때문이다.

예를들어 게시판이라는 웹서비스를 운영하고 있는데, 누군가 게시글 본문 내용에 <script>location.href='악성 웹사이트 url'<script> 를 작성하고 포스팅 했다고 하자.

만일 서버단에서 어떠한 조치를 취하지 않는다면 다른 유저가 해당 게시물에 접근했을때 바로 악성 웹사이트로 리다이렉트 되어 공격을 받을수 있게 된다.

이러한 공격행위를 XSS 라고 부르며, sanitize-html은 이러한 XSS(Cross Site Script)를 방지한다.

XSS 란?
크로스 사이트 스크립팅 (또는 사이트 간 스크립팅) 이라고 불리는 녀석이다.
웹 애플리케이션에서 많이 나타나는 취약점의 하나로 웹사이트 관리자가 아닌 사용자가 웹페이지에 악성 스크립트를 삽입할수 있는 점이다. 그래서 웹개발자들은 주로 xss필터를 고려하여 개발을 한다.

sanitize-html 사용법

sanitize-html

Clean up user-submitted HTML, preserving allowlisted elements and allowlisted attributes on a per-element basis. Latest version: 2.7.0, last published: 4 months ago. Start using sanitize-html in your project by running `npm i sanitize-html`. There are 1287

www.npmjs.com

> npm install sanitize-html

const sanitizeHtml = require('sanitize-html');

const dirty = `스크립트는 과연 
    <script>some really tacky HTML</script> 무시될까?
    h1태그는 <h1>링크</h1> 무시가 될까?`;  
    
const clean = sanitizeHtml(dirty);

console.log(clean);

위 처럼 sanitize 를 사용하면 <script> 같은 태그는 출력이 안되게하고, <h1> 태그같은경우도 태그는 없애 버리게 도니다. (단, 웹에서 실행해보면 글씨포인트는 커진상태로 태그만 사라진다)

이처럼 악성스크립트를 사전에 차단시킬수 있게 되는 원리이다.

태그 허용하기

sanitize-html은 기본적으로 대부분의 태그 문자열을 비허용하여 필터링시키지만, 사용자 지정으로 특정한 태그를 지정해 허용시킬 수 있다.

const sanitizeHtml = require('sanitize-html');

const dirty = `h1태그는 <h1>링크</h1> 무시가 될까?`;

const sanitizedDescription = sanitizeHtml(dirty, {
    allowedTags:['h1', 'a'], // h1 , a 태그 허용
    allowedAttributes: { a: ['href'] }, // a 태그의 href 속성 허용
    allowedFrameHostnames: ['www.youtube.com'] // iframe 허용하되 유튜브 사이트만 허용
});

console.log(sanitizedDescription); // 출력 : h1태그는 <h1>링크</h1> 무시가 될까?

만약, allowedTags: false, allowedAttributes: false 와 같은 옵션을 주면, 모든 태그와 속성을 허용한다.
반대로 allowedTags: [], allowedAttributes: [] 와 같은 옵션은 모든 태그를 금지한다

디폴트 설정에 추가적인 태그 허용

만일 위와 같이 따로 allowTags 등 추가 옵션을 지정하지않고 그대로 ~~sanitizeHTML(dirty)~~ 로 쓴다면, 현재 버전 기준 허용되는 태그와 허용되지 않는 태그 및 기타 default 설정은 다음과 같다.

sanitizeHtml(dirty, {
    allowedTags: [
      "address", "article", "aside", "footer", "header", "h1", "h2", "h3", "h4",
      "h5", "h6", "hgroup", "main", "nav", "section", "blockquote", "dd", "div",
      "dl", "dt", "figcaption", "figure", "hr", "li", "main", "ol", "p", "pre",
      "ul", "a", "abbr", "b", "bdi", "bdo", "br", "cite", "code", "data", "dfn",
      "em", "i", "kbd", "mark", "q", "rb", "rp", "rt", "rtc", "ruby", "s", "samp",
      "small", "span", "strong", "sub", "sup", "time", "u", "var", "wbr", "caption",
      "col", "colgroup", "table", "tbody", "td", "tfoot", "th", "thead", "tr"
    ],

    disallowedTagsMode: 'discard',

    allowedAttributes: {
      a: [ 'href', 'name', 'target' ],
      // We don't currently allow img itself by default, but
      // these attributes would make sense if we did.
      img: [ 'src', 'srcset', 'alt', 'title', 'width', 'height', 'loading' ]
    },

    // Lots of these won't come up by default because we don't allow them
    selfClosing: [ 'img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta' ],

    // URL schemes we permit
    allowedSchemes: [ 'http', 'https', 'ftp', 'mailto', 'tel' ],
    allowedSchemesByTag: {},
    allowedSchemesAppliedToAttributes: [ 'href', 'src', 'cite' ],
    allowProtocolRelative: true,
    enforceHtmlBoundary: false
}

allowTags 속성의 적힌 태그들을 보면 꽤 많은 태그들을 허용 한다는걸 알 수 있는데, 만일 이 태그들을 기본적으로 허용하면서 추가적으로 허용할 태그를 추가하고 싶다면 다음과 같이 선언해주어야 한다.

const clean = sanitizeHtml(dirty, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'img' ]) // 위의 디폴트 allowTag에 추가로 img 태그 허용
});

반드시 위 default 설정을 따라야 할 필요는 없다.
본인이 구현하는 웹 서비스에 맞게 허용/허용하지 않는 태그 및 속성들을 설정할 수 있다.

sanitize-html 미들웨어 만들기

sanitize-html을 직접 사용자 미들웨어로 만들어서 , 라우터에 적용시켜 보자.

예를들어 게시판에서 게시글(content)를 작성하고 발행을 하여, 서버 라우터에 POST 요청이 왔을때 content 문자열에 문제되는 태그들을 필터링하고 다음 미들웨어로 넘기는 식으로 구성할 예정이다.

const { sanitizer } = require('./sanitizer');

// ...

// article/post 로 POST 요청이 오면, 사용자 미들웨어 sanitezer에서 req.content 문자열값을 소독하고 다음 미들웨어로 넘긴다
app.post('/article/post', sanitizer , async (req, res, next) => { 
    const post = await Post.create({
        title : req.body.title, // 게시글 제목
        content : req.filtered, // sanitizer 사용자 미들웨어에서 req.content를 필터링해서 만든 게시글 내용 객체
        UserId : req.user.id // 게시글 작성자
    });
});

[ sanitizer.js ]

const sanitizeHtml = require('sanitize-html');

const sanitizeOption = {
   allowedTags: ['h1', 'h2', 'b', 'i', 'u', 's', 'p', 'ul', 'ol', 'li', 'blockquote', 'a', 'img'],
   allowedAttributes: {
      a: ['href', 'name', 'target'],
      img: ['src'],
      li: ['class'],
   },
   allowedSchemes: ['data', 'http'],
};

exports.sanitizer = (req, res, next) => {
   const filtered = sanitizeHtml(req.body.content, sanitizeOption); // 게시글 내용 req.body.content를 sanitize하여 결과 문자열을 변수에 저장
   filtered.length < 200 ? filtered : `${filtered.slice(0, 200)}...`; // 게시글 내용은 200자 제한이 있다면
   req.filtered = filtered; // 새로만든 req.filtered 객체에 소독한 문자열을 저장
   next(); // 다음 미들웨어로
};