Good Yunmorning

[EuroSys '19] Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

Problems To Solve In the context of distributed training, DL frameworks provide good support for training models used in the image classification tasks, but it is less scalable for training NLP models due to the lack of consideration of the difference in the sparsity of model parameters. How to Solve To optimize the amount of data transfer with considering sparsity, Parallax adopts a hybrid appr..

[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Directly applying NAS to a large scale task (e.g. ImageNet) is computationally expensive or impossible. To solve this problem, Some works proposed to search for building blocks on proxy tasks, such as training for fewer epochs, starting with a smaller dataset (e.g. CIFAR-10), or learning with fewer blocks. $\to$ Cannot guarantee to be optimal on the target task. ProxylessNAS directly learns the ..

[SOSP' 11] PTask: Operating System Abstractions To Manage GPUs as Compute Device

이 포스팅에서 리뷰할 논문은 2011년 SOSP에 나온 "PTask: Operating System Abstractions To Manage GPUs as Compute Devices"이다. 상당히 오래된 논문이지만 GPU 기반 이기종 시스템과 관련하여 중요하게 여겨지는 개념들과 아이디어들이 많기 때문에 구체적으로 리뷰를 하려고 한다. 염두에 두어야할 점은 이 연구에선 딥러닝의 맥락에서 GPU 스케줄링을 언급하고 있지 않으며 그 당시와 현재의 상황이 꽤 많이 달라졌다는 것이다. ABSTRACTION 이 논문에서는 GPU를 CPU와 같은 1순위 계산 자원으로 활용할 수 있도록 하기 위한 OS abstraction(OS API)인 PTask API를 소개한다. PTask API에서 OS가 관리하는 여러 객체..

[CV Study] Accelerating the Super-Resolution Convolutional Neural Network

Paper Link: https://arxiv.org/pdf/1608.00367.pdf Image super-resolution(SR) 태스크에서 SRCNN이 좋은 성능을 거두었지만 real-time 서빙을 하기에는 연산 비용이 너무 크다는 문제가 있다. 이 논문에서는 모래시계 모양의(encoder-decoder 구조를 생각하면 된다.) CNN 구조를 활용하여 기존의 SRCNN을 경량화, 가속화하는 것에 중점을 두었다. 이를 위해서 다음과 같은 3가지 방법을 취하였다. DCGAN, pix2pix 등에서 사용하는 transposed convolution(deconv)을 네트워크의 후반부에 활용하여, 저해상도(LR)의 입력 이미지와 고해상도(HR)의 출력 이미지 간의 매핑이 E2E로 학습가능한 네트워크 구조..

[Facebook 추천 시스템: DLRM] Deep Learning Recommendation Model for Personalization and Recommendation Systems

이번 포스팅에서는 2019년 페이스북에서 나온 추천 시스템 논문인 DLRM을 다룬다. Introduction 기존 personalization 및 recommendation 태스크에서 딥러닝이 활용된 연구들을 살펴보면 크게 두 부류로 구분할 수 있다. 1. 추천 시스템 가장 원시적인 추천 시스템에서는 몇몇 전문가들이 상품들을 몇 개의 카테고리로 묶은 뒤, 유저들이 기호에 따라 카테고리를 선택하도록 하는 방식을 사용하였다. 이것이 발전되어서 만들어진 것이 과거의 유저의 행동(상품을 장바구니에 넣는다든지, 구독을 한다든지, 좋아요를 누른다든지...)에 기반하여 추천을 하는 CF(collaborative filtering) 기법이다. 그 밖에도 유저와 연관성이 높은 상품을 함께 grouping하여 추천을 하는..

태그

최근글

댓글

공지사항

아카이브

POPULAR POSTS

RECENT POSTS

티스토리툴바