Code Monkey home page Code Monkey logo

libshalom's Introduction

LibShalom

Contact: Weiling Yang ([email protected])

LibShalom is a Library for Small Irregular-shaped Matrix Multiplications on ARMv8-based processors. It improves the performance of small and irregular-shaped GEMMs on ARMv8-based processors by improving the shortcomings of existing BLAS libraries, such as packing accounts for a large portion of the runtime, inefficient edge case processing and unreasonable parallelization methods.

This work continues to be optimized, and we need some time. Packing at micro-kernel is key to improving performance. This trick can even be used on large-scale GEMM.

Reference

Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang. LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores (SC 2021). DOI: https://dl.acm.org/doi/10.1145/3458817.3476217

Software dependences

hardware platform

Phytium 2000+, Kunpeng 920, ThunderX2 or otther ARMv8-based processors image

Compile and install

$ cd NN_LIB && make  
$ make install PREFIX= the installation path

These commands will copy LibShalom library and headers in the installation path PREFIX.

Compiling with LibShalom

All LibShalom definitions and prototypes may be included in your C source file by including a single header file, LibShalom.h:

#include <stdio.h>
#include <stdlib.h>
#include "LibShalom.h"

API

LibShalom_sgemm(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of small SGEMM
LibShalom_sgemm_mp(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of irregular-shaped SGEMM
LibShalom_dgemm(int transa, int transb, double *C, double *A, double *B, long M, long N, long K) // Interface of small DGEMM
LibShalom_set_thread_nums(int num) // Set the total number of threads

Running Benchmark

The command

$ cd benchmark/small_SGEMM && make  

will compile the benchmark program of fp32 small GEMM to generate the executable file main. By executing main, the user can get the evaluation result of the matrices of sizes from 8x8x8 to 128x128x128.

Getting Started

the following C code is focused on a specific functionality but may be considered as Hello LibShalom.

#include <stdlib.h>
#include <stdlib.h>
#include "LibShalom.h"

int main()
{

	int i,j,k;
	int loop= 100;
	long M, N, K;
        M= N = K = 80;
        /* row-major */   	
	float *A = ( float * ) malloc( K* M * sizeof( float ) );
	float *B = ( float * ) malloc( K* N * sizeof( float ) );
	float *C = ( float * ) malloc( M* N * sizeof( float ) );

	double drand48();
	/* initialize input matrices A and B*/
	for ( i = 0; i < M; i++ )
	{
		for ( j = 0; j < K; j++ )
			A [i* K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	for ( i = 0; i < K; i++ )
	{
		for ( j = 0; j < N; j++ )
			B [i * K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	// warm up
	//perform C = A * B (B is transposed)
	for( i =0 ;i< 5; i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);

	for( i= 0; i< loop ;i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);


	free(A);
	free(B);
	free(C);
	return 0;
}

The makefile corresponding to this program:

LibShalom_PREFIX = $ path to install LibShalom 
LibShalom_INC    = $(LibShalom_PREFIX)/SMM/include
LibShalom_LIB    = $(LibShalom_PREFIX)/SMM/lib/libsmm.a 

OTHER_LIBS  =-fopenmp

CC          = g++
CFLAGS      = -O3 -I$(LibShalom_INC)
LINKER      = $(CC)

OBJS        = Hello.o

%.o: %.c
	 $(CC) $(CFLAGS) -c -fopenmp $< -o $@

all: $(OBJS)
	$(LINKER) $(OBJS) $(LibShalom_LIB) $(OTHER_LIBS) -o a.out

Note

The matrices are stored in the row-major format in this library. We will keep this library updated and maintained.

libshalom's People

Contributors

anonymousywl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

libshalom's Issues

汇编

你好,请问全部写成汇编之后是会有多大的提升呢。

compile error: x29 cannot be used in asm here

I am trying to compile on Kunpeng 920-4826 machine, but got this error

g++ -c -fPIC test_temp_L1.c -o test_temp_L1.o
test_temp_L1.c: In function 'void SGEMM_NN_L1(float*, float*, float*, long int, long int, long int)':
test_temp_L1.c:2727:1: error: x29 cannot be used in asm here
 }
 ^
Makefile:14: recipe for target 'test_temp_L1.o' failed

It seems like register x29 is the frame pointer and cannot be used directly. Can you check this problem, Thanks.

The infomation of the CPU

Handle 0x002B, DMI type 4, 48 bytes
Processor Information
        Socket Designation: CPU01
        Type: Central Processor
        Family: ARM
        Manufacturer: HiSilicon
        Version: Kunpeng 920-4826
        Voltage: 0.9 V
        External Clock: 100 MHz
        Max Speed: 2600 MHz
        Current Speed: 2600 MHz
        Status: Populated, Enabled
        Upgrade: Unknown
        Core Count: 48
        Core Enabled: 48
        Thread Count: 48
        Characteristics:
                64-bit capable
                Multi-Core
                Execute Protection
                Enhanced Virtualization
                Power/Performance Control

How to handle corner case in GEMM

Hello!
I'm learning from your work in this project.
I have a question about how to handle the corner case in GEMM.
From your code I am not sure weather you solve N is not divisible by 16 and K is not divisible by 4.

Make error with benchmark/small SGEMM

I get following error when do the following instruction:

cd benchmark/small_SGEMM && make

error output:

../../NN_LIB/SMM_NT_thread.c: In function ‘void SGEMM_NT_mp(float*, float*, float*, long int, long int, long int)’:
../../NN_LIB/SMM_NT_thread.c:6362:19: warning: ignoring return value of ‘int posix_memalign(void**, size_t, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
     posix_memalign(&ptr, 64, T * GEMM_K * 17  *sizeof( float ));
     ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
g++ -c -O3 ../../NN_LIB/DGEMM_NN.c -o DGEMM_NN.o
g++ LibShalom_sgemm.o test_temp_L1.o test_temp.o NN_SMM.o test_SMM_NT.o SMM_thread.o SMM_NT_thread.o DGEMM_NN.o -O2 -fopenmp  -o main
/usr/bin/ld: SMM_thread.o: in function `LibShalom_sgemm_mp(int, int, float*, float*, float*, long, long, long)':
SMM_thread.c:(.text+0x5a0): undefined reference to `Small_MGN_NN_SGEMM(float*, float*, float*, long, long, long, long)'
/usr/bin/ld: SMM_thread.c:(.text+0x620): undefined reference to `Small_NGM_NN_SGEMM(float*, float*, float*, long, long
, long, long)'
/usr/bin/ld: SMM_NT_thread.o: in function `SGEMM_NT_mp(float*, float*, float*, long, long, long) [clone ._omp_fn.0]':
SMM_NT_thread.c:(.text+0x36b0): undefined reference to `SGEMM_NT_kernel_exist_1(float*, float*, float*, long, long, long, long, long, float*, long)'
/usr/bin/ld: SMM_NT_thread.c:(.text+0x3744): undefined reference to `SGEMM_NT_kernel_exist_1(float*, float*, float*, long, long, long, long, long, float*, long)'
collect2: error: ld returned 1 exit status
make: *** [Makefile:4:main] error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.