Most existing image classification methods have achieved significant progress in the field of natural images. However, in the field of diabetic foot ulcer (DFU) where data is scarce and complex, the accurate classification of data is still a thorny problem. In this paper, we propose an Asymmetric Convolutional Transformer Network (ACTNet) for the multi-class (4-class) classification task of DFU. Specifically, in order to strengthen the expressive ability of the network, we design an asymmetric convolutional module in the front part of the network to model the relationship between local pixels, extract the underlying features of the image, and guide the network to focus on the central region in the image that contains more information. Furthermore, a novel pooling layer is added between the encoder and the classification head in the Transformer, which weights the data sequence generated by the encoder to better correlate the features between the input data. Finally, to fully exploit the performance of the model, we pretrained our model on ImageNet and fine-tune it on DFU images. The model is validated on the DFUC2021 test set, and the F1-score and AUC value are 0.593 and 0.824, respectively. The experiments show that our model has excellent performance even in the case of a small dataset.