This work focuses on the increasingly serious and urgent environmental problem of wildfire, studying and testing the possible schemes, strategies and solutions in real application of autonomous wildfire perception and fighting. To efficiently tackle the wildfire fighting challenges of early detection and fast response with unmanned aerial vehicles (UAVs), several intelligent computer vision algorithms are studied, updated, and fine-tuned in this work. These algorithms are designed to work in conjunction with UAV motion planning and path planning algorithms to detect (early) wildfire spots based on aerial images, estimate the distance between wildfire spots and UAV(s), geographically locate these wildfire spots and efficiently approach these spots for firefighting. The main contribution of this work is the design of an intelligent wildfire perception system which utilizes both visible and infrared (VI) aerial image information, integrates deep-learning (DL) filters for oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (ORB features) to geo-positioning these wildfire spots. This work proposes a novel concept of combining DL models and ORB based simultaneous localization and mapping (SLAM) technologies as UAV applications in vision-based wildfire detection, estimation, and geo-positioning and management/fighting. There are three main functional aspects of the visual-infrared image based intelligent wildfire perception system in this work. The first main aspect is wildfire detection, which includes wildfire image classification, wildfire semantic segmentation, and wildfire spot(s) detection (object detection). For wildfire image classification, an optimized ResNet-based neural network model is utilized to achieve higher classification accuracy. For wildfire semantic segmentation, this work focuses on the U-shaped deep network models (UNets) and proposes the application of original UNet, an attention gate-enhanced UNet, and a SqueezeNet lightweight attention gate UNet for early wildfire smoke and flame segmentation. For online wildfire object detection, the model of you only look once version 5 (YOLOv5) and updated model of you only look once version 8 (YOLOv8) are utilized to obtain accurate bounding boxes of wildfire spots, YOLOv8 model can avoid pre-anchors and straightforwardly detect and track the center of wildfire spots. The second main aspect is the work of achieving UAV-wildfire distance estimation and wildfire geo-positioning through monocular ORB-SLAM technology (SLAM2 and SLAM3). This aspect has two main designs: The first one designs an attention gate UNet to filter ORB feature points for wildfire distance estimation, achieves more robust results and detailed segmentation at the edges of wildfire smoke and flame spots. This design can be deployed on ground workstations for detailed missions. The other one designs YOLOv8 filtering ORB-SLAM3 features for online wildfire distance estimation and geo-positioning, which can be deployed on the onboard computers for real-time wildfire spot recognition and geopositioning. This lightweight and fast-responding application can combine with UAV path and motion planning for online firefighting. The third aspect comprises several smaller functions to achieve the wildfire perception system integration. Most of these functions use the infrared information, because the energy radiation information could support the deep learning wildfire detection to have more confident and robust detection results. A geometry-based visible and infrared image alignment and registration scheme is designed in this work. The image registration work is the basis of the visible and infrared image fusion. Infrared images can also be used to estimate wildfire spot temperature to guide the safe flight of UAVs. After that, there is a design of online water retardant release mechanism, and it is briefly discribed.