Tether Launches Cross-Platform BitNet LoRA Framework Enabling Billion-Parameter AI Training And Inference On Consumer Devices

by Alisa Davidson

Published: March 18, 2026 at 3:00 am Updated: March 18, 2026 at 3:00 am

by Anastasiia O

Edited and fact-checked: March 18, 2026 at 3:00 am

In Brief

Tether has introduced a cross-platform framework that reduces the cost and hardware requirements of AI model training, enabling advanced LLMs to be fine-tuned efficiently on everyday consumer devices, including smartphones and standard GPUs.

Tether Launches Cross-Platform BitNet LoRA Framework Enabling Billion-Parameter AI Training And Inference On Consumer Devices

USDT stablecoin issuer Tether announced the launch of what it describes as the first cross-platform LoRA fine-tuning framework designed for Microsoft BitNet models, which are based on 1-bit large language model architecture. The capability is integrated into its QVAC Fabric system and is reported to significantly reduce both memory usage and computational demands. According to the company, this development enables large-scale language models, including those with billions of parameters, to be fine-tuned using widely available consumer hardware such as laptops, standard graphics processing units, and modern smartphones.

The development and maintenance of artificial intelligence systems have traditionally required enterprise-grade hardware, particularly specialized NVIDIA infrastructure or cloud-based environments. These requirements have contributed to high operational costs, limiting access to advanced AI development primarily to large organizations with substantial financial resources and access to specialized computing systems.

Tether stated that its QVAC Fabric large language model, enhanced by the newly introduced BitNet-based framework, addresses these limitations by supporting cross-platform LoRA fine-tuning and accelerating inference across a range of heterogeneous consumer GPUs. These include hardware from Intel, AMD, and Apple Silicon, among others. As a result, users are able to train and customize AI models directly on commonly available consumer devices rather than relying on centralized infrastructure.

The company reported that its engineering team has successfully demonstrated BitNet fine-tuning on mobile graphics processing units for the first time, including platforms such as Adreno, Mali, and Apple Bionic GPUs. Internal testing indicated that a 125 million-parameter BitNet model could be fine-tuned in approximately ten minutes on a Samsung S25 device equipped with an Adreno GPU using a biomedical dataset consisting of roughly 300 documents, or about 18,000 tokens. For a 1 billion-parameter model, the same dataset required approximately one hour and eighteen minutes on the Samsung S25 and one hour and forty-five minutes on an iPhone 16. The company also reported that it was able to extend testing to models as large as 13 billion parameters on the iPhone 16 under maximum device capacity conditions.

Advancements In Edge-Based AI Training And Performance Optimization

Further findings suggest that the framework can support fine-tuning of models up to twice the size of comparable non-BitNet models operating under Q4 quantization on edge devices. This outcome is attributed to the reduced memory footprint associated with the BitNet architecture.

In addition to improvements in training, the framework also demonstrates enhanced inference performance. Tests conducted on mobile devices indicated that BitNet models perform substantially faster when executed on GPUs, with processing speeds ranging from two to eleven times higher than CPU-based execution. These results indicate that mobile GPUs are increasingly capable of handling workloads that previously required specialized hardware or data center-level resources.

The system also shows notable gains in memory efficiency. Benchmark data suggests that a BitNet-1B model using TQ1_0 configuration requires up to 77.8 percent less VRAM compared to a 16-bit Gemma-3-1B model and 65.6 percent less than a 16-bit Qwen3-0.6B model during both inference and LoRA fine-tuning processes. These reductions provide additional capacity for running larger models and enabling personalization features on hardware that would previously have been considered insufficient.

Tether further indicated that the framework introduces LoRA fine-tuning capabilities for 1-bit large language models on non-NVIDIA hardware for the first time, extending compatibility to AMD, Intel, Apple Silicon, and mobile GPU platforms. By reducing reliance on specialized infrastructure and cloud services, the approach allows sensitive data to remain stored locally on user devices. The company noted that this efficiency may also support the development of federated learning systems, in which models can be trained collaboratively across distributed devices while maintaining data privacy and minimizing dependence on centralized systems.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson