How to create successful AI agent data?
Original author: jlwhoo7, Crypto Kol
Original translation: zhouzhou, BlockBeats
Editor's note:This article shares tools and methods that help improve the performance of AI agents, with a focus on data collection and cleaning. A variety of no-code tools are recommended, such as tools for converting websites to LLM-friendly formats, and tools for Twitter data crawling and document summarization. Storage tips are also introduced, emphasizing that the organization of data is more important than complex architecture. With these tools, users can efficiently organize data and provide high-quality input for the training of AI agents.
The following is the original content (the original content has been reorganized for easier reading and understanding):
We see many AI agents launched today, 99% of which will disappear.
What makes successful projects stand out? Data.
Here are some tools that can make your AI agent stand out.

Good data = good AI.
Think of it like a data scientist building a pipeline:
Collect → Clean → Validate → Store.
Before optimizing your vector database, tune your few-shot examples and prompt words.

I view most of today’s AI problems as Steven Bartlett’s “bucket theory” — solving them piece by piece.
First, lay a good data foundation, which is the foundation for building a good AI agent pipeline.

Here are some great tools for data collection and cleaning:
Code-free llms.txt generator: convert any website to LLM-friendly text.

Need to generate LLM-friendly Markdown? Try JinaAI's tool:
Crawl any website with JinaAI and convert it to LLM-friendly Markdown.
Just prefix the URL with the following to get an LLM-friendly version:
http://r.jina.ai<URL>

Want to get Twitter data?
Try ai16zdao's twitter-scraper-finetune tool:
With just one command, you can scrape data from any public Twitter account.
(See my previous tweet for specific operations)

Data source recommendation: elfa ai (currently in closed beta, you can PM tethrees to get access)
Their API provides:
Most popular tweets
Smart follower filtering
Latest $ mentions
Account reputation check (for filtering spam)
Great for high-quality AI training data!

For document summarization: Try Google's NotebookLM.
Upload any PDF/TXT file → let it generate few-shot examples for your training data.
Great for creating high-quality few-shot hints from documents!

Storage Tips:
If you use virtuals io's CognitiveCore, you can upload the generated file directly.
If you run ai16zdao's Eliza, you can store data directly into vector storage.
Pro Tip: Well-organized data is more important than fancy schemas!

You may also like

Trump, the World's Largest Oil Trader

If the US and Iran have not reached an agreement in 5 days, what other cards does Trump have?

Tether Whale Dumps £12 Million, Backing Crypto’s ‘British Trump’

Ethereum Foundation Post: Rethinking the Division of Work Between L1 and L2 to Build the Ultimate Ethereum Ecosystem

Two Major Prediction Market Platforms Unite Rarely, What Is the Story Behind This New Fund?

WEEX Official Product Launch: Win LALIGA Tickets & Unlock the 3-in-1 Crypto Trading Suite
Trade crypto without downloading an app. Join the WEEX H5, API, SKILLs livestream to explore the new trading experience, win LALIGA VIP tickets, and share 420 USDT rewards.

Dragonfly Partners: Most agents will not engage in autonomous trading, how can crypto payments prevail?

US AI Startup Goes All In on Chinese Mega-Model | Rewire News Morning Brief

Trump Lies Again: A "Five-Day Pause" Psyop, How Wall Street, Bitcoin, and Polymarket Insiders Synced Uposciogen

When a Token Becomes Labor, People Become the Interface

Ceasefire News Leaked Ahead of Time? Large Polymarket Bets on Outcome Before Trump's Tweet

BlackRock CEO's Annual Shareholder Letter: How is Wall Street Using AI to Keep Profiting from National Pension Funds?

Sun Valley Releases 2025 Financial Report: Bitcoin Mining Revenue Reaches $670 Million, Accelerating Transformation to AI Infrastructure Platform
On March 16, 2026, in Dallas, Texas, USA, CanGu Company (New York Stock Exchange code: CANG, hereinafter referred to as "CanGu" or the "Company") today announced its unaudited financial performance for the fourth quarter and full year ended December 31, 2025. As a btc-42">bitcoin mining enterprise relying on a globally operated layout and dedicated to building an integrated energy and AI computing power platform, CanGu is actively advancing its business transformation and infrastructure development.
• Financial Performance:
Total revenue for the full year 2025 was $688.1 million, with $179.5 million in the fourth quarter.
Bitcoin mining business revenue for the full year was $675.5 million, with $172.4 million in the fourth quarter.
Full-year adjusted EBITDA was $24.5 million, while the fourth quarter was -$156.3 million.
• Mining Operations and Costs:
A total of 6,594.6 bitcoins were mined throughout the year, averaging 18.07 bitcoins per day; of which 1,718.3 bitcoins were mined in the fourth quarter, averaging 18.68 bitcoins per day.
The average mining cost for the full year (excluding miner depreciation) was $79,707 per bitcoin, and for the fourth quarter, it was $84,552;
The all-in sustaining costs were $97,272 and $106,251 per bitcoin, respectively.
As of the end of December 2025, the company has cumulatively produced 7,528.4 bitcoins since entering the bitcoin mining business.
• Strategic Progress:
The company has completed the termination of the American Depositary Receipt (ADR) program and transitioned to a direct listing on the NYSE to enhance information transparency and align with its strategic direction, with a long-term goal of expanding its investor base.
CEO Paul Yu stated: "2025 marked the company's first full year as a bitcoin mining enterprise, characterized by rapid execution and structural reshaping. We completed a comprehensive adjustment of our asset system and established a globally distributed mining network. Additionally, the company introduced a new management team, further strengthening our capabilities and competitive advantage in the digital asset and energy infrastructure space. The completion of the NYSE direct listing and USD pricing also signifies our transformation into a global AI infrastructure company."
"As we enter 2026, the company will continue to optimize its balance sheet structure and enhance operational efficiency and cost resilience through adjustments to the miner portfolio. At the same time, we are advancing our strategic transformation into an AI infrastructure provider. Leveraging EcoHash, we will utilize our capabilities in scalable computing power and energy networks to provide cost-effective AI inference solutions. The relevant site transformations and product development are progressing simultaneously, and the company is well-positioned to sustain its execution in the new phase."
The company's Chief Financial Officer, Michael Zhang, stated: "By 2025, the company is expected to achieve significant revenue growth through its scaled mining operations. Despite recording a net loss of $452.8 million from ongoing operations, mainly due to one-time transformation costs and market-driven fair value adjustments, the company, from a financial perspective, will reduce its leverage, optimize its Bitcoin reserve strategy and liquidity management, introduce new capital to strengthen its financial position, and seize investment opportunities in high-potential areas such as AI infrastructure while navigating market volatility."
The total revenue for the fourth quarter was $1.795 billion. Of this, the Bitcoin mining business contributed $1.724 billion in revenue, generating 1,718.3 Bitcoins during the quarter. Revenue from the international automobile trading business was $4.8 million.
The total operating costs and expenses for the fourth quarter amounted to $4.56 billion, primarily attributed to expenses related to the Bitcoin mining business, as well as impairment of mining machines and fair value losses on Bitcoin collateral receivables.
This includes:
· Cost of Revenue (excluding depreciation): $1.553 billion
· Cost of Revenue (depreciation): $38.1 million
· Operating Expenses: $9.9 million (including related-party expenses of $1.1 million)
· Mining Machine Impairment Loss: $81.4 million
· Fair Value Loss on Bitcoin Collateral Receivables: $171.4 million
The operating loss for the fourth quarter was $276.6 million, a significant increase from a loss of $0.7 million in the same period of 2024, primarily due to the downward trend in Bitcoin prices.
The net loss from ongoing operations was $285 million, compared to a net profit of $2.4 million in the same period last year.
The adjusted EBITDA was -$156.3 million, compared to $2.4 million in the same period last year.
The total revenue for the full year was $6.881 billion. Of this, the revenue from the Bitcoin mining business was $6.755 billion, with a total output of 6,594.6 Bitcoins for the year. Revenue from the international automobile trading business was $9.8 million.
The total annual operating costs and expenses amount to $1.1 billion.
Specifically, they include:
· Revenue Cost (excluding depreciation): $543.3 million
· Revenue Cost (depreciation): $116.6 million
· Operating Expenses: $28.9 million (including related-party expenses of $1.1 million)
· Miner Impairment Loss: $338.3 million
· Bitcoin Collateral Receivable Fair Value Change Loss: $96.5 million
The full-year operating loss is $437.1 million. The continuing operations net loss is $452.8 million, while in 2024, there was a net profit of $4.8 million.
The 2025 non-GAAP adjusted net profit is $24.5 million (compared to $5.7 million in 2024). This measure does not include share-based compensation expenses; refer to "Use of Non-GAAP Financial Measures" for details.
As of December 31, 2025, the company's key assets and liabilities are as follows:
· Cash and Cash Equivalents: $41.2 million
· Bitcoin Collateral Receivable (Non-current, related party): $663.0 million
· Miner Net Value: $248.7 million
· Long-Term Debt (related party): $557.6 million
In February 2026, the company sold 4,451 bitcoins and repaid a portion of related-party long-term debt to reduce financial leverage and optimize the asset-liability structure.
As per the stock repurchase plan disclosed on March 13, 2025, as of December 31, 2025, the company had repurchased a total of 890,155 shares of Class A common stock for approximately $1.2 million.

The US AI Startup Is Loving China's Open Source Model

Three Weeks of the US-Iran War: Who's Making Money, Who's Paying the Bill?

Interpreting Polymarket's Major Update Last Night: Fee Expansion, Self-Regulation, and New Incentives

From Human Application to Intelligent Collaboration: How GOAT Network Builds the Next Generation Digital Economy

CZ Washington Dialogue: Crypto Entrepreneurs are Accelerating Their Return to the United States
Trump, the World's Largest Oil Trader
If the US and Iran have not reached an agreement in 5 days, what other cards does Trump have?
Tether Whale Dumps £12 Million, Backing Crypto’s ‘British Trump’
Ethereum Foundation Post: Rethinking the Division of Work Between L1 and L2 to Build the Ultimate Ethereum Ecosystem
Two Major Prediction Market Platforms Unite Rarely, What Is the Story Behind This New Fund?
WEEX Official Product Launch: Win LALIGA Tickets & Unlock the 3-in-1 Crypto Trading Suite
Trade crypto without downloading an app. Join the WEEX H5, API, SKILLs livestream to explore the new trading experience, win LALIGA VIP tickets, and share 420 USDT rewards.
