How can the average person start algorithmic trading in 2026?

By: blockbeats|2026/04/02 18:00:23

Original Title: How I'd Become a Quant If I Had to Start Over Tomorrow

Original Author: gemchanger, coldvision Founder

Translation & Annotations: MrRyanChi, insiders.bot

In 2026, Quantitative Trading is the Foundation Skill of Every Trader

Last week, I was invited by the Hong Kong University Artificial Intelligence and Management Association (@camo_hku) to share insights on the Money-Making Methods of the Agent Era. The biggest takeaway from the entire event for me was one thing:

AI Era = Era of Technological Empowerment

In the past, quant trading was the exclusive domain of a small number of institutions. Now, numerous studios and even individuals are involved in creating quantitative strategies and benefiting from them. In other words, if you still don't understand the essence of quant, you will face a significant disadvantage in the market.

In today's prevalent OpenClaw era, anyone can make money through quant. But this requires two prerequisites.

First, infrastructure, which is exactly what we at @insidersdotbot are trying to achieve through the development of a trading platform native to agents and algorithms, databases, and Skills. The official version of Agent-based backtesting will also be part of this ecosystem.

Second, and most importantly as an individual, is the ability to architect and design strategies. Strategies don't need to be 100% accurate, but they must be unique, sophisticated, and able to capture opportunities that others are not aware of.

As long as you have a strategy specific to you + awesome infrastructure, then, with the empowerment of Vibe Coding, you are not far from financial freedom.

And when it comes to learning about strategies and architecture, the original article by @gemchange_ltd is the most comprehensive "Quantitative Trading Knowledge Map" I have seen so far. It lays out, as a blueprint for predicting the market, every puzzle piece needed to become a top quant in the correct learning sequence, all at once.

After reading it, even if you are a novice, you should understand how to start algorithmic trading and how to design your own strategy.

If you are a market prediction trader, this is a must-read article for you.

If you trade other assets, many of the ideas in this article are universal, and I believe you will also benefit greatly.

The original text is very hardcore and academic. In order for anyone who is new to Polymarket, or even has no mathematical background, to understand, I have made a lot of revisions and additions. I assume you know nothing about complex mathematics, so I have added 20 fully Chinese illustrative diagrams and used the most down-to-earth language, everyday analogies, and practical examples to help you break down every concept.

If you want to earn money in the prediction market in the long term, rather than being a gambler, this article is your starting point.

By the way, this article has been optimized for structure for Agents. Just like the insiders.bot platform has been optimized for both human and AI traders. So, feel free to feed this article to your OpenClaw, Manus, Claude, or any AI, and immediately start building your quant model.

How can the average person start algorithmic trading in 2026?

Foreword: Are You Trading or Gambling?

Let me ask you a question.

You saw a contract on Polymarket, where the YES price for "Trump winning the election" is $0.52. You think the probability of him winning is higher, so you spent $520 to buy 1000 shares of YES.

You think you are trading. But in reality, you are just gambling. Because you have not answered these questions:

* How did you come up with your 52%?

* Is your source of information better than that of other market participants?

* If tomorrow there is breaking news, how should your probability estimate be updated?

* How much position should you buy to avoid being liquidated in the "unlikely event of being wrong"?

These questions cannot be answered by "feeling." They require mathematics.

In 2025, the base salary for entry-level quants at top quant firms (Jane Street, Citadel, HRT) ranges from $300K to $500K. Financial recruiting in AI and machine learning has grown by 88% year over year. And it's not because these companies love mathematicians. It's because math can actually make money through more accurate valuation models.

And Polymarket happens to be a trading market that perfectly integrates all core concepts of quantitative finance: Probability Theory, Information Theory, Convex Optimization, Integer Programming, all of which come into play.

Chapter 1: Probability, the Sole Language of an Uncertain World

Most people have a huge misunderstanding of quantitative trading. They think quantitative trading is about "picking stocks," having unique insights into a particular event.

But it's not.

The essence of quantitative trading = pure mathematics.

More specifically, what you are looking for is:

* Statistical correlation

* Market inefficiencies in pricing

* Structural advantages.

These advantages exist because the market is a complex system composed of humans, and humans always make systemic errors.

In the world of quantitative finance, all problems can ultimately be reduced to one question: What are the odds, and how much of an edge does this odds give me?

So first and foremost, you need to deeply understand the nature of "probability."

Conditional Thinking: Farewell to Absolute Right and Wrong

Ordinary people think about problems in terms of absolute right and wrong. Something either happens or it doesn't.

But a quant's way of thinking is conditional.

They ask: Given certain information, what is the likelihood of this event occurring?

The "likelihood of an event given certain information" is conditional probability.

In plain language: How does the original probability change when you get a new clue?

Sound a bit convoluted? Let's look at an actual example on Polymarket.

Imagine you are trading a contract on "Will XYZ Token rise today." Historical data shows that the token has a 60% daily rise probability. This is the Base Rate. However, if today the token's trading volume is above the historical average, the probability of it rising becomes 75%.

That 75% conditional probability is the true "signal." The isolated 60% is just noisy background data.

Let's take a more intuitive example. The probability of rain is 30%. But what if the sky is already dark with clouds? The probability of rain may increase to 85%. "Dark clouds" are your conditional information, which bumps your probability estimate from 30% to 85%. This is the essence of conditional probability.

Bayesian Theorem: How to Update Your Beliefs in Real Time

The Bayesian Theorem is the soul of quant trading. It answers the question: When you receive new data, how should you update your prior beliefs?

Its formula is as follows:

P(A|B) = P(A∩B) / P(B)

* P(A|B): The probability of A given that B has occurred

* P(A∩B): The probability of both A and B occurring

* P(B): The probability of B occurring

The logical essence of the Bayesian Theorem is this:

* You start with an initial estimate (e.g., I think this has a 50% chance of happening).

* Suddenly, you see a piece of new evidence (e.g., a positive news article is released).

* You ask yourself two questions: What is the likelihood of this news article if the event is indeed happening? What is the likelihood if the event is not happening at all?

* Based on the answers to these two questions, you adjust your initial estimate (e.g., bumping it up from 50% to 58%).

Let's use a Polymarket scenario to understand this.

Your model calculates that the fair price for a certain market should be $0.50 (meaning you believe the probability of this event happening is 50%). This is your prior belief.

Suddenly, breaking news arrives. Economic data is 3% better than expected.

Using the Bayes' theorem, you can accurately calculate your new belief. Let's say it comes out to be 58%. Then your new fair price would be $0.58.

In the market, whoever can quickly and accurately complete this type of probability update will take away most of the money. That's why quant teams spend millions of dollars to build low-latency systems. Not because they like speed, but because being 0.1 seconds faster could mean earning tens of thousands of dollars more.

If you want to lay a good foundation, read Harvard University's free book "Introduction to Probability" and focus on the first 6 chapters. Then try writing some Python code to simulate flipping a coin 10,000 times and see for yourself how the law of large numbers works.

Expected Value and Variance: Your Two Best Friends

In trading, there are two numbers more important than anything else.

Expected Value (EV), your certainty.

If a trade's expected value is positive, it means that as long as you repeat it enough times, you will make money in the long run.

Variance, your risk.

It tells you how much fluctuation you will experience before reaching that profitable "long run."

For example, suppose you have a strategy where each trade has an expected return of $2, but the standard deviation is $50. This means that while you "on average" make $2 per trade, the result of a single trade could fluctuate dramatically between losing $100 and gaining $100. If your capital is only $200, you might have already busted out with three consecutive losses before the "long run" arrives.

Kelly Criterion: Scientifically Determine Bet Size

Now that we know the expected value and variance, when faced with a good opportunity, how much should I buy? Should I go all-in?

Absolutely not. Here we need to introduce the Kelly Criterion.

The Kelly Criterion is specifically designed to tell you: given a certain win rate and odds, what percentage of your total funds you should bet to make your money grow fastest while avoiding bankruptcy.

If the calculation result is 20%, it means you should bet at most 20% of your total funds.

In practice, because our estimation of the win rate often has errors (you may think you have a 60% win rate, but it could actually be only 55%), top sharps usually use 'Half Kelly,' which means only betting half of the result calculated by the Kelly Criterion. This can significantly reduce the volatility of your funds while retaining most of the earning speed.

Chapter 1 Homework (2 hours per day, about 3-4 weeks to complete):

1. Reading: Read "Introduction to Probability" co-authored by Blitzstein & Hwang (Harvard offers a free PDF version, link: http://probabilitybook.net[[1]](https://stat110.hsites.harvard.edu/))

2. Programming Exercise 1: Simulate 10,000 coin tosses to visually confirm the 'Law of Large Numbers'.

3. Programming Exercise 2: Implement a Bayesian updater: input prior probability and likelihood function, output posterior probability.

Chapter 2: Statistics = Your Noise Detector

Once you learn the language of probability, the next step is to learn to 'listen to data'.

That's where statistics comes in.

The first lesson statistics teaches us is: the vast majority of things that look like 'signals' are actually noise.

Hypothesis Testing and the Pitfalls of Multiple Comparisons

Let's say you've written a trading bot, and backtesting data shows it can earn 15% per year. Is this real, or is it just lucky?

At this point, you need to calculate a p-value: What is the probability that this strategy, if it's actually garbage (pure luck), could coincidentally achieve a 15% return? Statistics can tell you how unlikely this probability is (e.g., less than 5%).

However, there is a huge pitfall here called the Multiple Comparisons Problem.

Imagine you have 1,000 monkeys each throw darts 100 times. Purely by luck, a few monkeys will inevitably continuously hit the bullseye, seemingly behaving like "dart-throwing masters." But you wouldn't hire them as investment managers, right?

Developing a trading strategy is similar. If you use a computer to generate 1,000 random strategies and backtest them on historical data, purely by luck, about 50 strategies will appear to be highly profitable.

Every novice who has just entered the field will significantly overestimate the "effective strategies" they discover. I can responsibly tell you that the first 10 strategies you develop are definitely among those luckily successful monkeys.

What is the solution? You need to apply Bonferroni correction to raise your significance threshold, or use False Discovery Rate (FDR) control. In simple terms, if you test 100 strategies, your significance threshold will no longer be 0.05 but 0.05/100 = 0.0005. Only then can you filter out false signals generated by luck.

Regression Analysis: Decompose Your Source of Returns

Linear regression is a key tool in the finance industry. In quantitative trading, you compare your strategy's returns with the market's performance.

The intercept term α (Alpha) here is your excess return. It represents the money you earned purely through your personal skills that cannot be explained by the market's movement.

For example, suppose your strategy earned 20% this year. But if the whole market could have gained 18% even with blindfolded investment, then your skill score (Alpha) is actually only 2%.

What's worse, if your strategy is merely "buying high and selling higher," then after removing market fluctuations, your Alpha could potentially become zero or even negative. This indicates that your so-called "trading edge" is merely a disguised trend-following behavior.

In financial data, there is another special issue that needs to be noted: Data often exhibit autocorrelation (today's price is related to yesterday's) and heteroscedasticity (volatility is not constant). So you need to use Newey-West standard errors to correct your regression results; otherwise, your statistical tests will give overly optimistic conclusions.

Maximum Likelihood Estimation (MLE): The Art of Reverse Inference

When you hear a top institution's quant say they are "calibrating" a model, they are almost always talking about one thing: Maximum Likelihood Estimation (MLE).

The principle of MLE is actually straightforward; it is a form of "reverse inference."

For example, suppose you see a puddle with a 2-meter diameter on the roadside. You want to know how much it rained last night. You have a "rainfall model" that tells you how different rainfall amounts would result in a puddle of a certain size.

What MLE does is reverse inference: since I've already seen a 2-meter puddle, among all possible rainfall amounts, which one is most likely to create such a large puddle?

Whether fitting a GARCH model to volatility or calibrating option pricing based on market quotes, MLE is a core tool.

The same applies to trading. You see the market price of an option (the puddle), and you want to infer the market's expectation of future volatility (rainfall amount). MLE helps you find the hidden parameter that "best explains the current price."

As an exercise, you can try downloading some real asset price data (e.g., using Python's yfinance library) and test whether they follow a normal distribution.

Spoiler: They absolutely do not. The real world is full of Fat Tails, meaning the frequency of extreme events is much higher than predicted by a normal distribution. Try fitting a t-distribution using MLE to see what real risk truly looks like.

Chapter 2 Homework (to be completed in about 4-5 weeks):

1. Reading: Read Chapters 1 to 13 of Wasserman's "All of Statistics." (CMU publicly available PDF version: https://www.stat.cmu.edu/~brian/valerie/617-2022/0%20-%20books/2004%20-%20wasserman%20-%20all%20of%20statistics.pdf)

2. Programming Exercise 1: Use yfinance to download actual stock returns data, conduct a normality test on it (Spoiler: it will likely be rejected, indicating that the returns do not follow a normal distribution). Then fit a t distribution using Maximum Likelihood Estimation (MLE) and compare the two.

3. Programming Exercise 2: Using the statsmodels library, run a Fama-French three-factor regression on a stock portfolio.

4. Programming Exercise 3: Implement a Permutation Test: shuffle the dates randomly 10,000 times and compare the performance after shuffling with the actual performance.

Chapter 3: Linear Algebra, the Underlying Engine of the Quantitative World

Many people find linear algebra boring, just a bunch of matrix operations. But it is actually the engine running the entire quantitative world. Portfolio construction, Principal Component Analysis (PCA), Neural Networks, Covariance Estimation, Factor Models—all rely on it.

There is even a legend in the field that the coveted Medal of Honor Hedge Fund, which outperformed Buffett with an annual return of 30%, is based on a Markov model rooted in linear algebra.

If you cannot think fluently in terms of matrices, you cannot become a quant.

Covariance Matrix: Understanding Asset Interactions

A covariance matrix Σ (Sigma) captures how each asset moves relative to all other assets.

When looking at 500 markets, this matrix is of size 500×500, containing 125,250 unique entries. Each entry tells you: "When Asset A rises, does Asset B tend to rise or fall, and by how much."

The total portfolio variance can be collapsed into a very elegant mathematical expression:

σ²_p = w^T Σ w

* w is your holding weight vector

* Σ is the covariance matrix

This quadratic form formula is the core of Markowitz's Modern Portfolio Theory, risk management, everything.

In other words, if you are simultaneously trading in multiple related markets (e.g., "Trump Wins Election" and "GOP Wins Senate"), your total risk is not simply adding up the risk of each market. You need to consider their correlation. And the covariance matrix is the tool that helps you do this.

Eigen Decomposition and PCA: Finding Hidden Drivers

When you first use eigen decomposition for Principal Component Analysis (PCA), the way you view the world will change.

PCA can be explained with the following analogy: Suppose you want to describe a person's physique, you can record his height, weight, arm length, leg length, shoulder width, and many other data points. However, many of these data points are interrelated (taller people tend to have longer legs). The purpose of PCA is to condense these many complex data points into a few core "hidden labels," such as "overall body size" and "level of leanness."

It's the same in the financial market. If you observe the price movements of 500 tokens, you will find that just the first 5 "hidden labels" (eigenvectors) can explain 70% of the market's total variance. Everything else is essentially noise.

You don't need to understand what each of the 500 tokens is doing. You only need to grasp these 5 "hidden drivers" (e.g., overall market sentiment, interest rate changes, specific sector trends). That's the magic of dimensionality reduction.

If you have enough time, it is recommended to watch MIT Professor Gilbert Strang's linear algebra lectures. Then, perform a PCA decomposition on the S&P 500 returns using Python and see for yourself what the first few principal components represent.

You will discover that the first principal component is nearly equivalent to "the overall market movement."

Chapter 3 Homework (Approximately 4-6 weeks to complete):

1. Watch the video: Watch all of Gilbert Strang's MIT 18.06 linear algebra course videos without skipping a single lecture. (MIT OpenCourseWare free to watch: https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010/video_galleries/video-lectures/)

2. Reading: Read Strang's "Introduction to Linear Algebra," and complete the exercises in the book. (Textbook website: https://math.mit.edu/~gs/linearalgebra/)

3. Programming Exercise 1: Perform PCA (Principal Component Analysis) on the S&P 500's return data, plot the eigenvalue spectrum (i.e., how much variance each principal component explains), and identify the top 3 most important principal components.

4. Programming Exercise 2: Implement Markowitz Mean-Variance Optimization from scratch.

Chapter 4: Calculus and Optimization, Capturing the Language of Change

Calculus is the language of "change." In the financial markets, everything is changing: prices, volatility, correlations, the entire probability distribution shifts by the second.

Calculus is used to describe and leverage these changes.

Derivatives and Taylor Expansion: Simplifying Complexity through Approximation

Derivatives appear in every neural network's backpropagation and in the computation of every option's "Greeks."

In quantitative trading, we often use Taylor expansion for approximate calculations. Fundamentally, derivatives provide the necessary input for Taylor expansion.

The essence of Taylor expansion is to approximate any complex function by tweaking a polynomial, thereby modeling the relationship between x (the key factor) and y (asset price).

Imagine you need to draw a highly intricate curved line, but you only have a ruler. What do you do?

First, you use a straight line to approximate a point on the curve (this is called a first-order approximation, known as Delta in options). Around this point, the line closely matches the curve.

Second, if you want a better fit, you can bend the line slightly to form a parabola (this is a second-order approximation, called Gamma in options).

The more you bend the line, the closer your drawing will resemble that intricate curve.

In trading, the price change of an option is an extremely complex formula. Since we can't do the math, we use Taylor expansion to break it down into several simple parts:

Price Direction Impact (Delta) + Price Curvature Impact (Gamma) + Time Decay Impact (Theta) + Volatility Change Impact (Vega).

Convex Optimization: Seeking the Best Solution

In quantitative finance, almost all "optimization" problems can be formulated as convex optimization problems. For example, given a risk budget, how should funds be allocated to maximize returns?

Imagine you are blindfolded and placed in a valley, asked to walk to the valley floor (optimal solution).

* If this valley is rugged, you might end up in a pit halfway down and can't get out (local optimum).

* But if it's a "bowl"-shaped perfect valley, you just need to keep walking downhill (gradient descent), and even with your eyes closed, you will definitely reach the single lowest point at the very bottom (global optimum).

As long as you can express a financial problem as a "bowl"-shaped mathematical formula, the computer can instantly help you find the perfect answer. That's what convex optimization does for you. The original author mentioned that Boyd and Vandenberghe of Stanford University wrote a free textbook "Convex Optimization," which is the Bible of this field. The Python library cvxpy can help you solve complex optimization problems in just a few lines of code.

Here, I also casually recommend Andrew Ng's AI course, where the early episodes will mention gradient descent and local/global optima. This will help everyone better understand the necessity of convex optimization. Link: https://www.youtube.com/watch?v=JPcx9qHzzgk

Chapter 4 Homework (Approx. 4-5 weeks to complete):

1. Reading: Read Chapters 1 to 5 of Boyd & Vandenberghe's book "Convex Optimization." (Stanford provides a free PDF version: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf, Book page: https://stanford.edu/~boyd/cvxbook/)

2. Coding Exercise 1: Implement the gradient descent algorithm from scratch to find the minimum of the Rosenbrock function. (The Rosenbrock function is one of the most classic test functions in the optimization field. It looks simple but is actually difficult to optimize, testing algorithm performance extensively.)

3. Coding Exercise 2: Solve an investment portfolio optimization problem using cvxpy and incorporate transaction cost constraints.

Chapter 5: Stochastic Calculus, Transforming from a Data Scientist Who Likes Finance to a True Quant

Before learning stochastic calculus, you were just a "data scientist who likes finance." After mastering it, you become a true quant.

This is where you learn how to model randomness in continuous time. Here, you will derive the famous Black-Scholes equation from first principles and truly understand why the trillion-dollar derivatives market operates the way it does today.

*Note 5.1: To gain a better understanding of the Black-Scholes equation and its implications, you can refer to the previous "Polymarket Market Making Bible" article. Link: https://x.com/MrRyanChi/status/2033466480067747844

*Note 5.2: Why is Itô calculus different from ordinary calculus? It is because in a stochastic process, the second-order Taylor term does not vanish. In ordinary calculus, as the time interval approaches zero, the second-order term can be ignored. However, in a stochastic process, due to the special property of Brownian motion, (dW)² = dt, the second-order term becomes first-order in magnitude and cannot be neglected. See details below.

Brownian Motion: Mathematical Expression of Pure Randomness

Brownian motion (also known as Wiener process, W_t) is a continuous-time random walk.

Imagine a drunkard walking in a square. Every step he takes, the direction is completely random. The meandering and irregular path he walks is Brownian motion. Mathematically, the price fluctuation of a stock is seen as the footsteps of this drunkard.

There are many examples of Brownian motion, such as in science, the movement of air particles is also a random Brownian motion.

Here is a key insight: in Brownian motion, the passage of time and the square of distance are equivalent (i.e., (dW)² = dt). It is precisely because of this property that stochastic calculus is different from ordinary calculus.

Itō's Lemma: Chain Rule of the Stochastic World

Stock prices are usually modeled using Geometric Brownian Motion (GBM):

dS_t = μS_t dt + σS_t dW_t

Translation: Price Change = Trend Caused by Expected Returns + Random Oscillation Caused by Volatility

And Itō's Lemma is the chain rule in the stochastic world.

* In regular calculus (e.g., calculating the trajectory of a car driving at a constant speed), you only need to consider velocity (first derivative).

* But in stochastic calculus (e.g., calculating the trajectory of a car driving on an extremely bumpy road), because the road itself is too bumpy (volatility), this bumpiness substantially changes the car's trajectory.

So, Itō's Lemma tells us: when computing stochastic changes, you can't just look at the direction (first-order term), you must also include the "degree of bumpiness" (second-order term) in the formula. If you don't, the price you calculate will be wrong.

Black-Scholes and Risk-Neutral Pricing

When you apply Itō's Lemma to an option price and construct a hedging portfolio, a miracle happens.

In the derived Black-Scholes equation, the variable representing "expected up and down movement" actually cancels out in the formula, magically disappearing!

What does this mean? It means that the price of an option does not depend on your expectation of whether the stock will rise or fall in the future.

In other words, if you are buying a call option. You might think that the more expensive the option, the more bullish everyone is. Wrong! In a perfect mathematical model, the option's price is only related to one thing: how volatile the future of this stock will be. Whether it rises sharply or falls sharply is irrelevant.

When you truly grasp this concept for the first time, the feeling is extremely mind-blowing. It explains why an extremely bullish trader and an extremely bearish trader can readily trade at the same option price. Because what they are trading is not the direction but the volatility.

Greek Letters (The Greeks): Breaking Down the Dimensions of Risk

With the Black-Scholes pricing model, risk can be precisely broken down into several independent dimensions. These dimensions are named after Greek letters, hence called the Greeks:

* Delta (Δ) - Price Sensitivity: How much the option price changes when the underlying asset moves $1. It directly tells you how much of the underlying asset you need to buy to hedge the risk.

* Gamma (Γ) - Curvature: The rate of change of Delta. It tells you how frequently you need to adjust your hedge position. Gamma is highest when the probability of an event is close to 50%, indicating the highest risk.

* Theta (Θ) - Time Decay: How much value the option loses each day. You can think of it as the "rent" you pay for holding the option every day.

* Vega (V) - Volatility Sensitivity: How much the option price changes when the volatility changes by 1%. This is where most Wall Street derivative trading desks really make (or lose) money.

* Rho (ρ) - Interest Rate Sensitivity: The impact of an interest rate change on the price. This sensitivity is usually small and can be ignored.

Chapter 5 Homework (Estimated Completion Time: 6-8 Weeks):

1. Reading Assignment: Read Shreve's "Stochastic Calculus for Finance II," which is widely recognized as the gold standard textbook in the field. (PDF Version: Link)

2. Alternative Textbook: If you find Shreve too challenging, you can consider reading Arguin's "A First Course in Stochastic Calculus," a more updated and reader-friendly book. (AMS Official Page: Link)

3. Derivation Exercise 1: Apply Itô's Lemma to f(S) = ln(S), where S follows Geometric Brownian Motion (GBM). Derive the key −σ²/2 correction term. (This correction term is crucial for understanding the relationship between log-returns and continuous compounding.)

4. Derivation Exercise 2: Starting from a Delta Hedging argument, fully derive the Black-Scholes partial differential equation.

6. Programming Exercise: Implement the Black-Scholes formula pricing from scratch, then price using Monte Carlo simulation, compare the two results, and validate that Monte Carlo converges to the analytical solution as the number of simulations increases.

Chapter 6: Polymarket and LMSR, the Mathematical Engine of Prediction Markets

Now, let's bring back all our mathematical weaponry to one of the most fascinating trading venues in the world today: Polymarket.

The math behind Polymarket beautifully ties together everything mentioned in this article: probability theory, information theory, convex optimization, and integer programming.

LMSR = Softmax of Neural Networks

In early prediction markets, Automated Market Makers (AMMs) typically use a mechanism called the Logarithmic Market Scoring Rule (LMSR). This was invented by economist Robin Hanson.

Its cost function is: C(q) = b · ln(Σ e^(q_i/b)) where:

* q_i is the outstanding shares of a particular outcome

* b is the liquidity parameter (larger b implies a "thicker" market, where prices are harder to move with large trades).

Based on this cost function, we can calculate the market price for any outcome i:

p_i = e^(q_i/b) / Σ_j e^(q_j/b)

If you have some machine learning background, upon seeing the price calculation formula for LMSR, you would immediately exclaim: Isn't this just the Softmax function!

What is Softmax? Imagine you have three apples weighing 100g, 50g, and 20g, respectively. You want to convert their weights into "percentage probabilities." Softmax is a "probability converter." It not only transforms these numbers into probabilities that add up to 100%, but it also amplifies the differences. A slightly heavier apple will receive a much larger share of the probability.

The formula driving prediction market pricing is mathematically equivalent to the formula driving predictions for any artificial intelligence (such as ChatGPT) predicting the next word. This is not a coincidence. The underlying logic of both is the same: elegantly transforming a set of messy numbers into a valid probability distribution.

This mechanism ensures several extremely elegant properties.

* The prices of all possible outcomes always add up to 1, perfectly following the laws of probability. Prices are always between 0 and 1.

* It can provide infinite liquidity (there is always someone to trade with you).

* The maximum potential loss for market makers is strictly limited to b × ln(n), where n = the number of possible outcomes.

Polymarket's CLOB Mechanism: From Theory to Practice

It is worth noting that while LMSR is the classic theoretical foundation of prediction market AMMs, today's Polymarket has evolved to use the CLOB (Central Limit Order Book) mechanism.

For details, you can refer to my article from October last year: https://x.com/MrRyanChi/status/1977932511775760517

In CLOB mode, the price is no longer calculated by a fixed mathematical formula but is entirely generated through order book competition between buyers and sellers in the market (Bids and Asks). This is similar to traditional stock trading platforms or Binance's futures market.

Why is this important? Because in the CLOB mechanism, the role of market makers has undergone a revolutionary change.

Core Differences Between LMSR (Traditional AMM) and CLOB (Current Polymarket):

* Price Formation: LMSR is automatically calculated by a mathematical formula; CLOB is generated through an order book game between buyers and sellers.

* Liquidity Source: LMSR is automatically provided by a system liquidity pool; CLOB must be provided by market makers placing orders actively.

* Market Maker Role: Professional market makers are not required in LMSR; however, in CLOB, market makers are the lifeblood of the market.

* Spread Control: The buy/sell spread in LMSR is determined by system parameters; in CLOB, the spread is determined by intense competition among market makers.

* Hedging Demand: Under the CLOB model, market makers face high unilateral exposure risk and must engage in extremely complex cross-market hedging.

In simpler terms, in the LMSR model, an AMM automatically provides liquidity, and you only need to trade against the formula. However, in the CLOB model, liquidity is entirely provided by market makers. You need to calculate the fair probability yourself (using the previously mentioned Bayesian updates and statistical models), then place buy and sell orders based on this probability to earn the spread.

If you miscalculate the probability on Polymarket or fail to hedge the correlation risk correctly, your orders will be instantly front-run by smarter quant funds, treating you as a liquidity provider.

Chapter 7: The Mercenary's Professional Landscape and Toolbox

If you want to turn this system into your profession or build your own quant team, you need to understand the industry's ecosystem.

Four Core Roles

* Quantitative Researcher: Someone who searches for patterns in massive datasets and builds predictive models. They require exceptional math and statistics skills. In the context of Polymarket, they are responsible for constructing probability models to determine the "fair price" of a contract.

* Quantitative Developer: The individual who builds infrastructure. They need to be proficient in C++, Rust, or Python to create low-latency trading systems. In the Polymarket context, they are responsible for developing a trading engine that interfaces with APIs, ensuring orders can be submitted and executed within milliseconds.

* Quantitative Trader: Individuals who manage funds, control risks, and make real-time decisions. Their income variance is the highest. On Polymarket, they are the ones providing liquidity in multiple markets, adjusting spreads and positions in real time.

* Risk Quant: Guardians of the team. They are responsible for model validation, calculating Value at Risk (VaR) under extreme scenarios, and conducting stress tests.

Salary Levels at Top Institutions

* Top Companies (e.g., Jane Street, Citadel, HRT): Entry-level salaries for new hires range from $300K to over $500K per year; senior staff salaries range from $1M to over $3M per year; star traders can earn from $3M to over $30M.

* Mid to Upper-tier Companies (e.g., Two Sigma, DE Shaw): New hires earn between $250K and $350K per year; senior staff earn between $575K and $1.2M.

*Note: Jane Street's average employee compensation in the first half of 2025 reached as high as $1.4 million per year.

Recommended Reading List (In Learning Sequence)

* Probability and Statistics - Blitzstein & Hwang's "Introduction to Probability": Conditional probability, Bayes, distributions

* Advanced Statistics - Wasserman's "All of Statistics": Hypothesis testing, regression, MLE

* Linear Algebra - Strang's "Introduction to Linear Algebra": Matrices, eigenvalues, PCA

* Optimization Theory - Boyd & Vandenberghe's "Convex Optimization": Convex optimization theory and practice

* Stochastic Calculus - Shreve's "Stochastic Calculus for Finance II": Brownian motion, Ito's lemma, BS model

* Quantitative Finance - Hull's Options, Futures, and Other Derivatives: A Derivatives Pricing Panorama

* Real-World Strategy - Ernest Chan's Quantitative Trading: A Pitfall-Avoidance Guide from Backtesting to Live Trading

Epilogue: Three Truths I Wish I'd Known Sooner

At the end of the article, the original author shared three profoundly insightful observations. These are also the pieces of advice I'd like to offer to all Polymarket traders.

1. Estimation Error Is Your Real Enemy

Many people like to use full Kelly criterion, or unconstrained Markowitz optimization, or machine learning models stuffed with hundreds of features. They all fail for the same reason: overfitting noisy historical data.

Math is perfect under perfect parameters. But in reality, you never get perfect parameters. The gap between theory and practice is always the estimation error.

The topmost sharps are not those using the most complex models, but those who hold a reverence for error. They proactively reduce position size (using half Kelly instead of full Kelly), simplify models (using 3 core features instead of 30), and incorporate constraints.

2. Tools Have Been Democratized, but "Conviction" Has Not

Today, anyone can download PyTorch for free. Anyone can access Polymarket's API. Technology is a necessary condition but no longer a sufficient one.

True trading edge exists in unique data, unique models, or unique execution capabilities. Not in having a few more Python libraries than others.

This is also why we delayed @insidersdotbot's updates by a whole month to prioritize improving our smart money vault and a better PNL calculation algorithm (e.g., a more accurate Split revenue calculation method than the official one). Because unique data and models can truly help you earn more money or turn losses into gains.

What does this mean on Polymarket?

This means you need to find information sources that others don't have (such as a network of experts in a niche field), or build models that others don't have (such as a pricing engine that can handle real-time multi-market correlations), or possess execution capabilities that others don't have (such as a trading system that can perform cross-market arbitrage in 10 milliseconds).

3. Mathematics is the True Moat

AI can help you write code, and can even suggest trading strategies. However, being able to derive why the Itô lemma has an additional term, prove that the discounted price is a Martingale under a risk-neutral measure, and determine when convex relaxation is tight in a portfolio market.

This profound mathematical intuition is the fundamental watershed that distinguishes between a "wide trader" who creates an advantage and a "wide trader" who borrows an advantage. The borrowed advantage will sooner or later expire.

The market prediction is undergoing the same transformation that the traditional options market experienced in 1973. Those who can be the first to introduce rigorous mathematical models, volatility pricing, and complex arbitrage algorithms to this market will reap the greatest rewards.

Stop relying on intuition. Go learn probability, write code, and build your mathematical moat.

Complete Toolbox

Python Stack

Data Processing: pandas, polars (Polars is 10 to 50 times faster than pandas when handling large datasets)

Numerical Computing: numpy, scipy

Machine Learning (Tabular Data Direction): xgboost, lightgbm, catboost

Machine Learning (Deep Learning Direction): pytorch

Optimization: cvxpy

Derivatives Pricing: QuantLib (an industrial-grade library, written in C++ for strong performance)

Statistical Analysis: statsmodels

Backtesting Framework: NautilusTrader

Backtesting Framework (Simpler and More Beginner-Friendly Options): backtrader, vectorbt (suitable for beginners)

Quantitative Research Platform: Microsoft Qlib (with over 17,000 stars on GitHub, leaning towards AI)

Reinforcement Learning Trading: FinRL (with over 10,000 stars on GitHub)

C++ and Rust

Common C++ Libraries: QuantLib, Eigen, Boost

On the Rust side: RustQuant can be used for option pricing, NautilusTrader uses a Rust + Python hybrid architecture (Rust at the core for speed, Python API on top for research purposes).

Data Sources

Free: yfinance, Finnhub (60 requests per minute rate limit), Alpha Vantage

Moderately Priced: Polygon.io ($199 per month, latency below 20 milliseconds), Tiingo

Enterprise Level: Bloomberg Terminal (approximately $32,000 per year), Refinitiv, FactSet

Blockchain Data: Alchemy (offers a free plan, supports historical archival data access)

In addition, @insidersdotbot is about to open-source an API. It will include a ready-made Smart Money database and trading functionality. Feel free to click the little bell Stay Tuned.

Solvers

Gurobi: The fastest commercial mixed-integer programming solver, students and academic users can apply for a free license. It is indispensable for combinatorial arbitrage problems.

Google OR-Tools: Among the strongest solvers in the free category.

PuLP / Pyomo: Python modeling interfaces used to easily define and invoke various solvers.

References

[1] gemchanger. (2025). How I'd Become a Quant If I Had to Start Over Tomorrow. X.

https://x.com/gemchange_ltd/status/2028904166895112617

[2] Blitzstein, J. K., & Hwang, J. (2014). Introduction to Probability. CRC Press.

https://projects.iq.harvard.edu/stat110

[3] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance.

[4] Strang, G. MIT 18.06 Linear Algebra. MIT OpenCourseWare.

https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010/

[5] Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.

https://web.stanford.edu/~boyd/cvxbook/

[6] Hanson, R. (2003). Logarithmic Market Scoring Rules for Modular Combinatorial Information Aggregation.

[7] Polymarket Documentation. CLOB Overview & API.

https://docs.polymarket.com/trading/overview

[8] Black, F., & Scholes, M. (1973). The Pricing of Options and Corporate Liabilities. Journal of Political Economy.

[9] Shreve, S. (2004). Stochastic Calculus for Finance II: Continuous-Time Models. Springer.

Original Post Link