bitcoin core 源码分析
Bitcoin 架构
Bitcoin Core 代码结构分析
文件夹
Bitcoin Core 代码仓库:https://github.com/bitcoin/bitcoin
1 | $ tree -L 1 -d # v0.21.0 |
2.2.1 P2P网络管理:
addrdb:P2P网络地址数据库(peer.dat),封禁地址(banlist.dat);
addman:存储在内存中的address,同时会一步dump到peer.dat
Net:网络节点管理
Net_processing:节点的通信操作,广播通知,状态验证等
Netaddress:网络地址对象
Netbase:网络通信基础类
Protocol:网络通信协议
Random:ssl随机数种子
Timedata:P2P网络时间同步
2.2.2 非对称密钥管理
Key:调用secp256k1的公私密钥处理
Key_io:签名及加密后的重编码
Keystore:密钥管理器
Pubkey:公钥管理
2.2.3 挖矿
Init:系统初始化,各个线程的初始化
Txmempool:交易池
merkleblock:生成merkletree 形式的block体
miner:矿工从txpool选取tx到block准备写入
Pow:工作量证明算法
2.2.4 线程管理
Scheduler:线程调度器
Sync:死锁处理
ThreadInterrupt:线程中断
2.2.5 链
Chain:区块链对象,维护着整个链的状态及各种参数;
Chainparams:区块链对象(防碰撞散列函数)的一些可调参数,包括主链,公有测试链,和私有链
Chainparamsbase:区块链对象的基本参数
Chainparamsseeds:P2P网络的DNS节点,用于解析及发现节点
Checkpoints:chain index检查点
2.2.6 算法
Base58:编码器;用于产生bitcoin钱包地址。相比Base64,Base58不使用数字”0”,字母大写”O”,字母大写”I”,和字母小写”l”,以及”+”和”/“符号
Bench32:string编码器,base32变种 ,bech32为比特币地址格式
Hash:sha-256哈希
Compressor:输出脚本压缩编码
bloom:布隆过滤器,O(1)复杂度的海量数据的查询过来算法,用于查找区块链数据;
2.2.7 数据类型与类型转化
amount:最大交易量,系统常量(CAmount MAX_MONEY = 21000000 * COIN)
Arith_uint256:256-bit 无符号大整数
Uint256:256bit不透明二进制对象
amountarith_uint256:无符号大整数的模板基类
limitedmap:限制类型(map)的哈希结构
Undo:序列化与取消序列化
Utilmoneystr:交易额转string
Utilstrencoding:去掉string里的不安全字符
Core_io:类型转化,hex,string,hash;tx对象与str对象的相互转化
2.2.8 进程入口文件
Bitcoind:服务节点守护进程
Bitcoin-cli:命令行客户端(rpc client)
Bitcoin-tx:比特币交易处理程序
bitcoin-wallet:钱包入口
bitcoind:app初始化入口
2.2.9 交易与区块相关
Blockencodings:交易装填到区块中
Checkqueue:待验证信息队列,主线程放入数据,多线程并行检查验证
Coins: Unspent Transaction Output entry 即比特币交易的coin 实体,因为比特币每笔交易都是由UTXO而来
psbt:PSBT功能实现;
core_read:实际上是脚本解析,检查交易的所有输入和输出脚本是否包含有效码
core_write
Validation:确认接受新区块
Versionbits:区块链版本维护
2.2.10 系统与网络
Cuckoocache:基于cuckoo hash的内存cache
compat:兼容32、64;window/linux
memusage:监控内存使用情况
net_permissions:网络权限参数控制
Netmessagemaker
threadsafety:线程安全
torcontrol:洋葱网络控制,即匿名通信网络
banman:检测peer的不当行为,然后禁用或者断开他们,防止破坏整个网络
Httprpc:rpc的http封装,网页通信
httpserver:通信服务端
2.2.11 重写与底层实现
serialize:文件读写序列化
span:实现C++20 std::span的功能
optional:重写(替换)C++17的std::optional与std::make_optional
tinyformat:一个类型安全的printf替换库
streams:文件流
reverse_iterator:反转
reverselock:数据反锁;RAII-style反锁,构造时解锁,析构时加锁
prevector:C++模板类,实现一个可变数组可存储N个元素,用于替换std::vector
2.2.12 接口
walletinitinterface:钱包抽象接口
Ui_interface:ui接口
validationinterface:区块校验接口;订阅由validation产生的事件
二级:
Dbwarpper:数据库上级接口
Fs:文件操作接口
2.2.13 其他
outputtype:特殊输出类型,根据地址类型自动设置;
rest:
Txdb:DBview
timedata:时间数据格式
Utiltime:取得时间,时间格式转换
Warnings:潜在错误
blockfilter:概率数据结构,用于测试“membership”;序列化匹配“cfilter”消息;
core_memusage
dbwrapper
dummywallet
flatfile:文件管理
indirectmap:用map映射指针型keys,但通过其解引用值进行比较的keys;
Clientversion:客户端版本验证
User interfaces
User interfaces > P2P
- Bitcoin forms a TCP overlay network of nodes passing messages to one another
- Messages defined in
src/protocol.h
- Messages defined in
- Each node has a set of outbound and inbound peers they exchange data with
-addnode=<addr>
-maxconnections=<n>
net.h MAX_OUTBOUND_CONNECTIONS, DEFAULT_MAX_PEER_CONNECTIONS
- Peers can be manually added (
-addnode
) or are discovered from DNS seeds: DNS servers that randomly resolve to known Bitcoin nodes - DoS protection is implemented to prevent malicious peers from disrupting the network
-banscore=<n>
configures sensitivity, defaults to100
- SPV (simple payment verification) nodes retrieve txout proofs
User interfaces > RPC/HTTP
A remote procedure call (RPC) interface allows users to programmatically interact with Bitcoin Core over HTTP
- Block explorers can query blockchain and mempool data
- External wallets can construct and sign transactions
- Miners and pool operators use
getblocktemplate
for block construction bitcoin-cli
provides a way to access this interface on the commandline
User interfaces > Qt
The Qt interface reveals
- wallet functionality
- basic network statistics
- RPC console
User interfaces > ZMQ
The ZMQ interface publishes notfications over a socket upon receipt of a
- new block (raw):
rawblock
- new block (hash):
hashblock
- new transaction (raw):
rawtx
- new transaction (hash):
hashtx
which allows external software to perform some action on these events:
1 | # From `contrib/zmq/zmq_sub.py` |
See also: -blocknotify=<cmd_str %s>
Concurrency model
- Bitcoin Core performs a number of tasks simultaneously
- It has a model of concurrent execution to support this based on
{std,boost}::threads
, shared state, and a number of locks. - Most threads are started (directly or indirectly) in
init.cpp
- P2P networking is enabled by a single
select
loop (CConman::ThreadSocketHandler
)- We may replace
select
withpoll
(#14336) to avoid file descripter limits
- We may replace
However,
all changes to chainstate are effectively single-threaded. Thanks, cs_main
.
Concurrency model > threads
Purpose | # threads | Task run |
---|---|---|
Script verification | nproc or 16 * |
ThreadScriptCheck() |
Loading blocks | 1 | ThreadImport() |
Servicing RPC calls | 4* | ThreadHTTP() |
Load peer addresses from DNS seeds | 1 | ThreadDNSAddressSeed() |
Send and receive messages to and from peers | 1 | ThreadSocketHandler() |
Initializing network connections | 1 | ThreadOpenConnections() |
Opening added network connections | 1 | ThreadOpenAddedConnections() |
Process messages from net -> net_processing |
1 | ThreadMessageHandler() |
Tor control | 1 | TorControlThread() |
Wallet notify (-walletnotify ) |
1 | user-specified |
txindex building | 1 | ThreadSync() |
Block notify (-blocknotify ) |
1 | user-specified |
Upnp connectivity | 1 | ThreadMapPort() |
CScheduler service queue (powers ValidationInterface ) |
1 | CScheduler::serviceQueue() |
* can be overridden
Concurrency model > ValidationInterface
Allows the asynchronous decoupling of chainstate events from various responses.
Uses SingleThreadedSchedulerClient
to queue responses and execute them out-of-band w.r.t. things like network communications, though still often blocked on lock acquisition, e.g. cs_main
.
Used (subclassed) for many things:
- Index building (
src/index/bash.h:BaseIndex
) - Triggering announcements to peers (
net_processing:PeerLogicValidation
) - Triggering wallet updates (
wallet/wallet.h:CWallet
) - Sending ZMQ publications (
CZMQNotificationInterface
)
Events you can make use of:
- UpdatedBlockTip
- TransactionAddedToMempool
- TransactionRemovedFromMempool
- BlockConnected
- BlockDisconnected
- ChainStateFlushed
- ResendWalletTransactions
- BlockChecked
- NewPoWValidBlock
Regions
“Regions” of code (in my terms) consist of state and procedures necessary for Bitcoin’s operation.
Each region is a subsystem within Bitcoin Core that handles a certain domain of tasks at a certain layer of abstraction.
Starting here will give us a high-level but specified sense of which parts of the system do what tasks.
Regions summary
Name | Purpose |
---|---|
net |
Handles socket networking, tracking of peers |
net_processing |
Routes P2P messages into validation calls and response P2P messages |
validation |
Defines how we update our validated state (chain, mempool) |
txmempool |
Mempool data structures |
coins & txdb |
Interface for in-memory view of the UTXO set |
script/ |
Script execution and caching |
consensus/ |
Consensus params, Merkle roots, some TX validation |
policy/ |
Fee estimation, replace-by-fee |
indexes/ |
Peripheral index building (e.g. txindex ) |
wallet/ |
Wallet db, coin selection, fee bumping |
rpc/ |
Defines the RPC interface |
Regions > net.{h,cpp}
net
is the “bottom” of the Bitcoin core stack. It handles network communication with the P2P network.
It contains addresses and statistics (CNodeStats
) for peers (CNode
s) that the running node is aware of.
CConman
is the main class in this region - it manages socket connections (and network interaction more generally) for each peer, and forwards messages to the net_processing
region (via CConman::ThreadMessageHandler
).
The globally-accessible CConman
instance is called g_conman
.
Regions > net_processing.{h,cpp}
net_processing
adapts the network layer to the chainstate validation layer. It translates network messages into calls for local state changes.
“Validation”-specific (i.e. information relating to chainstate) data is maintained per-node using CNodeState
instances.
Much of this region is ProcessMessage()
: a giant conditional for rendering particular network message types to calls deeper into Bitcoin, e.g.
NetMsgType::BLOCK
->validation:ProcessNewBlock()
NetMsgType::HEADERS
->validation:ProcessNewBlockHeaders()
- …
Peers are also penalized here based on the network messages they send (see Misbehaving
and its usages).
Regions > validation.{h,cpp}
validation
handles modifying in-memory data structures for chainstate and transactions (mempool) on the basis of certain acceptance rules.
It both defines some of these data structures (CChainState
, mapBlockIndex
) as well as procedures for validating them, e.g. CheckBlock()
.
Oddly, it also contains some utility functions for marshalling data to and from disk, e.g. ReadBlockFromDisk()
, FlushStateToDisk()
, {Dump,Load}Mempool()
. This is probably because validation.{h,cpp}
is the result of refactoring main.{h,cpp}
into smaller pieces.
It contains the instantiation of the infamous cs_main
lock.
Regions > txmempool.{h,cpp}
txmempool
provides a definition for the in-memory data structure that manages the set of transactions this node has seen, CTxMempool
.
This data structure provides a helpful view of transactions sorted in various ways (e.g. by fee rate, txid, entry time, fee rate with ancestors). See CTxMempool::indexed_transaction_set
.
The mempool is a fixed size, so eviction logic is defined here too.
An index of the UTXO set which includes unconfirmed mempool transactions is also defined here (CCoinsViewMemPool
).
This region is used in validation
. src/policy
(fee estimation), miner
, and others.
Regions > coins.{h,cpp} & txdb.{h,cpp}
Basically, they just provide this interface:
1 | /** Abstract view on the open txout dataset. */ |
Regions > dbwrapper.{h,cpp}
Abstracts access to leveldb databases.
Adds ability to obfuscate data - we do this to avoid spurious anti-virus detection and illegal data in the chainstate getting people in trouble.
Regions > script/
The script
subtree contains procedures for defining and executing Bitcoin scripts, as well as signing transactions (script/sign.*
).
It also maintains data structures which cache script execution and signature verification (script/sigcache.*
).
Script evaluation happens in script/interpreter.cpp::EvalScript()
.
Regions > consensus/
Contains procedures for obviously consensus-critical actions like computing Merkle trees, checking transaction validity.
Contains chain validation parameters, e.g. Params::BIP66Height, nMinimumChainWork
.
Defines BIP9 deployment description struct (consensus/params.h:BIP9Deployment
).
CValidationState
is defined in consensus/validation.h
and is used broadly (but mostly in validation
) as a small piece of state that tracks validity and likelihood of DoS when examining blocks.
Regions > policy/
Policy contains logic for making various assessments about transactions (does this tx signal replace-by-fee?).
It contains logic for doing fee estimation (policy/fees.*
).
Regions > interfaces/
Defines interfaces for interacting with the major subsystems in Bitcoin: node, wallet, GUI (eventually).
This is part of an ongoing effort by Russ Yanofsky to decompose the different parts of Bitcoin into separate systems that communicate using more formalized messages.
Eventually, we might be able to break Bitcoin Core into a few smaller repositories which can be maintained at different cadences.
Regions > wallet/
Contains
- logic for marshalling wallet data to and from disk via BerkeleyDB.
- utilities for fee-bumping transactions.
- doing coin selection.
- RPC interface for the wallet.
- bookkeeping for wallet owners (
CWalletTx
, address book) .
Regions > qt/
Contains all the code for doing the graphical user interface.
qt/bitcoin.cpp:main()
is an alternate entrypoint for starting Bitcoin.
Regions > rpc/
Defines RPC interface and provides related utilities (UniValue
mangling).
Example
1 | static UniValue getconnectioncount(const JSONRPCRequest& request) |
Then referenced in Register.*RPCCommands()
later.
Regions > miner.{h,cpp}
Includes utilities for generating blocks to be mined (e.g. BlockAssembler
). Used in conjunction with rpc/mining.cpp
by miners:
getblocktemplate
submitblock
Regions > zmq/
Registers events with ValidationInterface
to forward on notifications about new blocks and transactions to ZMQ sockets.
Storage
$ tree ~/.bitcoin/regtest/
1 | ├── banlist.dat |
Storage > .dat
files
Bitcoin stores some data in .dat
files, which are just the raw bytes of some serialized data structure.
See serialize.h
for details on how serialization often works.
A brief digression into serialization
1 | class CBlockHeader |
blocks/blk?????.dat
: serialized block datavalidation.cpp:WriteBlockToDisk()
src/primitives/block.h:CBlock::SerializationOp()
blocks/rev?????.dat
: “undo” data — UTXOs added and removed by a blockvalidation.cpp:UndoWriteToDisk()
src/undo.h:CTxUndo
mempool.dat
: serialized list of mempool contentssrc/txmempool.cpp:CTxMemPool::infoAll()
- Dumped in
src/init.cpp:Shutdown()
peers.dat
: serialized peerssrc/addrmah.h::CAddrMan::Serialize()
banlist.dat
: banned node IPs/subnets- See
src/addrdb.cpp
for serialization details
- See
1 | $ tree ~/.bitcoin/regtest/ |
Storage > leveldb
Leveldb is a fast, sorted key value store used for a few things in Bitcoin.
It allows bulk writes and snapshots.
It is bundled with the source tree in src/leveldb/
and maintained in bitcoin-core/leveldb
.
blocks/index
: the complete tree of valid(ish) blocks the node has seen- Serializes
mapBlockIndex
, or a list ofCBlockIndex
es CBlockIndex
is a block header plus some important metadata, for example validation status of the block (nStatus
) and position on disk (nFile
/nDataPos
)
- Serializes
chainstate/
: holds UTXO setCOutPoint
->CCoinsCacheEntry
- Basically
(txid, index) -> Coin
(i.e. [CTxOut, is_coinbase, height])
- Basically
CCoinsViewCache::BatchWrite()
Storage > berkeleydb
BerkeleyDB is basically like leveldb but worse.
We still use it for the wallet.
Some want to replace it with SQLite.
- Wallet
wallets/wallet.dat
: BerkeleyDB wallet filesrc/wallet/db.cpp
Data structures
The storage formats we just covered are deserialized into in-memory data structures during runtime.
Here are a few of the important ones.
Data structures > chainstate > blocks
src/primitives/block.h:CBlockHeader
The block header attributes you know and love: nVersion, hashPrevBlock, hashMerkleRoot, nTime, nBits, nNonce
.
src/primitives/block.h:CBlock
It’s CBlockHeader
, but with transactions attached.
src/primitives/block.h:CBlockLocator
A list of <32 block hashes starting with a given block and going back through a chain in sort-of logarithmic distribution.
Used to quickly find the divergence point between two tips.
Construction logic lives in src/chain.cpp:CChain::GetLocator()
.
src/chain.h:CBlockIndex
A block header plus some important metadata, for example validation status of the block (nStatus
) and position on disk (nFile
/nDataPos
)
The entire blockchain (and orphaned, invalid parts of the tree) is stored this way.
Data structures > chainstate > CChainState
Defined in validation
. Contains logic and storage for maintaining the activeChain
, the most-work valid chain.
mapBlockIndex
is an attribute.
Makes liberal use of cs_main
.
ActivateBestChain{Step}()
contains logic for incorporating new blocks and setting the most-work chain.
AddToBlockIndex()
incorporates all block headers encountered.
Exploring the codebase
- Ensure your editor is setup with goto-definition, find-usages functionality
- rtags is very helpful; hooks into LLVM to generate symbol information
- good rtags setup guide by @eklitzke here
- rtags is very helpful; hooks into LLVM to generate symbol information
- Bitcoin’s doxygen output
- Read
doc/developer-notes.md
Doxygen output
https://jameso.be/dev++2018/#1
http://diyhpl.us/wiki/transcripts/scalingbitcoin/tokyo-2018/edgedevplusplus/overview-bitcoin-core-architecture/
https://juejin.cn/post/6844903598522892302