PeerTree: A Peer-To-Peer Cloud Storage And Information Retrieval System.
Peter@bitmonky.com
shardnet.bitmonky.com
Abstract
I am proposing a cloud storage system where “The Cloud” is not owned by any entity. Similar to bitcoin the cloud is made up of trustless nodes in a peer to peer network. Unlike bitcoin the cloud is not a block chain where everyone keeps a copy of a ledger. Instead Information will be ripped into shards and stored randomly on nodes across the network. To retrieve a file you need only store the hash of the data and a sequential list of hashes for each shard of the file. To read the file from the network you send a broadcast request for a shard using the hash. Nodes on the network that have a saved copy of the shard will respond to your request and send you the data. To verify the data is correct the requester only has to hash the data retrieved and compare it to the hash used in the original request.
Introduction
Cloud computing on the Internet has come to rely almost exclusively on large institutions serving as trusted third parties. While the system works very well from a technical perspective, it still suffers from the inherent weaknesses of all large entities. They become monopolies. Everyone understands what that implies so I will not digress. What is needed is a distributed cloud architecture based on cryptographic proof instead of trust, allowing data to be stored randomly across many nodes that are not under control of any individual. Shards of data can be encrypted to avoid leakage of information and the contents of an entire file can only be assembled by the file owner(s). Sending a file becomes as simple as sending the hash of the file along with the sequential list of sharded data hashes.
The Network
A self organizing peer to peer network where Peers can send and receive and broadcast JSON messages using https: using only self signed certs.
Messages are digitally signed by each node using ec private public key pair
The peers form a tree structure where new nodes are
added left to right. The first node is the root of a tree. Each node keeps a list of the root peer group and its own peer group.
Nodes that leave or timeout are replaced by the last node to join. Messages that can not be sent are pushed onto a queue and are delivered as soon as the connection returns or the node is replaced.
I have a working prototype for the the network at
https://github.com/bitmonky/PeerTree
Project also includes a proof of concept blockchain application that runs on top of the PeerTree network object.
No work has been started on the shardnet application.
File Storage
To store a file in the cloud one needs to be a member node in the network.
Encrypt the file (optional) hash the file and store the hash
Rip the file into shards hashing each shard.
- Your node will have a database of all other nodes in the network. Randomly select n nodes from the database where n is the redundancy you require for each shard.
- Broadcast the shard to those nodes and listen for a response from each node. If any requests are denied or timeout send out (n - fq) more random requests where fq is the number of failed requests
- Repeat until all shards are successfully stored then save locally your files hash and the shard hash list {fileHash: string,filename:string,[ sequence: number, hash : string]}
File Retrieval
View peerFStore Demo
Here
- Open the file you saved locally and read the fileHash and the list or shards
- Iterate through the list and broadcast the hash for each hash to the entire network
- Nodes that have the required shard stored will send you a notification that they have the data.
- Pick the first of the responder and ask for the data directly from that peer.
- Check each responders data by hashing the response data and matching it to your hash for that shard.
- Once all shards have been retrieved, stitch them together into a single file. Do a hash sum check on the entire file
Distributed Search
ShardNet can be applied to searching large volumes of documents in the following way. First a relational database is used to build a meta data index. In the index information is stored about the document location. For example web page document {url}, or relational data information { key field: recID, table: table name} To prepare this information for search storage on the network the following is required.
Create and store a checksum hash for the data package (we will call this a memory item).
Prepare the data in a json string including the checksum hash.
Broadcast the memory to a random selection of n peers on the network.
Listen for success responses from all the peers (same as the file store method above).
Save the memory check sum in your relational database.
View PeerTree Search Demo
Here
Memory Storage Method On Peer Nodes
Peers will parse the json and create a relational table to store of all the words in each memory package. {hash: (the memory hash),wordList: [{word:(words from meta data provide)]}
Store the list in the nodes relational database and respond success or fail (peers can refuse to store a memory if they are full etc.).
Executing A Search
Prepare a request using the search str.
{searchID: unique, searchStr : string}
Broadcast the search to all peers.
Each node on the network will do a relational search, grouping the search by memoryID, word. Selecting for (memoryID,nWordsMatching/nWordsTotal). And send a response that consists only of memory(s) with a high score or no response for an empty list.
The node that made the search request will get back only responses from peers that found relevant (high scoring results);
Use this short list to retrieve your data from your relational database sorted by order of score desc.
Home[^]