The language of the network

Another quick post, I hope this makes sense, it is perhaps the most exiting aspect of MaidSafe I have found so far and the consequences of this are far reaching indeed. It will not be obvious, but please ask questions on maidsafe.org and I will try and explain more of why this is very important, not only to launch with but to move forward and also model system components very quickly.

The notes we never heard

MaidSafe, that strangely named bunch of folks who are looking to change the world with something that sounds:

  • Amazing
  • Impossible
  • Frightening

Yes the vision that is rock solid and unmoving against all odds and all comers no matter what. The vision to give to all the people of the world Privacy Security and Freedom, but how do they do it? how do they explain it?

Well until now, not very well. There are tests, experimental data, research, 18 months of network simulations modelling, papers, interviews, videos, blog posts, comments and answers, email trails, presentations and all in all it still sounds … too good to be true! For many it sound’s like an impossibility. Well it works, it produces results and we still cannot explain it. Is this about to change ?

The network critical parts

Routing is a critical and central part of the codebase, it provides routing information and security of the network, handles churn and ensures the xor relationships are intact and accurate. It is the workhorse of the network relying heavily on a very efficient reliable UDP implementation underneath. Crux, our new rudp is that implementation. I wanted to check this code in detail. I was particularly interested in the consensus checks and ensuring they were solid and as efficient as possible (they will always be more efficient as time goes by, but for now they need to be rock solid, so worth the effort), Consensus groups and checks are a very important part of the network for group based actions to ensure the correct authority is network agreed. This along with client digital signatures makes all the various authorities work in unison and provide harmony out of apparent chaos when you look at the network.

In routing we have a GroupSize and QuorumSize, the Quorum is the number we require to validate a groups intentions as an acceptable level of comfort the intent is agreed with the group.

Software development

Many who know me or know of me will know I am an Engineer, as happy with a hammer or soldering iron as I am with a compiler. I like Engineering and I think I act like an Engineer, what cannot be fixed? give it here, I accept the challenge!, Impossible great that will be fun, lets work out how to do it!.

Software development to me is Engineering, I care not if I am trying to loosen a nut with a damp dish-towel or making code do something for me. I see a problem, look at what is available and attack! Then though you have to build amazing things and it is different, you plan, build, test, build, plan, build test, build and so on, getting to the place where it makes sense. Then start all over again and implement the answer, neat repeatable and stunning.

With MaidSafe the problem we took on is, the Internet (web etc.) is fundamentally broken and taking the wrong route, answer, well fix it! OK yer on.

So we did, with many years of build, test, plan, build etc. we got a system that could fix this issue and went though testnets. On testnet2 I decided with a lot of coaxing to get back to code and re-factor routing for stability and security analysis in particular. In doing so I went though all sorts of hell, no sleep and fighting logic every day asking why like this? why like that? (not the code the sheer complexity) Ok rewrite part of routing, I can do this fast, I always can when I focus. So no xmas for me, there’s work to be done (and the lines of “the band played waltzing Matilda” resonates, especially “never knew there were worse things than dyin”).

Tough days, harder nights and I was in a familiar place, deep deep thought, what are these types in the code (c++ has types, they are called classes), why do they exist, what fundamental facets make them unique enough to exist as a type? Yes questioning God, gravity and C being a limiting factor of the universe, this was all familiar. I was in innovation mode again, that dark place where nothing can get in the way of your thoughts, not even a million blog posts and comments, questions etc. nothing. You are in cruise control, focussed on a burning question, why so complex? why? What am I missing?

The MaidSafe issue was again prevalent, I Cannot explain this to anyone, it is stunning and beautiful, but so so complex.

Back to school

Then it was back to school thinking, I remembered the huge equations of experimental data on a blackboard and what we did to it. We factored it. What starts out as a massive multi variable mess of junk, eventually factors to x = 2y + 6.

This is what MaidSafe is, un-factored (well not factored enough). This is why we cannot explain it, it’s why changes to the code are so hard. It is doing something nothing else has done and doing it well, but with such huge complexity. I am not happy, we never factored the equation. How though, how?

Back to types, what are these and why?

I deviate from some software developers here, I see types not as useful code helpers to stop you typing the wrong name or something into a functions. I see types as a thing and that thing has a particular Genus, you are making something with a type, something unique and with purpose, different from any other thing you have made. So it needs thought. Really think hard, what is it and why.

So back to routing, I am looking at the network as an electron or message would, the vast array of connections all held together in xor space, I know I can do certain things, but what can the network tell of me. Can it recognise where I have been, or where I am and if so what am I when I am here at this place in space and time?

I know we have consensus groups, I know folk do not find it easy,  but they are there all over the place inter-lapping, constantly changing and deterministic (well if you can stop time and just analyse the network fully, you may be able to identify the state, but like Zeno’s arrow it will give more questions than answers). I know these groups exist, I know what they do and I know there are lots of them, each interwoven with others in a dance that makes your head spin in this cable.

I then notice something, something that makes everything stop. I see a familiar thing, a pattern, a repeatable, explainable pattern and it is describable. In the mist there is a view coming to me and it’s taking a form, what is it?

It’s a consensus group type, not only that but it stamps itself on a message until the next group gets to it. There is something I can see and more importantly name. These patterns are everywhere. I look at the vault code. Over 260 c++ files, beautiful code, amazing skill, but it’s doing something I can see. Most of the time this code is trying to work out all the functions and features available at any stage in a messages travel. It’s looking for the answer, but not always the same way. It’s not hearing the music and it makes up for it with more code and more algorithms.

Now I am in shock, I always knew we would uncover the secrets of the network, I knew it was our path to follow and find this, but was this it? I had to take time out, Dude needs a walk on the beach, off we go. I end up running back to the code, I was right this is it. I need to think.

Ok what is it

So the revelation is this and it is quite simple. There are several persona types, all personas are actually one of these regardless of what code we put in it. A persona group is a certain and guaranteed thing that the network in the real world can see, validate and secure. This all happens very dynamically and the network knows these things. Even though we have calculated this, the network knows and it knows better that any amount of advanced code we can throw at it. This is one of those, “of course it is“! moments. The network can tell us who we are with respect to a message, we do not need to calculate it in upper layers, routing knows already it just needs to tell us and we must listen. This is such an under-researched area that this is a truly amazing find, in the group consensus model we have this really focusses thinking and allows significant improvements in explaining also everything about the network. Of course self authentication, encryption and data types etc. are all still very new, but this is a huge aspect (several man years of effort to find this) and it will transform not only us, but hopefully this industry of decentralised approaches to software and services based on it.

People will (and should) look at this now and say, well that’s obvious. I can assure you, it only is after you see it (like a wheel or boat).  So what are these types and why do they exist?

Here we go.

We have (4 type, only 4, no more no less)

Client Manager – This is a type of persona this is the group of routing nodes closest to a client node. They can tell as they have a connection that is not a routing table node, so it must be a client! Examples of these types are MaidManager (the group that looks after a Maid account), MpidManager (the group that looks after public name and public shares/drive for public clients)

Nae Manager – Network Addressable Element manager groups. They know they are this type of group as they are close to the address that equals the name of the network addressable element (not a Network Addressable Node, but data or function elements). Examples of these are DataManagers (look after data pointers) and VersionManagers (looks after directory versions and any other mutating directly addressable node)

Node Manager – This is the group surrounding a node and they know as the node appears in their routing table. An example is a PmidManager (the group looking after nodes holding data).

Managed Node – This is a routing node in a group of Node Managers such as a PmidNode (i.e. a node actually supposed to be holding a data element). Future, ComputeNode to handle computations (using zk-snark etc.)

That’s is it. Now we call these authorities as they represent a particular authority. So lets look at a client putting data on the network.

  • Client sends message to his own address on the network
  • ClientManagers receive this and check it is a client he has enough storage available (paid enough safecoin)
  • They send to the DataManagers
  • DataManagers check the from Authority is a ClientManager group and look to see if data is already stored. If so then Finish, else
  • DataManagers send to a PmidManager group.
  • PmidManagers check from authority was DataManagers and they store in a node close to the address of the group.
  • PmidNode receives this Put and checks the from authority was his PmidManager group.
  • Pmid Node stores data. Finish.

“Hold on! I spot a problem”. It is OK these nodes can confirm who they are, but here we mention from authority. It seems the from authority is required to confirm the chain of deterministic actions is correct, otherwise people can claim to be anyone.

This is the other neat part actually. We know and have proven experimentally many times the quorum of a group is secure. So you know what group you are (remember the network can absolutely tell you with incredible accuracy what persona you are). So you send your persona type in the message to the next group. The next group then accumulate results, check your signature and validate all your group said they were persona type X. So this seals the chain of events into a secure deterministic chain. This provides great security, but more importantly the language now becomes simple.

Put 

Client -> ClientManager<->NeaManager->NodeManager->ManagedNode

Get

Client <->NaeManager<->ManagedNode

We can add in symbols such as ->>>> to represent going to a group and some more small symbols (such as <-> meaning may return result from here) and we have a complete language to describe the vault code (to be published soon). Importantly given such strict rules and types this can be coded very quickly with a strict specification. We have tried to do some things that would break security and guess what, this language stops it dead in it’s tracks. So we can develop very quickly new persona types for these categories for many things, safecoin, compute, AI engine, Search and many more. This speeds up development to a point we should be able to test different ideas in days, not weeks and months as it is with the huge un-factored code base. Please do not misunderstand this does not mean code in days (although a massive reduction in time for code is obvious), I mean we can write down data flows and analyse them in a manner people understand easily. This allows  many people to be involved in such thought experiments prior to implementation. This is a huge step, no longer does one or perhaps two people understand what is happening, but everyone can (within reason).

This is huge and the most significant find I have had in 9 years working hard on these problems. It is still very exciting to me, even now. For instance our vault code will now go from over 260 template heavy and complex source files to a handful of very simple header files.  Testnet3 will include this, Crux and routing_V2 which adds in a sentinel, this is a crucial accumulation and security check. So we can confirm cryptographic authority (a client has signed and action on something he owns) and group / consensus based authority. A description of the sentinel is probably required now, so sorry for the inordinately long post.

Sentinel overview

Quick intro to network consensus, authority and crypto usage.

In a decentralised autonomous network there are many challenges to face. One such challenge is the range of attacks that consist of Sybil / Spartacus and forgery attacks (did the message really come from who you think). One of the simplest attack to foil is the forgery attack, thanks to asymmetric cryptography. This allows for a public key to be known by everyone and when anything is encrypted with this key it can only (supposedly) be decrypted by the private key of that keypair. Assuming a reasonable algorithm, keysize and implementation this holds true.

This also removes the Spartacus type attacks (claim to be another identity)., but not necessarily sybil attacks, where an attack on a bit of a network is enough to persuade the rest of the network that any request is valid or indeed anything asked of that part of the network is done as expected. To overcome this MaidSafe have several techniques used in parallel. These boil down to

  1. Have nodes create key chains (a chain of keys each signing the next until one is selected). We call these Fobs. A Publicfob consists of a public_key & signature as well as a name field. The name is the SHA512HASH(public_key+signature) making forging a key crypto hard (we can confirm the signature is also signed by a valid key pair by checking the signature there, where this ‘pure key’ is self signed). The Fob type is this PublicFob + the private key.
  2. Ask network to store the PublicFob for a node. The network will accept this if the node has certain characteristics (based on rank – later discussion) and the key is less than 3 leading bits different from the current group of nodes. This makes key placement distribute equally across the address range (as for rank consider only a single non ranked node allowed per group, and failure to increase rank means the key is deleted from the network and has to be re-stored if possible).
  3. This now resembles a PKI network where to speak to node ABC you go get the PublicFob at ABC and either encrypt a message to the node or check a message from the node is signed by using that PublicFob.public_key. The difference being no central authority exists and the network distributes and collects keys as any DHT would (except in this case the DHT is secured by the very PKI it manages). So this is very secure and does not require any human intervention (unlike a certificate authority)
  4. Assemble nodes into groups that will act in unison on any request/response. So these groups are selected to be large enough to ensure a Sybil attack would require at least 3X network size of attackers to be able to join (a single attacker with no other node types joining). The magic number here is 28, realistically this number is closer to 17.
  5. Allow a failure rate as failures will definitely happen. This is done by having a GroupSize of say 32 and set the QuorumSize to 28. This means for any action we require 28 nodes close to a target address to agree and carry out an action.

This Quorum creates a mechanism where another group or node can believe the action is correct and valid. This is called group consensus.

The group consensus provides the network a way to request or carry out actions and ensure such requests are valid and actions actually done. This is required as the network looks after itself (autonomous).

A client has a close group and requires to persuade this group to request the network take an action which Puts something on the network (a data element/message etc.) Clients create data and messages, the network handles these. As the client cannot just connect to an arbitrary group and demand something be done, they connect to their close group and register themselves (with their Fob) an account. The close group can then be persuaded by the client to request another part of the network create something (a Put). In the case of Maidsafe the close group request the client pay via safecoin (it used to be they paid with storage that another group managed an agreed). So the client persuades the close group to put data. (ignore how payment is made, it actually requires client send safecoin to a provable recycle part of the network (another group confirms this)).

So a client can sign request to the group (crypto secure) and the group can check validity of the request and then ask the appropriate group close to the address of the data or message to accept this request.

After anything is put the client can mutate this thing directly (if they have signed it). This is the case for directory entries, where a client can add versions to a list (StructuredDataVersion) as it was Put signed by the client. So the owner of the signature can sign a request to alter this. This is crypto secured authority and does not require the close group for consensus.

In the case of groups requesting actions then we have group based consensus and the network grants authority based on a group that can be measured as a valid close group, where each member has signed a request as being from that close group. This authority is what the sentinel confirms prior to the routing object processing an action request. Almost all messages are Sentinel checked, with the exception of get_group as this fetches Fob’s which are self validating and it fetches a copy of all Fobs from all group members and confirms they agree and validate. Get_group is only used for making sure we connect to our close group as a client or node.

Sentinel components

The sentinel consists of few components and requires no network connection. This is to allow for such a crucial element to be fully tested. The elements are limited to two (2)accumulator pairs. There are two pairs for two different authority types,

  1. Node level direct authority (i.e. could be a client)
  2. Group base consensus

In 1 we just accumulate a single message and get the Fob to check a signature. In 2 We require to get at least QuorumSize messages and this is for group based consensus and then we get the Fobs and again check signature to confirm the group. We also check the nodes are as close to each other in xor space as our own group is (with varying error rate)

To achieve this the process is

  1. Message Arrives
  2. Check Accumulator has seen it, if not Send a GetKey request (for a group or single node)
  3. Add to accumulator. If return is true then check the key accumulator of that pair -> if true then confirm the signature with the Fob (asymm::CheckSignature(Fob.public_key, message)

If the key accumulator did not have the key(s) accumulated (i.e. accumulator.CheckQuorumReached) then we do nothing and continue with other work. Then though

  1. Key arrives (from GetKey response)
  2. Check value accumulators have(address) the address is the source_id+message_id of the request, may be a group ID or nodeId
  3. If not found then ignore message
  4. Otherwise accumulator.Add(key) to the proper key accumulator of the pair
  5. If this returns true, then we get the keys and values (via accumulator.GetAll() calls from both accumulators and confirm signatures. group and return a valid message to the object holding the sentinel (the sentinel add call will be async)

The accumulators are LRU cache based and will empty old messages that do not confirm in this manner.

Conclusion

This language which describes the vaults and consensus logic will help us get closer to formal proofs, explaining the network and moving much much faster with a significant reduction in codebase. It also allows with a crtp template pattern a complete separation of vault logic and the network, so at last we can test fully logic and then attach it to the network. The last few months have been a special hell for me as we are in launch mode and then this. The opportunity is way to large to bypass so we are implementing this as we speak ( at breakneck speed ) and it’s looking fantastic. I will be posting to the community forum with updates, but next week I am taking a few days off and heading North for a break. I will do so in a huge sigh of relief and inner peace after this discovery. It is still something that has shook me to the bones in its elaborate beauty and simplicity. Truly amazing.

Enthusiastic human :-)

Posted in complex systems, MaidSafe, strategy
14 comments on “The language of the network
  1. Mark says:

    Genius… this is a wonderful read, and great to share in such an amazing feat and discovery. Congratulations David. I can recall the process and feelings during a creative revelation (though never of this significance). Very excited for you and how much creating this language means for Project SAFE and decentralised networks using this approach. I think Crux would be a very appropriate name!

    Time to dig out my Skids album with Waltzing Matilda on it.. Oh damn, that was a tape, in my loft. Scared To Dance will have to do.. I’m gonna shake it too! 😉 Enjoy your break.

  2. dyamanaka says:

    Great job on this discovery. I’m really looking forward to seeing the benefits in TestNet3!

  3. […] This new chapter in our development progress all started back in early February when David, again desperate to reduce complexity, spotted a pattern in the code, a repeatable and describable pattern that kickstarted the simplification of the decision-making logic within the Vaults (the series of processes that help look after all data on the network) and also improved network security. You can read David’s detailed blog post about it here. […]

  4. cadmar1944 says:

    Reblogged this on Alternativeaction Blog and commented:
    This was received today via email – A great leap forward.

    Early(ish) on Friday (26th June) morning, the tireless MaidSafe dev team managed to get the SAFE Network functioning from end to end. This means that a User is able to self-authenticate (create their own username and password, and access the network without the permission of a third party) using the Authentication API via the SAFE Client. During the process, the network’s transport/connection layer (Crust) connects the peer nodes, allowing Routing (the layer that verifies the identity of each node cryptographically) to establish and maintain connections to other network nodes (for more information on how all the components fit together, please visit our wiki). The clients were also able to PUT (store) and GET (retrieve) data.

    Congratulations to the dev team.

  5. […] 致力于降低网络复杂性的戴维在代码中发现了一个模式,这是一个可重复和描写的模式,它能够在保管库(帮助照看网络上所有数据的一系列过程)中简化决策逻辑,并能提高网络安全性。你可以在这里阅读戴维博客中关于它的详细描述。 […]

  6. Sam says:

    “…What am I missing…”

    One of the big disadvantages of all of the transport, chains is they can’t upgrade without breaking the whole system. You need to put a a byte at the beginning of each message (or somewhere), 0-255 when it gets to 255 you add a new byte. Each byte will be for a different type encryption or a different algorithm for routing or whatever. Like a Multipurpose Internet Mail Extensions (MIME) that is used to encode, messages, etc. You might have several of these. One byte for the transport or networking routing scheme. Why would you do such a thing. Open ended expansion. You could keep adding bytes for schemes or new algorithms.

    Example: You have 255 schemes filled for different type transport, network topology type, encryption, it could be whatever. The point is it is the basic way the system communicates and passes information. Ok now that the first byte if filled with schemes you now have 255,0 next 255, 001, etc.. and when you get to 255, 255, then the next one is 255, 255, 001. Anyone who doesn’t have the new scheme on their router could be ignored or communication could drop back to a older scheme. This could also be done for different type managers or nodes.

    How many hops? How many clients will something search through until it stops? Freenet has a number that decremented for that. Obvious defect into who is asking for information. Could you have a mathematical equation to do the same that would NOT tell you how close a node is? For example the first node would do the math but the answer would have nothing to do with how close it is to the original requester. Let’s say the number of hops to die is 20, when the mathematical equation gets to twenty hops and has been operated on twenty times then it tells the last server that computed the operation,”ok that’s it end of hops” but it can’t know what number of hops it’s at anywhere in the chain. Only when the equation is solved. I have no idea if there is such a “hidden number of operations” equation, blockchain?

    • David Irvine says:

      In our case we have gone for a very accurate DHT, so this means (we hope, but it does work in tests and testnet works) that when you get close to the info it’s there or not and we will return Errors. We do not know how many hops it took as we do not count them (there is the logn limit though). I am not sure I am specifically talking about the same issue here though, so I apologise if this is the case. Anyway does this help at all?

      • Sam says:

        Yes your answers are helpful. I’m not trying troll you I want you to think about the future and how the network could be upgraded without pulling the whole thing. Hence the byte that encodes schemes(bad choice of wording) or in actuality changes algorithms. The algorithm I speak of could be anything but let me give you an example. What if the encryption you’re using is suddenly found to have grievous error? What if another network topology is found to be much better? With a byte at the front, or anywhere, you could have a network upgrade without changing the whole system at once and starting over. Each bit in the byte stands for a different algorithm. It could be extended infinitely. So the first byte gets you 255 algorithms as 11111111 binary means you need another byte (ex. of next algorithm) 11111111 00000001 for the next algorithm. After you fill up the two bytes you would need three. That’s a lot of expansion though and would take a long time to reach. The basic idea is exactly the same as Multipurpose Internet Mail Extensions (MIME) where what the algorithm is exactly, (in the MIME case for JPG or MPG or Whatever). In this case it would a routing standard or encryption standard. Each server would read the byte it was sent and know what library function to preform on the stream sent. Of course it could cost you a byte for each connection. The advantage would be a huge flexibility. Have you seen Ethereum? Which is a sort of routing/computing network. Adding a byte could allow you to use this sort of flexibility in the future. You don’t have to use it now. A question you should ask yourself is “Have I thought of all the possible routing and computing options possible right now that could ever come up”? As computing gets faster then more work could be done at each node while hardly touching the speed of the interconnects. The network is the slow part once the data is in the processor it can be operated on quite quickly. Kind of like how Truecrypt, a encryption tool, decrypts data off of a hard drive in real time. It doesn’t really slow getting data much at all because the processor is so fast compared to the latency of the hard drive. The network is in a similar position as the hard drive in terms of latency and it may be that intelligent computation can drastically speed up the function or the “utility” of the network.

        I understand DHT’s. I think I how know Kademlia works sorta, the XOR part I understand to test if a hash or address is at a location.I’m not so sure yet about “how or where” data is stored. If the data hash is stored in a location that matches the node hash then that seems unsafe. An simple made up example. Let’s say Tom Cruise new movie “The Huge Mass Attack” has a hash of 11000110 and your node has a close hash of 110001101 so it stores Tom’s movie there couldn’t the movie companies know to look at your node and fine the hell out of you? To get close to you they could keep changing their node number until they got your IP as the first IP.

        I need to do more reading to understand how the data hash relates to the address of the node hash. It would seem that if the data hash is not close to the node hash number then the log n search would fail as the data could not be found just by XOR of the node’s hash. I’ve been reading this post also,

        https://metaquestions.me/2014/09/03/maidsafe-part-iii-joining-anonymity/

        I’ve been reading but this does get a little complicated in parts. I’ll read a little more and maybe ask more questions. Probably in the other linked thread. My hops question I see is ill-founded(thanks for answering it). I do believe the question on having a type byte relating to the algorithm is reasonable.

  7. David Irvine says:

    Trolling, no, questions are great when it’s clear we are looking for answers. Happy you are helping.

    In terms of upgrade, I see your point, similar to multihash etc. Our wire format is not formalised yet, but the intention is to have a version id at least. Nodes should be able to negotiate the version on connection with new nodes being backwards compatible. This may only be a single version at a time.

    In Kad (traditional) data is stored near the hash, he issues there are not identification as much as they are looseness. By that I mean no delete/edit as data is loose and not known with certainty if an operation like a mutate has worked and none of the older data exists any more.

    As for trying to work out which chunks are where, logically you can get a location in the overlay (if we ignore caching) but then getting the IP address is harder. Well in normal kad you just ask and you will get it. We have gone for a negotiated connection. So a asks b to connect and b calculates if it wants/needs to. If so then ip info is sent encrypted between the two of them,

    Then the nodes relocate in groups every so often (see our rfc github, node aging). So this becomes pretty obfuscated.

    If you check the community forum https://safenetforum.org/ these issues are discussed in great detail actually. Lots of good info there.

  8. […] As many mini compute devices appear we need to be able to share data between them, securely and (again) irrefutably. Importantly here we wish the network to spot and remove bad actors. The early network nodes that we have right now are not incentivised to operate for the benefit of us all. Providing these incentives (safecoin) means that the provision of resources to run IOT devices is rewarded, whether the “owner” benefits from the device capability at all times, or not. This alters the market significantly where devices can help others and not only the current owner. With these devices, security is important, but ensuring valid data is essential. An autonomous network can and should ensure good behaviour of nodes via node managers as described in the language of the network. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Member of The Internet Defense League

Follow Metaquestions on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,426 other followers

BTC address
13YSCv8SLrBw27AdyRQY8adfsGa56viQcJ
%d bloggers like this: