Data is the currency, literally!

freddie-collins-309833.jpg
Photo by Freddie Collins on Unsplash

In the last post, I discussed “the impossible network“, an autonomous network designed to protect the worlds people and their data.  Before moving on to use cases for such a network, I thought it required a little more clarification.

Many people have said that sounds like project … (insert many blockchain products that store data), however I think this could not be further from the case.  SAFE is an autonomous network for a start, I do not think any other project that manages private and public data claims this (private means it must provide some method of self authentication), but would love to hear of any that did. Never mind one with an inbuilt incentive mechanism.

The currency on such a network would obviously be data, but I do not mean that in some abstract sense, I am literally stating the currency in such a network is a data type. Not a separate network or an add on component, but a data type like any other on the network.  From a crypto currency perspective the network is the ledger, but in the case of SAFE that ledger is private and most certainly not an add-on, it’s an implicit function,

Another misunderstanding is related to projects that store some data identifier (e.g. hash) in a blockchain, but store the data “somewhere”. That “somewhere” could be some persons computer that they control, or some company servers etc.  This is the very thing we are supposed to be removing, Those pesky corruptible intermediaries. To us in MaidSafe this is a huge red flag and almost as bad as just not storing the data at all. If the data can be corrupted, or peoples access denied etc. then it’s simply not physically secured, there is no debate here.  If you store the data, then you have the identifier, so there is a dichotomy with those networks that is hard to reason.

SAFE stores data identifiers (in data chains) and also stores and protects the data itself, this is what farmers get paid to do. That is how we commercialise the world’s unused disk space automatically for ordinary people across the globe and provide the cheapest possible secure storage (and computation eventually) for everyone on the planet.

So if you are to remember one thing about SAFE, it is the network, not any humans, which protects our data, this includes controlling all costs, rewards, storage, access, communications and currency.

Now I will get on with use cases, I hope this makes the impossible network a bit more clear.  It sounds huge, but it’s actually much bigger than that 😉

 

Posted in complex systems, MaidSafe, Personal Opinion, safecoin

The Impossible Network

pexels-photo-210199

In recent weeks and months, the MaidSafe team have been very quietly progressing something quite amazing.  The dedication and commitment of the team is admirable, but the task is so great that we forget how huge the prospects are. Not only that we also at times forget to talk publicly about it. This will change I am sure, but in this personal blog I do intend to get the message across and try to really explain the potential here, it is quite astounding and can change our world, but it needs better understood, even by early adopters.

The Back Story

We have had several discussion in house recently about the SAFE network or the MaidSafe design. These discussion are surprising, we are thinking about what we are offering, not the vision, not the design and not the roadmap, these don’t change. The issue we have been discussing is one of perception and by extension imagination.

I had better explain this rambling introduction slightly better. As always some historical perspective helps. In 2014 we aligned with the crypto community, with similar drive and goals in many respects, however very different products and this is where the issue happens. We choose to define our goals as decentralising the Internet. Of course all the “it’s already decentralised” arguments aside it was a message we felt folk could follow. “We will do for data what bitcoin is doing for money” (again ignoring the side arguments about wealth, stores and all that jazz) was a mantra we used to drive home our message.

It worked and worked well, but with issues. Being us, we got to work, heads down, bums up designing, coding and testing. Nothing could stop us, we were and are focused on delivery of this “impossible network“. However, as society does, never letting a good message go to waste, the decentralised Internet message gathered pace and simply became overused and no longer differentiated us. Some projects even started calling miners farmers and such like, the confusion was not good for us, but we ignored it, mostly. Recently we have noticed we are actually perceived as another version of several projects. Initially we were compared to tor, BitTorrent, freenet etc. but now it’s any crypto project with storage or some networking. so there was a shift. Decentralised Internet became, use some crypto and (maybe) store data somewhere.

Not disrespecting any project here, this is our project I am thinking about and it’s, just like 2014, being considered just the same as project X, where X now is anything crypto with storage, where previously it was any project that stored stuff, regardless of crypto.  To be fair I have not deeply looked at many of these newer projects past the headlines.

So what’s the problem? Competition is great, it shows we are on the right track, or at least not going it alone. Well there is a problem and it’s our problem. The Decentralised Internet does not describe us any more, well no more than “some computer thing”. We have failed to really explain to people what MaidSafe is and what the SAFE network will be able to achieve. The interesting thing I see here is that decentralised Internet mantra used by projects that have pivoted. This shows how many projects stick this description on their products/goals even when those products change quite dramatically. Therefore it is essential we do not use an overused and frankly abused phrase to describe in a few words what we do.

Ok then what is SAFE?

pexels-photo-210158 (1)

This is the critical point, SAFE at it’s core is an “Autonomous Network“, not a set of federated servers, or owned storage locations, or identifiable nodes, but actually an autonomous network. This means no human intervention, no humans setting prices, no altering configurations to make things work, no tweaking data on disk, no altering rules of the nodes, absolutely no human intervention apart from running a piece of self configuring, self healing software. The network decides prices, rewards and how to protect data,  communications and calculations, not any human!

By the very definition of an autonomous network it must be secured against all known threats, otherwise it would not live long at all and be rendered useless almost immediately. Such threats could be collusion based Sybil attacks (people “buy” other peoples vaults etc.) as well as many DOS attacks (so no leader based consensus) have to be defended against, an example of a couple of difficult problems we had to solve. There are a great many threats and challenges, way too many to mention here. In short this is extremely difficult to achieve and naive implementations that ignore even one of these threats will certainly fail. Many answers are not in current papers, research or literature, Engineers need to just suck that up and get innovating and that’s great.

The network alone must know the user accounts, nobody else should. If you think about this, it’s critical and sounds simple, but some thought experiments will show it’s far from simple, otherwise we have intermediaries and by extension corruption. Unless the user wishes to let folk know then they are truly anonymous. The network masks IP addresses to prevent snooping. The network also balances supply of resources and demand on those resources, via “paying” a human somewhere to run this piece of software.

 

This autonomous network works in simulations and the wider network is being proven by running and measuring testnets and releases. This means nobody owns SAFE, nobody controls SAFE and nobody can manipulate SAFE, groups cannot collude to successfully game or attack SAFE and SAFE does what SAFE is programmed to do, protect our privacy, security and give us freedom to communicate, think privately, share privately, learn and teach others. SAFE is the network and the network is safe.

So Autonomous network, that is a mouthful and will confuse many people, what do we really mean in simple terms? Well it’s the Impossible Network, that thing that should not exist, the flying car, the self driving truck, the moon landing, it’s just the impossible network, until we make it possible. A network that secures data, communications and calculations for us all, without owners, intermediaries or third parties. Its a living thing, if you like, something we switch on and the world uses. People power the network for people and get paid for it, people read, watch movies, publish information and hopefully innovate, collaborate and further society, without borders, control or fear. That’s worth doing and is worth working very hard to achieve, however it’s complex and is very easily misunderstood. It’s also, unfortunately not easy to release in parts, autonomous means no intervention, it requires a whole working foundation. That is no easy feat, but if it was then none of us would find it interesting to work on, I certainly wouldn’t. If you want to terrify Engineers put them on projects like this, if they have integrity they will be terrified, however if they relish a challenge they will be satisfied in all the pressure and apparently impossible issues that absolutely require to be solved.

What are we doing with SAFE right now?

At this time the testnets are rolling out, Alpha 2 is a few weeks away, or may be launched before I finish this post. This is replicating current Internet based applications. This seemingly strange task is quite essential, but in some ways is difficult to do. We are forcing a very intelligent huge smart contract system to behave like a normal Internet. The reasons though are varied, but mainly allows the application developers to start on a familiar path. Then as we dive deeper the realisation that these apps (although amazing and competing globally without infrastructure costs etc.) are actually only the tip of the iceberg of possibilities. A conundrum akin to Tim Berners Lee using the internet to stick stamps on emails. In our case though it is valuable, but it is 100% a stepping stone to a much wider and probably more useful set of services that developers can create for people.

What lies between now and launch/Beta

chain

There has been quite a bit of work on data chains which is basically a design to allow the network to validate data was stored securely and transactions happened on the real live network and was not introduced. This provides important features that are required for networks to restart and the network to  manage network partitions. Nodes can then republish data as individuals, but backed by  digitally signed proofs etc.  Data chains and disjoint sections allow a lot of features including secure message transport and more. It’s available to read about  in the RFC‘s this blog and more on implementation design of part 1 in the dev forum. We have split this task into 2 parts, Data Chains part 1 secures groups of nodes. This all happens at the routing layer and actually creates an autonomous network, although one with no data. This is Alpha 3. There is a simulation framework built that allows us to test the data chains design, here. Node ageing is also a requirement of this network and those who like to know more can read about that here. The above requires disjoint sections in the network and again that is described here for the avid reader, this particular implementation was a surprisingly mammoth task in itself, but has proven successful..

Then we add the data layer again to reintroduce user run vaults (the software people run to create the network and get paid) in Alpha 4. The data element is much more straightforward after data chains part 1 is in place.

After that it’s a lot of testing and the introduction of test safecoin. This is simply a data type on the network, so already secured. The addition is the networks contract, some will call this a “smart contract”, but the difference is the oracle is in the network and secured in the network.  Safecoin is the mechanism of tokenising the provision of resources by the network. The network measures and rewards good behaviour of nodes by allocating safecoin to such nodes that behave and have proven to have handled user requests properly.

That is the last components in the very long journey, not that they will be completed immediately, it’s many (many) months of work and testing at least. We are in a magnificent position right now though a we do not have any unanswered remaining blockers between us and launch, no huge design issues to face, just some implementation decisions. After launch the focus in the backend will be simplification and formalisation of as much of these innovations as possible.

Consequences of a true autonomous network

This is where, understanding the “Autonomous Network” really matters. The non ownership of the data storage and computation capability is vital for a future where we can actually integrate innovations and allow “ideas to have sex” much more instantly and effectively. Autonomous network(s) also allow people and companies to move from “private owned” silo’s that inevitably get hacked.  A good (probably vital for society) outcome is the invalidating of privacy as a product moving profits to providing great value, without taking control from people. This is much more lucrative, less risky and without the intrusion and fear that customers are feeling increasingly for their children these days, never mind the fines for data theft and costs/upheaval of ransom-ware etc..

Examples of how some existing technology may change (or be fixed)

  • Autonomous transport
    • Collaborating and sharing important information like accidents, road, rail or sea issues, weather etc. who owns the servers the data sits on, what if its manipulated?. Autonomous networks removes the ownership problem and allows true provable (non refutable) sharing of data. They also enforce industry wide rules that are set in code that is also tested industry wide. Autonomous networks also remove or at least vastly reduce reduce manipulation of results as well as preventing theft or false reporting of company data and industry test results (I am looking at you pharma).
  • IOT
    • As many mini compute devices appear we need to be able to share data between them, securely and (again) irrefutably. Importantly here we wish the network to spot and remove bad actors. The early network nodes that we have right now are not incentivised to operate for the benefit of us all. Providing these incentives (safecoin) means that the provision of resources to run IOT devices is rewarded, whether the “owner” benefits from the device capability at all times, or not. This alters the market significantly where devices can help others and not only the current owner. With these devices, security is important, but ensuring valid data is essential. An autonomous network can and should ensure good behaviour of nodes via node managers as described in the language of the network.
  • Home automation tools and assistants (Alexa, Google home et’ al)
    • The market and advances in home automation, including voice and video recondition amazes and also terrifies consumers. This is an area where we all know we do not wish any company knowing all that information about us. However with autonomous networks  entities create their own accounts without intermediaries, (see self authentication, a very important and simple requirement of an autonomous network) and these accounts can be used to communicate with other nodes, importantly with zero requirement to share who the node belongs to.
  • Cyber warfare or server hacking
    • Simply removing the “target” in terms of servers is a significant reduction in the attack vector of cyber warfare or industrial spying and data theft. The market for this level of security is extremely large, but autonomous networks just remove the problem and therefor are the solution. Anything else that has been tried until now is purely a cat and mouse game between black hat hackers and companies.
  • Password thefts
    • As with improving server security by removing the server,  autonomous networks like SAFE remove the requirement to store a password at all, not locally or stored on disks the we remove passwords form the network. Therefor no password theft on the network.

Of course some of these can be considered changing the status quo to a point where the above points seem like new technology. However, that would be new in our Engineering eyes, but consumers would see no difference, although they may feel safer, the end product in terms of features already exists or these improvement would be invisible to them. More importantly consumers of today’s network probably expect some of the above to already be the case, sadly though it is not and will not be without autonomous networks in my opinion.

Examples of new technology that will be possible

I have written this section and deleted it, replacing with more ideas several times now. I think it’s for another post, but readers may wish to read this whole post a few times and the linked documents and come to their own conclusions. No doubt much better ideas than I can provide will be found and the world changing products of tomorrow should be discovered this way. The worlds full of inspiring people and with the right tools and removal of infrastructure costs etc. I believe we can as a society make great use of these types of network for a safer future.

I hope this has proven to be a bit of an insight into SAFE and the potential it has. The exciting thing for me is not replacing the Internet, but removing this crazy ownership of data by large corporations. They should be providing value, not taking our privacy. I also am excited by the networks ability to ignore silly laws like weaken encryption or snoop on your customers. When we use logic to create things of beauty like this, then those silly governance issue become nothing to fear, discuss or think about. We all get on with our lives then, privately securely and we gain the freedom to communicate, collaborate and innovate all as one. Then it will get very exciting to watch the progress that will hopefully benefit every person on the planet, in one way or another.

In future posts I will attempt to break down the technical parts in a bit more detail and in addition will investigate more solid use cases, one by one.

Posted in complex systems, MaidSafe, Personal Opinion, Privacy

Data chains: what? why? how?

What?


A chain is an image we all know. It represents strength and reliability. Bitcoin has made use of such imagery in the implementation of a blockchain. A blockchain is a chain of links where each link couples transactions (blocks of transactions) in a reliable and cryptographically secured manner.

chain

Data chains represent, instead, links that couples together data identifiers. A data identifier is not the data. As the blockchain does not hold bitcoins a data chain does not hold data.

This post discusses an overview of data chains, there is also an RFC covering some basic use of DataChains and also a codebase to implement DataChains, as well as Data handling (recognition) and long term storage.

Why?


What use is that then? If these things hold no data then what use are they?

This is a critical question to answer and resist writing off as irrelevant, just yet. If we can be assured a transaction happened in a blockchain, then it allows us to know where a bitcoin should exist as if it were a real thing. It is the same with a data chain, if we can be sure a piece of data should exist and where it should exist. However with data chains, identified data is real (documents, videos, etc). That is, if we know these files should exist and can identify them, then we just need to get the data and validate it.With a network that both stores and validates data and their IDs, we gain a lot in efficiency and simplicity as compared to blockchains which cannot store significant amounts of data such as files. Data chains would additionally allow cross network/blockchain patterns, but one must ask why do that, duplication is not efficient?

Data handling 

Therefore if a block identifies indisputable information about a file, such as (naively) a hash of the content, then we can be presented with data and compare to a valid block. If the block says the data is recognised, then it was created by whatever mechanism the network agreed to assume responsibility for the data.

Now we can historically validate that data belongs on the network and it was paid for or otherwise accepted, i.e. it is valid network data. This is a great step, some will think, oh I can do that with XYZ, but hang on for a few minutes.

Network handling

Here, we separate from a standard one truth chain and look at chain parts, or a decentralised chain on a decentralised network. A chain where only a very small number of nodes will know of the validity of some small sets of data, but the network as a whole knows everything and all data. Ok, we can see now this is a bit different.

We need now to look again at the chain a little closer, here is another picture to help. We can see here that there seems to be links and blocks. This is an important issue to cement in our thinking.  In a blockchain for instance the link is simple (simple is good, in security the simpler the stronger in many cases).

datachain_diagram

Here though, a link is another agreement block. These links are actually a collection of names and signatures of the group who agreed on the chain at this time. Each link must contain at least a majority of the previous link members. With every change to the group then a new link is created and added to the chain. Please see the consensus overview blog for an overview of these groups and their importance.

Between these links, lie the data blocks, these represent data identifiers (hash, name, type etc.) of real data. Again these are signed by a majority of the previous link (in actual fact they can be slightly out of order as required in an out of order network, or planet 🙂 ).

Now the picture of a DataChain should be building. But how does it split into a decentralised chain?

Splitting up

The easy way here is to consider just the above chain picture, but also remember back to a xor network13706-20c, or plain binary tree, here is one as a reminder.

Remember also we have a notion of groups of nodes. So lets take a group size of 2 as an example.

The process would be:

  • Node 1 starts the network, there is a link with one member.
  • Node 2 starts and gets the chain so far (it’s small with only node1 listed). Node 1 then sends a link to node 2 signed by node 1.
  • node 2 then sends a link to node 1 signed by node 2.

So now the chain has two links, one with node 1 alone and the next with nodes 1 and 2, both signed by each other. This just continues, so no need for me to bore you here,

However, then node 4 joins and assuming all nodes have a purely even distribution (meaning 2 nodes address starts with 0 and the other two nodes address start with 1) the chain splits! two nodes go down the 0 branch and two go down the 1 branch. The chain has split, but maintains the same “genesis” block. The link with node 1 only. Between these links the data blocks are inserted as data is added (Put), edited (POST) or Deleted from the network, each data block again is signed by the majority of the previous link block (to be valid).

So as more nodes join then this process continues, with the chain splitting as the network grows (for detail see RFC and code). This allows some very powerful features, which we will not dive too deeply in but to name a few as example:

  • Nodes can be shown to exist on the network (i.e. not fake nodes).
  • Nodes can prove group membership.
  • Nodes can have a provable history on the network.
  • Nodes can restart and republish data securely that the network can agree should exist.
  • Network can react to massive churn and still recover.
  • Network can recover from complete outage.
  • Network can handle Open ledgers for publicly verifiable data.
  • Network can individually remember a transaction (so for instance a single safecoin transaction can be remembered as a receipt, at a cost though).
  • Network can handle versioning of any data type (each version is payed for though).
  • Fake nodes cannot forge themselves onto a chain without actually taking over a group (meaning they cannot be fake nodes 🙂 ).
  • As data blocks are held between verifiable links then data validity is guaranteed.
  • As data is guaranteed some nodes can hold only identifiers and not the data, or a small subset of data.
  • As not all nodes are required to store data, churn events will produce significantly less traffic as not all nodes need all data all the time.
  • The network can handle nodes of varying capabilities and use the strongest node in a group to handle the action it is strongest with (i.e. nodes with large storage, store lots, nodes with high bandwidth transfer a lot etc.).
  • Archive nodes become a simple state of a capable node that will be rewarded well in safecoin, but have to fight for its place in the group, too many reboots or missed calls and others will take its place.
  • Nodes can be measured and ranked easily by the network (important security capability)

There is a lot more this pattern allows, an awful lot more, such as massive growth or an immediate contraction in the number of nodes. It is doubtful the full capability of such a system can be quantified easily and certainly not in a single blog post like this, but now it’s time to imagine what we can do?

How? 

Moving forward we need many things to happen, these include, but are not limited to:

  1. Design in detail must be complete for phase 1 (data security and persistence)
  2. Open debates, presentations and discussion must take place.
  3. Code must be written.
  4. Code must be tested.
  5. Integration into existing system.
  6. End to End testing
  7. Move to public use.

Of these points 1, 2, 3 & 4 are ongoing right now. Point 5 requires changes to the existing SAFE routing table starting with an RFC. Point 4 will be enhanced as point 2 gets more attention and input. Point 6 is covered by community testing and point 7 is the wider community tests (i.e. Alpha stages).

Conclusion

Data chains would appear to be something that is a natural progression for decentralised systems. They allow data of any type, size or format to be looked after and maintained in a secure and decentralised manner. Not only the physical data itself (very important), but the validity of such data on the network.

Working on the data chains component has been pretty tough with all the other commitments a typical bleeding edge project has to face, but it looks like it has been very much worth the effort and determination.  The pattern is so simple (remember simple is good) and when I tried to tweak it for different problems I thought may happen, the chain fixed itself. So it is a very simple bit of code really, but provides incredible amount of security to a system as well as historical view of a network, republish capability etc. and all with ease. The single biggest mistake in this library/design would be to try any specialisation for any special situation, almost every time, it takes care of itself, at least so far.

This needs a large scale simulation, which we will do in MaidSafe for sure prior to user testing, but seems to provide a significant missing link from many of today’s networks and decentralised efforts. I hope others find this as useful, intriguing and interesting as I do.

We think there will be many posts depicting various aspects of this design pattern over the next while as we get the ability to explain what it offers in many areas. (i.e. when we understand it well enough to explain it easily)

 

 

Posted in Uncategorized

Introduction & Technical Overview of SAFE Consensus

The features included in decentralized networks can be quite varied based on the proposed goals of the technology. From the sharing capabilities offered by Bittorrent to user privacy enabled via To…

Source: Introduction & Technical Overview of SAFE Consensus

Posted in Uncategorized

Structuring Networks with XOR

MaidSafe

A prerequisite to understanding the SAFE Network on a technical level, including the consensus process, requires knowledge of the underlying structure which powers it as a decentralized, autonomous system. Peer-to-peer networks can be categorised into two general types: structured and unstructured. In unstructured networks, there is no explicit organization of nodes and the process of joining and forming connections to others is random. In structured networks, like those which use a distributed hash table (DHT) such as Bittorrent or the SAFE Network, the overlay protocol includes a structure for organizing the network and makes purposeful connections to specific nodes more efficient.

One of the most widely adopted DHT’s, named Kademlia, was originally popularised through its implementation in Bittorrent’s Mainline DHT which removed dependence on central trackers for finding the locations of nodes and data stored on the network. Kademlia employs a rather simple operation called “exclusive or” (XOR) to establish…

View original post 1,574 more words

Posted in Uncategorized

The language of the network

Another quick post, I hope this makes sense, it is perhaps the most exiting aspect of MaidSafe I have found so far and the consequences of this are far reaching indeed. It will not be obvious, but please ask questions on maidsafe.org and I will try and explain more of why this is very important, not only to launch with but to move forward and also model system components very quickly.

The notes we never heard

MaidSafe, that strangely named bunch of folks who are looking to change the world with something that sounds:

  • Amazing
  • Impossible
  • Frightening

Yes the vision that is rock solid and unmoving against all odds and all comers no matter what. The vision to give to all the people of the world Privacy Security and Freedom, but how do they do it? how do they explain it?

Well until now, not very well. There are tests, experimental data, research, 18 months of network simulations modelling, papers, interviews, videos, blog posts, comments and answers, email trails, presentations and all in all it still sounds … too good to be true! For many it sound’s like an impossibility. Well it works, it produces results and we still cannot explain it. Is this about to change ?

The network critical parts

Routing is a critical and central part of the codebase, it provides routing information and security of the network, handles churn and ensures the xor relationships are intact and accurate. It is the workhorse of the network relying heavily on a very efficient reliable UDP implementation underneath. Crux, our new rudp is that implementation. I wanted to check this code in detail. I was particularly interested in the consensus checks and ensuring they were solid and as efficient as possible (they will always be more efficient as time goes by, but for now they need to be rock solid, so worth the effort), Consensus groups and checks are a very important part of the network for group based actions to ensure the correct authority is network agreed. This along with client digital signatures makes all the various authorities work in unison and provide harmony out of apparent chaos when you look at the network.

In routing we have a GroupSize and QuorumSize, the Quorum is the number we require to validate a groups intentions as an acceptable level of comfort the intent is agreed with the group.

Software development

Many who know me or know of me will know I am an Engineer, as happy with a hammer or soldering iron as I am with a compiler. I like Engineering and I think I act like an Engineer, what cannot be fixed? give it here, I accept the challenge!, Impossible great that will be fun, lets work out how to do it!.

Software development to me is Engineering, I care not if I am trying to loosen a nut with a damp dish-towel or making code do something for me. I see a problem, look at what is available and attack! Then though you have to build amazing things and it is different, you plan, build, test, build, plan, build test, build and so on, getting to the place where it makes sense. Then start all over again and implement the answer, neat repeatable and stunning.

With MaidSafe the problem we took on is, the Internet (web etc.) is fundamentally broken and taking the wrong route, answer, well fix it! OK yer on.

So we did, with many years of build, test, plan, build etc. we got a system that could fix this issue and went though testnets. On testnet2 I decided with a lot of coaxing to get back to code and re-factor routing for stability and security analysis in particular. In doing so I went though all sorts of hell, no sleep and fighting logic every day asking why like this? why like that? (not the code the sheer complexity) Ok rewrite part of routing, I can do this fast, I always can when I focus. So no xmas for me, there’s work to be done (and the lines of “the band played waltzing Matilda” resonates, especially “never knew there were worse things than dyin”).

Tough days, harder nights and I was in a familiar place, deep deep thought, what are these types in the code (c++ has types, they are called classes), why do they exist, what fundamental facets make them unique enough to exist as a type? Yes questioning God, gravity and C being a limiting factor of the universe, this was all familiar. I was in innovation mode again, that dark place where nothing can get in the way of your thoughts, not even a million blog posts and comments, questions etc. nothing. You are in cruise control, focussed on a burning question, why so complex? why? What am I missing?

The MaidSafe issue was again prevalent, I Cannot explain this to anyone, it is stunning and beautiful, but so so complex.

Back to school

Then it was back to school thinking, I remembered the huge equations of experimental data on a blackboard and what we did to it. We factored it. What starts out as a massive multi variable mess of junk, eventually factors to x = 2y + 6.

This is what MaidSafe is, un-factored (well not factored enough). This is why we cannot explain it, it’s why changes to the code are so hard. It is doing something nothing else has done and doing it well, but with such huge complexity. I am not happy, we never factored the equation. How though, how?

Back to types, what are these and why?

I deviate from some software developers here, I see types not as useful code helpers to stop you typing the wrong name or something into a functions. I see types as a thing and that thing has a particular Genus, you are making something with a type, something unique and with purpose, different from any other thing you have made. So it needs thought. Really think hard, what is it and why.

So back to routing, I am looking at the network as an electron or message would, the vast array of connections all held together in xor space, I know I can do certain things, but what can the network tell of me. Can it recognise where I have been, or where I am and if so what am I when I am here at this place in space and time?

I know we have consensus groups, I know folk do not find it easy,  but they are there all over the place inter-lapping, constantly changing and deterministic (well if you can stop time and just analyse the network fully, you may be able to identify the state, but like Zeno’s arrow it will give more questions than answers). I know these groups exist, I know what they do and I know there are lots of them, each interwoven with others in a dance that makes your head spin in this cable.

I then notice something, something that makes everything stop. I see a familiar thing, a pattern, a repeatable, explainable pattern and it is describable. In the mist there is a view coming to me and it’s taking a form, what is it?

It’s a consensus group type, not only that but it stamps itself on a message until the next group gets to it. There is something I can see and more importantly name. These patterns are everywhere. I look at the vault code. Over 260 c++ files, beautiful code, amazing skill, but it’s doing something I can see. Most of the time this code is trying to work out all the functions and features available at any stage in a messages travel. It’s looking for the answer, but not always the same way. It’s not hearing the music and it makes up for it with more code and more algorithms.

Now I am in shock, I always knew we would uncover the secrets of the network, I knew it was our path to follow and find this, but was this it? I had to take time out, Dude needs a walk on the beach, off we go. I end up running back to the code, I was right this is it. I need to think.

Ok what is it

So the revelation is this and it is quite simple. There are several persona types, all personas are actually one of these regardless of what code we put in it. A persona group is a certain and guaranteed thing that the network in the real world can see, validate and secure. This all happens very dynamically and the network knows these things. Even though we have calculated this, the network knows and it knows better that any amount of advanced code we can throw at it. This is one of those, “of course it is“! moments. The network can tell us who we are with respect to a message, we do not need to calculate it in upper layers, routing knows already it just needs to tell us and we must listen. This is such an under-researched area that this is a truly amazing find, in the group consensus model we have this really focusses thinking and allows significant improvements in explaining also everything about the network. Of course self authentication, encryption and data types etc. are all still very new, but this is a huge aspect (several man years of effort to find this) and it will transform not only us, but hopefully this industry of decentralised approaches to software and services based on it.

People will (and should) look at this now and say, well that’s obvious. I can assure you, it only is after you see it (like a wheel or boat).  So what are these types and why do they exist?

Here we go.

We have (4 type, only 4, no more no less)

Client Manager – This is a type of persona this is the group of routing nodes closest to a client node. They can tell as they have a connection that is not a routing table node, so it must be a client! Examples of these types are MaidManager (the group that looks after a Maid account), MpidManager (the group that looks after public name and public shares/drive for public clients)

Nae Manager – Network Addressable Element manager groups. They know they are this type of group as they are close to the address that equals the name of the network addressable element (not a Network Addressable Node, but data or function elements). Examples of these are DataManagers (look after data pointers) and VersionManagers (looks after directory versions and any other mutating directly addressable node)

Node Manager – This is the group surrounding a node and they know as the node appears in their routing table. An example is a PmidManager (the group looking after nodes holding data).

Managed Node – This is a routing node in a group of Node Managers such as a PmidNode (i.e. a node actually supposed to be holding a data element). Future, ComputeNode to handle computations (using zk-snark etc.)

That’s is it. Now we call these authorities as they represent a particular authority. So lets look at a client putting data on the network.

  • Client sends message to his own address on the network
  • ClientManagers receive this and check it is a client he has enough storage available (paid enough safecoin)
  • They send to the DataManagers
  • DataManagers check the from Authority is a ClientManager group and look to see if data is already stored. If so then Finish, else
  • DataManagers send to a PmidManager group.
  • PmidManagers check from authority was DataManagers and they store in a node close to the address of the group.
  • PmidNode receives this Put and checks the from authority was his PmidManager group.
  • Pmid Node stores data. Finish.

“Hold on! I spot a problem”. It is OK these nodes can confirm who they are, but here we mention from authority. It seems the from authority is required to confirm the chain of deterministic actions is correct, otherwise people can claim to be anyone.

This is the other neat part actually. We know and have proven experimentally many times the quorum of a group is secure. So you know what group you are (remember the network can absolutely tell you with incredible accuracy what persona you are). So you send your persona type in the message to the next group. The next group then accumulate results, check your signature and validate all your group said they were persona type X. So this seals the chain of events into a secure deterministic chain. This provides great security, but more importantly the language now becomes simple.

Put 

Client -> ClientManager<->NeaManager->NodeManager->ManagedNode

Get

Client <->NaeManager<->ManagedNode

We can add in symbols such as ->>>> to represent going to a group and some more small symbols (such as <-> meaning may return result from here) and we have a complete language to describe the vault code (to be published soon). Importantly given such strict rules and types this can be coded very quickly with a strict specification. We have tried to do some things that would break security and guess what, this language stops it dead in it’s tracks. So we can develop very quickly new persona types for these categories for many things, safecoin, compute, AI engine, Search and many more. This speeds up development to a point we should be able to test different ideas in days, not weeks and months as it is with the huge un-factored code base. Please do not misunderstand this does not mean code in days (although a massive reduction in time for code is obvious), I mean we can write down data flows and analyse them in a manner people understand easily. This allows  many people to be involved in such thought experiments prior to implementation. This is a huge step, no longer does one or perhaps two people understand what is happening, but everyone can (within reason).

This is huge and the most significant find I have had in 9 years working hard on these problems. It is still very exciting to me, even now. For instance our vault code will now go from over 260 template heavy and complex source files to a handful of very simple header files.  Testnet3 will include this, Crux and routing_V2 which adds in a sentinel, this is a crucial accumulation and security check. So we can confirm cryptographic authority (a client has signed and action on something he owns) and group / consensus based authority. A description of the sentinel is probably required now, so sorry for the inordinately long post.

Sentinel overview

Quick intro to network consensus, authority and crypto usage.

In a decentralised autonomous network there are many challenges to face. One such challenge is the range of attacks that consist of Sybil / Spartacus and forgery attacks (did the message really come from who you think). One of the simplest attack to foil is the forgery attack, thanks to asymmetric cryptography. This allows for a public key to be known by everyone and when anything is encrypted with this key it can only (supposedly) be decrypted by the private key of that keypair. Assuming a reasonable algorithm, keysize and implementation this holds true.

This also removes the Spartacus type attacks (claim to be another identity)., but not necessarily sybil attacks, where an attack on a bit of a network is enough to persuade the rest of the network that any request is valid or indeed anything asked of that part of the network is done as expected. To overcome this MaidSafe have several techniques used in parallel. These boil down to

  1. Have nodes create key chains (a chain of keys each signing the next until one is selected). We call these Fobs. A Publicfob consists of a public_key & signature as well as a name field. The name is the SHA512HASH(public_key+signature) making forging a key crypto hard (we can confirm the signature is also signed by a valid key pair by checking the signature there, where this ‘pure key’ is self signed). The Fob type is this PublicFob + the private key.
  2. Ask network to store the PublicFob for a node. The network will accept this if the node has certain characteristics (based on rank – later discussion) and the key is less than 3 leading bits different from the current group of nodes. This makes key placement distribute equally across the address range (as for rank consider only a single non ranked node allowed per group, and failure to increase rank means the key is deleted from the network and has to be re-stored if possible).
  3. This now resembles a PKI network where to speak to node ABC you go get the PublicFob at ABC and either encrypt a message to the node or check a message from the node is signed by using that PublicFob.public_key. The difference being no central authority exists and the network distributes and collects keys as any DHT would (except in this case the DHT is secured by the very PKI it manages). So this is very secure and does not require any human intervention (unlike a certificate authority)
  4. Assemble nodes into groups that will act in unison on any request/response. So these groups are selected to be large enough to ensure a Sybil attack would require at least 3X network size of attackers to be able to join (a single attacker with no other node types joining). The magic number here is 28, realistically this number is closer to 17.
  5. Allow a failure rate as failures will definitely happen. This is done by having a GroupSize of say 32 and set the QuorumSize to 28. This means for any action we require 28 nodes close to a target address to agree and carry out an action.

This Quorum creates a mechanism where another group or node can believe the action is correct and valid. This is called group consensus.

The group consensus provides the network a way to request or carry out actions and ensure such requests are valid and actions actually done. This is required as the network looks after itself (autonomous).

A client has a close group and requires to persuade this group to request the network take an action which Puts something on the network (a data element/message etc.) Clients create data and messages, the network handles these. As the client cannot just connect to an arbitrary group and demand something be done, they connect to their close group and register themselves (with their Fob) an account. The close group can then be persuaded by the client to request another part of the network create something (a Put). In the case of Maidsafe the close group request the client pay via safecoin (it used to be they paid with storage that another group managed an agreed). So the client persuades the close group to put data. (ignore how payment is made, it actually requires client send safecoin to a provable recycle part of the network (another group confirms this)).

So a client can sign request to the group (crypto secure) and the group can check validity of the request and then ask the appropriate group close to the address of the data or message to accept this request.

After anything is put the client can mutate this thing directly (if they have signed it). This is the case for directory entries, where a client can add versions to a list (StructuredDataVersion) as it was Put signed by the client. So the owner of the signature can sign a request to alter this. This is crypto secured authority and does not require the close group for consensus.

In the case of groups requesting actions then we have group based consensus and the network grants authority based on a group that can be measured as a valid close group, where each member has signed a request as being from that close group. This authority is what the sentinel confirms prior to the routing object processing an action request. Almost all messages are Sentinel checked, with the exception of get_group as this fetches Fob’s which are self validating and it fetches a copy of all Fobs from all group members and confirms they agree and validate. Get_group is only used for making sure we connect to our close group as a client or node.

Sentinel components

The sentinel consists of few components and requires no network connection. This is to allow for such a crucial element to be fully tested. The elements are limited to two (2)accumulator pairs. There are two pairs for two different authority types,

  1. Node level direct authority (i.e. could be a client)
  2. Group base consensus

In 1 we just accumulate a single message and get the Fob to check a signature. In 2 We require to get at least QuorumSize messages and this is for group based consensus and then we get the Fobs and again check signature to confirm the group. We also check the nodes are as close to each other in xor space as our own group is (with varying error rate)

To achieve this the process is

  1. Message Arrives
  2. Check Accumulator has seen it, if not Send a GetKey request (for a group or single node)
  3. Add to accumulator. If return is true then check the key accumulator of that pair -> if true then confirm the signature with the Fob (asymm::CheckSignature(Fob.public_key, message)

If the key accumulator did not have the key(s) accumulated (i.e. accumulator.CheckQuorumReached) then we do nothing and continue with other work. Then though

  1. Key arrives (from GetKey response)
  2. Check value accumulators have(address) the address is the source_id+message_id of the request, may be a group ID or nodeId
  3. If not found then ignore message
  4. Otherwise accumulator.Add(key) to the proper key accumulator of the pair
  5. If this returns true, then we get the keys and values (via accumulator.GetAll() calls from both accumulators and confirm signatures. group and return a valid message to the object holding the sentinel (the sentinel add call will be async)

The accumulators are LRU cache based and will empty old messages that do not confirm in this manner.

Conclusion

This language which describes the vaults and consensus logic will help us get closer to formal proofs, explaining the network and moving much much faster with a significant reduction in codebase. It also allows with a crtp template pattern a complete separation of vault logic and the network, so at last we can test fully logic and then attach it to the network. The last few months have been a special hell for me as we are in launch mode and then this. The opportunity is way to large to bypass so we are implementing this as we speak ( at breakneck speed ) and it’s looking fantastic. I will be posting to the community forum with updates, but next week I am taking a few days off and heading North for a break. I will do so in a huge sigh of relief and inner peace after this discovery. It is still something that has shook me to the bones in its elaborate beauty and simplicity. Truly amazing.

Posted in complex systems, MaidSafe, strategy

Experts and the corruption of truth!

This is another observation post and not a member of MaidSafe series. I need to get around to writing more on that. This is related though for sure. It’s also a rant which may not be very useful. This should perhaps be entitled Why David Irvine is absolutely rubbish at answering questions. As usual, like presenting I do not research, read up or prepare this stuff so expect mistakes.

Show me the maths proof!

I hear this a lot with respect to MaidSafe and generally from ‘experts’. I get very frustrated by it which seems to show a lack of ability to some. I argue this is not the case. We have recently developed some maths equations for the security of close groups in MaidSafe, they are far from simple, but agree with results. I must add they are probably not understandable by many, not really. (we will publish these in papers, but believe me it’s a tiny part of a large story, a headline, not full research). I am not putting it in this story for the reason I am writing this story, it’s a badge, like standing in front of a whiteboard with prepared equations, or people who speak in unknown languages of maths to people, in attempt to meet an ‘expert’ status. I will place these equations in papers where they fit for maths people, to portray part of a story, but only part.

You see, to me it seems a complex equation, preferably with some calculus and well placed greek letters can look amazing. It also can be a complete bluff, smoke and mirrors. A good example of this is I was recently chatting to a lecturer/researcher in a local Uni, he is a leading expert in DHT technology and well read and published (even in the peer-to-peer handbook – a bible of DHT tech). This person who for years, along with the community of industry ‘experts’ figured D1HT was a truly amazing, scalable, fast and very efficient DHT. This was published in many places and well-regarded, the maths was impressive and ‘proved’ the case.

For years we had maths that proved bees could not fly. We had maths to prove aeroplanes flew by splitting molecules of air and the longer areas of the top of the wing made these go faster lifting the wing up (well I am reducing this a little). Prior to this we had maths that showed how everything worked (Newton provided much physics to prove this),  we also had a ton of maths to prove zero did not exist (in fact was the devils number) and the world was flat.

Prove it by measurement!

Another field of physics/maths to prove things by experimentation. I was watching a Stephen Hawkins series there and heard a few times, “this measurement proved my thesis”, I also read that a lot in a book called why does e=mc^2 by the professor Brian Cox (the pop band chap, very entertaining and great). These tests do not prove any theory, they agree with a thesis within the confines of the variables involved in this test, which necessarily is limited. So it’s a limited agreement with a thesis, at best.

Make it repeatable!

OK, repeatable experiments are really good, but !

We repeatedly seen a bee fly, we repeated measured planes flying, we repeatedly built things using Newtonian maths/physics, we repeatedly worked for 600 years after Aristotle without the number zero (lets not even discuss complex numbers of negative, and infinity, well …), well maybe not all of us, the Chinese had gaps in numbers to represent zero, but not a name for it (as the mafia would say).

So the question is repeat what ?

Make it intelligent!

Aha, lets prove stuff by giving them, either a complex name or better a complex name with a muckle big equation with loads of components and if it does not fit, shove in a constant or two (it was good enough for Einstein). Now there you go we are experts, nearly nobody knows what we are talking about and we can stand beside a blackboard full of chalk, or even publish a ton of scientific peer-reviewed papers. We are set, experts, proof and above all a satisfied (for now) ego.

Hold on

  • The bee flew (we now believe it flaps its wings really hard)
  • D1HT failed (yes somebody built it and it does not work)
  • Aircraft wings now apparently are believed to behave like a sail on a boat (are we surprised molecules are not splitting in some weird dance with matter ?)
  • Einstein ‘proves’ Newton was wrong
  • Hawkings ‘proves’ Einstein was wrong
  • Recent ‘proof’ shows Newton was right

A single test or maths equations that agrees with a theory does not prove a theory and make a law.

So does e=MC^2, well we don’t know! but we all believe it to be true as though it were a law or proven theory, we teach it as such. Well all this is a problem, there seems to be no agreement. We also need to be careful of peer review and field testing as the NSA / NIST have shown that what is reported and agreed is not necessarily the truth, so this is a huge problem. A huge talk on its own, suffice to say, you need to believe the people who measure and give results (Pharma companies are already known to misrepresent results and actually throw away bad results)

Make it simple

This is the crux of the matter, it just is not simple.

We all need to be able to understand our first principles are probably wrong. We need to realise field testing only tells us a bit of what we need to know within limits. It is just not simple, if its simple it need to be a belief like religion and that leads to people holding onto something that is not proven. Those people will stop innovation and improvement on our global knowledge base.

What do we need to know

Maths is hugely important and is a must, but do not use it as a badge, it’s a tool like a spanner, it may be beautiful to you, but don’t shove it in folks faces.

Measurements are also hugely important to home in on improving the validity of a theory, but should never be stated as proof of a theory in their own right, unless they cover 100% of variables. We live in a pretty variable universe so it would be a brave person …

Repeated measurements really help, but repeat with what variables, do you use the same seed values or make them random (for many tests)?  This is why experimental physics is hard (very hard), there is a huge amount of things to consider.

Another important thing to consider is change, not only of your field but all around you. For instance RSA was very secure and beyond feasible for a computer to break. Now there are at least three things I know of that influence this a lot

  • Computers are exponentially more capable of solving factors via brute force.
  • Discreet maths is moving very fast, so the amount of work to solve these primes is way less than was thought possible when the RSA algorithm was created.
  • Cloud / grid computing allows huge resources to be applied and it can be costed (so we know its circa $5 of amazon time to solve a 256 RSA key)

Never mind far off quantum computing and all that clever sounding chat about qubits and shor’s algorithms etc. (oh nearly became an expert there!), it’s all part of a large pot of stuff we need to know.

Conclusion

There is no short-cut, there is no final proof, we are merely allowed to see only part of what makes the universe work, we do not know gravity, anti matter, black holes or even if any of these things exist! We cannot even tell why prime numbers happen in the order they do, what we do not know is very simple things that we really should know. So how on earth could we know the big stuff? Bottom line, this game we are playing only shows us a few of the rules at any one time, things will change as we discover more rules, but they force us to reconsider all our previous moves continually and as nature shows us more rules it will force us to be humble and start again.

The number of unknowns is enormous and trying to ignore them by simply an equation, fancy word for something, a measurement or even a series of experiments is simply not enough. All together they offer an ability to start to ask, none of them offer a final answer (and never will). So don’t be an expert, be an explorer and if you are nice to your fellow explorers they may even show you ways you have not yet considered. If they talk in maths riddles and hide behind fancy papers and equations then they are safe to ignore. There is no easy answer, only more information and potentially all you know may be wrong, certainly the majority is certainly wrong, so don’t be a believer, be ready to infer new conclusions as you find out more info, which may not even look related. So look at everything and prepare for massive surprises, they will happen!

This is why I struggle to give simple answers to folk, I hate lies and part of a truth is closer to a lie that saying nothing. Many understand this position, but many don’t yet. It is interesting to see though that the experts seem to be the very people who are the believers and not explorers, when folk also realise what they know is trivial and likely wrong then perhaps things will move along faster.

In saying that I also agree that the inability to easily explain something is an indication of a lack of understanding. A quandary, well yes … Just another thing I don’t know, I wish I did.

Posted in Uncategorized

Member of The Internet Defense League

Follow Metaquestions on WordPress.com

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,426 other followers

BTC address
13YSCv8SLrBw27AdyRQY8adfsGa56viQcJ