Beta testing of the TQL editor is now in progress at a live site. Stay tuned.
Updates on the world of high performance organizations and the information they use.
Beta testing of the TQL editor is now in progress at a live site. Stay tuned.
A cautionary note. AncelusDB is not supported on CentOS 6.7. This OS has a bug that corrupts the memory map and is guaranteed to cause the application to fail unpredictably.
Users of 6.7 should immediately upgrade to CentOS 6.8 or higher.
The barriers have finally been broken down. Primary architecture is defined and prototype is functional. We had gotten distracted by an attempt to keep the query language completely independent of the physical data store. This is the foundation assumption of SQL. Once we let go of that it didn't take long.
Since more than 90% of queries do not require that degree of independence, a new data store concept can get us out of the trap of being held hostage to the needs of the 10%. And no performance penalties. The 10% of queries that are really "ad-hoc" (not defined by the application) will be slower than the 90% cases. But still faster than the all systems based on the relational assumption.
The packaging and clean up to get this to beta level will take several months, but it now looks like we're on the right track.
Our development work on TQL continues but not without some setbacks. Several strategies have been abandoned in the past 18 months, and the idea of keeping it an arms-length toolset (independent of the database) is now on the shelf.
TQL development will now move down a new path, but with less disclosure. We think we're on a track that will result in a new operating model.
Watch this space. We'll let you know as soon as we can.
A common objection heard from IT organizations when discussing Ancelus is "Our database and technology stack is fast enough."
So how fast is "fast enough"? This usually means the user response time doesn't generate too many complaints. But that's a red herring.
When it comes to database performance Speed = $$$. Accepting a relational system as "fast enough" usually means "lots of hardware solved it." We had one customer that reduced server count from 252 to 26.
The more precise question would seem to be "How much technology bloat would you like to fund?"
What would happen if we tested the world's fastest database on the world's fastest server?
We usually ignore the announcements of the hardware and chip manufacturers. A debate about a 3 GHz clock vs. a 3.1 GHz clock has no significant effect on Ancelus performance. Our unique architecture eliminates most of the operating system and CPU functions.
That's why we chose to publish our benchmarks on a 1U pizza box, mail order server that cost under $8,000.
But something happened last week that is changing our attitude. We received a new server based on the Intel Broadwell CPU. This is the unquestioned leader in server performance. We're now in the process of running new benchmarks. We're suddenly paying a lot of attention to this hardware.
First results suggest that Intel has done something very fundamental. Slower clock speed (2.2 GHz) but much higher performance....
The NoSQL movement has received a lot of attention over the past few years. But the reality is turning out to be different from the hype.
All of these systems start with an assumption of a relational model. Most are table based, a few are columnar/relational. They all fall into the same trap. They assume the issue is with SQL and that they need to find a faster way to do SQL. They haven't yet gotten to the real problem.
The relational assumption is the core problem. It doesn't match the native state of information in nature, so it requires translation routines. The problem isn't that SQL is badly designed. SQL is forced to do massive transforms because of the relational model. Trying to make SQL faster misses the point that SQL does massive amounts of unproductive work. In normalized relational structures (many tables) the amount of unproductive work expands exponentially. In de-normalized relational structures (one or few tables) the schema structure is an add on. In either case scaled performance decays exponentially.
Ancelus eliminates all these problems by eliminating the relational storage structure and replacing it with a mathematical model. The physical storage model is purely abstract. The logical structure is purely native information - linked and recursive lists. Columns and tables are abstractions, mathematically derived.
The result is that instead of handling data many times, discarding the uninteresting, Ancelus only handles the interesting stuff - the final results set. Dramatic reduction in computing and network load....
Ancelus 6 is released. Includes performance improvements. New benchmarks will publish soon.
The latest update for the Ancelus database, version 6.0, has been installed in beta mode at a customer site. Includes major improvements in varchar speed and memory utilization.
GA release target date is June 1.
There seem to be two trends in dealing with the explosion of data from the Internet of Things. The dominant theme seems to be focused on the efficient administrative tasks in handling bigger datasets. This approach has no end game that we can see, since the most successful examples are de-normalized data structures struggling to deal with the write-speed constraints. This means you've duplicated massive amounts of data. So you solve the excess data problem by increasing the amount of data?
Hadoop and Cassandra use two different approaches, but they both involve a vast array of hardware. And neither has a reputation for addressing the real issue of cutting "time-to-insight."
The second is the still-small use of streaming analytics. Do it on the fly and focus on the efficiency of the data scientist rather than the admin. Most of these solutions deliver limited scope, but we've taken a different approach. First we use the extreme performance of Ancelus to enable more precise flagging of interesting trends. First statistics operate on the live data stream. Then we support executable stored procedures to accelerate the data scientist's in-depth analysis, but only after pointing to the area of interest. No fishing expeditions needed.
Our streaming toolset is called A3: Ancelus Adaptive Analytics and it serves both purposes without the hardware explosion implied by physically de-normalized data. The Ancelus logical structure is 100% normalized and 100% de-normalized at the same time. No data duplication, no pre-defined storage structure, so the logical structure is unconstrained. It can even be changed on the fly without downtime.
New technology, new game.
The Ancelus database has always been available to run on hosted servers, Amazon Web Services being the most common. But we've launched a project this week to modify the architecture to support a new pricing model.
Our first step is to implement our demo system on AWS to eliminate the need for a local install. Local systems often ran into version conflicts between php and Apache that took time to sort. The new system will be pre-installed on Red Hat and will be immediately operable. Sign up and start using it. By using AWS we can implement large memory, disk and CPU configurations for temporary use without having to own the hardware.
The second step will be to add tracking functions to support transaction pricing and other usage based pricing strategies. This seems to be the other major complaint against site installed systems.
This will also let us offer specific benchmark data comparing various cloud offerings. We continue to hear of major performance differences, but with no current way to validate how it affects Ancelus.
Stay tuned. This could be fun.
One of the common questions we get in our monthly webinars relates to the behavior of Hadoop in big data analytics. The root cause of the problem that Hadoop purports to solve is found in the nature of disk drives and how they interact with all structured data storage systems.
First some perspective on the origins of the Big Data discussion. The following chart shows the response characteristic of large data sets over the past 30 years. In the life of relational databases we have seen exponential growth in the storage density of disk drives. This has been followed by proportional increase in the size of large databases. Unfortunately the response time of these systems has degraded along the same exponential curve as storage density growth. The reason is that there has been almost no improvement in retrieval time from a disk. Storage density (and dataset size) has increased according to Moore's law. Retrieval time increases according to Newton's laws of motion. The crossover point about a decade ago marked an irreversible tipping point for all disk based systems.
To solve this problem the analytics industry has moved away from relational databases in favor of de-normalized storage structures like Hadoop and SAS. This approach eliminates the time consuming joins of the relational world, but does so at the expense of storage efficiency. This explodes the size of the stored image from duplicate data and sparse matrix issues (the reason for relational databases in the first place). The solution is at best temporary. In most cases it's a reversion to concepts of the late 1950s.
The entire design goal of the Ancelus database was to eliminate the need for these Hobson's choices. Extreme speed, extreme size, extreme complexity, extreme scaling, non-stop operation, live time-series pipelines, and much more. All in the same system, record-setting and patented system.
This question comes up often, so we probably need a better explanation.
The traditional list of database types generally includes the following:
A concise description of each can be found HERE. Each defines a different way of organizing data elements and relationships in a way that allows it to fit the two-dimensional structure of computer storage. In essence it is a model of the physical structure of the storage.
There is a new class of database that does away with the mapping and transforms:
This class uses the native logical model of information directly, without transform or mapping. The physical store is decoupled, abstract and unpredictable....
This subject is as old as multi-tasking computers. Ancelus boasts a 20 nanosecond lock-unlock time, so there should be few collisions. Coupled with the 100 nanosecond latency for atomic R/W it should be rare indeed. But that doesn't mean it can't happen or that there aren't some other "gocha's".
Stef Dawson is one of the most knowledgeable developers on the Ancelus platform. Based in the Midlands of the UK, Stef has worked with multiple versions of this special technology for over 20 years. He has a great short piece on some things to watch for. Check it out here:
In order to keep raw performance as high as possible, Ancelus takes over many OS functions and substitutes high performance algorithms. But when a spin lock times out the OS takes over and handles the queue as Stef describes. His solution is the preferred method, but it does require the developer to maintain "situational awareness" during development.
Another point of contention is the "kill" command. It should never be used in a database application since it might leave the lock structures in an unknown state. A potential repair utility is pending in the next few weeks, but it's better to avoid this brute force method of exiting a process thread....
This question came up in a recent meeting. The answer depends on the frame of reference.
In the logical sense, all databases are associative. It's what they do. The question usually arises when attempting to categorize various physical structures. In that sense Ancelus is not associative. It's worse than that since Ancelus has no physical storage model. It's essentially random.
The closest description of the Ancelus storage model is tokenized (as noted by Phillip Howard, Director of Research at Bloor Research). But even that isn't completely correct. The Ancelus "tokens" don't contain all the information to completely define context of the data. The Ancelus "hidden word" points to a three level indirection that dynamically changes and completes the cycle. This is more complex than simple tokens, but offer major advantages.
Bottom line: Ancelus is a new and unique. It eliminates any semblance of a pre-define storage structure. It uses fixed-time algorithmic methods to store and retrieve data. It eliminates the need to transform the natural structure of information, and instead logically stores information in its natural (list based) structure. Further the lists can be dynamically updated (inserts and deletes). In a key list an item can be inserted or deleted in the middle of the list if that's where the sorted order dictates.
You can probably see why we have such difficulty describing it. The IT community has no language for describing how Ancelus works under the hood. That's the penalty you pay when you ask a quantum physicist to solve the database problem. Spectacular results, but not using the half-century old concepts of structured data storage.
Everyone is now paying attention. Seems unfortunate that it took a hack in the entertainment industry to get awareness of this increasingly serious threat. But at least a discussion is now possible.
What started out as mischief, then morphed to commercial gain, has now transformed into malicious attacks doing damage in the 10s of millions.
The commercial hacks didn't stir attention because consumers didn't lose much money. The banks bore the loss. Now the threat could damage or bring down a large company. We're late to the game. What if the hack had brought down the power grid? Or the energy supply chain? Or the food supply chain?
Our supply chains are pretty efficient today. But that makes the economy vulnerable to an upset. We need to harden both the industrial controls and the communication networks that are the backbone of all supply chains.
The reason why these hacks are increasing is that the entire IT security scheme is built on assumptions that will ultimately fail. The methods of writing code must move away from the vision of a technology stack assembled for a task and toward an integrated structure with security built in at every level. Truly secure encryption at the deepest level, random key generation, no stored keys, no credentials stored in the clear, time-restricted credentials, deep data security - all are on the wish list. Some already exist....
Things don't always work as planned. Surprises from the TQL benchmark tests. It came in at 20% faster than the native API.
Took a while to figure out why, but it turns out to be in the format/screen print function. TQL is inherently more efficient than the API version because the latter renders the output in row/column format. Necessarily causes the massive data duplication built into the de-normalized format.
It looks like the non-formatted version (i.e. deliver the query result to a program) will be quite close to the same performance (API vs. TQL). Great news since it means that Dave got the core code structure right.
The important thing is that we have eliminated the massive overhead built into the table/SQL model of the relational calculus.
We set out to develop TQL to by- pass the inherent assumptions built into the SQL process. The performance issues with SQL are required for all table based structures, and also exist in some form for all non-table structured storage as well.
We will shortly find out if the project succeeded. Out billion row three table join benchmark is the acid test for ACID databases, and we'll run it next week.