![]() One of the key advantages of Apache Drill is its flexibility. It also supports accessing data over variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, SMB and local files. When the individual nodes finish their execution they will return the data to the driving Drillbit and from there the results are streamed back to the client.ĭrill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments. It will determine the appropriate nodes to execute various query plan fragments to maximize data locality. The Drillbit that accepts the query becomes the driving Drillbit node for just that request. Drillbit then parse the query, optimize it, and generate a distributed query plan that, most of the time, is optimized for fast and efficient execution. When a Drill client issues a query (using JDBC, ODBC, a command line interface or the REST API) any Drillbit service in the cluster can accept the query. It can run from just one node to hundreds/thousands of them coordinated using Zookeeper. ![]() It is designed to be scalable, flexible, and efficient, allowing us to quickly and easily access and analyze data from a wide variety of sources. ![]() We can define Drill as an open-source distributed SQL query engine inspired by Google Dremel/BigQuery which is able to run distributed queries over large-scale datasets. Even if it can be considered a bit legacy and “not so cool” nowadays (there are so many MPP open source alternatives with SQL dialects.) Drill is still a good piece of software. A few days ago I had some exposure to Apache Drill and I found it somehow interesting.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |