My Data Tech Directory

DOWNLOAD THE TECH DIRECTORY:

Feel free to download the Markdown with all the data showed below at the directory. Enjoy!
Download the Directory

I Recently started up writing a directory table with data-related technologies. Due to my job, having a directory with all the hot data technologies I use was becoming a priority. So I wrote this table where you can sort technologies by their name, developer, license, or category and read a small summary about each technology.

Data Tech Directory

Technology	Developer	Licence	Category	Use
Airflow	Apache	Open Source	Workflow engine	workflow automation and cheduling system. Airflow is written in Python, and workflows are created via python scripts.
ADLS (Azure Data Lake Storage)	Microsoft	Commercial	Distributed File Storage	Fully managed, elastic, scalable and secure file system that supports DFS semantics and works with the pache Hadoop ecosystem.
Amazon Cloudsearch	Amazon	Commercial	Search Engine	Search large collections of data such as web pages, document files, forum posts, or product information.
Amazon Data Pipeline	Amazon	Commercial	Data ingestion	It is an ETL service that you can use to automate the movement and transformation of data. It launches an Amazon EMR cluster for each scheduled interval, submits jobs as steps to the cluster, and terminates the cluster after tasks have completed.
Amazon EC2	Amazon	Commercial	VM & Containers	Allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment of applications by providing a web service through which a user can boot an Amazon Machine Image (AMI) to configure a virtual machine, which Amazon calls an "instance", containing any software desired.
Amazon EMR	Amazon	Commercial	Data Processing	Amazon EMR uses Hadoop, to distribute your data and processing across a resizable cluster of Amazon EC2 instances.
Amazon GLUE	Amazon	Commercial	Data Processing	AWS Glue is a fully managed ETL service to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler.
Amazon Lambda	Amazon	Commercial	Data Processing	AWS Lambda was designed for use cases such as image or object uploads to Amazon S3, updates to DynamoDB tables, responding to website clicks, or reacting to sensor readings from an IoT connected device.
Amazon ML	Amazon	Commercial	ML & AI	A managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
Amazon Neptune	Amazon	Commercial	Data Storage	Is a managed graph Data Storage product published by Amazon.com. It is used as a web service and is part of Amazon Web Services (AWS).
Amazon SageMaker	Amazon	Commercial	ML & AI	A fully managed machine learning service. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers.
Ambari	Apache	Open Source	Cluster Management	Ambari enables system administrators to provision, manage and monitor a Hadoop cluster, and also to integrate Hadoop with the existing enterprise infrastructure.
Anaconda	Anaconda	Commercial	Frameworks	Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.
APIGee	Google	Commercial	REST API	It was an API management and predictive analytics software provider before its merger into Google Cloud.
Arrow	Apache	Open Source	Data Processing	Software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware.
Athena	Amazon	Commercial	Query Engine	Amazon Athena is an interactive query service to query data and analyze big data in Amazon S3 using standard SQL. Athena uses Presto, a distributed SQL engine to run queries. It also uses Apache Hive to create, drop, and alter tables and partitions.
Atlas	Apache	Open Source	Data Catalog	An enterprise-scale data governance and metadata framework for Hadoop. Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets.
Aurora	Apache	Open Source	Workflow engine	A Mesos framework for both long-running services and cron jobs, originally developed by Twitter starting in 2010 and open sourced in late 2013.
Avro	Apache	Open Source	Data Format	Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
BigML	BigML	Commercial	ML & AI	BigML is a consumable, programmable, and scalable Machine Learning platform that makes it easy to solve and automate Classification, Regression, Time Series Forecasting, Cluster Analysis, Anomaly Detection, Association Discovery, and Topic Modeling tasks.
BigQuery	Google	Commercial	Query Engine	BigQuery leverages the columnar storage format and compression algorithm to store data in Colossus, optimized for reading large amounts of structured data. BigQuery presents data in tables, rows, and columns and provides full support for Data Storage transaction semantics (ACID).
Cassandra	Apache	Open Source	Data Storage	A distributed, wide-column store, NoSQL Data Storage management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
Chronos	Apache	Open Source	Workflow engine	A distributed cron-like system which is elastic and capable of expressing dependencies between jobs.
Cloudera	Cloudera	Commercial	Data Platform	Plataforma BigData para clusters On premise. Almacenamiento: sistema de archivos HDFS. Procesamiento: MapReduce, Hive, Spark...
Cosmos DB	Miscrosoft	Commercial	Data Storage	A fully managed NoSQL Data Storage for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee speed at any scale.
Data Factory	Microsoft	Commercial	Data Processing	It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores.
Databricks	Databricks	Commercial	Data Platform	Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers, and data analysts with a simple collaborative environment to run interactive, and scheduled data analysis workloads.
Databricks SQL	Databricks	Commercial	Query Engine	Databricks SQL provides a simple and secure access to data, ability to create or reuse SQL queries to analyze the data that sits directly on your data lake.
Django	Django	Open Source	Framework	A high-level Python web full stack framework that encourages rapid development and clean, pragmatic design. Django is a collection of Python libs allowing you to quickly and efficiently create a quality Web application, and is suitable for both frontend and backend.
Docker	Apache	Open Source	Container Platform	A set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.
Docker Swarm	Apache	Open Source	Container Platform	A container orchestration tool, meaning that it allows the user to manage multiple containers deployed across multiple host machines. One of the key benefits associated with the operation of a docker swarm is the high level of availability offered for applications.
DocumentDB	Amazon	Commercial	Data Storage	A managed proprietary NoSQL Data Storage service that supports document data structures and has limited support for MongoDB
Drill	Apache	Open Source	Query Engine	Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores. Users can query the data using standard SQL and BI tools without having to create and manage schemas.
DynamoDB	Amazon	Commercial	Data Storage	A fully managed proprietary NoSQL Data Storage service that supports key–value and document data structures and is offered by Amazon.
Elastic Search	Apache	Open Source	Search Engine	A search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
Event Hubs	Microsoft	Commercial	Data Processing	A big data streaming platform and event ingestion service. It can receive and process millions of events per second. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.
Falcon	Apache	Commercial	Data Processing	A feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.
Flask	Armin Ronacher	Open Source	Framework	A micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries.
Flink	Apache	Open Source	Data Processing	A unified stream-processing and batch-processing framework. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner.
Flume	Apache	Open Source	Data Processing	A distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. It is used to collect log data present in log files from web servers and aggregating it into HDFS for analysis.
Grafana	Apache	Open Source	Data Visualization	Grafana is a multi-platform analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.
GraphDB	Ontotext	Commercial	Data Storage	An online Data Storage management system with Create, Read, Update and Delete (CRUD) operations working on a graph data model.
Graphite	Apache	Open Source	Data Visualization	A tool that monitors and graphs numeric time-series data such as the performance of computer systems. It collects, stores, and displays time-series data in real time.
Hadoop Yarn	Apache	Open Source	Cluster Management	YARN sits between HDFS and the processing engines being used to run applications. It combines a central resource manager with containers, application coordinators and node-level agents that monitor processing operations in individual cluster nodes.
HBase	Apache	Open Source	Data Storage	HBase is a column-oriented non-relational Data Storage management system that runs on top of Hadoop Distributed File System (HDFS). HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases.
HDFS	Apache	Open Source	Data Format	Hadoop Distributed File System is a distributed, scalable, and portable file system written in Java for the Hadoop framework.
HDInsight	Microsoft	Commercial	Cluster Management	Azure HDInsight is a cloud distribution of the Hadoop components from the Hortonworks Data Platform (HDP). It makes easy, fast, and cost-effective to process massive amounts of data.
Hive	Apache	Open Source	Query Engine	An open source data warehouse software for reading, writing and managing large data set files that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems such as Apache Hbase or Amazon S3.
Hortonworks	Cloudera	Open Source	Data Platform	A security-rich, enterprise-ready, open source Apache Hadoop distribution based on a centralized architecture (YARN), with Flow Management, Stream Processing, and Management Services components.
Hudi	Apache	Open Source	Data Processing	A data management framework used to simplify incremental Data Processing and data pipeline development.
Hue	Apache	Open Source	Query Engine	Hue provides a web user interface along with the file path to browse HDFS. The most important features of Hue are Job browser, Hadoop shell, User admin permissions, Impala editor, HDFS file browser, Pig editor, Hive editor, Ozzie web interface, and Hadoop API Access.
Iceberg	Apache	Open Source	Data Format	Apache Iceberg is a new table format for storing large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.
Ignite	Apache	Open Source	Data Storage	A distributed in-memory Data Storage that scales horizontally across memory and disk tiers. Ignite supports ACID transactions, ANSI-99 SQL, key-value, compute, machine learning, and other data processing APIs.
Impala	Apache	Open Source	Query Engine	A SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala brings scalable parallel Data Storage technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation.
Jupyter	Jupyter	Open Source	Notebook	An interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document.
Kafka	Apache	Open Source	Data ingestion	Necesita un volcado periódico a sistemas tipo Hadoop o DWH. Se combina con Storm. Usa Zookeeper.
Kafka Streams	Apache	Open Source	Data Processing	Kafka Streams is a client library for building streaming applications, specifically applications that transform input Kafka topics into output Kafka topics stored in an Apache Kafka cluster. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side.
Kibana	Elastic	Kibana	Data Visualization	A data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.
Kinesis	Amazon	Commercial	Data Processing	A managed, scalable, AWS cloud-based service that allows real-time processing of streaming large amount of data per second. It is designed for real-time applications and allows developers to take in any amount of data from several sources, scaling up and down that can be run on EC2 instances.
Kubernetes	Apache	Open Source	Container Platform	An open-source container orchestration platform that enables the operation of an elastic web server framework for cloud applications. Starting with a collection of Docker containers, Kubernetes can control resource allocation and traffic management for cloud applications and microservices.
Livy	Apache	Open Source	Rest API	A service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or an RPC client library.
Lucene	Apache	Open Source	Search Engine	An inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast.
Machine Learning Studio	Microsoft	Commercial	ML & AI	A cloud-based service used to build, test and deploy predictive analytics solutions based on your data. Machine Learning Studio(MLS) is a drag-and-drop tool that can be used to build ML models and publish them as web services.
Marathon	Apache	Open Source	Container Platform	Platform as a service or container orchestration system scaling to thousands of physical servers. It is fully REST-based and allows canary-style deployments and deployment topologies. It is written in the programming language Scala.
MariaDB	MariaDB	Open Source	Data Storage	An open source relational Data Storage management system (DBMS) that is a compatible drop-in replacement for the widely used MySQL Data Storage technology.
Mesos	Apache	Open Source	Cluster Management	A cluster manager that handles workloads in a distributed environment through dynamic resource sharing and isolation. Mesos is suited for the deployment and management of applications in large-scale clustered environments.
Microsoft SQL Server	Microsoft	Commercial	Data Base	A relational Data Storage management system developed by Microsoft. As a Data Storage server, it is a software product with the primary function of storing and retrieving data as requested by other software applications.
MongoDB	MongoDB	Open Source	Data Storage	A document-oriented Data Storage which stores data in JSON-like documents with dynamic schema. It means you can store your records without worrying about the data structure such as the number of fields or types of fields to store values. MongoDB documents are similar to JSON objects.
MySQL	Oracle	Open Source	Data Storage	A relational Data Storage management system based on SQL. The application is used for a wide range of purposes, including data warehousing, e-commerce, and logging applications. The most common use for mySQL however, is for the purpose of a web Data Storage.
Neo4j	Neo Technology	Commercial	Data Storage	A graph Data Storage management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional Data Storage with native graph storage and processing.
NiFi	Apache	Open Source	Data ingestion	An open source software for automating and managing the data flow between systems. It is a powerful and reliable system to process and distribute data. It provides web-based User Interface to create, monitor, and control data flows.
NuoDB	Nimbus DB	Commercial	Data Storage	A distributed relational Data Storage management system. Unlike traditional shared-disk or shared-nothing architectures, NuoDB uses a peer-to-peer messaging protocol to route queries to nodes. NuoDB splits its architecture into two layers: a transactional tier and a storage tier.
Nutch	Apache	Open Source	Search Engine	A web crawler software product that can be used to aggregate data from the web. It is used in conjunction with other Apache tools, such as Hadoop, for data analysis. Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations.
Oozie	Apache	Open Source	Workflow jobs	A workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
Openshift	Apache	Open Source	Container Platform	A Kubernetes distribution that helps you to develop, deploy, and manage container-based applications. It provides you with a self-service platform to create, modify, and deploy applications on demand, thus enabling faster development and release life cycles.
Oracle Data Storage	Oracle	Commercial	Data Storage	A multi-model relational Data Storage management system, mainly designed for enterprise grid computing and data warehousing.
Oracle NoSQL	Oracle	Commercial	Data Storage	A NoSQL-type distributed key-value Data Storage from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.
ORC	Apache	Open Source	Data Format	A column-oriented data storage format of the Apache Hadoop ecosystem. It provides a highly efficient way to store Hive data and was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.
Parquet	Apache	Open Source	Data Format	A column-oriented data storage format of the Apache Hadoop ecosystem. Reads and querying are much more efficient than writing. Better optimized for Apache Spark.
Podman	RedHat	Open Source	Container Platform	A daemonless, Linux native tool designed to make it easy to find, run, build, share and deploy applications using Open Containers Initiative (OCI) Containers and Container Images. Podman provides a command line interface (CLI) familiar to anyone who has used the Docker Container Engine.
PostgreSQL	PostgreSQL	Open Source	Data Storage	An advanced, enterprise-class, and open-source relational Data Storage system. PostgreSQL supports both SQL (relational) and JSON (non-relational) querying. PostgreSQL is used as a primary Data Storage for many web applications as well as mobile and analytics applications.
PowerBI	Microsoft	Commercial	Data Visualization	A suite of business intelligence (BI), reporting, and data visualization products and services for individuals and teams. Power BI stands out with streamlined publication and distribution capabilities, as well as integration with other Microsoft products and services.
Presto	Apache	Open Source	Query Engine	A distributed SQL query engine that is used best for running interactive analytic workloads in your big data environment. Presto allows you to query against many different data sources whether its HDFS, MySQL, Cassandra, or Hive.
Purview	Microsoft	Open Source	Data Governance	A unified data governance service that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage.
Pycharm	jetbrains	Commercial	IDE	A dedicated Python Integrated Development Environment (IDE) providing a wide range of essential tools for Python developers, tightly integrated to create a convenient environment for productive Python, web, and data science development.
Qlik Sense	Qlik Tech	Commercial	Data Visualization	A self-service analytical tool based on the same in-memory technology as QlikView. It's associative engine allows for snappy selections, filtering and prompt re-calculation of all charts and aggregations on the fly.
Qlik View	Qlik Tech	Commercial	Data Visualization	A traditional, technical tool for shared business intelligence, data analytics and reporting.
Rabbit MQ	Pivotal	Open Source	Message Broker	A messaging broker - an intermediary for messaging. It gives your applications a common platform to send and receive messages, and your messages a safe place to live until received. It is also used between microservices, where it serves as a means of communicating between applications.
Ranger	Apache	Open Source	Data Governance	A framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture.
Redshift	Amazon	Commercial	Data Processing	A managed service provided by Amazon. Raw data flows into Redshift (ETL), where it’s converted and transformed at a regular cadence, or on an ad hoc basis. It is designed to crunch large amounts of data as a data warehouse.
Rekognition	Amazon	Commercial	Data Processing	A cloud-based software as a service (SaaS) computer vision platform, that automatically extracts metadata from your image and video files, capturing objects, faces, text and more. This metadata can be used to easily search your images and videos with keywords, or to find the right assets for content syndication.
Rstudio	Rstudio	Open Source	IDE	An Integrated Development Environment (IDE) for R, a programming language for statistical computing and graphics.
S3	Amazon	Commercial	Data Storage	A cloud IaaS (infrastructure as a service) solution from AWS for object storage via a convenient web-based interface. The basic storage unit of S3 is the "object", which consists of a file with an associated ID number and metadata. These objects are stored in buckets, which function similarly to folders or directories. S3 scales vertically and automatically according to your current data usage, without any need for action on your part.
SageMaker	Amazon	Commercial	ML & AI	A fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker includes modules that can be used together or independently to build, train, and deploy your machine learning models.
Snowflake	Snowflake	Commercial	Data Storage	A data warehouse built on top of the Amazon Web Services or Microsoft Azure cloud infrastructure. Its architecture allows storage and compute to scale independently, so customers can use and pay for storage and computation separately. And the sharing functionality makes it easy for organizations to quickly share governed and secure data in real time.
SolR	Apache	Open Source	Search Engine	Solr performs text analysis on certain content and search queries in order to determine similar words, understand and match synonyms, remove syncategorematic words, and score each result based on how well it matches the query. It is built on top of lucene to provide a search platform. SOLR is a wrapper over Lucene index.
Spark	Apache	Open Source	Data Processing	An open-source, distributed processing engine used for big data workloads and compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat.
Spark SQL	Apache	Open Source	Query Engine	A Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. It enables unmodified Hadoop Hive queries to run up to 100x faster on existing deployments and data.
Spark Streaming	Apache	Open Source	Data Processing	An extension of the core Spark API that allows to process real-time data from various sources including Kafka, Flume, and Amazon Kinesis. This processed data can be pushed out to file systems, databases, and live dashboards. It provides us the DStream API which is powered by Spark RDDs.
Spark Structured Streaming	Apache	Open Source	Data Processing	This model of streaming is based on Dataframe and Dataset APIs. Hence with this library, we can easily apply any SQL query (using DataFrame API) or scala operations (using DataSet API) on streaming data. In Structured streaming, there is no concept of a batch. The received data in a trigger is appended to the continuously flowing data stream.
Spyder	Anaconda	Commercial	IDE	An open-source cross-platform IDE that is included with Anaconda. The Python Spyder IDE is written completely in Python.
Sqoop	Apache	Open Source	Data ingestion	A tool designed for efficiently transferring bulk data between Apache Hadoop and external datastores such as relational databases, enterprise data warehouses. Sqoop is used to import data from external datastores into Hadoop Distributed File System or related Hadoop eco-systems like Hive and HBase.
Stinger	Stinger	Open Source	Data Processing	A package designed to support streaming graph analytics by using in-memory parallel computation to accelerate the computation. STINGER is composed of the core data structure and the STINGER server, algorithms, and an RPC server that can be used to run queries and serve visualizations.
Storm	Apache	Open Source	Data Processing	A free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm runs on YARN and integrates perfectly with the Hadoop ecosystem.
Synapse Analytics	Microsoft	Commercial	Data Platform	A cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. Use Azure as a key component of a big data solution.
Tableu	Salesforce	Commercial	Data Visualization	A powerful and fastest growing data visualization tool used in the Business Intelligence Industry. It helps in simplifying raw data in a very easily understandable format.
Talend Data Platform	Talend	Commercial	Data Processing	A data integration solution helps companies deal with growing system complexities by addressing both ETL for analytics and ETL for operational integration needs and offering industrialization features and extended monitoring capabilities.
Thrift	Apache	Open Source	Framework	An interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It forms a remote procedure call (RPC) framework and was developed at Facebook for "scalable cross-language services development".
Watson Studio	IBM	Commercial	Data Platform	A platform to build, run and manage AI models, and optimize decisions anywhere on IBM Cloud Pak® for Data. Unite teams, automate AI lifecycles and speed time to value on an open multicloud architecture.
Zeppelin	Apache	Open Source	Notebook	Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
ZooKeeper	Apache	Open Source	Cluster Management	An open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

My Data Tech Directory

DOWNLOAD THE TECH DIRECTORY:

Data Tech Directory

Other references and links