Python is the preferred programming language in AI. However, most organizations can’t incorporate their Python developers into legacy data infrastructure. That means they miss out on pulling the benefits of AI into the org. However, there has been a lack of open source Python library for developers designed around AI workflows.
dltHun (short for data load tool), a Berlin-based startup, thinks it may have the solution. It’s building exactly that open source python library which, it claims, is designed for this new wave of AI.
The startup says its library will integrate into existing workflows, including Python workflows where previously data loading did not happen, such as a Google Colab notebook, an AWS Lambda function, an Airflow DAG or GPT-4-assisted docs or GPT-4 development playgrounds.
The startup has now raised $1.5 million in pre-seed funding from Dig Ventures, founded by Ross Mason who created the Mule Project and founded MuleSoft (MULE:NYSE). Joining the round are AI and enterprise founders from companies such as Hugging Face, Instana, Miro and Matillion.
CEO Matthaus Krzykowski told me via email: “Most of the GPT-4 apps that are showcased in the media are demoware. Users that try them abandon them quickly. Other AI tooling where VCs recently have been piling on a lot of money (vector databases / frameworks) have a lot of similar challenges.”
He says dlt now has a growing community of Python developers and is being “deployed in production in several scale-up tech companies,” including San Francisco-based software delivery company Harness, which we previously covered.
In a statement, Alexander Butler, senior data engineer at Harness, said: “Leveraging dlt has changed our data operations. It has… accelerated the pace of our DataOps team: we spend less time on the EL (extract and load) and more on the T whilst still being able to deeply customise our extractors as business requirements evolve.”
Julien Chaumond, CTO/co-founder at Hugging Face and an angel investor at dltHub added: “The current machine learning revolution has been enabled by the Cambrian explosion of Python open-source tools that have become so accessible that a wide range of practitioners can use them. As a simple-to-use Python library, dlt is the first tool that this new wave of people can use.”
In regards to potential competitors, Krzykowski admits that the startup “competes with ETL companies such Meltano, Stitch Data, Airbyte and to a lesser degree Fivetran.”
However, he says that “from a bigger picture we operate in the space of data warehouse companies such Snowflake, Databricks, Microsoft Fabric who also want to build to bring AI to the enterprise.”