Beyond Your Own Data: Why Public Data is Essential for Complete Business Insight

Vidhi Chugh
7 min readJun 18, 2024

--

Reference from “State of Public Web Data Report 2024

The Rise of Data-Driven World

The onset of the pandemic has given a new dimension to the term “digital transformation” — it has altogether accelerated the rate of digitalization adopted by many industries. The impact was not just limited to the enterprises but was equally seen among customers, as witnessed in increased user footfall into the digital properties of businesses. Inevitably, the data has become a key enabler in understanding ever-evolving user behavior.

Notably, the industry has moved away from the first-principles approach and has seen increased adoption of data-driven frameworks to better understand their customer preferences. There is an increased focus on adopting data-centric measures to analyze market trends, bring operational efficiencies, and make informed strategic decisions.

Leveraging data is every business’s agenda

So, what are the various ways organizations can monetize their data assets to drive business intelligence and innovation?

Source: Adapted from Bright Data’s report on public web data

The State of Public Web Data Report 2024 highlights various ways businesses can make data-informed decisions, including but not limited to:

  • Sentiment analysis by analyzing social media discussions, reviews, and search trends
  • Competitive intelligence by observing market trends to create a differentiator
  • Optimizing pricing strategies through competitors’ analysis, and more.

The question then arises — where to get access to such data? The easier approach is to gather insights from in-house traditional data repositories. However, solely relying on these datasets may limit the depth of the analysis.

Why do we need public web data?

So, one thing is clear. Just having data is not enough, it must be exhaustive enough to provide meaningful insights.

But, where would such exhaustive data come from? Think of public data i.e. the data that is not behind any log-in — it could be from social media platforms and other websites, which when integrated with enterprise data provides crucial business context and real-time insights.

Interestingly, 89% of the respondents from the report recognize the contribution of such public web data to the global economy.

Manufacturing and Transportation

Speaking from my experience working with a manufacturing and transportation company, one of their key objectives was to deliver goods to customers on time. We built an AI model to predict the lead time to compute the expected time to ship, i.e. ETA.

However, myriad challenges such as high-traffic conditions or accident-prone routes disrupted the ETA, resulting in revision in shipping timelines.

The situation became concerning when frequent delays in expected shipping times resulted in a breach of SLAs, putting customers’ future contracts at risk.

As is a common practice in any AI project, we investigated the reasons behind the models’ inability to accurately predict the ETA and found out that the model performance improved significantly by including real-time traffic conditions through public web data.

It is not just specific to our business problem. As per the report, 56% of organizations would use additional public web data to enhance current AI models or start a new AI program.

What’s noteworthy is that it not only led to a happy customer but also saved fuel costs by creating operational efficiencies — a win-win for us.

And, it did not just end here. Having seen the first-hand results of improved ETA prediction by including real-traffic data, our second phase of AI model iteration included a focus on incorporating weather forecasts (for situations like storms, heavy rain, or snow) and geopolitical events. The inclusion of such public web data further improved the model’s accuracy, creating a significant business impact.

Retrospectively, I believe having a public web data strategy would have made our efforts much easier. However, according to the State of Public Web Data Report 2024, the industry has come a long way: 90% of organizations have plans to collect and use public web data.

Sentiment Analysis

Customer-centric businesses prioritize their customers’ opinions and feedback, which is adequately reflected in customer reviews, ratings, and comments on various aspects of products or services.

Even the number of likes, shares, and mentions are very strong indicators of how their customers perceive the product. Organizations analyze such signals, especially the ones highlighting complaints, feedback, and suggestions regarding customer service to iron out the existing issues and gain customer trust.

Such data is a gold mine for any business, highlighting the need to leverage public web data, to capture the full spectrum of customer sentiments and behaviors openly available on social media.

Source: brightdata.com

And, it is not a one-time process. Businesses act on the latest real-time information to drive decisions and seldom use delayed or outdated insights as they do not help much in the fast-moving world.

Hence, the comprehensive, real-time, and diverse nature of public web data, allows organizations to stay agile, informed, and competitive in today’s fast-paced business environment.

In addition to optimizing the shipping process and better-understanding customer behavior, there are various ways organizations can leverage public web data, as shown below:

Great, I hope these use cases bring forth the power of public web data that augment the existing organizational datasets, providing deeper and richer insights.

But, then the next question that arises immediately is — why can’t the organizations get access to public data by themselves? What’s stopping them from leveraging this data?

Let’s find out!!!

If only, extracting meaningful public data was so simple!!!

Considering the need for continuous real-time or periodic data updates, organizations often face several challenges accessing public web data on their own, such as:

  • Building web scraping tools to extract data from websites is technically challenging, especially when websites have dynamic content or encounter CAPTCHA.
  • While building scraping tools is a non-trivial effort, maintaining them adds to the woes, for the cases when websites frequently update their content dynamically.
  • Scaling data operations aka scraping the data at scale becomes a challenge as websites may block or ban IP addresses sending too many requests.
  • Organizations do not just need data, they require reliable data to derive trustworthy results. I have first-hand seen errors in extracted data that immediately erode the trust among business leaders. Hence, it is suggested to work with an accountability partner to ensure the accuracy and reliability of extracted data.

Now, organizations end up making significant investments in infrastructure and expertise to handle large volumes of data, but should that be their core focus?

A better option would be to partner with a trusted and reliable public web data provider to facilitate data operations at scale. The next section covers details on the organizational strategy.

What’s your core focus?

Now that we know how resource-intensive public data extraction is — be it in terms of time, skills, and technology infrastructure.

Undoubtedly, specialized tools, technologies, and expertise are necessary to ensure accurate and timely data collection, however, it is often not the core focus of the business leaders (for example, the manufacturing and transportation enterprise aiming to timely deliver shipments).

Hence, it becomes easier to outsource public data extraction to specialized vendors with the necessary infrastructure and skills to extract and process the public data. Among several benefits as shown in the image above, such outsourcing also allows businesses to innovate and benefit from high-quality data insights without needing to invest in building these capabilities in-house.

Talking about investments, 93% of the organizations have, in fact, increased their budgets for public web data collection in 2024.

Not to forget that these vendors offer scalable services and are well-versed in compliance requirements and data security protocols, reducing the risk of legal issues.

Bright Data — leading public data vendor

When talking about core business focus, think of how easy it becomes for a company to simply focus on driving innovation that helps it stay ahead of the competition and adapt to changing market dynamics.

Hence, partnering with reputable, trusted, and reliable web data providers such as Bright Data can cater to continuous real-time or periodic data requirements.

By offloading the public data requirements to Bright Data, which is the most technologically advanced company in the market with its “unstoppable” proxy network and tools, organizations can tap into their full potential, drive innovation, and gain a competitive edge in their respective industries.

--

--

Vidhi Chugh

Data Transformist and AI Strategist | International Speaker | AI Ethicist and Data-Centric Scientist | Global Woman Achiever https://allaboutscale.com/