
In my previous blog, I covered:
👉 From LLMs to Agents: Build Smart AI Systems with Tools in LangChain
We learned how to:
- build custom tools
- create AI agents
- fetch real-world data
🔥 What’s Next?
Now let’s take it further.
👉 Instead of just querying tools, we will make AI work with real data sources:
In this blog, we will learn:
- 📄 Load and analyze text files
- 📊 Process CSV data
- 🌐 Fetch and analyze web URLs (web scraping)
- ⚡ Optimize using semantic search (vector DB)
📄 1. Load Text File Using TextLoader
We can directly load a *.txt file into LangChain:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("tata_motors.txt")
docs = loader.load()
docs
Output
👉 This converts your text file into structured documents.

Add Queries to fetch result from .txt file
from langchain_community.document_loaders import TextLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
loader = TextLoader("tata_motors.txt", encoding="utf-8")
docs = loader.load()
# Combine all texts into one single string
context = "\n\n".join(doc.page_content for doc in docs)
# Ask Questions
query = """
How much worth Tata Motors has provided on behalf of its Singapore holding company?
"""
prompt = ChatPromptTemplate.from_template("""
You are a stock research assistant.
Use only the context below.
Do not invent missing values.
User query:
{query}
Context:
{context}
""")
chain = prompt | llm
response = chain.invoke({
"query": query,
"context": context
})
print(response.content)
Output
📊 2. Load CSV Data Using CSVLoader
from langchain_community.document_loaders import CSVLoader
loader = CSVLoader("cars.csv")
data = loader.load()
data
Output
👉 cars.csv file contents
You can also use Pandas for better control:
pip install -U pandas
👉 This allows LLMs to behave like a data analyst on your CSV.
import os
import pandas as pd
from langchain_openai import ChatOpenAI
df = pd.read_csv("cars.csv")
question = "List the cars within 10 Lakhs budget?"
csv_text = df.to_string(index=False)
prompt = f"""
You are answering questions from this CSV data.
CSV data:
{csv_text}
Question:
{question}
Answer clearly using only the CSV data.
"""
response = llm.invoke(prompt)
print(response.content)
Output
🌐 3. Load URLs & Perform Web Scraping
Now comes the powerful part.
pip install -U unstructured
👉 LLM will read web content and generate structured analysis.
from langchain_community.document_loaders import UnstructuredURLLoader
urls = [
"https://www.tickertape.in/stocks/tata-motors-TMC",
"https://groww.in/stocks/tata-motors-ltd",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
query = """
Analyze valuation, profitability, entry point, red flags,
and overall whether Tata Motors stock looks attractive.
"""
prompt = f"""
You are a stock research assistant.
Use only the context below. Do not invent missing values.
User query:
{query}
Return the answer in this exact format:
# Tata Motors Stock Analysis
## 1. Quick View
- Overall view:
- Reason:
## 2. Key Metrics Found
| Metric | Value | Interpretation |
|---|---:|---|
| Market Cap | | |
| PE Ratio | | |
| PB Ratio | | |
| Dividend Yield | | |
| Risk / Volatility | | |
| Red Flags | | |
## 3. Valuation
## 4. Profitability / Quality
## 5. Entry Point
## 6. Red Flags / Risks
## 7. Final Tentative View
"""
response = llm.invoke(prompt)
print(response.content)
Output
⚠️ Problem: Slow Performance
If you load many URLs:
- ⏳ Processing becomes slow
- 📉 Context becomes too large
- 💸 Cost increases
- ⚡ Solution: Semantic Search (Vector DB)
Instead of passing all data, we:
- Split content into chunks
- Convert into embeddings
- Store in vector DB
- Retrieve only relevant data
⚡ 4. Optimize using semantic search (vector DB)
from langchain_community.document_loaders import UnstructuredURLLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
urls = [
"https://www.tickertape.in/stocks/tata-motors-TMC",
"https://groww.in/stocks/tata-motors-ltd",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
# Step 1: Split into Chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)
# Step 2: Create Embeddings + Store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_db = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_url_db"
)
# Step 3: Retrieve Relevant Data
retriever = vector_db.as_retriever(search_kwargs={"k": 4})
retrieved_docs = retriever.invoke(query)
context = "\n\n".join(
doc.page_content for doc in retrieved_docs
)
query = """
Analyze valuation, profitability, entry point, red flags,
and overall whether Tata Motors stock looks attractive.
"""
prompt = ChatPromptTemplate.from_template("""
You are a stock research assistant.
Use only the context below. Do not invent missing values.
User query:
{query}
Context:
{context}
Return the answer in this exact format:
# Tata Motors Stock Analysis
## 1. Quick View
- Overall view:
- Reason:
## 2. Key Metrics Found
| Metric | Value | Interpretation |
|---|---:|---|
| Market Cap | | |
| PE Ratio | | |
| PB Ratio | | |
| Dividend Yield | | |
| Risk / Volatility | | |
| Red Flags | | |
## 3. Valuation
## 4. Profitability / Quality
## 5. Entry Point
## 6. Red Flags / Risks
## 7. Final Tentative View
""")
context = "\n\n".join(
doc.page_content for doc in retrieved_docs
)
# Step 4: Final Analysis
chain = prompt | llm
response = chain.invoke({
"query": query,
"context": context,
})
print(response.content)
Output
🚀 What You Learned
In this blog, we moved from:
👉 AI Agents → AI + Data Intelligence
You learned how to:
- Load text and CSV data
- Scrape and analyze web content
- Handle large data efficiently
- Use vector databases for semantic search
United States
NORTH AMERICA
Related News
UCP Variant Data: The #1 Reason Agent Checkouts Fail
7h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago
How Braze’s CTO is rethinking engineering for the agentic area
10h ago

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)
17h ago
How AI Reduced Manual Driver Verification by 75% — Operations Case Study. Part 2
4h ago





