Beautiful Soup
Beautiful soup is indeed beautiful!
I wanted to parse an HTML page containing a table and import it into a MySQL table in an automated way. Upon my friend Kumar’s advice, I came to know about Beautiful Soup. Today was the day to explore Beautiful Soup. Being new to python, I had to do a bit of python reading side-by-side. Finally, I was able to successfully pass an HTML file to my script and get a CSV output.
f = open("input_file.html","r")
g = open("outfile_file.csv,"w")
soup = BeautifulSoup(f)
t = soup.findAll('table')
for table in t:
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
g.write(td.find(text=True))
g.write(",")
g.write("\n")
This script parses a simple HTML table without looking for any special tags or anything. Now that this is working, I have to make this more stronger and parse an uglier table, my task for tomorrow.
Prasanna Ramaswamy is doing his B Tech in IIT Madras. He likes his music, photography, astronomy and uses GNU/Linux.
Naveen 9:28 pm on May 28, 2008 Permalink |
Awesome… Beautiful soup is really amazing!
A very nice idea too…
Continue writing such posts…
Akarsh Simha 10:14 pm on May 28, 2008 Permalink |
Amazing! I didn’t know it would be THAT simple. I like the for…in funda of python a lot for its simplicity.
Kumar Appaiah 6:31 pm on May 30, 2008 Permalink |
I’m glad you’ve also joined the “Python for making life simple” bandwagon!
Prasanna 4:41 pm on May 31, 2008 Permalink |
@Naveen
Thanks
@Akarsh
Yes. That’s true
@Kumar
Thank you for making me dive in
Recent Links Tagged With "beautifulsoup" - JabberTags 8:19 pm on October 19, 2008 Permalink |
[...] public links >> beautifulsoup Beautiful Soup Saved by davepro14 on Sat 18-10-2008 Another project completed Saved by yiheng on Thu 09-10-2008 [...]
The Beautiful Soup | Marketcalls 8:32 pm on April 18, 2010 Permalink |
[...] web application. After a search of nearly about a year unfortunately the search ends today @ Prasanna’s Blog.Beautiful Soup is a Python HTML/XML i.e it pick data’s from a table which is updating every [...]