Automatic email retrieval and SQL insertion with Perl
In the next lines you will see how to use Perl and
- read parameters from a config file
- retrieve all the e-mails from a pop3 account
- insert data in a remote MySQL database using the retrieved e-mails
I had to read some e-mails and insert data in a remote database every day all the time. When a standard procedure must be followed with no exceptions it is better to be done automatically. The "no exception" rule excludes the need of human involvement. So I decided to develop a small script and delegate this job to cron. I consider that the most appropriate language for cron jobs is Perl.
The script will be split in three parts.
In the first part , we will use a config file for the basic parameters. The user must not interfere in the main code but he must be able to change parameters such as the username, password etc without any possibility of causing troubles.
In the second part, the script will login to an e-mail account and retrieve all the e-mails. According to the content of the e-mails the MySQL commands will be created. If there are no e-mails then the program will stop. Due to the countless spam mails we will verify that the retrieved e-mail is appropriate. The spam mails will be deleted after the logout.
Last but not least, we will login to the remote database and execute the query we created based on the verified e-mails.Only after the query execution the e-mails will be deleted.
As usual we will have a log file.
Let's get to work now. We will use the Net::POP3 and DBI modules. To install them you can type
perl -MCPAN -e "install Net::POP3"
perl -MCPAN -e "install DBI"
If you try to install DBD::Mysql you may have a lot of "problems". The actual problem is that after the installation it runs a test that is using the user root (with no pass) to login in the database test and do some select, insert etc. In most cases you don't have such a user and database. The fact is that there is no problem with the module but with the test so don't mind for that.
#!/usr/bin/perl -w
use Net::POP3;
use DBI;
#declaring the name of the file
$INIFILE = "access.ini";
#opening the file
open(INIFILE) or die("Could not open log file.");
#initializing variables.
my ($pophost, $popuser, $poppass, $userfrom, $sqlhost, $sqluser, $sqlpass, $sqldb , $sqltable);
#Read some lines of the file
$pophost=<INIFILE>;
$popuser=<INIFILE>;
$poppass=<INIFILE>;
$userfrom=<INIFILE>;
$sqlhost=<INIFILE>;
$sqluser=<INIFILE>;
$sqlpass=<INIFILE>;
$sqldb=<INIFILE>;
$sqltable=<INIFILE>;
#You can add other parameters such as initial value of dbinsert, insertvalues etc
#Remove the \n of each file line
chomp($pophost, $popuser, $poppass,$userfrom, $sqlhost, $sqluser, $sqlpass,$sqldb , $sqltable);
#Close the config file
close(INIFILE);
#create the timestap for the log file
my $timedate;
my ($sec,$min,$hour,$mday,$mon,$year,$wday, $yday,$isdst)=localtime(time);
$year=$year+1900;
$mon=$mon+1;
$timedate= "$year-$mon-$mday $hour:$min:$sec";
#initializing the $dbinsert variable
my $dbinsert .= "INSERT INTO `$sqltable` ( ...Write your own staff ...) VALUES ";
#Count the real emails
my $inserts=0;
The first part has been completed. Now we will deal with the POP3 account.
# initiate connection
# default timeout = 120 sec
$conn = Net::POP3->new($pophost) or die("$timedate ERROR: Unable to connect.\n");
# login
$numMsg = $conn->login($popuser, $poppass) or die("$timedate ERROR: Unable to login.\n");
# display number of messages
if ($numMsg > 0) {
#You don't need the following line in the log file
print "$timedate Your mailbox has $numMsg message(s).\n\n";
} else {
die( "$timedate Your mailbox is empty.\n");
}
#Retrieve every e-mail one by one
foreach my $msg_id (1 .. $numMsg) {
my $header = $conn -> top($msg_id, 0);
#The headers are analyzed. For now we use the subject, from, status
my ($subject, $from, $status) = analyze_header($header);
#We use the sender to verify the e-mails.
if ($subject ne $from) {
#the spam mails will be deleted after the logout. For the moment they are marked
$pop3->delete($msg_id); # not really deleted until quit is called
}
else
{
#if not spam mail it is analyzed further
$msg = $conn->top($msg_id, 9999);
# grab the header fields of this message
$content ="";
$inHeader = 1;
for $line (@$msg)
{
#mails to be inserted
$inserts++;
chomp $line;
# A regular expression that replaces the ' with "
$line =~ s/.\'/\"/g;
#You can add any other regex you like in order to modify the strings
$content .= "$line\n" if (!$inHeader);
# header stops at first blank line
$inHeader = 0 if ($line =~ /^\s*$/);
}
#The content is gathered in order to be inserted in the database later on
$dbinsert .= "(...... '$subject', '$content', '$timedate', ......')";
#A problem appears if the query contains many single INSERT so we create one INSERT with many data
if ($numMsg>1 && $msg_id<$numMsg) {$dbinsert .= " , ";}
else { $dbinsert .= ";"; }
}
}
#If not all the e-mails were spam
if ($inserts<1) {
$conn->quit() or die("$timedate The pop3 quit was unsuccesfull.\n");
die("$timedate No mail to insert.\n");
}
Now we will connect to the remote database and execute one single INSERT query. If we had many INSERTs inside the query then a problem will appear.
$sqlconnect = DBI->connect("DBI:mysql:database=$sqldb;host=$sqlhost", $sqluser, $sqlpass, {RaiseError => 1});$statement = $sqlconnect->prepare($dbinsert);
$statement->execute() or die("$timedate ERROR: Unable to execute query.\n"); # close the POP3 connection. Now the e-mails will be really deleted
$conn->quit() or die("$timedate The pop3 quit was unsuccesfull");
#for the log file
printf "$timedate, e-mails=$numMsg, inserted=$inserts\n";
#A modified sub to analyze the headers
#Original from http://www.adp-gmbh.ch/perl/net/pop3.html
sub analyze_header {
my $header_array_ref = shift;
my $header = join "", @$header_array_ref;
my ($subject) = $header =~ /Subject: (.*)/m;
my ($from ) = $header =~ /From: (.*)/m;
my ($status ) = $header =~ /Status: (.*)/m;
if (defined $status) {
$status = "Unread" if $status eq 'O';
$status = "Read" if $status eq 'R';
$status = "Read" if $status eq 'RO';
$status = "Ne $status = "-";w" if $status eq 'NEW';
$status = "New" if $status eq 'U';
}
else {
$status = "-";
}
return ($subject, $from, $status);
}
The file is saved as popsqlcron.sh . After we exit the editor we must type
chmod u+x popsqlcron.sh
in order to make it executable. Now it is time to delegate the execution to cron. I will use cron every 30 minutes. Open the crontab with
crontab -e
and add the line
30 * * * * sh -c $'popsqlcron.sh >> popsqlcron.log'
And you are done!
Some comments about the program.As you can see the die(message) is after every important command. The reason is simple. We don't want to continue if one of these commands fails. For example if there is no message in the mailbox we don't have to continue to the database part.
Don't forget to write the access.ini file where you can pass the parameters you need for the pop3 and mysql login.
A nice tutorial for regex is here. You can add whatever you like to remove words/phrases, line breaks, spaces etc.
The Academic Site of Velonis I. Petros


Comments
For SPAM:These e-mails are not send in my regular e-mail account and I used a list with recognized senders. Otherwise I should use a different way.