3. A)
Observations and Explanations:
There were numerous objects downloaded from both nytimes.com and vox.com. But vox.com had a lot of image files to download and thus the total size of objects downloaded from vox.com (around 25 MB) is very large compared to nytimes.com
(2 MB).
Considering vox.com: o Almost all the image files were downloaded from hostname: o o
o
o
cdn0.vox-cdn.com, cdn1.vox-cdn.com, cdn2.vox-cdn.com & cdn3.vox-cdn.com. Out of these four, three of them (0, 2, 3) had a common IP address. Thus most of the image traffic was sent to a particular server.
(After analyzing the pcap file--) All the four hostnames are the alias of another domain namely
“ddrgqsxlcy7wq.cloudfront.net”. …show more content…
Thus these resource record in the DNS are CNAME (canonical name) type record.
Although this particular domain had multiple IP addresses assigned (to balance the load may be, as most of the query done on this domain are for images which are huge in size and thus can slow down the network), the DNS response for three (cdn0, cdn2, cdn3) returned the same IP (54.230.0.10) most of the times (was diff. for cdn2 & cdn0 in few cases) while it was different for cdn1.
Ideally it should have been that DNS followed the Round
Robin configuration for returning the IP addresses so that
there was no risk of skewing the load between target servers. Can also in a way help in fault-tolerance on network systems. o For cdn1 two different IPs (54.230.0.69 & 54.230.0.238) were returned. Both had equal load as 3 objects …show more content…
were downloaded from one and rest three form other. This is an example of load balancing by using Round Robin configuration in DNS response. o Note: The table would have multiple entries if there are multiple IPs returned for same domain (like for cdn1). This is done just so that it can be inferred as to which all domains have been assigned multiple IPs and the distribution of objects downloaded from these domains.
In the Screenshot above, cdn2 has a different IP
(54.230.2.234) in one case and cdn0 also has a different IP
(54.230.3.190) for three different objects.
o Now, because we parse har file w.r.t host name and pcap
w.r.t IP, all three of cdn0, cdn2, cdn3 have same TCP connections in the table as they have same IPs.
o Better way to read this would be to consider only non-zero download size connections for each of these three domains.
Haven’t implemented it to avoid irregularity in the table. It sort of gets implemented while making the download tree, so no problems there. o As expected, the connections are exhaustive and no two domains have same tuple of (src.port and dst.port).
o Any inconsistency in the table is mostly due to mismatches in har file and the pcap file.
E.g. 1:- total size of objects downloaded according to data of Wireshark is less for ping.chartbeat.com. The har file recorded 4 objects while pcap has data of only one. All these four are different objects.
E.g. 2:- The opposite is also observed i.e. there are cases where Wireshark captures a HTTP GET request but
there isn’t a separate entry in the har file corresponding to that request. It just shows up in the referred objects by other entity. So maybe Firebug missed something there!!
This happens for cdn0 (/community_logos/52517/voxv.png) and thus total number of objects downloaded according to pcap (15 sum of all non-zero size objects across all IPs
(cdn0 has two as mentioned in first screenshot)) and har
(13) are different.
Strange thing happens here, this same object is requested twice from two different TCP connections acc. to the pcap file but there is no single corresponding request in the har file. E.g. 3:- There are cases where har dump has https packets but there are no corresponding application layer packets in the Wireshark dump. One such instance is that of youtube.com. The har file has an entry with URL both https://www.youtube.com/embed/F4U0SXz2DJs and http://www.youtube.com/embed/F4U0SXz2DJs but the
Wireshark dump has data for latter one only.
The size is zero (data w.r.t pcap file) because nothing was downloaded from http request rather it was redirected to https one.
Same thing happens with s.ytimg.com and i.ytimg.com.
o Also, there were cases where the same URL was requested multiple times. One such instance in vox har was the GET request of http://ox-d.sbnation.com/w/1.0/jstag
The cache headers for both the request were empty so it’s most likely not the case that this is a non-static object. Also the text downloaded was same in both the requests. So one possibility can be that object might have been requested from multiple frames in the referrer webpage and the browser treats it as independent (hence duplicate) GET requests. There are multiple requests for this object in the
referrer page but can’t figure out if they are from different frames. o One more instant of this multiple request is http://cdn3.voxcdn.com/assets/4395631/Durand_line_crop.jpg. Although har file had only 1 GET request of this object, pcap had 2.But there is more to it. These objects were downloaded on different TCP connections and I can’t see a reason why! So this might be attributed to a browser bug.
TCP 3529,80 and 3530,80 both had request for this object.
So subtracting one from the total verifies the no. of objects downloaded w.r.t har file. Here, only
(3529,3531,3530,3554,3558) ports have non-zero size downloads, hence only these are considered for getting total number of objects downloaded. Rest other TCP belong to other domains (cdn0 or cdn2).
Very similar things were also observed for the nytimes.com as well. o Analogous to cdn0,cdn1,cdn2,cdn3 are the pair of
[a1.nyt.com and typeface.nytimes.com], they have same IP of 180.149.59.147 , the triplet of [int.nyt.com, ds.servingsys.com, b.scorecardresearch.com] , they have the same IP of 180.149.59.155 , the pair of [graphics8.nytimes.com,
js.moatads.com], they have same IP of 180.149.59.146 and few more. Thus the table should to be read accordingly. o Again multiple GET requests were made for a same object.
But nytimes.com had more such instances than vox.com. http://p.rfihub.com/cm?in=1&pub=6919 , http://typeface.nytimes.com/zam5nzz.js, http://cdn.krxd.net/controltag?confid=HrUwtkcl , http://ds.serving-sys.com/BurstingCachedScripts//Ad_2_29_3_0/ebStdBanner.js , were a few of them. o Screenshots and explanations for nytimes.com are avoided as the observations and inferences were exactly the same as vox.com. All the inconsistencies in nytimes.com table can also be attributed to one of the above explanations.
3. B)
Observations and Explanation:
Considering the object tree of both nytimes.com and vox.com, as one would expect, the trees obtained were quite flat with most of the objects being referred by the root (index page). The root node have been given parent node ID equal to 0. Other than that node
IDs are unique for each URL. A URL being requested multiple times would be shown multiple times but with same ID.
Considering the download tree, the tuples are according to the har file i.e. those objects are not included whose data isn’t available in the har file. Again each TCP connection have unique ID which is auto-incremental.
Download tree directly follow from table 2 in 3. A). So no new observation to cite.
3. C)
Observations and answers to various ques.
ii) In numerous cases, the time spent even for the first DNS query is ZERO. This was checked on the har file provided as well as on the self-created (loading the page on IIT network) one. The reasons can be: o In case of the har file provided by sir, it may be that the entries are cached at the Wi-Fi router (router is the DNS server in this case). Clearing local DNS cache won’t clear the router cache. The round trip time from router to the PC would be too small (0-3 ms) and hence output obtained is 0. o In the cases where the subsequent DNS query time is nonzero, most of the times it is close to zero (ranging from 1ms3 ms). This is exactly the range of roundtrip latency to the
Wi-Fi router. So this non-zero time may be due to delayed processing of data or instrumentation error. o In case of har file generated myself, the DNS server basically is the IIT server. So again clearing local cache would make no difference to the IIT server. So it’s possible that IIT DNS server has the entry cached. If that’s the case then again
DNS time output is just the RTT to the IIT server. Doing a ping to 10.10.2.2, it was verified that the RTT is indeed close to 0 ms.
iii) All the timing information is given in tabular form. o iii). 1 In vox.com there are cases where multiple objects on same TCP connection have different connect times. One such case is that of http://www.vox.com/fonts/vox/harriet_text_regular.woff and http://www.vox.com/fonts/vox/voxicon.woff. The GET
requests for these two were made at the same time by the browser (verified by har file). Thus ideally the connect time should have been same. But this is not the case. Rather what happened here can be attributed to a bug in HAR output
(given by firebug).
The first object fetched after the TCP connection was established, was voxicon.woff.
The connect time was 265 ms and total download time was 629 ms.
Right after the first object downloaded, processing of other .woff file started (since within a TCP connection, objects have to be downloaded sequentially).
The delay of 629 ms since the initiation of the connection when voxicon file was being downloaded which should have been under blocking time is shown in connect time.
This was happening for only for the requests whose start time lies within the start and end time of other request on the same TCP connection. Not for all consecutive requests. In case of no simultaneous requests issued, connect time of subsequent request is zero (as expected). o iii). 4 As expected, the send time for all the cases comes out to be Zero. o iii). 6 Total active time of a TCP connection is defined as
“(end time of the last request sent through that TCP connection) – (start time of the first request sent through that TCP connection) + (time taken for the last object on that
TCP to download).” o iii). 7 In some cases the idle percentage comes out to be negative because (send + wait + receive) time gets bigger
than the active time of a TCP connection. This can happen only when all the requests sent are overlapping. o iii). 9 Tried the direct download of the max size URL for about 10 different TCP connections.
Observed the same pattern for all the 10 cases wherein direct download gave a much greater max goodput compared to the har output. I.e. the time taken in direct download was very less.
Reason can be that in cumulative download of all the objects, multiple objects are downloaded across parallel connections and thus the bandwidth assigned to a particular TCP connection for downloading a particular file reduces. Since the total bandwidth of a network is fixed, this division leads to slow download of objects.
For objects of comparative small sizes, there is not much of a difference but when downloading image files of size in few hundred KBs (like image files), difference can be clearly seen. o iii). 10 Avg. achieved Goodput is (total size of objects downloaded / Total time spent in receiving).
Now, for a particular domain multiple overlapping TCP connections can be opened. So the best approximation of avg. goodput must consider this simultaneous download. The overlap time when the largest no of TCP connections to a domain are active can be obtained by running through the start and end times of each TCP connection for the particular domain.
Within this overlap window, each object is object is received simultaneously. Consider the object with the least start time (say T1) being received within this window and also consider the object with max end time (say T2) being received within this window. For the objects being downloaded outside that window consider the one with Maximum receive time (say T3).
Then the total time spent in receiving can be thought of as T = T2 - T1 + T3.
Thus if the total download size is S then,
Average goodput = S/T. o iii). 11 Similar to the calculations done in 10, consider the overlap time for all the TCP connections across domains. Do similar calculations and define quantities T1, T2, T3 in the same manner and apply the same formula. o iii). 12 If the avg. achieved goodput of the network is comparable to the maximum of the maximum achieved goodput then we can say that the download capacity was utilized well to access the webpages.
In our case, Maximum of maximum achieved goodput for
Vox.com = 1012 bytes/msec = 98.8 KB/sec
Nytimes.com = 1983 bytes/msec = 193 KB/sec
And average achieved goodput of the network for
Vox.com = 143 KB/sec
Nytimes.com = 65.4 KB/sec
Max of max goodput is a number closely related to channel capacity. It’s the maximum possible quantity of data that can be transmitted under ideal circumstances. Maximum
theoretical throughput is more accurately reported to take into account format and specification overhead with best case assumptions whereas avg. goodput is defined as the asymptotic throughput when the load (the amount of incoming data) is very large.
iv) Maximum number of TCP connections opened simultaneously for each domain was included in the table of DNS times. (The first table in 3c_output). Apart from that, simultaneous TCP connections across domains and no. of outstanding requests are there just below that table. o For most of the cases, wherever possible, the browser had parallel connections opened. o Considering vox.com, five domains had more than one TCP connections and all five also had parallel opening of these connections. o iv). 4 The cap imposed by browser on the number of TCP connections opened per domain is dependent of the version and type of browser. In our case it was Firefox 3 and this version imposes a restriction of 6 TCP connections per domain. This was found to be true in almost all the cases except for the domain “nytimes.com” which had 14 TCP connections out of which 13 were working in parallel. Apart from this, all other domains had a maximum of 6 connections. No inference could be made for total number of TCP connections opened. Looking at the two files, it felt like more the number of domains present, more will be the number of connections. It was around 40 for vox and 85 for nytimes. The maximum number of objects downloaded on a TCP connection was generally not more than 5-6. But for some domains like static01.nyt.com where the number of objects downloaded were 44, max no. of objects reached upto 9 and avg. being more than 7.